You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Data Catalog Application Evaluation Rubric: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Razzi
No edit summary
 
imported>Milimetric
No edit summary
Line 58: Line 58:
|
|
|The dogfooding approach: run our data catalong on wikitech. Mediawiki by itself could be used and manually updated, and any programmatic data access could be accomplished using mediawiki extensions.
|The dogfooding approach: run our data catalong on wikitech. Mediawiki by itself could be used and manually updated, and any programmatic data access could be accomplished using mediawiki extensions.
|}
Some other candidates with reasons they were not more seriously considered:
{| class="wikitable"
|+
!''Name''
!Metacat
|-
|Tagline
|Metacat is a unified metadata exploration API service. You can explore Hive, RDS, Teradata, Redshift, S3 and Cassandra. Metacat provides you information about what data you have, where it resides and how to process it. Metadata in the end is really data about the data. So the primary purpose of Metacat is to give a place to describe the data so that we could do more useful things with it.
|-
|Link
|https://github.com/Netflix/metacat
|-
|Disqualifying Reasons
|Documentation is still in the "TODO" phase, no references to community or the kind of organization that Apache projects enjoy, and somewhat limited scope.
|}
|}

Revision as of 20:49, 15 November 2021

Evaluating potential data catalogs for https://phabricator.wikimedia.org/T293643.

Read the Data-as-a-Service Execution plan here.

Name Amundsen Altas DataHub Mediawiki
Tagline Open source data discovery and metadata engine Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. The Metadata Platform for the Modern Data Stack MediaWiki is a collaboration and documentation platform brought to you by a vibrant community.
Wikipedia Page
Release Date
Website https://www.amundsen.io https://atlas.apache.org https://datahubproject.io https://mediawiki.org
Author
Owner
License
UX
Robustness (criteria TBD)
Comment The dogfooding approach: run our data catalong on wikitech. Mediawiki by itself could be used and manually updated, and any programmatic data access could be accomplished using mediawiki extensions.

Some other candidates with reasons they were not more seriously considered:

Name Metacat
Tagline Metacat is a unified metadata exploration API service. You can explore Hive, RDS, Teradata, Redshift, S3 and Cassandra. Metacat provides you information about what data you have, where it resides and how to process it. Metadata in the end is really data about the data. So the primary purpose of Metacat is to give a place to describe the data so that we could do more useful things with it.
Link https://github.com/Netflix/metacat
Disqualifying Reasons Documentation is still in the "TODO" phase, no references to community or the kind of organization that Apache projects enjoy, and somewhat limited scope.