You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

2021 data catalog selection/Rubric/Amundsen

From Wikitech-static
< 2021 data catalog selection‎ | Rubric
Revision as of 18:04, 4 July 2022 by imported>Neil P. Quinn-WMF (Neil P. Quinn-WMF moved page Data Catalog Application Evaluation/Rubric/Amundsen to 2021 data catalog selection/Rubric/Amundsen: Clarify this is not a living document and use title case)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Core Service and Dependency Setup

Ingestion Configuration

Progress Status

Perceptions

Outcome

Razzi's take on Amundsen

Pros:

- simple architecture of 3 flask services all in python (as opposed to Datahub using java and python)

- ingestion architecture is simple: python scripts or airflow dags that make http api requests

- "social" ui features, like frequent users and owners

- loose coupling means you can use a relational database as the data store rather than neo4j (https://github.com/amundsen-io/amundsenrds)

Cons:

- seems like the community is losing steam: https://github.com/amundsen-io/amundsen#blog-posts-and-interviews has a flurry of events in 2019/2020 but nothing in 2021

- only supports polling for data updates, unless we also deploy atlas. Push ingest api is on their roadmap

- documentation is somewhat lacking; few ingestion examples, and broken links in docs

- some dependencies are getting out of date: elasticsearch version 6 (v7 was released 2019), nodejs version 12 (v13 was released 2019)

File:Screenshot of Amundsen home page.png
The Amundsen home page running in Docker, after loading their small sample dataset from example/scripts/sample_data_loader.py
File:Amundsen README screenshot.png
Summary of Amundsen from the README on Github.

Amundsen was created by Lyft and is now hosted by the Linux Foundation.