[[File:Atlas data lineage screenshot. png|center|frameless|800x800px]]This was much harder to get running than datahub. The documentation is on par for Apache, which is to say, quite lacking, and it took dozens of minutes just to start the server.
Atlas .was on .
Currently [[ User: Razzi| razzi]] is trying to get atlas running on the test cluster ([[ phab: T296670|T296670] ]), which will be much harder since it won' t be using docker. Maybe someday we 'll be able to run rootless docker and generate keytabs on the fly, to use docker for development and deployment.
to get the [:]
Revision as of 10:32, 8 February 2022
Core Service and Dependency Setup
Atlas 2.2.0 was downloaded as a tarball and compiled on an-test-coord1001 without using any root privileges.
At a later date the HEAD from the GitHub repository was also tried as version 3.0.0-SNAPSHOT.
We used Maven to build the project and selected the BerkeleyDB & Apache Solr profile, which automatically built and started the Solr and Zookeeper dependencies on the same host.
The daemons were executed with
bin/atlas_start.sh and the web service was available on port 21000.
The key ingestion elements that we wanted to get working were the Hive Hook and Bridge components, along with the utility script
The hook and bridge provide for real-time metadata synchronization in Atlas, whenever data is changed in Hive.
import_hive.sh script provides for a one-off import of existing Hive data.
The bulk of the time spent on the evalution was in trying to get the script working.
Getting the import script to talk to the Hive Metastore using an individual's existing Kerberos session took some time to get working. Unfortunately, once this had been achieved we discovered that the Hive integration with Atlas depends on having Hive version 3.1.0 or later. We currently have Hive version 2.3.6
Therefore, the only ways in which we could proceed with Atlas and its Hive integration were either:
- Upgrade our existing Hive services, along with the underlying Hadoop services
- Downgrade Atlas to version 1.2.0
The upgrade option was the more appealing of the two, but it would require a great deal of work before we could continue with the evaluation.
In some ways, Atlas really seems like it might have hit the spot for this requirement. The real-time integration with Hive would have been particularly useful and other projects, such as Amundsen, can build upon this further.
However, there are also a number of ways in which the project did not impress.
- The community did not respond to our requests for assistance on the mailing list.
- There were errors in the pristine 2.2.0 tarball that prevented building, suggesting a lack of quality.
- The monolithic nature of the project makes it difficult to address a single component, such as the Hive connector.
We ceased working on the prototype once the requirement for Hive 3.1 became clear.