You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Systems/DataHub/Upgrading: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Milimetric
No edit summary
imported>ODimitrijevic
 
Line 1: Line 1:
The upstream DataHub repository is: https://github.com/linkedin/datahub/
#REDIRECT [[Data Engineering/Systems/DataHub/Upgrading]]
 
At the moment we maintain a fork of DataHub here: https://gerrit.wikimedia.org/r/admin/repos/analytics/datahub
 
The reasons why we do this are:
 
* DataHub do not publish binary artifacts other than their docker images
* We need to add files for [[PipelineLib]] configuration files and [[Blubber]] build pipelines alongside the codebase
 
Currently our changes are made in a [[https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/datahub/+/refs/heads/wmf wmf branch]] and we frequently squash any changes to that branch down to a single commit.
 
When a new release is required we perform the following operations.
 
* Update the code in a feature branch
* Merge to the wmf branch to publish the new containers
* Create a feature branch in the deployment-charts repository and update the image version in the helm charts
* Deploy the new version with <code>helmfile</code>
 
== Update the code ==
* Check out the code locally.
* Add the upstream remote if it does not already exist
<code>
git remote add linkedin-github git@github.com:datahub-project/datahub.git
</code>
* Pull the master branch from the <code>upstream</code> remote.
<code>
git remote update linkedin-github
</code>
* Push the master branch from the upstream repository to our gerrit repository.
<code>
git push origin linkedin-github/master:master
</code>
* Also push the tags to the remote repository
<code>
git push origin --tags
</code>
* Checkout the <code>wmf</code> branch.
<code>
git checkout wmf
</code>
* Rebase your current branch against the tag of the new version. In this case it is <code>v0.8.34</code>
<code>
git rebase -i v0.8.34
</code>
* Fix any merge conflicts if encountered
* Force-push the branch to gerrit
<code>
git push --force-with-lease
</code>
 
== Deploy datahub CLI tool ==
 
The version of the CLI tool has to match the server version, so we have to:
 
* Update the datahub-cli version on the [https://gerrit.wikimedia.org/r/c/analytics/refinery/+/792215 packaged virtual environment]
* Build and publish to Archiva: [[Analytics/Systems/Archiva#Uploading_Dependency_Artifacts]]
* Update the artifact version and metadata ingestion jobs in the airflow jobs repository

Latest revision as of 16:31, 2 September 2022