You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Apache Gobblin is Hadoop ingestion software used at WMF primarily to import data from Kafka into HDFS.
Gobblin jobs are declared in puppet.
WMF's Gobblin fork
The Data Engineering team maintains a fork of Gobblin. We use this fork to maintain our own gobblin-wmf gobblin module in the wmf branch. The gobblin-wmf module mostly contains code for interact with Event Platform based events in Kafka. The master branch should track upstream.
Releasing new Gobblin versions
We upload our gobblin-wmf artifacts directly to Archiva, and then add them as git-fat jar files in Analytics/Systems/Cluster/Deploy/Refinery, and deploy them like we do other jar artifacts with analytics/refinery.
We do not (as of 2021-07) have an automated release process for Gobblin. You must manually upload the packaged artifact .jars to archiva, and manually download and git add them to analytics/refinery.