You are browsing a read-only backup copy of Wikitech. The live site can be found at

Analytics/Systems/Airflow: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
(18 intermediate revisions by 9 users not shown)
Line 1: Line 1:
WIP documentation page.
#REDIRECT [[Data Engineering/Systems/Airflow]]
See also
= Airflow Instances =
== analytics ==
Airflow instance owned by the Data / Analytics engineering team.
== analytics-test ==
Airflow test instance owned by the Data / Analytics engineering team.
{| class="wikitable"
| Host || an-test-coord1001.eqiad.wmnet
| Web UI Port || 8600
| Dags || [ refinery/airflow/dags]
SSH Tunnel to Web UI:
  ssh -t -N -L8600: an-test-coord1001.eqiad.wmnet
== search ==
= Administration =
== Creating a new Airflow Instance ==
In this example, we'll be creating a new Airflow instance named 'test'.
=== Create the Airflow MySQL Database ===
You'll need a running MariaDB instance somewhere.  If your MariaDB instance is replicated, you'll need to run the <code>GRANT</code> statement on the replicas as well.
<syntaxhighlight lang="sql">
CREATE DATABASE airflow_test;
CREATE USER 'airflow_test' IDENTIFIED BY 'password_here';
GRANT ALL PRIVILEGES ON airflow_test.* TO 'airflow_test';
Make sure your MariaDB config sets <code>explicit_defaults_for_timestamp = on</code>.  See:
=== Configure the Airflow instance in Puppet ===
Add the <code>profile::airflow</code> class to your node's role in Puppet and configure the Airflow instance(s) in your role's hiera.
Let's assume we're adding this instance in a role class <code>role::airflow::test</code>.
<syntaxhighlight lang="puppet">
class role::airflow::test {
    include ::profile::airflow
    # profile::kerberos::keytabs is needed if your Airflow
    # instance needs to authenticate with Kerberos.
    # You'll need to create and configure the keytab for the Airflow instance's
    # $service_user we'll set below.
    include ::profile::kerberos::keytabs
Then, in <code>hieradata/role/common/airflow/test.yaml</code>:
<syntaxhighlight lang="yaml">
# Set up airflow instances.
  # airflow@test instance.
    # Since we set security: kerberos a keytab must be deployed for the service_user.
    service_user: test_user
    service_group: test_group
  # Set this to true if you want enable alerting for your airflow instance.
    monitoring_enabled: false
    # Configuration for /srv/airflow-test/airflow.cfg
    # Any airflow configs can go here. See:
        security: kerberos # you don't need to set this if you don't use Kerberos.
        executor: LocalExecutor
        # This can be an ERB template that will be rendered in airflow::instance.
        # db_user and db_password params should be set in puppet private
        # in profile::airflow::instances_secrets.
        sql_alchemy_conn: mysql://<%= @db_user %>:<%= @db_password %>@my-db-host.eqiad.wmnet/airflow_analytics?ssl_ca=/etc/ssl/certs/Puppet_Internal_CA.pem
# Make sure the keytab for test_user is deployed via profile::kerberos::keytabs
  - role: 'test_user'
    owner: 'test_user'
    group: 'test_group'
    filename: 'test_user.keytab'
See [[Analytics/Systems/Kerberos#Create_a_keytab_for_a_service|Create_a_keytab_for_a_service]] for instructions on creating keytabs.
Note that we didn't set <code>db_user</code> or <code>db_password</code>.  These are secrets and should be set in the [[Puppet#Private_puppet|operations puppet private repository]] in the hiera variable <code>profile::airflow::instances_secrets</code>.  So, in puppet private in the <code>hieradata/role/common/airflow/test.yaml</code> file:
<syntaxhighlight lang="yaml">
# Set up airflow instances.
  # airflow@test instance.
    db_user: airflow_test
    db_password: password_here
<code>profile::airflow::instances_secrets</code> will be merged with <code>profile::airflow::instances</code> by the <code>profile::airflow</code> class, and the parameters to <code>airflow::instance</code> will be available for use in the <code>sql_alchemy_conn</code> as an ERb template.
Once this is merged and applied, the node with the <code>role::airflow::test</code> will run the systemd services <code>airflow-scheduler@test</code>, <code>airflow-webserver@test</code>, <code>airflow-kerberos@test</code>, as well as some 'control' systemd services <code>airflow@test</code> and <code>airflow</code> that can be used to manage the Airflow test instance.

Latest revision as of 16:30, 2 September 2022