You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Analytics/Systems/Airflow"

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
imported>Ottomata
(One intermediate revision by the same user not shown)
Line 18: Line 18:
|-
|-
| Dags || [https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/+/refs/heads/master/airflow/dags refinery/airflow/dags]
| Dags || [https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/+/refs/heads/master/airflow/dags refinery/airflow/dags]
|-
| Service user || analytics
|}
|}


SSH Tunnel to Web UI:  
SSH Tunnel to Web UI:  
   ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet
   ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet
and navigate to http://localhost:8600


== analytics-test ==
== analytics-test ==
Line 28: Line 32:
{| class="wikitable"
{| class="wikitable"
|-
|-
| Host || an-test-coord1001.eqiad.wmnet
| Host || an-test-client1001.eqiad.wmnet
|-
|-
| Web UI Port || 8600
| Web UI Port || 8600
|-
|-
| Dags || /srv/airflow-analytics-test-dags
| Dags || /srv/airflow-analytics-test-dags
|-
| Service user || analytics
|}
|}


SSH Tunnel to Web UI:  
SSH Tunnel to Web UI:  
   ssh -t -N -L8600:127.0.0.1:8600 an-test-coord1001.eqiad.wmnet
   ssh -t -N -L8600:127.0.0.1:8600 an-test-client1001.eqiad.wmnet
 
and navigate to http://localhost:8600


== search ==
== search ==
TODO
TODO
== research ==
Airflow instance owned by the Research team.
{| class="wikitable"
|-
| Host || an-airflow1002.eqiad.wmnet
|-
| Web UI Port || 8600
|-
| Dags || <tt>/srv/airflow-research/dags</tt>
|-
| Service user || analytics-research
|}
SSH Tunnel to Web UI:
  ssh -t -N -L8600:127.0.0.1:8600 an-airflow1002.eqiad.wmnet
and navigate to http://localhost:8600
== platform_eng ==
Airflow instance owned by the Research team.
{| class="wikitable"
|-
| Host || an-airflow1003.eqiad.wmnet
|-
| Web UI Port || 8600
|-
| Dags || <tt>/srv/airflow-platform_eng/dags</tt>
|-
| Service user || analytics-platform-eng
|}
SSH Tunnel to Web UI:
  ssh -t -N -L8600:127.0.0.1:8600 an-airflow1003.eqiad.wmnet
and navigate to http://localhost:8600


= Administration =  
= Administration =  
Line 124: Line 168:


Create the airflow tables by running
Create the airflow tables by running
   sudo -u airflow_test airflow-test db upgrade
   sudo -u test_user airflow-test db upgrade


The airflow services were probably already started by the earlier puppet run.  Restart them now that the airflow tables are created properly.
The airflow services were probably already started by the earlier puppet run.  Restart them now that the airflow tables are created properly.
   sudo systemctl restart airflow@test.service
   sudo systemctl restart airflow@test.service

Revision as of 14:04, 29 July 2021

WIP documentation page.

See also:


Airflow Instances

analytics

Airflow instance owned by the Data / Analytics engineering team.

Host an-launcher1002.eqiad.wmnet
Web UI Port 8600
Dags refinery/airflow/dags
Service user analytics

SSH Tunnel to Web UI:

 ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet

and navigate to http://localhost:8600

analytics-test

Airflow test instance owned by the Data / Analytics engineering team.

Host an-test-client1001.eqiad.wmnet
Web UI Port 8600
Dags /srv/airflow-analytics-test-dags
Service user analytics

SSH Tunnel to Web UI:

 ssh -t -N -L8600:127.0.0.1:8600 an-test-client1001.eqiad.wmnet

and navigate to http://localhost:8600

search

TODO

research

Airflow instance owned by the Research team.

Host an-airflow1002.eqiad.wmnet
Web UI Port 8600
Dags /srv/airflow-research/dags
Service user analytics-research

SSH Tunnel to Web UI:

 ssh -t -N -L8600:127.0.0.1:8600 an-airflow1002.eqiad.wmnet

and navigate to http://localhost:8600

platform_eng

Airflow instance owned by the Research team.

Host an-airflow1003.eqiad.wmnet
Web UI Port 8600
Dags /srv/airflow-platform_eng/dags
Service user analytics-platform-eng

SSH Tunnel to Web UI:

 ssh -t -N -L8600:127.0.0.1:8600 an-airflow1003.eqiad.wmnet

and navigate to http://localhost:8600

Administration

Creating a new Airflow Instance

In this example, we'll be creating a new Airflow instance named 'test'.

Create the Airflow MySQL Database

You'll need a running MariaDB instance somewhere.

CREATE DATABASE airflow_test;
CREATE USER 'airflow_test' IDENTIFIED BY 'password_here';
GRANT ALL PRIVILEGES ON airflow_test.* TO 'airflow_test';

Make sure your MariaDB config sets explicit_defaults_for_timestamp = on. See: https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#setting-up-a-mysql-database

Configure the Airflow instance in Puppet

Add the profile::airflow class to your node's role in Puppet and configure the Airflow instance(s) in your role's hiera.

Let's assume we're adding this instance in a role class role::airflow::test.

class role::airflow::test {
    include ::profile::airflow
    # profile::kerberos::keytabs is needed if your Airflow
    # instance needs to authenticate with Kerberos.
    # You'll need to create and configure the keytab for the Airflow instance's
    # $service_user we'll set below.
    include ::profile::kerberos::keytabs
}

Then, in hieradata/role/common/airflow/test.yaml:

# Set up airflow instances.
profile::airflow::instances:
  # airflow@test instance.
  test:
    # Since we set security: kerberos a keytab must be deployed for the service_user.
    service_user: test_user
    service_group: test_group
   # Set this to true if you want enable alerting for your airflow instance.
    monitoring_enabled: false
    # Configuration for /srv/airflow-test/airflow.cfg
    # Any airflow configs can go here. See:
    # https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#webserver
    airflow_config:
      core:
        security: kerberos # you don't need to set this if you don't use Kerberos.
        executor: LocalExecutor
        # This can be an ERB template that will be rendered in airflow::instance.
        # db_user and db_password params should be set in puppet private
        # in profile::airflow::instances_secrets.
        sql_alchemy_conn: mysql://<%= @db_user %>:<%= @db_password %>@my-db-host.eqiad.wmnet/airflow_analytics?ssl_ca=/etc/ssl/certs/Puppet_Internal_CA.pem

# Make sure the keytab for test_user is deployed via profile::kerberos::keytabs
profile::kerberos::keytabs::keytabs_metadata:
  - role: 'test_user'
    owner: 'test_user'
    group: 'test_group'
    filename: 'test_user.keytab'

See Create_a_keytab_for_a_service for instructions on creating keytabs.

Note that we didn't set db_user or db_password. These are secrets and should be set in the operations puppet private repository in the hiera variable profile::airflow::instances_secrets. So, in puppet private in the hieradata/role/common/airflow/test.yaml file:

# Set up airflow instances.
profile::airflow::instances_secrets:
  # airflow@test instance.
  test:
    db_user: airflow_test
    db_password: password_here

profile::airflow::instances_secrets will be merged with profile::airflow::instances by the profile::airflow class, and the parameters to airflow::instance will be available for use in the sql_alchemy_conn as an ERb template.

Once this is merged and applied, the node with the role::airflow::test will run the systemd services airflow-scheduler@test, airflow-webserver@test, airflow-kerberos@test, as well as some 'control' systemd services airflow@test and airflow that can be used to manage the Airflow test instance.

Create the airflow tables by running

 sudo -u test_user airflow-test db upgrade

The airflow services were probably already started by the earlier puppet run. Restart them now that the airflow tables are created properly.

 sudo systemctl restart airflow@test.service