You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Analytics/Systems/Airflow: Difference between revisions
imported>Ottomata No edit summary |
imported>Ottomata |
||
Line 18: | Line 18: | ||
|- | |- | ||
| Dags || [https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/+/refs/heads/master/airflow/dags refinery/airflow/dags] | | Dags || [https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/+/refs/heads/master/airflow/dags refinery/airflow/dags] | ||
|- | |||
| Service user || analytics | |||
|} | |} | ||
SSH Tunnel to Web UI: | SSH Tunnel to Web UI: | ||
ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet | ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet | ||
and navigate to http://localhost:8600 | |||
== analytics-test == | == analytics-test == | ||
Line 28: | Line 32: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
| Host || an-test- | | Host || an-test-client1001.eqiad.wmnet | ||
|- | |- | ||
| Web UI Port || 8600 | | Web UI Port || 8600 | ||
|- | |- | ||
| Dags || /srv/airflow-analytics-test-dags | | Dags || /srv/airflow-analytics-test-dags | ||
|- | |||
| Service user || analytics | |||
|} | |} | ||
SSH Tunnel to Web UI: | SSH Tunnel to Web UI: | ||
ssh -t -N -L8600:127.0.0.1:8600 an-test- | ssh -t -N -L8600:127.0.0.1:8600 an-test-client1001.eqiad.wmnet | ||
and navigate to http://localhost:8600 | |||
== search == | == search == | ||
TODO | TODO | ||
== research == | |||
Airflow instance owned by the Research team. | |||
{| class="wikitable" | |||
|- | |||
| Host || an-airflow1002.eqiad.wmnet | |||
|- | |||
| Web UI Port || 8600 | |||
|- | |||
| Dags || <tt>/srv/airflow-research/dags</tt> | |||
|- | |||
| Service user || analytics-research | |||
|} | |||
SSH Tunnel to Web UI: | |||
ssh -t -N -L8600:127.0.0.1:8600 an-airflow1002.eqiad.wmnet | |||
and navigate to http://localhost:8600 | |||
== platform-eng == | |||
Airflow instance owned by the Research team. | |||
{| class="wikitable" | |||
|- | |||
| Host || an-airflow1003.eqiad.wmnet | |||
|- | |||
| Web UI Port || 8600 | |||
|- | |||
| Dags || <tt>/srv/airflow-platform_eng/dags</tt> | |||
|- | |||
| Service user || analytics-platform-eng | |||
|} | |||
SSH Tunnel to Web UI: | |||
ssh -t -N -L8600:127.0.0.1:8600 an-airflow1003.eqiad.wmnet | |||
and navigate to http://localhost:8600 | |||
= Administration = | = Administration = | ||
Line 124: | Line 168: | ||
Create the airflow tables by running | Create the airflow tables by running | ||
sudo -u | sudo -u test_user airflow-test db upgrade | ||
The airflow services were probably already started by the earlier puppet run. Restart them now that the airflow tables are created properly. | The airflow services were probably already started by the earlier puppet run. Restart them now that the airflow tables are created properly. | ||
sudo systemctl restart airflow@test.service | sudo systemctl restart airflow@test.service |
Revision as of 00:42, 29 July 2021
WIP documentation page.
See also:
Airflow Instances
analytics
Airflow instance owned by the Data / Analytics engineering team.
Host | an-launcher1002.eqiad.wmnet |
Web UI Port | 8600 |
Dags | refinery/airflow/dags |
Service user | analytics |
SSH Tunnel to Web UI:
ssh -t -N -L8600:127.0.0.1:8600 an-launcher1002.eqiad.wmnet
and navigate to http://localhost:8600
analytics-test
Airflow test instance owned by the Data / Analytics engineering team.
Host | an-test-client1001.eqiad.wmnet |
Web UI Port | 8600 |
Dags | /srv/airflow-analytics-test-dags |
Service user | analytics |
SSH Tunnel to Web UI:
ssh -t -N -L8600:127.0.0.1:8600 an-test-client1001.eqiad.wmnet
and navigate to http://localhost:8600
search
TODO
research
Airflow instance owned by the Research team.
Host | an-airflow1002.eqiad.wmnet |
Web UI Port | 8600 |
Dags | /srv/airflow-research/dags |
Service user | analytics-research |
SSH Tunnel to Web UI:
ssh -t -N -L8600:127.0.0.1:8600 an-airflow1002.eqiad.wmnet
and navigate to http://localhost:8600
platform-eng
Airflow instance owned by the Research team.
Host | an-airflow1003.eqiad.wmnet |
Web UI Port | 8600 |
Dags | /srv/airflow-platform_eng/dags |
Service user | analytics-platform-eng |
SSH Tunnel to Web UI:
ssh -t -N -L8600:127.0.0.1:8600 an-airflow1003.eqiad.wmnet
and navigate to http://localhost:8600
Administration
Creating a new Airflow Instance
In this example, we'll be creating a new Airflow instance named 'test'.
Create the Airflow MySQL Database
You'll need a running MariaDB instance somewhere.
CREATE DATABASE airflow_test;
CREATE USER 'airflow_test' IDENTIFIED BY 'password_here';
GRANT ALL PRIVILEGES ON airflow_test.* TO 'airflow_test';
Make sure your MariaDB config sets explicit_defaults_for_timestamp = on
. See:
https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#setting-up-a-mysql-database
Configure the Airflow instance in Puppet
Add the profile::airflow
class to your node's role in Puppet and configure the Airflow instance(s) in your role's hiera.
Let's assume we're adding this instance in a role class role::airflow::test
.
class role::airflow::test {
include ::profile::airflow
# profile::kerberos::keytabs is needed if your Airflow
# instance needs to authenticate with Kerberos.
# You'll need to create and configure the keytab for the Airflow instance's
# $service_user we'll set below.
include ::profile::kerberos::keytabs
}
Then, in hieradata/role/common/airflow/test.yaml
:
# Set up airflow instances.
profile::airflow::instances:
# airflow@test instance.
test:
# Since we set security: kerberos a keytab must be deployed for the service_user.
service_user: test_user
service_group: test_group
# Set this to true if you want enable alerting for your airflow instance.
monitoring_enabled: false
# Configuration for /srv/airflow-test/airflow.cfg
# Any airflow configs can go here. See:
# https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#webserver
airflow_config:
core:
security: kerberos # you don't need to set this if you don't use Kerberos.
executor: LocalExecutor
# This can be an ERB template that will be rendered in airflow::instance.
# db_user and db_password params should be set in puppet private
# in profile::airflow::instances_secrets.
sql_alchemy_conn: mysql://<%= @db_user %>:<%= @db_password %>@my-db-host.eqiad.wmnet/airflow_analytics?ssl_ca=/etc/ssl/certs/Puppet_Internal_CA.pem
# Make sure the keytab for test_user is deployed via profile::kerberos::keytabs
profile::kerberos::keytabs::keytabs_metadata:
- role: 'test_user'
owner: 'test_user'
group: 'test_group'
filename: 'test_user.keytab'
See Create_a_keytab_for_a_service for instructions on creating keytabs.
Note that we didn't set db_user
or db_password
. These are secrets and should be set in the operations puppet private repository in the hiera variable profile::airflow::instances_secrets
. So, in puppet private in the hieradata/role/common/airflow/test.yaml
file:
# Set up airflow instances.
profile::airflow::instances_secrets:
# airflow@test instance.
test:
db_user: airflow_test
db_password: password_here
profile::airflow::instances_secrets
will be merged with profile::airflow::instances
by the profile::airflow
class, and the parameters to airflow::instance
will be available for use in the sql_alchemy_conn
as an ERb template.
Once this is merged and applied, the node with the role::airflow::test
will run the systemd services airflow-scheduler@test
, airflow-webserver@test
, airflow-kerberos@test
, as well as some 'control' systemd services airflow@test
and airflow
that can be used to manage the Airflow test instance.
Create the airflow tables by running
sudo -u test_user airflow-test db upgrade
The airflow services were probably already started by the earlier puppet run. Restart them now that the airflow tables are created properly.
sudo systemctl restart airflow@test.service