Jump to content

This is a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Dumps/Airflow/Operations

From Wikitech

Making a change to the SQL/XML Dump DAGs

These DAGs are specific in the way they are defined: they are code-generated instead of being manually written. The reason for that is: we used to define all of the DAGs in a single python file, the parsing of which was taking about 2 minutes. Not only was it a hotspot in the DAG parsing phase, but it was also causing the DAGs to temporarily disappear while that parsing was occurring.

To take full advantage of the DAG parsing parallelism (by default, we have twice as many DAG parsing workers as available CPUs in the scheduler pod), we decided to code-generate the XML/SQL dag files, each of these files only defining a single DAG.

This means that if you want to make a change to these DAGs, you need to make a change to the DAG template file and/or the code generation script , and regenerate the DAG files.

  • The code generation script is in charge of defining the wikis in scope for each DAG, the DAG keyword arguments, the Airflow pool, etc, and renders each DAG file by injecting these parameters into the DAG file template
  • The DAG file template contains the actual DAG, with string template plaeholders for each of the parameters injected by the code generation script

For example, let's assume we'd like to remove the wip=True DAG keyword argument from each of these DAGs. To do this, we would remove it from the code generation file.

diff --git a/scripts/generate_test_k8s_sql_xml_dump_dags.py b/scripts/generate_test_k8s_sql_xml_dump_dags.py
index a3c28060..a152e8b9 100755
--- a/scripts/generate_test_k8s_sql_xml_dump_dags.py
+++ b/scripts/generate_test_k8s_sql_xml_dump_dags.py
@@ -104,7 +104,6 @@ common_dag_kwargs = {
     "default_args": {
         "email": DUMPS_ALERTS_RECIPIENT,
     },
-    "wip": True,
 }

 RegularSqlXmlDumps = Dumps(

We would then regenerate all DAG files, which will also call black and isort on the generated DAG files.

~/wmf/airflow-dags T406874 *14 !3 ?2  make test_k8s/dags/dumps/sql_xml
reformatted /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_large_a_to_z_full.py
...
reformatted /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_t_to_v_partial.py

All done!  🍰 20 files reformatted, 1 file left unchanged.
Fixing /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_l_to_m_full.py
...
Fixing /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_full.py

You'll see your change being reflected in all DAGs (partial diff for clarity):

diff --git a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py
index 07170f48..ae42caf5 100644
--- a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py
+++ b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py
@@ -271,7 +271,6 @@ dag_kwargs = {
     "max_active_runs": 1,
     "max_active_tasks": 32,
     "default_args": {"email": "data-platform-alerts@wikimedia.org"},
-    "wip": True,
 }
 dag_kwargs["schedule"] = PARTIAL_DUMP_SCHEDULE
 dag_kwargs["user_defined_filters"] = filters
diff --git a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py
index 68d293ee..3343974b 100644
--- a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py
+++ b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py
@@ -247,7 +247,6 @@ dag_kwargs = {
     "max_active_runs": 1,
     "max_active_tasks": 32,
     "default_args": {"email": "data-platform-alerts@wikimedia.org"},
-    "wip": True,
 }
 dag_kwargs["schedule"] = FULL_DUMP_SCHEDULE
 dag_kwargs["user_defined_filters"] = filters

Once you're done, open a merge request with the change in the code generation script / template and the generated DAG files, which will then get automatically deployed when the MR is merged.

Re-running a failed dump command

TODO

Re-running a failed sync command

TODO

I'm getting paged

The dumps CephFS volume is filling up

You can dynamically resize the CephFS volume by adjusting the dumps.persistence.size value in deployment-charts/helmfile.d/dse-k8s-services/mediawiki-dumps-legacy/values.yaml . Send a patch, get it reviewed, and then ssh to the deployment server to run

% ssh deployment.eqiad.wmnet
brouberol@deploy1003:~$ cd /srv/deployment-charts/helmfile.d/dse-k8s-services/mediawiki-dumps-legacy
brouberol@deploy1003:~$ helmfile -e dse-k8s-services -i apply
Do not live edit the PVC size, as it would get reverted at the next helmfile apply (or at least the helmfile apply would fail as the operator might refuse to downsize the volume).

The fetch_wiki_list_from_noc dag is failing

If this dag is failing, it means that https://noc.wikimedia.org exposes different list of wikis than the one we have in airflow-dags .

If that is the case, it means that either the next dump dag run would be dumping a wiki that has been removed, or wouldn't be dumping a wiki that was recently added. To fix this issue, run the following script in your local airflow-dags checkout:

make test_k8s/dags/dumps/sql_xml

Send a merge request with the changes. The new wiki lists will be automatically deployed to airflow once the MR is merged.

I need to clear a dump wiki lock

Exec into the mediawiki-dumps-legacy toolbox pod to remove the lock file.

% ssh deployment.eqiad.wmnet
brouberol@deploy1003:~$ kube_env mediawiki-dumps-legacy dse-k8s-eqiad
brouberol@deploy1003:~$ kubectl exec -it $(kubectl get pod -l component=toolbox --no-headers -o custom-columns=":metadata.name") -- bash
www-data@mediawiki-dumps-legacy-toolbox-66b8dc5599-x29r2:/$ rm /mnt/dumpsdata/xmldatadumps/private/<wiki>/<date>/lock_*

The external storage servers are too loaded because of the dumps

Go to https://airflow-test-k8s.wikimedia.org/pool/list/ and reduce the slots of the mediawiki-dumps-legacy-regular and mediawiki-dumps-legacy-large pool slots by 2.

The default values are 32 slots for mediawiki-dumps-legacy-regular and 32 slots for mediawiki-dumps-legacy-large