Dumps/Airflow/Operations
Making a change to the SQL/XML Dump DAGs
These DAGs are specific in the way they are defined: they are code-generated instead of being manually written. The reason for that is: we used to define all of the DAGs in a single python file, the parsing of which was taking about 2 minutes. Not only was it a hotspot in the DAG parsing phase, but it was also causing the DAGs to temporarily disappear while that parsing was occurring.
To take full advantage of the DAG parsing parallelism (by default, we have twice as many DAG parsing workers as available CPUs in the scheduler pod), we decided to code-generate the XML/SQL dag files, each of these files only defining a single DAG.
This means that if you want to make a change to these DAGs, you need to make a change to the DAG template file and/or the code generation script , and regenerate the DAG files.
- The code generation script is in charge of defining the wikis in scope for each DAG, the DAG keyword arguments, the Airflow pool, etc, and renders each DAG file by injecting these parameters into the DAG file template
- The DAG file template contains the actual DAG, with string template plaeholders for each of the parameters injected by the code generation script
For example, let's assume we'd like to remove the
wip=True
DAG keyword argument from each of these DAGs. To do this, we would remove it from the code generation file.
diff --git a/scripts/generate_test_k8s_sql_xml_dump_dags.py b/scripts/generate_test_k8s_sql_xml_dump_dags.py
index a3c28060..a152e8b9 100755
--- a/scripts/generate_test_k8s_sql_xml_dump_dags.py
+++ b/scripts/generate_test_k8s_sql_xml_dump_dags.py
@@ -104,7 +104,6 @@ common_dag_kwargs = {
"default_args": {
"email": DUMPS_ALERTS_RECIPIENT,
},
- "wip": True,
}
RegularSqlXmlDumps = Dumps(
We would then regenerate all DAG files, which will also call
black
and
isort
on the generated DAG files.
~/wmf/airflow-dags T406874 *14 !3 ?2 ❯ make test_k8s/dags/dumps/sql_xml
reformatted /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_large_a_to_z_full.py
...
reformatted /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_t_to_v_partial.py
All done! ✨ 🍰 ✨
20 files reformatted, 1 file left unchanged.
Fixing /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_l_to_m_full.py
...
Fixing /Users/brouberol/wmf/airflow-dags/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_full.py
You'll see your change being reflected in all DAGs (partial diff for clarity):
diff --git a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py
index 07170f48..ae42caf5 100644
--- a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py
+++ b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_a_to_b_partial.py
@@ -271,7 +271,6 @@ dag_kwargs = {
"max_active_runs": 1,
"max_active_tasks": 32,
"default_args": {"email": "data-platform-alerts@wikimedia.org"},
- "wip": True,
}
dag_kwargs["schedule"] = PARTIAL_DUMP_SCHEDULE
dag_kwargs["user_defined_filters"] = filters
diff --git a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py
index 68d293ee..3343974b 100644
--- a/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py
+++ b/test_k8s/dags/dumps/sql_xml/mediawiki_dumps_sql_xml_regular_c_to_e_full.py
@@ -247,7 +247,6 @@ dag_kwargs = {
"max_active_runs": 1,
"max_active_tasks": 32,
"default_args": {"email": "data-platform-alerts@wikimedia.org"},
- "wip": True,
}
dag_kwargs["schedule"] = FULL_DUMP_SCHEDULE
dag_kwargs["user_defined_filters"] = filters
Once you're done, open a merge request with the change in the code generation script / template and the generated DAG files, which will then get automatically deployed when the MR is merged.
Re-running a failed dump command
TODO
Re-running a failed sync command
TODO
I'm getting paged
The dumps CephFS volume is filling up
You can dynamically resize the CephFS volume by adjusting the
dumps.persistence.size
value in
deployment-charts/helmfile.d/dse-k8s-services/mediawiki-dumps-legacy/values.yaml
. Send a patch, get it reviewed, and then ssh to the deployment server to run
% ssh deployment.eqiad.wmnet
brouberol@deploy1003:~$ cd /srv/deployment-charts/helmfile.d/dse-k8s-services/mediawiki-dumps-legacy
brouberol@deploy1003:~$ helmfile -e dse-k8s-services -i apply
helmfile apply
(or at least the
helmfile apply
would fail as the operator might refuse to downsize the volume).
The
fetch_wiki_list_from_noc
dag is failing
If this dag is failing, it means that https://noc.wikimedia.org exposes different list of wikis than the one we have in airflow-dags .
If that is the case, it means that either the next dump dag run would be dumping a wiki that has been removed, or wouldn't be dumping a wiki that was recently added. To fix this issue, run the following script in your local
airflow-dags
checkout:
make test_k8s/dags/dumps/sql_xml
Send a merge request with the changes. The new wiki lists will be automatically deployed to airflow once the MR is merged.
I need to clear a dump wiki lock
Exec into the
mediawiki-dumps-legacy
toolbox pod to remove the lock file.
% ssh deployment.eqiad.wmnet
brouberol@deploy1003:~$ kube_env mediawiki-dumps-legacy dse-k8s-eqiad
brouberol@deploy1003:~$ kubectl exec -it $(kubectl get pod -l component=toolbox --no-headers -o custom-columns=":metadata.name") -- bash
www-data@mediawiki-dumps-legacy-toolbox-66b8dc5599-x29r2:/$ rm /mnt/dumpsdata/xmldatadumps/private/<wiki>/<date>/lock_*
The external storage servers are too loaded because of the dumps
Go to
https://airflow-test-k8s.wikimedia.org/pool/list/
and reduce the slots of the
mediawiki-dumps-legacy-regular
and
mediawiki-dumps-legacy-large
pool slots by 2.
mediawiki-dumps-legacy-regular
and 32 slots for
mediawiki-dumps-legacy-large