Portal:Data Services/Admin/Quarry: Difference between revisions
m (BryanDavis moved page Portal:Data Services/Admin/quarry to Portal:Data Services/Admin/Quarry: Sentence case)
|Line 43:||Line 43:|
Revision as of 17:30, 29 September 2020
This page contains administrative documentation for the Quarry service.
The service is composed of a web interface and query runners, with a shared database by both to persist state of operations.
Query runners launch SQL queries to Wiki Replicas.
A webproxy exists quarry.wmflabs.org which redirects to the web server.
The main service is a nginx server listening on port 80/tcp, which enforces a HTTPS redirection. Then, a uwsgi quarry-specific app is use, which is controlled by the uwsgi-quarry-web.service systemd service file.
The code of this app is hosted at analytics/quarry/web in gerrit (mirrored to https://github.com/wikimedia/analytics-quarry-web), which is git cloned locally to something like /srv/quarry.
Logs produced by the uwsgi proccess can be found in /var/log/syslog with the uwsgi-quarry-web keyword.
When a query is launched in the web interface by an user, it is submitted via celery to workers/runners.
There is a celery-quarry-worker.service systemd service file which controls the workers.
Again, the code lives in something like /srv/quarry.
Logs produced by the celery worker can be found in /var/log/syslog with the celery-quarry-worker keyword.
The local mariadb database stores state of users and queries for the Quarry service. Is controlled by the mariadb.service systemd service file.
The local redis database stores web sessions and a queue for worker jobs. Is controlled by the redis-instance-tcp_6379.service systemd service file.
Queries results are stored in NFS at /data/project/quarry/results/. This is something to improve in the future.
In the case of long queries, there is a cronjob in the quarry user of the web node that runs every minute and checks queries running longer than 30 minutes to kill them.
Logs can be found at /var/log/quarry/killer.log.
The service is deployed in a Cloud VPS project (called quarry) and is composed of several virtual machines instances.
These VM are usually like this (source of truth is openstack-browser):
quarry-db-01.quarry.eqiad1.wikimedia.cloud quarry-web-01.quarry.eqiad1.wikimedia.cloud quarry-worker-01.quarry.eqiad1.wikimedia.cloud quarry-worker-02.quarry.eqiad1.wikimedia.cloud
The deployment is persisted into operations/puppet.git in several places, specially in:
After the initial git clone by puppet, code deployments are done with simple git fetch as we go:
root@quarry-web-01:/srv/quarry# git fetch && git checkout FETCH_HEAD root@quarry-worker-01:/srv/quarry# git fetch && git checkout FETCH_HEAD root@quarry-worker-02:/srv/quarry# git fetch && git checkout FETCH_HEAD
And then restart the affected services:
root@quarry-web-01:~# systemctl restart uwsgi-quarry-web.service root@quarry-worker-01:~# systemctl restart celery-quarry-worker.service root@quarry-worker-02:~# systemctl restart celery-quarry-worker.service
Some well known admin operations of this service.
Get runners current work
In worker nodes, go to /srv/quarry and then:
user@quarry-worker-01:/srv/quarry $ sudo /srv/quarry/venv/bin/celery -A quarry.web.worker inspect active
Sometimes if a query is stuck in running state but not actually running, check the celery logs in workers node to find more info.
Clear running or queued queries
It is possible that queries was running when the database was lock or the runners killed during maintenance window. This will mark all less than 30 minutes ones in running or queued as "killed" state:
UPDATE query join query_revision on query.latest_rev_id = query_revision.id join query_run on latest_run_id = query_run.id SET status=3 where (status = 0 or status = 2) and query_run.timestamp <= DATE_ADD(NOW(), INTERVAL -1 HOUR);
See quarry/web/models/queryrun.py for query run numeric status correspondences.
Maintenance mode warning
Planned maintenance windows should be warned to end-users using a maintenance message present in /srv/quarry/config.yaml. It's commanded to add relevant phabricator task as HTML link in the message. Restarting the web service on -web instance is needed after config edit.
Block a user
INSERT INTO user_group (user_id, group_name) VALUES (XXX, "blocked");
- Admin panel (contains SAL, monitoring, etc).