You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Nova Resource:Deployment-prep/Dumps

From Wikitech-static
< Nova Resource:Deployment-prep
Revision as of 21:15, 7 January 2018 by imported>ArielGlenn (→‎Dumps testing in deployment-prep)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Dumps testing in deployment-prep

For information on how to set up instances, see Nova_Resource:Deployment-prep/Dumps/Setup_notes.

What you can test

Currently you can check dumps of very small wikis in beta, or a couple of larger ones if you have the patience. Even these larger ones are tiny compared to the wikis in production. Run times for complete runs of larger wikis may take a couple of hours however.

How to test

  • Ssh into the instance, become root, and then su - dumpsgen to become the dumpsgen user with its environment.
  • cd /srv/deployment/dumps/dumps/xmldatadumps to get to the directory with the scripts.
  • Decide if you want to test one dumps job or all of them, for a given wiki.
  • To run all jobs for, e.g., enwikinews, run python ./worker.py --configfile /etc/dumps/conf... stuff... copy pasta tomorrow
  • To run one job for enwikinews, run ....stuff..
  • In either case, output will appear on the console as the run progresses. The run of all jobs for enwikinews should not take more than a couple of minutes.
  • If you see an exception from a job, you can run just that job and give the --verbose argument before the wikiname, thus: ...stuff...

I recommend enwikinews as a nice wiki that has a small size.

For testing all the dumps capabilities, you'll want to run the stubs and metahistory jobs for a large wiki, e.g. enwiki. These take a few minutes each. You can run those by...

After you have finished testing, please clean up your run by removing the run directory: .... Yes, the next dump run will remove an old run to make sure the dumps don't fill up the disk, but I would prefer that we keep the oldest run around, so it can be used for prefetch testing; if you have a bunch of broken runs left around, eventually all the good runs wil have been removed and no good page content dumps will be available for prefetch for the next test.