You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Google storage

From Wikitech-static
Revision as of 22:20, 25 January 2014 by imported>Pathoschild (rm single-page category)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

We have been donated some space from Google for storage of XML dumps and other items.

How to get set up

We like gsutil, a python client built on the boto library. You can get it here. Put it where it's convenient and unpack it; each user who runs it is going to wind up with a ~/.boto config file in their home directory. Likewise, each user will need to add lines equivalent to the following, to their .bashrc:

export PATH=${PATH}:full-path-to-where-you-unpacked-it/gsutil
export PYTHONPATH=${PYTHONPATH}:full-path-to-where-you-unpacked-it/gsutil/boto

If you run gsutil without args it give a usage message.

The first run should be something like gsutil ls (something with an argument); it will prompt you for your dev keys. Then it will exit. Now you can run real commands.

Getting keys

Ah yes, you need to get a set. We're in the process of working out a procedure for that.

Basics

  • gsutil ls
    lists all buckets

There are no directories or subdirectories, only "buckets". Filenames can contain forward slashes.

  • gsutil ls -L -b gs://my-bucket-name
    gives additional detail about a specific bucket
  • gsutil ls gs://my-bucket-name
    lists the files in the specified bucket
  • gsutil cp reallygreatfile gs://my-bucket-name
    copies a file from local system to the bucket

Copies can be done from one bucket to another as well.

Debugging

gsutil takes the -d option which gives some HTTP headers, the -D option which gives more headers, and when both those aren't enough you can edit your ~/.boto config file, uncomment the line "#is_secure False" and then tcpdump to capture the HTTP packets as the command runs.