You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Google storage

From Wikitech-static
Jump to navigation Jump to search

We have been donated some space from Google for storage of XML dumps and other items.

How to get set up

We like gsutil, a python client built on the boto library. You can get it here. Put it where it's convenient and unpack it; each user who runs it is going to wind up with a ~/.boto config file in their home directory. Likewise, each user will need to add lines equivalent to the following, to their .bashrc:

export PATH=${PATH}:full-path-to-where-you-unpacked-it/gsutil
export PYTHONPATH=${PYTHONPATH}:full-path-to-where-you-unpacked-it/gsutil/boto

If you run gsutil without args it give a usage message.

The first run should be something like gsutil ls (something with an argument); it will prompt you for your dev keys. Then it will exit. Now you can run real commands.

Getting keys

Ah yes, you need to get a set. We're in the process of working out a procedure for that.

Basics

  • gsutil ls
    lists all buckets

There are no directories or subdirectories, only "buckets". Filenames can contain forward slashes.

  • gsutil ls -L -b gs://my-bucket-name
    gives additional detail about a specific bucket
  • gsutil ls gs://my-bucket-name
    lists the files in the specified bucket
  • gsutil cp reallygreatfile gs://my-bucket-name
    copies a file from local system to the bucket

Copies can be done from one bucket to another as well.

Debugging

gsutil takes the -d option which gives some HTTP headers, the -D option which gives more headers, and when both those aren't enough you can edit your ~/.boto config file, uncomment the line "#is_secure False" and then tcpdump to capture the HTTP packets as the command runs.