You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Analytics/Web publication: Difference between revisions
imported>Wargo m (Undo revision 1853780 by Dick balls cock (talk)) |
imported>Neil P. Quinn-WMF (Add SWAP-specific instructions) |
||
Line 1: | Line 1: | ||
This page describes how to make '''safe, non-identifying''' datasets, notebooks, or other research products public on the web in the [https://analytics.wikimedia.org/published analytics.wikimedia.org/published] directory. For guidelines on how to formally release an open dataset (with metadata and persistent identifiers), please refer to [[Data releases]]. For regular, structured, and maintained datasets, please see [[Analytics#Datasets]]. | This page describes how to make '''safe, non-identifying''' datasets, notebooks, or other research products public on the web in the [https://analytics.wikimedia.org/published analytics.wikimedia.org/published] directory. For guidelines on how to formally release an open dataset (with metadata and persistent identifiers), please refer to [[Data releases]]. For regular, structured, and maintained datasets, please see [[Analytics#Datasets]]. | ||
If you're looking for data here, some of it may not be maintained or documented. If possible, please reach out to the authors of the data for help, or to [[Analytics/Team]]. If you're publishing data here, there are some guidelines in [https://analytics.wikimedia.org/datasets/README the README on the server]. | |||
== Instructions == | |||
# Double-check that the dataset or notebook you want to publish is '''safe and non-identifying'''. | |||
# Decide where you want to publish it. There are separate folders for notebooks and datasets; within those, you should browse the existing subfolders and decide where your code fits. For example, if you have <code>my-data-2020-01.tsv</code>, you may want to publish it as <code>datasets/one-off/my-data/my-data-2020-01.tsv</code>. Please try to use names that the complete strangers viewing the website will understand! | |||
# Make sure it's on one of the [[Analytics/Systems/Clients|Analytics clients]]. | |||
# Copy it to the corresponding location within the <code>/srv/published/</code> folder on that machine. Create the intermediate folders if necessary. If you're using [[SWAP]], for security reasons you will not be able to access this file from the terminal in your browser. You'll need to SSH directly into the notebook host and move the file using the command line. | |||
Once you do this, it will be automatically synced to the website by a script that runs automatically every 15 minutes. If you want to run the sync immediately, you can do it manually with the <code>published-sync</code> command. | |||
Revision as of 04:43, 11 February 2020
This page describes how to make safe, non-identifying datasets, notebooks, or other research products public on the web in the analytics.wikimedia.org/published directory. For guidelines on how to formally release an open dataset (with metadata and persistent identifiers), please refer to Data releases. For regular, structured, and maintained datasets, please see Analytics#Datasets.
If you're looking for data here, some of it may not be maintained or documented. If possible, please reach out to the authors of the data for help, or to Analytics/Team. If you're publishing data here, there are some guidelines in the README on the server.
Instructions
- Double-check that the dataset or notebook you want to publish is safe and non-identifying.
- Decide where you want to publish it. There are separate folders for notebooks and datasets; within those, you should browse the existing subfolders and decide where your code fits. For example, if you have
my-data-2020-01.tsv
, you may want to publish it asdatasets/one-off/my-data/my-data-2020-01.tsv
. Please try to use names that the complete strangers viewing the website will understand! - Make sure it's on one of the Analytics clients.
- Copy it to the corresponding location within the
/srv/published/
folder on that machine. Create the intermediate folders if necessary. If you're using SWAP, for security reasons you will not be able to access this file from the terminal in your browser. You'll need to SSH directly into the notebook host and move the file using the command line.
Once you do this, it will be automatically synced to the website by a script that runs automatically every 15 minutes. If you want to run the sync immediately, you can do it manually with the published-sync
command.