You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/EventLogging/Publishing

From Wikitech-static
< Analytics‎ | EventLogging
Revision as of 14:30, 21 September 2015 by imported>Halfak (Removed reports. No consensus for that.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

WMF's EventLogging database is private, because it may hold sensitive information during a certain time window. To access it, one must be an employee of the Wikimedia Foundation or have signed an NDA. Hence, any data sets based on EventLogging data are potentially harmful and need to be subject of review before they can be published.

Publishing data sets

We consider a data set: a collection of (whole or partial) records extracted from the database for the purpose of enabling future analyses.

The preferred option is NOT to release any such data sets publicly. If you'd like to open an exception, please contact the Legal team AND also the Community Advocacy team to review your data set, and ensure that it contains no sensitive data. If you have other questions, please ask the Analytics team or the Research team.

Which data will be vetted?

  • PII (Personally identifiable information), like clientIp, userAgent, userName, userId, editCount, and in general, any piece of information that can uniquely identify a physical or virtual person.
  • User-inputed textual fields, like pageTitle, imageTitle, summary, userName, userText, etc. Schemas containing this kind of data are marked as such in the schema talk page.