Category : Data platform
Technical documentation for users of the Data Platform , which is maintained primarily by Data Platform Engineering . Documentation that is mostly for administrators and maintainers of the infrastructure, pipelines, components, etc. is also categorized under Category:Data_platform_systems .
Pages in category "Data platform"
The following 200 pages are in this category, out of 227 total.
(previous page) ( next page )A
- Analytics/Archive/EventLogging
- Analytics/Archive/EventLogging/Administration
- Analytics/Archive/EventLogging/Architecture
- Analytics/Archive/EventLogging/Backfilling
- Analytics/Archive/EventLogging/Data representations
- Analytics/Archive/EventLogging/EventCapsule
- Analytics/Archive/EventLogging/Monitoring
- Analytics/Archive/EventLogging/NotErrorLogging
- Analytics/Archive/EventLogging/Outages
- Analytics/Archive/EventLogging/Performance
- Analytics/Archive/EventLogging/Publishing
- Analytics/Archive/EventLogging/Sanitization vs Aggregation
- Analytics/Archive/EventLogging/Schema Guidelines
- Analytics/Archive/EventLogging/Sensitive Fields
- Analytics/Archive/EventLogging/TestingOnBetaCluster
- Analytics/Archive/EventLogging/User agent sanitization
D
- Data Platform
- Data Platform Engineering/Ops week
- Data Platform/Analyze data
- Data Platform/AQS
- Data Platform/AQS/Media metrics
- Data Platform/AQS/Mediarequests/Limitations
- Data Platform/AQS/Pageviews/Pageviews per project
- Data Platform/AQS/Wikistats 2/Data Quality/VettingPerProject
- Data Platform/AQS/Wikistats 2/DataQuality/Vetting of mediarequest metrics
- Data Platform/AQS/Wikistats 2/DataQuality/VettingPerProjectFamilies
- Data Platform/Dashboard tutorial
- Data Platform/Data access
- Data Platform/Data access guidelines
- Data Platform/Data Incident management
- Data Platform/Data Lake
- Data Platform/Data Lake/Content
- Obsolete:Data Platform/Data Lake/Content/Mediawiki wikitext current
- Obsolete:Data Platform/Data Lake/Content/Mediawiki wikitext history
- Data Platform/Data Lake/Content/Wikidata entity
- Data Platform/Data Lake/Content/Wikidata item page link
- Data Platform/Data Lake/Data Issues/2021-02-09 Unique Devices By Family Overcount
- Data Platform/Data Lake/Data Issues/2021-06-04 Traffic Data Loss
- Data Platform/Data Lake/Data Issues/2023-01-08 Webrequest Data Loss
- Data Platform/Data Lake/Data Issues/2023-11 eventgate-analytics-external Data Loss
- Data Platform/Data Lake/Edits
- Data Platform/Data Lake/Edits/Edit hourly
- Data Platform/Data Lake/Edits/Geoeditors
- Data Platform/Data Lake/Edits/Geoeditors/Public
- Data Platform/Data Lake/Edits/MediaWiki history
- Data Platform/Data Lake/Edits/MediaWiki history dumps
- Data Platform/Data Lake/Edits/MediaWiki history dumps/FAQ
- Data Platform/Data Lake/Edits/Mediawiki history dumps/Python Dask examples
- Data Platform/Data Lake/Edits/Mediawiki history dumps/Python Pandas examples
- Data Platform/Data Lake/Edits/MediaWiki history dumps/Python spark examples
- Data Platform/Data Lake/Edits/MediaWiki history dumps/Scala spark examples
- Data Platform/Data Lake/Edits/Mediawiki history reduced
- Data Platform/Data Lake/Edits/MediaWiki history/Revision identity reverts
- Data Platform/Data Lake/Edits/Mediawiki page history
- Data Platform/Data Lake/Edits/Mediawiki project namespace map
- Data Platform/Data Lake/Edits/Mediawiki user history
- Data Platform/Data Lake/Edits/Metrics
- Data Platform/Data Lake/Edits/Public
- Data Platform/Data Lake/Edits/Structured data/Commons entity
- Data Platform/Data Lake/Events
- Data Platform/Data Lake/Project History
- Data Platform/Data Lake/Public Data Lake
- Data Platform/Data Lake/Traffic
- Data Platform/Data Lake/Traffic/Banner activity
- Data Platform/Data Lake/Traffic/BotDetection
- Data Platform/Data Lake/Traffic/Browser general
- Data Platform/Data Lake/Traffic/Caching
- Data Platform/Data Lake/Traffic/Interlanguage
- Data Platform/Data Lake/Traffic/Mediacounts
- Data Platform/Data Lake/Traffic/mediawiki api request
- Data Platform/Data Lake/Traffic/mobile apps session metrics
- Data Platform/Data Lake/Traffic/mobile apps uniques
- Data Platform/Data Lake/Traffic/Pagecounts-ez
- Data Platform/Data Lake/Traffic/Pageview actor
- Data Platform/Data Lake/Traffic/Pageview hourly
- Data Platform/Data Lake/Traffic/Pageview hourly/Fingerprinting Over Time
- Data Platform/Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis
- Data Platform/Data Lake/Traffic/Pageview hourly/K Anonymity Threshold Analysis
- Data Platform/Data Lake/Traffic/Pageview hourly/Sanitization
- Data Platform/Data Lake/Traffic/Pageview hourly/Sanitization algorithm proposal
- Data Platform/Data Lake/Traffic/Pageviews
- Data Platform/Data Lake/Traffic/Pageviews/Bots
- Data Platform/Data Lake/Traffic/Pageviews/Bots Research
- Data Platform/Data Lake/Traffic/Pageviews/Redirects
- Data Platform/Data Lake/Traffic/Projectview hourly
- Data Platform/Data Lake/Traffic/ReaderCounts
- Data Platform/Data Lake/Traffic/referrer daily
- Data Platform/Data Lake/Traffic/referrer daily/Dashboard
- Data Platform/Data Lake/Traffic/SessionLength
- Data Platform/Data Lake/Traffic/Unique Devices
- Data Platform/Data Lake/Traffic/Unique Devices/Automated traffic correction
- Data Platform/Data Lake/Traffic/Unique Devices/Last access solution
- Data Platform/Data Lake/Traffic/Unique Devices/Last access solution/Validation
- Data Platform/Data Lake/Traffic/UserRetention
- Data Platform/Data Lake/Traffic/Virtualpageview hourly
- Data Platform/Data Lake/Traffic/Webrequest
- Data Platform/Data Lake/Traffic/Webrequest/RawIPUsage
- Data Platform/Data Lake/Traffic/Webrequest/Tagging
- Data Platform/Data lifecycle management
- Data Platform/Data modeling guidelines
- Data Platform/Data quality/Entrophy alarms
- Data Platform/Data quality/User agent entropy
- Data Platform/Dataset archiving and deletion
- Data Platform/Dataset creation
- Data Platform/Discover data
- Data Platform/Evaluations
- Data Platform/Evaluations/2021 data catalog selection
- Data Platform/Evaluations/2021 data catalog selection/Rubric
- Data Platform/Evaluations/2021 data catalog selection/Rubric/Amundsen
- Data Platform/Evaluations/2021 data catalog selection/Rubric/Atlas
- Data Platform/Evaluations/2021 data catalog selection/Rubric/DataHub
- Data Platform/Evaluations/2021 data catalog selection/Rubric/OpenMetadata
- Data Platform/Evaluations/Data Format Experiments
- Data Platform/Evaluations/Dumps
- Data Platform/Evaluations/Event Platform/EventStreams
- Data Platform/Evaluations/Event Platform/Stream Processing/Framework Evaluation
- Data Platform/Evaluations/SQL Engine on Cloud
- Data Platform/Evaluations/Workflow management tools study
- Data Platform/Event Sanitization
- Data Platform/Fundraising
- Data Platform/Geoeditors
- Data Platform/Internal API requests
- Data Platform/Mysql/Utility Datasets
- Data Platform/Sessions
- Data Platform/Systems
- Data Platform/Systems/Airflow
- Data Platform/Systems/Airflow/Developer guide
- Data Platform/Systems/Airflow/Developer guide/Python Job Repos
- Data Platform/Systems/Airflow/Instances
- Data Platform/Systems/Airflow/Upgrading
- Data Platform/Systems/Analytics Meta
- Data Platform/Systems/analytics.wikimedia.org
- Data Platform/Systems/AQS
- Data Platform/Systems/AQS/Scaling
- Data Platform/Systems/AQS/Scaling/2016/Hardware Refresh
- Data Platform/Systems/AQS/Scaling/2017/Cluster Expansion
- Data Platform/Systems/AQS/Scaling/2020/Cluster Expansion
- Data Platform/Systems/AQS/Scaling/LoadTesting
- Data Platform/Systems/Archiva
- Data Platform/Systems/Bigtop Packages
- Data Platform/Systems/Ceph
- Data Platform/Systems/Cluster
- Data Platform/Systems/Cluster/Geotagging
- Data Platform/Systems/Cluster/Hadoop/Load
- Data Platform/Systems/Cluster/Spark History
- Data Platform/Systems/Conda
- Data Platform/Systems/Coordinator
- Data Platform/Systems/Dashiki
- Data Platform/Systems/Data deletion and sanitization
- Data Platform/Systems/Data Quality
- Data Platform/Systems/DataHub
- Data Platform/Systems/DataHub/Administration
- Data Platform/Systems/DataHub/Data Catalog Documentation Guide
- Data Platform/Systems/DataHub/Upgrading
- Data Platform/Systems/DB Replica
- Data Platform/Systems/Dealing with data loss alarms
- Data Platform/Systems/Druid
- Data Platform/Systems/Druid/Alerts
- Data Platform/Systems/Druid/Load test
- Data Platform/Systems/Edit data loading
- Data Platform/Systems/Edit history administration
- Data Platform/Systems/Edit serving layer
- Data Platform/Systems/Event Data retention
- Data Platform/Systems/Event Data retention/AppInstallId
- Data Platform/Systems/Exporting from HDFS to Swift
- Data Platform/Systems/Geolocation
- Data Platform/Systems/Gobblin
- Data Platform/Systems/Hadoop
- Data Platform/Systems/Hadoop Event Ingestion Lifecycle
- Data Platform/Systems/Hadoop/Administration
- Data Platform/Systems/Hadoop/Alerts
- Data Platform/Systems/Hadoop/Test
- Data Platform/Systems/Hive
- Data Platform/Systems/Hive to Druid Ingestion Pipeline
- Data Platform/Systems/Hive/Alerts
- Data Platform/Systems/Hive/Avro
- Data Platform/Systems/Hive/Compression
- Data Platform/Systems/Hive/Counting uniques
- Data Platform/Systems/Hive/Queries
- Data Platform/Systems/Hive/Queries/Wikidata
- Data Platform/Systems/Hive/Querying using UDFs
- Data Platform/Systems/Iceberg
- Data Platform/Systems/Iceberg/Migration Dependencies
- Data Platform/Systems/Java
- Data Platform/Systems/Jupyter
- Data Platform/Systems/Jupyter/Administration
- Data Platform/Systems/Kerberos
- Data Platform/Systems/Kerberos/Administration
- Data Platform/Systems/Maintenance Schedule
- Data Platform/Systems/Managing systemd timers
- Data Platform/Systems/Manual maintenance
- Data Platform/Systems/Matomo
- Data Platform/Systems/Mediawiki history reduced algorithm
- Data Platform/Systems/Mediawiki History Snapshot Check
- Data Platform/Systems/MediaWiki replicas
- Data Platform/Systems/Page and user history reconstruction
- Data Platform/Systems/Page and user history reconstruction algorithm
- Data Platform/Systems/Presto
- Data Platform/Systems/Presto/Administration