You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Systems/Cluster/Geotagging

From Wikitech-static
< Analytics‎ | Systems‎ | Cluster
Revision as of 13:44, 7 April 2017 by imported>Milimetric (Milimetric moved page Analytics/Cluster/Geotagging to Analytics/Systems/Cluster/Geotagging: Reorganizing documentation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Geotagging functions in Hadoop are provided by jars available at hdfs:///wmf/refinery/current/artifacts

Libraries

refinery-core.jar

org.wikimedia.analytics.refinery.core.Geocode exposes two functions

Function Name Data Returned
getCountryCode(String ip) country code
getGeocodedData(String IP) <map> containing geocoding information:
  • continent
  • country_code
  • country
  • subdivision
  • city
  • postal_code
  • latitude
  • longitude
  • timezone

refinery-hive.jar

This library provides wraper functions usable as a hive UDF

Hive UDF Wrapped Function
org.wikimedia.analytics.refinery.hive.GetCountryISOCodeUDF org.wikimedia.analytics.refinery.core.Geocode.getCountryCode
org.wikimedia.analytics.refinery.hive.GetGeoDataUDF org.wikimedia.analytics.refinery.core.Geocode.getGeocodedData

Updates

These functions use a regularly updated (every week) version of the MaxMind database that is downloaded on every node of the cluster in the folder /usr/share/GeoIP.