You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Data Engineering/Systems/Cluster/Geotagging
< Data Engineering | Systems | Cluster
Jump to navigation
Jump to search
Geotagging functions in Hadoop are provided by jars available at hdfs:///wmf/refinery/current/artifacts
Libraries
refinery-core.jar
org.wikimedia.analytics.refinery.core.Geocode
exposes two functions
Function Name | Data Returned |
---|---|
getCountryCode(String ip)
|
country code |
getGeocodedData(String IP)
|
<map> containing geocoding information:
|
refinery-hive.jar
This library provides wraper functions usable as a hive UDF
Hive UDF | Wrapped Function |
---|---|
org.wikimedia.analytics.refinery.hive.GetCountryISOCodeUDF
|
org.wikimedia.analytics.refinery.core.Geocode.getCountryCode
|
org.wikimedia.analytics.refinery.hive.GetGeoDataUDF
|
org.wikimedia.analytics.refinery.core.Geocode.getGeocodedData
|
Updates
These functions use a regularly updated (every week) version of the MaxMind database that is downloaded on every node of the cluster in the folder /usr/share/GeoIP
.