You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
These are some notes about JVM tuning, in particular memory and garbage collection settings for various services and clusters.
Overview of JVMs and GCs
Services relying on JVMs:
- Cassandra (RESTBase, Maps, Analytics query service)
- Hadoop (analytics)
- Kafka (analytics)
- Kafka (production: EventBus)
- Wikidata query service served from a labs project
As of this writing:
- OpenJDK 8 is in use on restbase*, maps*, aqs*, cerium, praseodymium and xenon. The last three are Cassandra test hosts.
- OpenJDK 7 is in use on analytics*, cobalt (gerrit), conf* (?), contint1001, druid*, elastic*, kafka*, logstash*, meitnerium (archiva), notebook*, relforge*, stat*, and wdqs* hosts.
- OpenJDK 6 is no longer in use anywhere, thank goodness.
Garbage collectors in use for the various JVMs include CMS (Concurrent Mark Sweep), G1 (Garbage First), ParallelGC/ParallelOldGC (default for OpenJDK 7 and 8).
(a few) Options that apply to all JVMs
Heap size settings
Guessing heap size is one of the first tuning exercises. That's going to be specific to your application so no numbers will be given here. Whatever value you use, you likely want to use the same for both the initial and maximum values, so that your JVM does not spend time reallocating memory. Set options
-Xmx for the initial and maximum heap sizes, respectively. You want compressed ordinary object pointers (saves space) and this is the default for heap sizes under 32G so don't specify more than that. For determining the exact safe spot, see the docs.
Garbage collection logging
Garbage collection is the bane of JVMs everywhere, especially when low latency is expected. "Full" garbage collection entails freezing all application threads for the duration; concurrent garbage collection generally has "Mark" phases which also require a total freeze, though for less time. If you have latency issues and suspect garbage collection, you should enable GC logging. Some useful options:
- -XX:+PrintGCDetails -- print out a detailed message after each garbage collection
- -XX:+PrintGCDateStamps -- print the date and time that each garbage collection starts
- -XX:+PrintHeapAtGC -- (which java versions is this good for?) show start and top of various parts of heap (CMS/ParallelGC only?), see next section for info
- -XX:+PrintGCApplicationStoppedTime -- print time application spent inside safepoints (includes stops due to GC)
- -XX:+PrintTenuringDistribution -- (is this good for all GCs?) print byte countes of objects in age groups of the survivor spaces for YoungGen (see next section for info)
- -XX:+PrintPromotionFailure -- show information about failures to move objects from pools for younger objects to pools for older ones
- -XX:PrintFLSStatistics=1 -- print free list stats after garbage collection (good for CMS GC, any others?)
Log rotation options (should be self-explanatory):
Note that log rotation works by appending ".0" to the end of the current log file while all other log files are renumbered accordingly. Note also bug: T148655
Specific options for different garbage collectors
Eden/YoungGen/OldGen/PermGen/MetaSpace, the soundbyte version:
The way Java garbage collectors work is based on two assumptions:
- most objects are short-lived
- objects that have stuck around for awhile are likely to stick around even longer
So objects at creation time are plopped into an area for new objects (typically called "Eden"), then moved to a "survivor" area after the first round of GC. Those two areas comprise the "YoungGen" area. Ojects are moved from the survivor area to 'OldGen' after a later round, and for openJDK 7, moved to PermGen after that. Candidates for PermGen include classes and methods. Note that OpenJDK 8 does not have PermGen; it has Metaspace (not part of the heap, it's native memory!) for class metadata.
The process of moving objects from a 'younger objects' area to an 'older objects' one is called 'promotion'. See -XX:+PrintPromotionFailure option in the section above.
- -XX:PermSize - Determine initial size of heap for objects kept permanently, for CMS/G1 for openJDK 7. (Does CMS have PermGen in openJDK 8?)
- -XX:MaxPermSize - Set max size of heap for objects kept permanently, for CMS/G1 for openJDK 7. (Does CMS have PermGen in openJDK 8?)
- -XX:MaxGCPauseMillis - "The VM will adjust the java heap size and other GC-related parameters in an attempt to keep GC-induced pauses shorter than nnn milliseconds." (source: Oracle docs) This is a best effort setting, no guarantees are made. (How does that interact with other settings?) (Parallel, CMS, G1)
- -XX:InitiatingHeapOccupancyPercent - how full the (entire) heap must be before a concurrent-type (not full!) garbage collection is kicked off (G1)
- -XX:CMSInitiatingOccupancyFraction - what percentage of the old generation heap space must be used before the first concurrent GC is started (CMS)
- -XX+UseCMSInitiatingOccupancyOnly - use the CMSInitiatingOccupancyFraction value to decide when to start all CMS concurrent GC runs, not just the first one
- -XX:+CMSClassUnloadingEnabled - Do GC on the PermGen (classes/methods) space. This is expensive (app threads freeze). (CMS)
- -XX:G1HeapRegionSize - in G1, the heap is composed of many regions of small size, any of which may be marked for use by younger or older objects. This settings sets the size of those regions.
There are many other settings but start with those as they are the basics.