You are browsing a read-only backup copy of Wikitech. The live site can be found at

Puppet coding: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Andrea Denisse
mNo edit summary
Line 13: Line 13:

{{warn|content=The instructions below refer to <tt>rbenv</tt> and <tt>.ruby-version</tt>. The latter file is gone (it was outdated for a long time) and the former may have better replacements. This documentation should updated accordingly. Until then, if your system has the same Ruby version as the puppet masters, things will likely be okay.}}
{{warn|content=The instructions below refer to <tt>rbenv</tt> and <tt>.ruby-version</tt>. The latter file is gone (it was outdated for a long time) and the former may have better replacements. This documentation should be updated accordingly. Until then, if your system has the same Ruby version as the puppet masters, things will likely be okay.}}

You need to make sure that you have a ruby version installed that matches the version in production (see the <tt>.ruby-version</tt> file in the puppet repo).
You need to make sure that you have a ruby version installed that matches the version in production (see the <tt>.ruby-version</tt> file in the puppet repo).

Revision as of 22:35, 7 June 2022

This page is about writing puppet code: how to write it, when to write it, where to put it. For information about how to install or manage the Puppet software itself, visit Puppet.

Set up local environment

It is possible to run some tests locally. To do that, you will need an environment setup.

Getting the source

Source code: operations/puppet

$ git clone --recursive


You need to make sure that you have a ruby version installed that matches the version in production (see the .ruby-version file in the puppet repo). If your system version of ruby differs from that, you may use rbenv to build the required version for you.

Via macports:
# port install rbenv ruby-build
# port search tox
# port install py<ver>-tox
Via Homebrew:
# brew install tox rbenv
If using the ruby system package:
$ sudo apt install ruby bundler tox
If not using the ruby system package:
$ sudo apt install rbenv ruby-build tox

Application environment

  1. Either run rbenv init to hook rbenv to your shell or follow instructions from here
  2. Go to your local puppet repo to have rbenv install the appropriate ruby version
    $ rbenv install
  3. List versions available (including your system version):
    $ rbenv versions
  4. Install bundler (skip this if using the bundler system package):
    $ rbenv exec gem install bundler
  5. Install dependencies:
    rbenv exec bundle install
  6. If using ruby/bundler system package:
    $ bundle install --path vendor/bundle

Test that it is ok, the following will show you a list of tasks:

 $ rbenv exec bundle exec rake --tasks

Done, your env is ready!

When we use Puppet

Puppet is our configuration management system. Anything related to the configuration files & state of a server should be puppetized. There are a few cases where configurations are deployed into systems without involving Puppet but these are the exceptions rather than the rule (MediaWiki configuration etc.); all package installs and configurations should happen via Puppet in order to ensure peer review and reproducibility.

However, Puppet is not being used as a deployment system at Wikimedia. Pushing code via Puppet, e.g. with the define git::clone should be avoided. Depending on the case, Debian packages or the use of our deployment system (Scap3) should be employed. Deploying software via Scap3 can be achieved in Puppet by using the puppet class scap::target.

Wikimedia Cloud VPS

Cloud VPS users have considerable freedom in the configuration of their systems, and the state of machines is frequently not puppetized. Specific projects (e.g. Toolforge) often have their own systems for maintaining code and configuration.

That said, any system being used for semi-production purposes (e.g. public test sites, bot hosts, etc.) should be fully puppetized. VPS users should always be ready for their instance to vanish at a moment's notice, and have a plan to reproduce the same functionality on a new instance -- generally this is accomplished using Puppet.

The node definitions for a VPS instances are not stored in manifests/site.pp -- they are configured via the OpenStack Horizon user interface and stored in a backing persistent data store. In order to test and develop puppet manifests for Cloud VPS it might be helpful to setup a Standalone puppetmaster.

We maintain certain instance standards that must be preserved, such as: LDAP, DNS, security settings, administrative accounts, etc. Removing or overriding these settings means an instance is no longer manageable as part of the Cloud VPS environment. These instances may be removed or turned off out of necessity. This is part of the instance lifecycle.


As of December 2016, we decided[1] to adopt our own variation of the role/profile pattern that is pretty common in puppet coding nowadays. Please note that existing code might not respect this convention, but any new code should definitely follow this model.

The code should be organized in modules, profiles and roles, where

  1. Modules should be basic units of functionality (e.g. "set up, configure and run HHVM")
  2. Profiles are collection of resources from modules that represent a high-level functionality  ("a webserver able to serve mediawiki"),
  3. Roles represent a collective function of one class of servers (e.g. "A mediawiki appserver for the API cluster")
  4. Any node declaration must only include one role, invoked with the role function. No exceptions to this rule. If you need to include two roles in a node, that means that's another role including the two.

Let's see more in detail what rules apply to each of this logical divisions.


Modules should represent basic units of functionality and should be mostly general-purpose and reusable across very different environments. Rules regarding organizing code in modules are simple enough:

  1. Any class, define or resource in a module must not  use classes from other modules, and avoid, wherever possible, to use defines from other modules as well.
  2. No hiera call, explicit or implicit, should happen within a module.

These rules will ensure the amount of WMF-specific code that makes it into modules is minimal, and improve debugging/refactoring/testing of modules as they don't really depend on each other. Keeping up with the HHVM module example, the base class hhvm is a good example of what should be in a module, while hhvm::admin is a good example of what should not be in a module; surely not in this one: it is a class to configure apache to forward requests to HHVM, depends mostly on another module (apache) and also adds ferm rules, which of course requires the WMF-specific network::constants class.


Profiles are the classes where resources from modules are collected together, organized and configured. There are several rules on how to write a profile, specifically:

  1. Profile classes should only have parameters that are set via explicit lookup calls. You may optionally provide a default by using the {default_value} construct.
    • $web_workers = lookup('profile::ores::web::workers') is good.
    • $web_workers = lookup('profile::ores::web::workers', {'default_value' => 48}) is also good.
  2. use of the legacy hiera functions (hiera, hiera_hash & hiera_include) are now deprecated and CI will throw an error
  3. No hiera call should be made outside of said parameters.
  4. No resource should be added to a profile using the include class method, but with explicit class instantiations. Only very specific exceptions are allowed, like global classes like the network::constants class.
  5. If a profile needs another one as a precondition, it must be listed with a require ::profile::foo at the start of the class, but it's preferred to have roles compose profiles that don't need to depend on each other. See below for more context.

Most of what we used to call "roles" at the WMF are in fact profiles. Following our example, an apache webserver that proxies to a local HHVM installation should be configured via a profile::hhvm::webproxy class; a mediawiki installation served through such a webserver should be configured via a profile::mediawiki::web class.


Roles are the abstraction describing a class of servers, and:

  1. Roles must only include profiles via the include keyword, plus a system::role definition describing the role
  2. A role can include more than one profile, but no conditionals, hiera calls, etc are allowed.
  3. Inheritance can be used between roles, but it is strongly discouraged: for instance, it should be remembered that inheritance will not work in hiera.
  4. All roles should include the standard profile

Following our example, we should have a role::mediawiki::web that just includes profile::mediawiki::web and profile::hhvm::webproxy.


Hiera is a powerful tool to decouple data from code in puppet. However, as we saw while transitioning to its use, it is not without dangers and can easily become a tangled mess. To make it easier to understand and debug, the following rules apply when using it:

  1. No class parameter autolookup is allowed, ever. This means you should explicitly declare any variable as a parameter of the profile classes, and passed along explicitly to the classes/defines within the profile.
  2. Hiera calls can only be defined in profiles, as default values of class parameters.
  3. All hiera definitions for such parameters should be defined in the role hierarchy. Only exceptions can be shared data structures that can be used by many profiles or to feed different modules with their data. Those should go in the common/$site global hierarchy. A good example is, in our codebase, ganglia_clusters, or any list of servers (the memcached hosts for mediawiki, for example).
  4. Per-host hiera should only be used to allow tweaking some knobs for testing or to maybe declare canaries in a cluster. It should not be used to add/subtract functionality. If you need to do that, add a new role and configure it accordingly, within its own hiera namespace.
  5. Hiera keys must reflect the most specific common shared namespace of the puppet classes trying to look them up. This should allow easier grepping and avoid conflicts. Global variables should be avoided as much as possible. This means that a parameter specific to a profile will have its namespace (say profile::hhvm::webproxy::fcgi_settings), while things shared among all profiles for a technology should be at the smallest common level (say the common settings for any hhvm install, profile::hhvm::common_settings), and finally global variables should have no namespace (the ganglia_clusters example above) and use only snake case and no semicolons.

Nodes in site.pp

Our traditional node definitions included a lot of global variables and boilerplate that needed to be added. Nowadays, your node definitions should just include a simple one-liner:

node 'redis2001.codfw.wmnet' {

this will include the class role::db::redis and look for role-related hiera configs in hieradata/role/codfw/db/redis.yaml and then in hieradata/role/common/db/redis.yaml, which may for example be:

cluster: redis
  - redis-admins

A working example: deployment host

Say we want to set up a deployment host role with the following things:

  • A scap 3 master, which includes the scap configuration and a simple http server
  • A docker build and deploy environment, which will need the kubernetes client cli tool

So we will want to have a simple module that installs and does the basic scap setup:

# Class docs, see below for the format
class scap3 (
    $config = {}
) {
    require_package('python-scap3', 'git')
    scap3::config { 'main':
        config => $config,

as you can see, the only resource referenced is coming from the same module. In order for a master to work, we also need apache set up and firewall rules to be created. This is going to be a profile: we are defining one specific unit of functionality, the scap master.

# Class docs, as usual
class profile::scap3::master (
    $scap_config = lookup('profile::scap3::base_config'), # This might be shared with other profiles
    $server_name = lookup('profile::scap3::master::server_name'),
    $max_post_size = lookup('profile::scap3::master::max_post_size'),
    $mediawiki_proxies = lookup('scap_mediawiki_proxies'), # This is a global list
) {

    class { 'scap3':
        config => merge({ 'server_name' => $server_name}, $scap_config),
    # Set up apache
    apache::conf { 'Max_post_size':
        content => "LimitRequestBody ${max_post_size}",
    apache::site { 'scap_master':
        content => template('profile/scap/scap_master.conf.erb'),
    # Firewalling
    ferm::service { 'scap_http':
        proto => 'tcp',
        port  => $scap_config['http_port'],
    # Monitoring
    monitoring::service { 'scap_http':
        command => "check_http!${server_name}!/"

For the docker building environment, you will probably want to set up a specific profile, and then set one up for the docker deployment environment. The latter actually depends on the former in order to work. Assuming we already have a docker module that helps set install docker and set it up on a server, and that we have a kubernetes module that has a class for installing the cli tools, we can first create a profile for the build environment:

class profile::docker::builder(
    $proxy_address = lookup('profile::docker::builder::proxy_address'),
    $proxy_port = lookup('profile::docker::builder::proxy_port'),
    $registry = lookup('docker_registry'), # this is a global variable again
    # Let's suppose we have a simple class setting up docker, with no params
    # to override in this case
    class { 'docker':

    # We will need docker baseimages
    class { 'docker::baseimages':
        docker_registry => $registry,
        proxy_address   => $proxy_address,
        proxy_port      => $proxy_port,
        distributions   => ['jessie'],

    # we will need some build scripts; they belong here
    file { '/usr/local/bin/build-in-docker':
        source => 'puppet:///modules/profile/docker/builder/',
    # Monitoring goes here, not in the docker class
    nrpe::monitor_systemd_unit_state { 'docker-engine': }

and then the deployment profile will need to add credentials for the docker registry for uploading, the kubernetes cli tools, and a deploy script. It can't work if the profile::docker::builder::profile is not included, though

class profile::docker::deployment_server (
    $registry = lookup('docker_registry')
    $registry_credentials = lookup('profile::docker::registry_credentials'),
) {
    # Require a profile needed by this one
    require ::profile::docker::builder
    # auth config
    file { '/root/.docker/config.json':
    class { 'kubernetes::cli':
    # Kubernetes-based deployment script

Then the role class for the whole deployment server will just include the relevant profiles:

class role::deployment_server {
    system::role { 'role::deployment_server':
        description => 'Deployment server for production.'
    # Standard is a profile, to all effects
    include standard
    include profile::scap3::master
    include profile::docker::deployment_server

and in site.pp

node /deployment.*/ {

Configuration will be done via hiera; most of the definitions will go into the role hierarchy, so in this case: hieradata/role/common/deployment_server.yaml

profile::scap3::master::server_name: "deployment.%{::site}.wmnet"
# This could be shared with other roles and thus defined in a common hierarchy in some cases.
    use_proxies: yes
    deployment_dir: "/srv" 
profile::scap3::master::max_post_size: "100M"
profile::docker::builder::proxy_address: 'http://webproxy.eqiad.wmnet:3128'

some other definitions are global, and they might go in the common hierarchy, so: hieradata/common.yaml

docker_registry: ''
    - mw1121.eqiad.wmnet
    - mw2212.eqiad.wmnet

while others can be shared between multiple profiles (in this case, note the 'private' prefix as this is supposed to be a secret) hieradata/private/common/profile/docker.yaml

profile::docker::registry_credentials: "some_secret"

When should a profile include another profile

There has been some confusion regarding one of the rules in the style guide, specifically the one about profiles requiring each other.

The point of underlining that profiles should mostly not include other profiles is that we don't want to create fat profiles that devoid roles of their "composition of functionality" function.

Let's make a simple example: we want to install a python application (ORES) that needs to use a redis database. One might be tempted to do

class profile::ores {
   # ores needs redis on localhost
   require profile::redis
   class {'ores': }

class role::ores {
  include profile::ores

This is what we want to prevent. The application can work just fine with a remote redis, so the profile doesn't really require the redis one. The code above should instead be something like

class profile::ores($redis_dsn = lookup(...)) {
   class {'ores': redis_dsn => $redis_dsn}

# Role that installs ores and a local redis cache
class role::ores {
  include profile::redis
  include profile::ores

The advantage is clear: you will be able to also define a different role where redis is reached on a remote machine, without having to rewrite all of your classes. Also, your role clearly states what's installed on your server.

Now, let's say our python application needs an UWSGI server to be installed, and we already have a working `profile::uwsgi`. ORES would not work unless this profile is included too.

In this case, you are not only allowed, but encouraged to mark that dependency clearly:

class profile::ores($redis_dsn = lookup(...)) {
   require profile::uwsgi
   class {'ores': redis_dsn => $redis_dsn}

# Role that installs ores and a local redis cache
class role::ores {
  include profile::redis
  include profile::ores

Please note that, if you want it to be easy to see that profile::uwsgi is included in the role, you can still add it explicitly to `role::ores`. But `profile::ores` will still need to add the `require` keyword.

WMF Design conventions

  • Always include the 'base' class for every node (note that standard includes base and should be used in most cases)
  • For every service deployed, please use a system::role definition (defined in modules/system/manifests/role.pp) to indicate what a server is running. This will be put in the MOTD. As the definition name, you should normally use the relevant puppet class. For example:
system::role { "role::cache::bits": description => "bits Varnish cache server" }
  • Files that are fully deployed by Puppet using the file type, should generally use a read-only file mode (i.e., 0444 or 0555). This makes it more obvious that this file should not be modified, as Puppet will overwrite it anyway.
  • For each service, create a nested class with the name profile::service::monitoring (e.g. profile::squid::monitoring) which sets up any required (Nagios) monitoring configuration on the monitoring server.
  • Any top-level class definitions should be documented with descriptive header, like this:
 # Mediawiki_singlnode: A one-step class for setting up a single-node MediaWiki install,
 #  running from a Git tree.
 #  Roles can insert additional lines into LocalSettings.php via the
 #  $role_requires and $role_config_lines vars.
 #  etc.

Such descriptions are especially important for role classes. Comments like these are used to generate our online puppet documentation.

Coding Style

Please read the upstream style-guide. And install puppet-lint.

Our codebase is only compatible with Puppet 4.5 and above. Use of puppet 4.x constructs like loops, new functions, and in particular parameter types are strongly encouraged. See the slides linked here for more details.

The slideset for a short presentation about the new things in recent versions of the puppet language.


Many existing manifests use two-spaces (as suggested in the style guide) instead of our 4 space indent standard; when working on existing code always follow the existing whitespace style of the file or module you are editing. Please do not mix cleanup changes with functional changes in a single patch.

Spacing, Indentation, & Whitespace

  • Must use four-space soft tabs.
  • Must not use literal tab characters.
  • Must not contain trailing white space
  • Must align fat comma arrows (=>) within blocks of attributes.


  • Must use single quotes unless interpolating variables.
  • All variables should be enclosed in in braces ({}) when being interpolated in a string. like this
  • Variables standing by themselves should not be this
  • Must not quote booleans: true is ok, but not 'true' or "true"


  • Must single quote all resource names and their attribute, except ensure. (unless they contain a variable, of course).
  • Ensure must always be the first attribute.
  • Put a trailing comma after the final resource parameter.
  • Again: Must align fat comma arrows (=>) within blocks of attributes.
  • Don't group resources of the same type (a.k.a compression) :


file { '/etc/default/exim4':
    require => Package['exim4-config'],
    owner   => 'root',
    group   => 'root',
    mode    => '0444',
    content => template('exim/exim4.default.erb'),
file { '/etc/exim4/aliases/':
    ensure  => directory,
    require => Package['exim4-config'],
    mode    => '0755',
    owner   => 'root',
    group   => 'root',

don't do

file { '/etc/default/exim4':
    require => Package['exim4-config'],
    owner   => 'root', 
    group   => 'root', 
    mode    => '0444', 
    content => template('exim/exim4.default.erb');
    ensure  => directory,
    require => Package['exim4-config'],
    mode    => '0755', 
    owner   => 'root', 
    group   => 'root', 
  • keep the resource name and the resource type on the same line. No need for extra indentation.


  • Don't use selectors inside resources:


$file_mode = $facts['os']['name'] ? {
    debian => '0007',
    redhat => '0776',
    fedora => '0007',
file { '/tmp/readme.txt':
    content => "Hello World\n",
    mode    => $file_mode,


file { '/tmp/readme.txt':
    mode => $facts['os']['name'] ? {
        debian => '0777',
        redhat => '0776',
        fedora => '0007',
  • Case statements should have default cases. like this.


All classes and resource type definitions must be in separate files in the manifests directory of their module.

  • Do not nest classes.
  • NEVER EVER use inheritance, puppet is not good at that. Also, inheritance will make your life harder when you need to use hiera - really, don't.
  • When refering to facts try prefer to use the facts hash to distinguish them from Global variables:
# good
# bad
  • when referring to global variables always refer to them with them as fully qualified variables
# good
# bad
  • Do not use dashes in class names, preferably use alphabetic names only.
  • In parameterized class and defined resource type declarations, parameters that are required should be listed before optional parameters. like this.
  • It is in general better to avoid parameters that don't have a default; that will only make your life harder as you need to define that variable for every host that includes it.


  • One include per line.
  • One class per include.
  • Include only the class you need, not the entire scope.

Useful global variables

These are useful variables you can refer to from anywhere in the Puppet manifests. Most of these get defined in realm.pp or base.pp.

The "realm" the system belongs to. As of July 2021 we have the realms production, or labs.
Contains the 5-letter site name of the server, e.g. "eqiad", "codfw", "esams", "ulsfo" or "eqsin".

Testing a patch

Please see the main testing page

Puppet modules

There are currently two high level types of modules. For most things, modules should not contain anything that is specific to the Wikimedia Foundation. Non WMF specific modules could be useable an other puppet repository at any other organization. A WMF specific module is different: it may contain configurations specific to WMF (duh), but remember that it is still a module, so it must be useable on its own as well. Users of either type of module should be able able to use the module without editing anything inside of the module. WMF specific modules will probably be higher level abstractions of services that use and depend on other modules, but they may not refer to anything inside of the top level manifests/ directory. E.g. the 'applicationserver' module abstracts usages of apache, php and pybal to set up a WMF application server.

Often it will be difficult to choose between creating role classes and creating a WMF specific module. There isn't a hard rule on this. You should use your best judgement. If role classes start to get overly complicated, you might consider creating a WMF specific module instead.

3rd party or upstream modules

There are so many great modules out there! Why spend time writing your own?!

Well, for good reasons. Puppet runs as root on the production nodes. We can't import just any 3rd party module, as we can't be sure to trust them. Not because they would do something malicious (although they might), but because they might do something stupid.

All 3rd party modules must be reviewed in the same manner that we review our own code before it goes to production.

Adding a 3rd Party module

Third party modules use to be added using git submodules however this proved to be somewhat error prone and there were complex to update. As such for the few third party modules we copy them directly into the puppet tree and update them in a single commit. As an example to update the stdlib module to 6.6.0 i would use the following process

$ cd ~/git/puppetlabs-stdlib/
$ git fetch --all
Fetching origin
Fetching upstream
remote: Enumerating objects: 1119, done.
remote: Counting objects: 100% (1119/1119), done.
remote: Compressing objects: 100% (115/115), done.
remote: Total 753 (delta 632), reused 725 (delta 619), pack-reused 0
Receiving objects: 100% (753/753), 105.71 KiB | 1.53 MiB/s, done.
Resolving deltas: 100% (632/632), completed with 338 local objects.
   3373a08c..a26d0c59  main                                       -> upstream/main
 * [new branch]        pdksync_maint/main/deprecate_rhel_5_family -> upstream/pdksync_maint/main/deprecate_rhel_5_family
 * [new branch]        pdksync_maint/main/deprecate_sles11        -> upstream/pdksync_maint/main/deprecate_sles11
 * [new branch]        pdksync_maint/main/perform_pdk_update      -> upstream/pdksync_maint/main/perform_pdk_update
 + 1152b998...b5afe245 pdksync_remove_puppet5                     -> upstream/pdksync_remove_puppet5  (forced update)
   9492b860..f11132d6  release                                    -> upstream/release
 * [new tag]           v7.0.0                                     -> v7.0.0
$ git checkout v6.6.0
M       spec/type_aliases/unixpath_spec.rb
M       types/unixpath.pp
Note: checking out 'v6.6.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 9492b860 Merge pull request #1158 from sanfrancrisko/release_prep_v6_6_0
$ cd ~/git/puppet
$ git fetch --all ; git reset --hard origin/production 
Fetching origin
remote: Counting objects: 6903, done
remote: Finding sources: 100% (20/20)
remote: Getting sizes: 100% (8/8)
remote: Total 20 (delta 9), reused 18 (delta 9)
Unpacking objects: 100% (20/20), done.
From ssh://
   0e6f06938a..e7d94c968e  production -> origin/production
HEAD is now at e7d94c968e mariadb: Productionize db2145
$ rm -rf modules/stdlib/
$ cp -r ~/git/puppetlabs-stdlib modules/stdlib
$ git add modules/stdlib/
$ git commit -m 'stdlib: update stdlib to v6.6.0'
[production 536e4aafd4] stdlib: update stdlib to v6.6.0
 2 files changed, 2 insertions(+), 1 deletion(-)
$ git-review


VIM guidelines

The following in ~/.vim/ftdetect/puppet.vim can help with a lot of formatting errors

" detect puppet filetype
autocmd BufRead,BufNewFile *.pp set filetype=puppet
autocmd BufRead,BufNewFile *.pp setlocal tabstop=4 shiftwidth=4 softtabstop=4 expandtab textwidth=80 smarttab

And for a proper syntax hightlighting the following can be done

$ sudo aptitude install vim-puppet
$ mkdir -p ~/.vim/syntax
$ cp /usr/share/vim/addons/syntax/puppet.vim ~/.vim/syntax/

And definitely have a look at the vim plugin which will report puppet errors directly in your buffer whenever you save the file (works for python/php etc as well).

Of course symlinks can be used or you can just install vim-addon-manager to manage plugins. vim-puppet provides ftplugin and indent plugin as well. Maybe there are worth the time, but it is up to each user to decide.

Emacs guidelines

Syntax Highlighting

The elpa-puppet-mode deb package can be used for emacs syntax highlighting, or the raw emacs libraries can be found here.

elpa-puppet-mode - Emacs major mode for Puppet manifests

The following two sections can be added to a .emacs file to help with 4 space indentions and trailing whitespace.

;; Puppet config with 4 spaces
(setq puppet-indent-level 4)
(setq puppet-include-indent 4)

;; Remove Trailing Whitespace on Save
(add-hook 'before-save-hook 'delete-trailing-whitespace)

Puppet for Wikimedia VPS Projects

There is currently only one Puppet repository, and it is applied both to Cloud VPS and production instances. Classes may be added to the Operations/Puppet repository that are only intended for use on a Cloud VPS instances. The future is uncertain, though: code is often reused, and Cloud VPS services are sometimes promoted to production. For that reason, any changes made to Operations/Puppet must be held to the same style and security standards as code that would be applied on a production service.

Packages that use pip, gem, etc.

Other than mediawiki and related extensions, any software installed by puppet should be a Debian package and should come either from the WMF apt repo or from an official upstream Ubuntu repo. Never use source-based packaging systems like pip or gem as these options haven't been properly evaluated and approved as secure by WMF Operations staff.

Different ways to install apt packages with puppet

There are several different ways to install concrete apt packages with puppet, each with pros and cons.

At WMF we recommend the usage of the puppet stdlib function ensure_packages when possible, which will simplify many of the typical usages (see task T266479 ):

 ensure_packages(['python3-git', 'python3-pynetbox', 'python3-requests'])

The following is a discussion of the options of the lower-level package resource, for more specialized uses.

In all cases, you need a repo configured in the host that provides this package.

method 1: ensure present

The most basic method is to declare a package like this:

package { 'htop':
    ensure => present,

Puppet will ensure the package is present (installed). If the package is already installed, puppet will do nothing. If it isn't installed, puppet will install the proper candidate (as reported by apt-cache policy <package>).


  • simple and straight forward.
  • allows to manually modify the version of the package in the system (i.e, manual upgrade of the package)
  • Good for security management (i.e, security package upgrades)


  • not very smart. If you need concrete package version, this method may not be enough.

method 2: ensure latest

Using this method, puppet will ensure that the most recent package is installed.

package { 'htop':
    ensure => latest,

Puppet not will only ensure the package is present, but that the higest version is installed (as reported by apt-cache policy <package>). In packages that are frequently updated, this means that puppet will update it arbitrarily.


  • simple and straight forward.
  • allows to keep installed most recent version of the package, without further actions by the administrator.
  • Security package upgrades will be done automatically in most cases.


  • may be too smart. Upgrading packages without control may lead to unexpected results, specially those related to services (services will likely be restarted by the package upgrade).
  • may clash with manual package upgrades by the administrator (i.e, running manual upgrade in the server)
  • crazy things may happen if you play with adding different repos to the system that contain different versions of the same package
  • since auto upgrading may have unexpected results, this should be done only for trusted packages from trusted repos, or other special cases.

method 3: ensure version

Using this method, puppet will ensure the package is installed in a concrete version.

package { 'htop':
    ensure => x.y.z,

This overrides the candidate version reported by apt-cache policy <package>.


  • simple and straight forward.
  • a way to get a specific version of a package installed


  • requires a puppet patch to manage the version of the package, if that requires changes (for example, for security management).
  • if installing packages from different repos, the resolver may find clashes with the dependencies from other packages.
  • need concrete version specification, not too smart for some cases.
  • this is not the preferred mechanism for getting a concrete version installed (see below). This may revert package security upgrades if puppet code wasn't patched accordingly.

method 4: install options

You can pass specific install options to apt by means of the package declaration:

package { 'htop':
    ensure          => <whatever>,
    install_options => ['-t', 'stretch-backports'],

You can pass additional options to the apt call using this mechanism. In the example, this is the equivalent of using the -t stretch-backports, i.e, temporally give stretch-backports maximum priority in the resolver for calculating candidates.

You can combine this option with any 'ensure' option.


  • simple way to override default resolver behaviour.
  • allows for manual package installations
  • TODO: fill me.


  • if using ensure present, this is not enough for puppet to upgrade the package if it was previously installed (for example, by debian-installer).
  • TODO: fill me.

method 5: apt pinning

You can create an apt pinning and let the apt resolver use that when dealing with the package.

apt::pin { 'my_pinning':
    package  => 'htop',
    pin      => 'component a=jessie-backports',
    priority => 1002,
    before  => Package[htop],
package { 'htop':
    ensure => present,

This methods create a file in /etc/apt/preferences.d/ with the pinning configuration file. The pinning affects how the resolver behaves for that package (check that with apt-cache policy <package>).

The pinning can be anything, from the component, release or version. You can combine this method with any 'ensure' option. Mind that if using 'present', and if the package is already installed but not meeting the pinning requirement, puppet won't reinstall it (because is already present).


  • this is the preferred way of getting a given package version installed.
  • if using version pinning, the version specification is more flexible, allowing wildcards like x.y.z*.
  • TODO: fill me.


  • may require a puppet patch to manage the version of the package (if using version pinning rather than component), if that requires changes (for example, for security management).
  • if installing packages from different repos, the resolver may find clashes with the dependencies from other packages.

method 6: combination

You can combine any of the options above. You probably don't need this method unless in a very specific situation.

apt::pin { 'my_pinning':
    package  => 'htop',
    pin      => 'version x.y.z*',
    priority => 1002,
    before  => Package[htop],
package { 'htop':
    ensure          => latest,
    install_options => ['-t', 'stretch-backports'],


  • the most expressive way to do almost anything regarding package installation.
  • if using version pinning, the version specification is more flexible, allowing wildcards like x.y.z*.
  • TODO: fill me.


  • requires a puppet patch to manage the version of the package, if that requires changes (for example, for security management).
  • if installing packages from different repos, the resolver may find clashes with the dependencies from other packages.
  • doing manual package upgrades may create a ping/pong between your manual operation and the puppet run.
  • this is not the preferred mechanism for getting a concrete version installed (see above). This may revert package security upgrades if puppet code wasn't patched accordingly.

method 7: apt configuration before installation

In some cases is interesting to create a given apt repo/pinning configuration and then package installation in the same puppet agent run. This is a complex case as such we have created a custom resource apt::package_from_component:

  apt::package_from_component { 'librenms_php72':
    component => 'component/php72',
    distro    => 'buster-wikimedia'
    packages  => ['php-cli', 'php-curl', 'php-gd']


  • TODO: fill me.


  • TODO: fill me.