You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

DNS/Netbox: Difference between revisions

From Wikitech-static
< DNS
Jump to navigation Jump to search
imported>Elukey
imported>Volans
(→‎DNS records involved: Add section on missing AAAA records)
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
IP allocation is moving to [[Netbox]], that will be our IPAM, and DNS records will be automatically generated from Netbox data.
IP allocation is done via [[Netbox]], that is our IPAM, and DNS records of IPs allocated in Netbox are automatically generated from Netbox data.


== Infrastructure ==
== Infrastructure ==
Line 7: Line 7:
* Netbox data is checked out on the authoritative DNS servers in <code>/srv/git/netbox_dns_snippets</code>.
* Netbox data is checked out on the authoritative DNS servers in <code>/srv/git/netbox_dns_snippets</code>.
* When compiling the <code>gdnsd</code> final zones, the Netbox data is copied into <code>/etc/gdnsd/zones/netbox</code> for later inclusion.
* When compiling the <code>gdnsd</code> final zones, the Netbox data is copied into <code>/etc/gdnsd/zones/netbox</code> for later inclusion.
* In the actual zonefiles, within an <code>$ORIGIN</code>, the related snippet file is included using the <code>$INCLUDE</code> directive.
* In the actual zonefiles, within an <code>$ORIGIN</code> , the related snippet file auto-generated from Netbox data is included using the <code>$INCLUDE</code> directive.


== IP Allocation ==
== IP Allocation ==


The migration to the automated system requires that we move the allocation of IPs to Netbox that will gradually become the authoritative source of truth for IPAM.
Netbox is currently the IPAM for the production infrastructure and IP allocation must be done in Netbox unless it's on some very specific subnets that are exceptions and not managed within Netbox for now. The frack hosts are exempt from this as their IPs are not tracked in Netbox.


=== Cutoff dates ===
As historical data, the management IP address allocation has been migrated to Netbox on Wed. June 24th 2020 and the primary IPs address allocation for production infrastructure devices has been migrated to Netbox on Sep. 14th 2020.


* '''[Wednesday June 24th 2020 10:00am UTC]''' '''All''' the management IP address allocation will be performed in Netbox from now on. Either via the [https://netbox.wikimedia.org/extras/scripts/interface_automation/AssignIPs/ Add interfaces and IPs to devices] Netbox script for provisioning new devices or manually via the ''Add an IP Address'' button in the ''IP Addresses'' tab of any IP Prefix that will assign the first available IP in that subnet. The [https://netbox.wikimedia.org/extras/scripts/offline_device/OfflineDevice/ Offline a device with extra actions] Netbox script takes instead care of the removal of interfaces and IPs when setting a device offline.
== DNS records involved ==
* '''[Monday September 14th 2020 11:00am UTC]''' '''All''' IP address allocation except frack (Fundraising-tech) will be performed in Netbox from now on.
The following records are managed by Netbox in all data centres (except the frack infrastructure):
*Management forward (<code>A</code>) and reverse (<code>PTR</code>) records for both the hostname (<code>foo.mgmt.eqiad.wmnet</code>) and the asset tag (<code>wmf1234.mgmt.eqiad.wmnet</code>)
* Primary IPv4 (<code>A</code> ) and IPv6 (<code>AAAA</code> ) and related reverse (<code>PTR</code> ) records for the hostname (<code>foo.eqiad.wmnet</code> or <code>foo.wikimedia.org</code> )
*Any additional IP in Netbox that has the ''DNS Name'' property set.


== DNS records involved ==
=== Missing AAAA records for primary IPv6 ===
Although we're aiming for all hosts to have <code>A/AAAA</code> DNS records for their primary IPs, some clusters do not yet fully support IPv6 and as such do not have the DNS Name set in Netbox for their primary IPv6 address. Although not ideal, that's still considered ok as long as there is a mid/long term plan to remove the technical debt and make the cluster IPv6 compatible. If you manage one of those clusters, make sure to '''always''' notify DCOps of the fact that the <code>AAAA</code> records should be skipped when provisioning new hosts to prevent their addition.


* Management forward (<code>A</code>) and reverse (<code>PTR</code>) records for both the hostname (<code>foo.mgmt.eqiad.wmnet</code>) and the asset tag (<code>wmf1234.mgmt.eqiad.wmnet</code>)
==== Mixed clusters ====
* Primary IPv4 (<code>A</code>) and IPv6 (<code>AAAA</code>) and related reverse (<code>PTR</code>) records for the hostname (<code>foo.eqiad.wmnet</code> or <code>foo.wikimedia.org</code>)
A separate, more dangerous case is when a given cluster is mixed, having some hosts with the <code>AAAA</code> records for their primary IPv6 and some without. This is an indication that the cluster was not fully supporting IPv6 in the past and some hosts now have their primary IPv6 with the <code>AAAA</code> records, potentially because DCOps was not notified to skip the <code>AAAA</code> assignment at provision time. In this case there are two possible options:


=== Active ===
# The hosts that have <code>AAAA</code> records are mis-configured, clients are trying to connect via IPv6, fail and have to retry on IPv4, and the <code>AAAA</code> records should be removed. In this case get in touch with [[SRE/Infrastructure Foundations/Contact]] for their removal.
==== Management ====
# The hosts that have <code>AAAA</code> records are functioning correctly, meaning that the cluster actually supports IPv6 and we could just add the missing <code>AAAA</code> records to the remaining hosts. In this case follow [[DNS/Netbox#Add_missing_DNS_name_to_the_primary_IPv6_address]].
* <code>ulsfo</code>
* <code>eqsin</code>
* <code>esams</code>
* <code>frack</code> in <code>codfw</code>
* <code>frack</code> in <code>eqiad</code>
* <code>codfw</code>
* <code>eqiad</code>
==== Primary IPs ====
*<code>ulsfo</code>
*<code>eqsin</code>
*<code>esams</code>
*<code>eqiad</code>
=== To be migrated ===
==== Management ====
* NONE, all migrated
==== Primary IPs ====
* <code>codfw</code>


== Operations ==
== Operations ==
Line 63: Line 50:
=== Convert an hardcoded $ORIGIN to Netbox ===
=== Convert an hardcoded $ORIGIN to Netbox ===
This is an [https://gerrit.wikimedia.org/r/c/operations/dns/+/585545 example patch] to convert an hardcoded <code>$ORIGIN</code> to the dynamically generated data.
This is an [https://gerrit.wikimedia.org/r/c/operations/dns/+/585545 example patch] to convert an hardcoded <code>$ORIGIN</code> to the dynamically generated data.
=== Check if any automatically generated zone file is not included ===
This is the way to check if any of the automatically generated zone files is not <code>$INCLUDE</code>ed in the <code>operations/dns</code> repository. From a local checkout of the automatically generated repository (see [[Netbox#DNS]]), assuming that the <code>operations/dns</code> repository is checked out in <code>../dns/</code>, run:<syntaxhighlight lang="bash">
for file in $(ls *); do git -C ../dns/ grep "INCLUDE" | grep -q "netbox/${file}" || echo "missing ${file}"; done
</syntaxhighlight>Some outstanding files are expected at the moment, namely related to SVC, wikimediacloud, and frack records.
You can use this [https://github.com/topranks/random_wmf/blob/main/netbox_scripts/gen_zones.py script] to help generate reverse zonefile includes and submit a CR to the dns repo to add them.


=== Atomically deploy auto-generated records and a manual change  ===
=== Atomically deploy auto-generated records and a manual change  ===
Line 70: Line 64:
#* CI will fail if there is any <code>$INCLUDE</code> of files not yet existing in the generated data, that's expected
#* CI will fail if there is any <code>$INCLUDE</code> of files not yet existing in the generated data, that's expected
# Modify the data in Netbox
# Modify the data in Netbox
# Run the <code>sre.dns.netbox</code> cookbook as described above in [[DNS/Netbox#Update_generated_records]] '''adding the option --skip-authdns-update'''
# Run the <code>sre.dns.netbox</code> cookbook as described above in [[DNS/Netbox#Update_generated_records]] '''adding the option <code>--skip-authdns-update</code>'''
# Comment ''recheck'' in the CR sent for the <code>operations/dns</code> repository, CI should now pass
# Comment ''recheck'' in the CR sent for the <code>operations/dns</code> repository, CI should now pass
# Merge and deploy the patch, once deployed it will include also the generated data that was pushed but not deployed by the cookbook, making the change atomical from the DNS point of view.
# Merge and deploy the patch, once deployed it will include also the generated data that was pushed but not deployed by the cookbook, making the change atomic from the DNS point of view.


The above procedure should be run all together without let too much time pass between each step and it will be wise to ask in the various SRE channels to refrain during this operation from running <code>authdns-update</code> or any of the cookbooks that in turn run the <code>sre.dns.netbox</code> one (as of Oct. 2020 <code>sre.hosts.decommission</code> and <code>sre.ganeti.makevm</code>) or the <code>sre.dns.netbox</code> cookbook itself.
The above procedure should be run all together without let too much time pass between each step and it will be wise to ask in the various SRE channels to refrain during this operation from running <code>authdns-update</code> or any of the cookbooks that in turn run the <code>sre.dns.netbox</code> one (as of Oct. 2020 <code>sre.hosts.decommission</code> and <code>sre.ganeti.makevm</code>) or the <code>sre.dns.netbox</code> cookbook itself.
Line 79: Line 73:


=== Modify the generated data in an emergency ===
=== Modify the generated data in an emergency ===
To modify the generated data in an emergency it's possible just running the <code>sre.dns.netbox</code> cookbook as described above in [[DNS/Netbox#Update_generated_records]] '''adding the option --emergency-manual-edit'''.
To modify the generated data in an emergency it's possible to just run the <code>sre.dns.netbox</code> cookbook as described above in [[DNS/Netbox#Update_generated_records]] '''adding the option <code>--emergency-manual-edit</code>'''.


After the generation of the data the cookbook will stop and prompt the user to make the modifications, something like:<syntaxhighlight lang="text">
After the generation of the data the cookbook will stop and prompt the user to make the modifications, something like:<syntaxhighlight lang="text">
Line 86: Line 80:
Then run "git log --pretty=oneline -1" and copy the new SHA1 of HEAD
Then run "git log --pretty=oneline -1" and copy the new SHA1 of HEAD
</syntaxhighlight>'''N.B.: any subsequent run of the cookbook will try to revert the manual changes, make all SREs aware of the emergency situation.'''
</syntaxhighlight>'''N.B.: any subsequent run of the cookbook will try to revert the manual changes, make all SREs aware of the emergency situation.'''
=== Add missing DNS name to the primary IPv6 address ===
Adding a missing DNS name to a primary IPv6 address is simple, but if it needs to be done on many hosts, it can be done programmatically, see below.
Before proceeding make sure that the host does fully support IPv6, all services are listening also on the IPv6 address, any ACL/grant/ferm rule is also setup for the IPv6 space. Keep in mind that most clients will default to connect via IPv6 once the AAAA records are live.
==== Single host ====
# Find the host's primary IPv6 address in Netbox, click edit and fill the DNS Name field with the FQDN of the host. To double check verify that the host primary IPv4 has the same DNS Name.
# Run the <code>sre.dns.netbox</code> cookbook as described above in [[DNS/Netbox#Update_generated_records]]
==== Multiple hosts ====
When the addition has been tested on few hosts and you're ready to convert multiple hosts:
# Get in touch with the [[SRE/Infrastructure Foundations/Contact]] so that the DNS names can be added programmatically in Netbox running something like [[phab:T271143#7953387|this]].
# Run the <code>sre.dns.netbox</code> cookbook as described above in [[DNS/Netbox#Update_generated_records]]


== Transition FAQ ==
== Transition FAQ ==
Line 93: Line 103:


* I '''never read or contribute''' to this repository:
* I '''never read or contribute''' to this repository:
** you're '''not affected''' and nothing will change for you. You can stop reading here.
** you're '''not affected''' and nothing has changed for you. You can stop reading here.
* I '''sometimes read or search''' for things in this repository:
* I '''sometimes read or search''' for things in this repository:
** you're '''marginally''' '''affected''' as the manual records will gradually disappear from the operations/dns repository to be replaced by the auto-generated files. You can search directly in [https://netbox.wikimedia.org/ Netbox]. If you want to see directly the content of the generated files you can clone the auto-generated repository to read or search in it following the instructions in [[Netbox#DNS]]. You can optionally read the rest of the document.
** you're '''marginally''' '''affected''' as the manual records were moved from the <code>operations/dns</code> repository and replaced by the auto-generated files. You can search directly in [https://netbox.wikimedia.org/ Netbox]. If you want to see the raw content of the generated files you can clone the auto-generated repository to read or search in it following the instructions in [[Netbox#DNS]]. You can optionally read the rest of the document.
* I '''contribute''' to the repository:
* I '''contribute''' to the repository:
** '''you're affected''' and should keep reading this FAQ section and the rest of the document.
** '''you're affected''' and should keep reading this FAQ section and the rest of the document.


=== What is changing ===
=== What has changed ===


* IP allocation that is currently done manually as part of the DNS record definition in the DNS repository zone files is moving to [[Netbox]], which will be our [[:en:IP_address_management|IPAM]] tool. This transition will be done all at once to ensure consistency. Only Fundraising-tech (frack) non-mgmt records will be left out of this transition.
* IP allocation is now done directly on [[Netbox]] , that is our [[:en:IP_address_management|IPAM]] tool. Only Fundraising-tech (frack) non-mgmt records were left out of this transition.
** The '''cutoff date for all remaining IP allocation except frack to be moved to Netbox is Monday September 14th around 11:00am UTC'''.
** All new host's primary IPv4/IPv6 are automatically assigned to them at provision time.
** All existing IPs except for frack ones will be automatically imported into Netbox prior to the cutoff time (a sneak peak can be found in [https://netbox-next.wikimedia.org netbox-next.wikimedia.org]).
** Additional IPs require manual allocation in Netbox [see below] unless they are for Cassandra instances, that use case is already covered by the provisioning script.
** The changes in the Server Lifecycle procedure are outlined in the [[Server_Lifecycle/DNS_Transition]] page and DCOps is up to speed with the process.
* The automatic DNS record generation (see above [[DNS/Netbox#Update_generated_records]] ) generates all of the records present in Netbox.
** After that date all IPs except frack non-mgmt ones '''must''' be allocated in Netbox '''prior''' to assigning them a DNS record in the DNS repository.
 
** All new host's primary IPv4/IPv6 will be automatically assigned to them at provision time.
=== Why some IPv6 do not have a related AAAA/PTR DNS records? ===
** Additional IPs will require manual allocation in Netbox [see below]
Some clusters do not (yet?) support IPv6 on their infrastructure <code>$hostname.$dc.$wmnet / $hostname.wikimedia.org</code> addresses. For those, when the data was imported into Netbox, to keep the existing behaviour, the <code>DNS Name</code> field of the related IPv6 address in Netbox has been left empty. That means that at generation time only the IPv4 A/PTR records are generated and no AAAA/PTR record is generated for the IPv6.
** Right after the cutoff time, all newly allocated IPs '''will still need a manual patch in the operations/dns repository''' until their zone has been migrated [see below].
* The automatic DNS record generation (see above [[DNS/Netbox#Update_generated_records]]) generates all of the records present in Netbox, but they will be included in the DNS repository and hence in production on a per-<code>$ORIGIN</code> basis, which will not be rolled out simultaneously:
** If a given <code>$ORIGIN</code> '''has been migrated''' to the automated zone file, updating Netbox and running the cookbook will change the DNS records.
** If a given <code>$ORIGIN</code> '''has not yet been migrated''' to the automated zone file, a manual change to the DNS repository which adds the record in question is still needed after the Netbox allocation.
** To check if an <code>$ORIGIN</code> has been migrated, just look for a <code>$INCLUDE netbox/zone_name</code> line right below the <code>$ORIGIN</code> line.


=== Who can I ping for questions? ===
=== Who can I ping for questions? ===
For questions, concerns or comments please get in touch with [[User:CRusnov|Cas]] or [[User:Volans|Riccardo]]. If unable to find either of us get in touch with the SRE Infrastructure Foundations team.
For questions, concerns or comments please get in touch with the  [[SRE/Infrastructure Foundations/Contact]].


=== What to do if the diff has spurious changes? ===
=== What to do if the diff has spurious changes? ===
Line 123: Line 128:


==== Physical hosts ====
==== Physical hosts ====
The management, primary IPv4 and primary IPv6 for any new physical host will be automatically assigned at provision time by DCOps running a [https://netbox.wikimedia.org/extras/scripts/interface_automation/AssignIPs/ Netbox script], see also [[Server_Lifecycle/DNS_Transition#Provisioning_2]].  
The management, primary IPv4 and primary IPv6 for any new physical host will be automatically assigned at provision time by DCOps running a [https://netbox.wikimedia.org/extras/scripts/interface_automation/ProvisionServerNetwork/ Netbox script].  


==== Virtual machines ====
==== Virtual machines ====
For Ganeti virtual machines the <code>sre.ganeti.makevm</code> cookbook has been updated to take care of the new workflow automatically. During the transition phase, if needed, it will prompt the user to create a manual DNS patch with the newly pre-allocated IPs.
For Ganeti virtual machines the <code>sre.ganeti.makevm</code> cookbook takes care of the new workflow automatically.


=== How to manually allocate a special purpose IP address in Netbox ===
=== How to manually allocate a special purpose IP address in Netbox ===
Line 134: Line 139:
# Search for the correct VLAN based on datacenter, type, row (if applicable), etc.
# Search for the correct VLAN based on datacenter, type, row (if applicable), etc.
# Click on the desired prefix (v4 or v6) in the ''Prefixes'' column for that VLAN
# Click on the desired prefix (v4 or v6) in the ''Prefixes'' column for that VLAN
#*If you are setting up a new LVS service, use the prefix for LVS service IPs:  [https://netbox.wikimedia.org/ipam/prefixes/92/ip-addresses/ codfw], [https://netbox.wikimedia.org/ipam/prefixes/93/ip-addresses/ eqiad]
# Click on the ''IP Addresses'' tab in the prefix page
# Click on the ''IP Addresses'' tab in the prefix page
# Click on the ''Add an IP Address'' green button on the top-right, Netbox will automatically select the first available IP in that subnet
# Click on the ''Add an IP Address'' green button on the top-right, Netbox will automatically select the first available IP in that subnet
#* Make sure to '''change the netmask''' to /32 in the address field.
#* Make sure to '''change the netmask''' to /32 in the address field for IPv4.
#* To create an IPv6 that is a mapped version of an existing IPv4, modify the ''Address'' field at the top to override the automatically selected address.
#* To create an IPv6 that is a mapped version of an existing IPv4, modify the ''Address'' field at the top to override the automatically selected address, set the netmask to /128
#* If this is a VIP, make sure you get the same last octect in both eqiad and codfw datacentres
#* If this is a VIP, make sure you get the same last octect in both eqiad and codfw datacentres
# Select the relevant Role (VIP for LVS, anycast, etc.)
# Select the relevant Role (VIP for LVS, anycast, etc.)
Line 143: Line 149:
# Select the Tenant if applicable (FR-Tech, RIPE)
# Select the Tenant if applicable (FR-Tech, RIPE)
# Click on the ''Create'' blue button at the bottom
# Click on the ''Create'' blue button at the bottom
 
#Once all Netbox changes are completed follow the instructions at [[DNS/Netbox#Update_generated_records]]
=== When will the zones be migrated to the new auto-generated files? ===
#For VIPs that are in the <code>*.svc.*</code> zones, until [[phab:T270071|T270071]] will be solved, it is also required to perform a manual change in the <code>operations/dns</code> repository to add the IP manually. This is the one that will be used, the automatically generated records for <code>*.svc.*</code> zones are not currently used.
Right after the cutoff date the migration will start on a per-<code>$ORIGIN</code> basis. We plan to migrate all related zones within a month after the cutoff date.
[[Category:Wikimedia infrastructure]]
[[Category:Wikimedia infrastructure]]
[[Category:SRE Infrastructure Foundations]]

Revision as of 08:48, 7 July 2022

IP allocation is done via Netbox, that is our IPAM, and DNS records of IPs allocated in Netbox are automatically generated from Netbox data.

Infrastructure

  • IP allocation is done on Netbox.
  • Netbox data is exported via Netbox#DNS.
  • Netbox data is checked out on the authoritative DNS servers in /srv/git/netbox_dns_snippets.
  • When compiling the gdnsd final zones, the Netbox data is copied into /etc/gdnsd/zones/netbox for later inclusion.
  • In the actual zonefiles, within an $ORIGIN , the related snippet file auto-generated from Netbox data is included using the $INCLUDE directive.

IP Allocation

Netbox is currently the IPAM for the production infrastructure and IP allocation must be done in Netbox unless it's on some very specific subnets that are exceptions and not managed within Netbox for now. The frack hosts are exempt from this as their IPs are not tracked in Netbox.

As historical data, the management IP address allocation has been migrated to Netbox on Wed. June 24th 2020 and the primary IPs address allocation for production infrastructure devices has been migrated to Netbox on Sep. 14th 2020.

DNS records involved

The following records are managed by Netbox in all data centres (except the frack infrastructure):

  • Management forward (A) and reverse (PTR) records for both the hostname (foo.mgmt.eqiad.wmnet) and the asset tag (wmf1234.mgmt.eqiad.wmnet)
  • Primary IPv4 (A ) and IPv6 (AAAA ) and related reverse (PTR ) records for the hostname (foo.eqiad.wmnet or foo.wikimedia.org )
  • Any additional IP in Netbox that has the DNS Name property set.

Missing AAAA records for primary IPv6

Although we're aiming for all hosts to have A/AAAA DNS records for their primary IPs, some clusters do not yet fully support IPv6 and as such do not have the DNS Name set in Netbox for their primary IPv6 address. Although not ideal, that's still considered ok as long as there is a mid/long term plan to remove the technical debt and make the cluster IPv6 compatible. If you manage one of those clusters, make sure to always notify DCOps of the fact that the AAAA records should be skipped when provisioning new hosts to prevent their addition.

Mixed clusters

A separate, more dangerous case is when a given cluster is mixed, having some hosts with the AAAA records for their primary IPv6 and some without. This is an indication that the cluster was not fully supporting IPv6 in the past and some hosts now have their primary IPv6 with the AAAA records, potentially because DCOps was not notified to skip the AAAA assignment at provision time. In this case there are two possible options:

  1. The hosts that have AAAA records are mis-configured, clients are trying to connect via IPv6, fail and have to retry on IPv4, and the AAAA records should be removed. In this case get in touch with SRE/Infrastructure Foundations/Contact for their removal.
  2. The hosts that have AAAA records are functioning correctly, meaning that the cluster actually supports IPv6 and we could just add the missing AAAA records to the remaining hosts. In this case follow DNS/Netbox#Add_missing_DNS_name_to_the_primary_IPv6_address.

Operations

Update generated records

To update the dynamically generated records based on the current Netbox data and deploy them to all the authoritative DNS servers, the sre.dns.netbox cookbook must be run. The cookbook must be run anytime records are changed in Netbox. See also Cookbooks#Cookbook_Operations. For example:

 sudo cookbook sre.dns.netbox -t T12345 "Add newly racked cp hosts in eqiad"

There is an Icinga check if changes in Netbox are not committed after a while, see Monitoring/Netbox_DNS_uncommitted_changes for troubleshooting.

If when running the cookbook the presented diff show changes unrelated to your work, follow the instructions in Monitoring/Netbox_DNS_uncommitted_changes#What_to_do.

Force update generated records

It might happen that one or more authdns hosts fail to run authdns-update, leading to an inconsistent state. The sre.dns.netbox cookbook offers a --force option, that takes as input the SHA of the git commit that you want all authdns servers to be synced on. In order to find the SHA, just do:

ssh netbox.wikimedia.org
sudo -i
cd /srv/netbox-exports/dns.git
git log -1

Convert an hardcoded $ORIGIN to Netbox

This is an example patch to convert an hardcoded $ORIGIN to the dynamically generated data.

Check if any automatically generated zone file is not included

This is the way to check if any of the automatically generated zone files is not $INCLUDEed in the operations/dns repository. From a local checkout of the automatically generated repository (see Netbox#DNS), assuming that the operations/dns repository is checked out in ../dns/, run:

for file in $(ls *); do git -C ../dns/ grep "INCLUDE" | grep -q "netbox/${file}" || echo "missing ${file}"; done

Some outstanding files are expected at the moment, namely related to SVC, wikimediacloud, and frack records.

You can use this script to help generate reverse zonefile includes and submit a CR to the dns repo to add them.

Atomically deploy auto-generated records and a manual change

In case there is a change in the generated Netbox data that requires at the same time a change in the manual operations/dns repository too, this is the procedure to follow:

  1. prepare the patch for the operations/dns repository, send it for review
    • CI will fail if there is any $INCLUDE of files not yet existing in the generated data, that's expected
  2. Modify the data in Netbox
  3. Run the sre.dns.netbox cookbook as described above in DNS/Netbox#Update_generated_records adding the option --skip-authdns-update
  4. Comment recheck in the CR sent for the operations/dns repository, CI should now pass
  5. Merge and deploy the patch, once deployed it will include also the generated data that was pushed but not deployed by the cookbook, making the change atomic from the DNS point of view.

The above procedure should be run all together without let too much time pass between each step and it will be wise to ask in the various SRE channels to refrain during this operation from running authdns-update or any of the cookbooks that in turn run the sre.dns.netbox one (as of Oct. 2020 sre.hosts.decommission and sre.ganeti.makevm) or the sre.dns.netbox cookbook itself.

As an example the above procedure was used when a new prefix was created and as a result the generated data got moved from one file to another, see operations/dns/+/632953

Modify the generated data in an emergency

To modify the generated data in an emergency it's possible to just run the sre.dns.netbox cookbook as described above in DNS/Netbox#Update_generated_records adding the option --emergency-manual-edit.

After the generation of the data the cookbook will stop and prompt the user to make the modifications, something like:

Generated temporary files are available on netbox1001.wikimedia.org:/tmp/dns-c25pcHBldHM-iad8k5x_
SSH there, as root modify any file, git stage them and run "git commit --amend" to commit them
Then run "git log --pretty=oneline -1" and copy the new SHA1 of HEAD

N.B.: any subsequent run of the cookbook will try to revert the manual changes, make all SREs aware of the emergency situation.

Add missing DNS name to the primary IPv6 address

Adding a missing DNS name to a primary IPv6 address is simple, but if it needs to be done on many hosts, it can be done programmatically, see below.

Before proceeding make sure that the host does fully support IPv6, all services are listening also on the IPv6 address, any ACL/grant/ferm rule is also setup for the IPv6 space. Keep in mind that most clients will default to connect via IPv6 once the AAAA records are live.

Single host

  1. Find the host's primary IPv6 address in Netbox, click edit and fill the DNS Name field with the FQDN of the host. To double check verify that the host primary IPv4 has the same DNS Name.
  2. Run the sre.dns.netbox cookbook as described above in DNS/Netbox#Update_generated_records

Multiple hosts

When the addition has been tested on few hosts and you're ready to convert multiple hosts:

  1. Get in touch with the SRE/Infrastructure Foundations/Contact so that the DNS names can be added programmatically in Netbox running something like this.
  2. Run the sre.dns.netbox cookbook as described above in DNS/Netbox#Update_generated_records

Transition FAQ

Am I affected?

If your workflows will be affected by this change depends entirely on your interaction with the operations/dns repository:

  • I never read or contribute to this repository:
    • you're not affected and nothing has changed for you. You can stop reading here.
  • I sometimes read or search for things in this repository:
    • you're marginally affected as the manual records were moved from the operations/dns repository and replaced by the auto-generated files. You can search directly in Netbox. If you want to see the raw content of the generated files you can clone the auto-generated repository to read or search in it following the instructions in Netbox#DNS. You can optionally read the rest of the document.
  • I contribute to the repository:
    • you're affected and should keep reading this FAQ section and the rest of the document.

What has changed

  • IP allocation is now done directly on Netbox , that is our IPAM tool. Only Fundraising-tech (frack) non-mgmt records were left out of this transition.
    • All new host's primary IPv4/IPv6 are automatically assigned to them at provision time.
    • Additional IPs require manual allocation in Netbox [see below] unless they are for Cassandra instances, that use case is already covered by the provisioning script.
  • The automatic DNS record generation (see above DNS/Netbox#Update_generated_records ) generates all of the records present in Netbox.

Why some IPv6 do not have a related AAAA/PTR DNS records?

Some clusters do not (yet?) support IPv6 on their infrastructure $hostname.$dc.$wmnet / $hostname.wikimedia.org addresses. For those, when the data was imported into Netbox, to keep the existing behaviour, the DNS Name field of the related IPv6 address in Netbox has been left empty. That means that at generation time only the IPv4 A/PTR records are generated and no AAAA/PTR record is generated for the IPv6.

Who can I ping for questions?

For questions, concerns or comments please get in touch with the SRE/Infrastructure Foundations/Contact.

What to do if the diff has spurious changes?

Follow the instructions in Monitoring/Netbox_DNS_uncommitted_changes#What_to_do.

How to allocate primary IPs for a server

Physical hosts

The management, primary IPv4 and primary IPv6 for any new physical host will be automatically assigned at provision time by DCOps running a Netbox script.

Virtual machines

For Ganeti virtual machines the sre.ganeti.makevm cookbook takes care of the new workflow automatically.

How to manually allocate a special purpose IP address in Netbox

This procedure is meant to be used only to create IPs in Netbox that are not attached to any device's interface because have special purposes like virtual IP addresses (VIPs, which are generally used for service addresses). Depending on real life use cases the following procedure might be automated into a Netbox script in the near future.

  1. Go to the VLANs page in Netbox VLANs and Netbox Prefixes
  2. Search for the correct VLAN based on datacenter, type, row (if applicable), etc.
  3. Click on the desired prefix (v4 or v6) in the Prefixes column for that VLAN
    • If you are setting up a new LVS service, use the prefix for LVS service IPs: codfw, eqiad
  4. Click on the IP Addresses tab in the prefix page
  5. Click on the Add an IP Address green button on the top-right, Netbox will automatically select the first available IP in that subnet
    • Make sure to change the netmask to /32 in the address field for IPv4.
    • To create an IPv6 that is a mapped version of an existing IPv4, modify the Address field at the top to override the automatically selected address, set the netmask to /128
    • If this is a VIP, make sure you get the same last octect in both eqiad and codfw datacentres
  6. Select the relevant Role (VIP for LVS, anycast, etc.)
  7. Set the DNS Name field with the FQDN to assign to this IP
  8. Select the Tenant if applicable (FR-Tech, RIPE)
  9. Click on the Create blue button at the bottom
  10. Once all Netbox changes are completed follow the instructions at DNS/Netbox#Update_generated_records
  11. For VIPs that are in the *.svc.* zones, until T270071 will be solved, it is also required to perform a manual change in the operations/dns repository to add the IP manually. This is the one that will be used, the automatically generated records for *.svc.* zones are not currently used.