You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Anycast: Difference between revisions
imported>Ayounsi |
imported>Arturo Borrero Gonzalez (→Troubleshooting: mention the case in which the VIP is not being announced by BGP) |
||
(17 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
Anycast-based Wikimedia deployments include [[Anycast authoritative DNS]], [[Wikidough|Wikimedia DNS]] and its accompanying verification server, [[Durum]]. In production there is [[Anycast recursive DNS]] and the [[#Syslog]]. | |||
Anycast works thusly: | |||
* The VIP (virtual IP) is configured on the servers loopback | |||
[[ | * Bird (routing daemon) advertises the VIP to the routers using [[:en:Border_Gateway_Protocol|BGP]] | ||
* (optional) A [[:en:Bidirectional_Forwarding_Detection|BFD]] session is established between Bird and the routers to ensure fast failover in case of server or link failure | |||
* [https://github.com/unixsurfer/anycast_healthchecker Anycast_healthchecker] monitors the local (anycasted) service by querying it every second | |||
* If a service failure is detected, the VIP stops being advertised to the routers | |||
* When the service is restored, anycast_healthchecker waits 10s before re-advertising the IP to avoid flaps | |||
* The bird service is linked (systemd bind) to the anycast_healthchecker service so bird is stopped if anycast_healthchecker is not running/crashed | |||
* Time between a local service failure and clients to be redirected to a different server (advertising the same VIP) is 1s max | |||
* All servers advertise the same VIP worldwide, clients will be be routed to the closest (in the BGP definition) server (same DC, then shorter AS path, etc...) but is not based on latency | |||
* Routers do per flow load balancing (ECMP) between all local (same site) servers. Hashing is done on IP and port (L4) | |||
* As last hope backup, in case all servers stop advertising the VIP (eg. global missconfiguration), eqiad and codfw routers have less specific (/30) backup static routes pointing to their local servers | |||
Limitations: | |||
== | *The server "self-monitor" itself, if it fails in a way where BGP is up, but DNS is unreachable from outside to the VIP (eg. iptables) this will cause an outage | ||
*By the nature of Anycast, Icinga will only check the health of the VIP closer to it | |||
**This could be worked around by checking the anycasted service health from various vantage points - {{Phabricator/en|T311618}} | |||
**health checks to the servers' real IP still works | |||
== Deploying a new service == | |||
# Assign an IP in DNS, from the 10.3.0.0/24 range - (eg. [[gerrit:c/operations/dns/+/524045|Gerrit CR 524045]]) | # Assign an IP in DNS, from the 10.3.0.0/24 range - (eg. [[gerrit:c/operations/dns/+/524045|Gerrit CR 524045]]) | ||
# Configure the server side (eg. [[gerrit:c/operations/puppet/+/524037|Gerrit CR 524037]]) | # Configure the server side (eg. [[gerrit:c/operations/puppet/+/524037|Gerrit CR 524037]]) | ||
## Add <code>include ::profile::bird::anycast</code> where you see fit (usually to the service's role) | ## Add <code>include ::profile::bird::anycast</code> where you see fit (usually to the service's role) | ||
## Configure the VIP and its attributes (usually <code>hieradata/role/common/</code> | ## Configure the VIP and its required attributes (usually <code>hieradata/role/common/</code><syntaxhighlight lang="yaml" line="1"> | ||
profile::bird::advertise_vips: | profile::bird::advertise_vips: | ||
<vip_fqdn>: | <vip_fqdn>: # used as identifier | ||
address: 10.3.x.x # VIP to advertise | address: 10.3.x.x # VIP to advertise | ||
check_cmd: '/bin/true' # Any command to check the | check_cmd: '/bin/true' # Any command to check the health of the service | ||
service_type: foobar # Can be any string, if underling applications need to distinguish VIPs | |||
check_fail: 2 # (Optional, default = 1) number of tries before the service is considered down | |||
</syntaxhighlight>Notes: | |||
</ | ##*check_cmd is ran once per second from user "bird" | ||
##* | ##*If any complex commands create a small bash script or use <code>check_cmd: "/bin/sh -c '<commands>'"</code> | ||
##*anycast-healthchecker use the return code of the heath-check script, 0 = good, everything else is considered as a failure | |||
##*IPv6 is supported but not enabled by default. See the section below on how to enable it. | |||
# Configure the router side: | # Configure the router side: | ||
## <code>set protocols bgp group Anycast4 neighbor <server_IP></code> | ##<code>set protocols bgp group Anycast4 neighbor <server_IP></code> | ||
# Add monitoring to the VIP, similar to any Icinga checks, but in [[phab:source/operations-puppet/browse/production/modules/profile/manifests/bird/anycast_monitoring.pp|modules/profile/manifests/bird/anycast_monitoring.pp]] | # Add monitoring to the VIP, similar to any Icinga checks, but in [[phab:source/operations-puppet/browse/production/modules/profile/manifests/bird/anycast_monitoring.pp|modules/profile/manifests/bird/anycast_monitoring.pp]] | ||
# (Optional) if deploying a new type of service, ask Netops to add a backup static route | # (Optional) if deploying a new type of service, ask Netops to add a backup static route | ||
<br /> | == Other relevant configurations == | ||
Hiera keys:<syntaxhighlight lang="yaml"> | |||
# service the anycast-healthchecker binds to. | |||
# bird is automatically bounded to anycast-healthchecker | |||
profile::bird::bind_anycast_service: haproxy.service | |||
# Router IPs with which Birds establish BGP sessions | |||
# Usually set per site | |||
profile::bird::neighbors_list: | |||
- routerIP | |||
- other_router_IP | |||
# Fast failure detection between router and server (Optional, true by default) | |||
profile::bird::bfd: true | |||
# Usually set per service (role) | |||
# But can be set for a specific host as well, for example to specifically remove the VIP from a host to be decommissioned. | |||
profile::bird::advertise_vips: | |||
<vip_fqdn>: # Used as identifier | |||
address: 10.3.x.x # VIP to advertise (required) | |||
check_cmd: '/bin/true' # Any command to check the healh of the service, ran as user "bird" once per second (required) | |||
service_type: foobar # Can be any string, if underling applications need to distinguish VIPs (required) | |||
ensure: present # Set to absent to cleanly remove the check (optional, present by default) | |||
# IPv6 support (experimental!) | |||
# IPv6 is not enabled by default and needs to be explicitly enabled as the use case is limited and current deployments use IPv4 | |||
# To enable IPv6 support, you need to set do_ipv6 to true and then set the relevant IPv6 settings (address_ipv6 and check_cmd_ipv6) | |||
profile::bird::do_ipv6: true | |||
profile::bird::advertise_vips: | |||
<vip_fqdn>: # Used as identifier | |||
address: 10.3.x.x # VIP to advertise (required) | |||
check_cmd: '/bin/true' # Any command to check the healh of the service, ran as user "bird" once per second (required) | |||
service_type: foobar # Can be any string, if underling applications need to distinguish VIPs (required) | |||
ensure: present # Set to absent to cleanly remove the check (optional, present by default) | |||
address_ipv6: 2620:0:860 # /128 IPv6 VIP to advertise (required if do_ipv6 is set to true) | |||
check_cmd_ipv6: '/bin/true' # command to check the health of the service, for IPv6 | |||
</syntaxhighlight> | |||
== Routing == | |||
{{Remove|Code duplication, this will often be out of date.}} | |||
=== Configuration === | |||
<syntaxhighlight lang="bash" line="1"> | |||
# show protocols bgp group Anycast4 | |||
type external; | |||
/* T209989 */ | |||
multihop { | |||
ttl 193; | |||
} | |||
local-address 208.80.153.193; # Router's loopback | |||
damping; | |||
import anycast_import; # See below | |||
family inet { | |||
unicast { | |||
prefix-limit { | |||
maximum 50; # Take the session down if more than 50 prefixes | |||
teardown; # learned from the servers (eg. missconfiguration) | |||
} | |||
} | |||
} | |||
export NONE; | |||
peer-as 64605; # Server's ASN | |||
bfd-liveness-detection { | |||
minimum-interval 300; # Take the session down after 3*300ms failures | |||
} | |||
multipath; # Enable load balancing (remove for active/passive) | |||
neighbor 208.80.153.111; # Servers IPs | |||
neighbor 208.80.153.77; | |||
# show policy-options policy-statement anycast_import | |||
term anycast4 { | |||
from { | |||
prefix-list-filter anycast-internal4 longer; # Only accept prefixes in the defined range | |||
} | |||
then { | |||
damping default; | |||
accept; | |||
} | |||
} | |||
then reject; | |||
# show policy-options prefix-list anycast-internal4 | |||
10.3.0.0/24; | |||
# show routing-options static route 10.3.0.0/30 | |||
next-hop 208.80.153.111; | |||
readvertise; | |||
no-resolve; | |||
</syntaxhighlight> | |||
=== Routing to specific VIPs === | |||
Here both next hops (servers) are load balanced, as they are under the same *[BGP] block.<syntaxhighlight lang="bash" line="1"> | |||
> show route 10.3.0.1 | |||
10.3.0.1/32 *[BGP/170] 1w4d 08:54:21, localpref 100, from 208.80.153.77 | |||
AS path: 64605 I, validation-state: unverified | |||
to 208.80.153.77 via ae3.2003 | |||
> to 208.80.153.111 via ae4.2004 | |||
</syntaxhighlight>MTR can also be used for less granularity (site). Eg:<syntaxhighlight lang="bash"> | |||
bast5001:~$ mtr 10.3.0.1 --report | |||
Start: Fri Apr 5 16:48:21 2019 | |||
HOST: bast5001 Loss% Snt Last Avg Best Wrst StDev | |||
1.|-- ae1-510.cr2-eqsin.wikimed 0.0% 10 0.3 0.7 0.2 4.4 1.1 | |||
2.|-- ae0.cr1-eqsin.wikimedia.o 0.0% 10 0.2 1.0 0.2 7.8 2.3 | |||
3.|-- xe-5-1-2.cr1-codfw.wikime 0.0% 10 195.1 195.3 195.1 196.5 0.3 | |||
4.|-- recdns.anycast.wmnet 0.0% 10 195.1 195.1 195.1 195.1 0.0 | |||
</syntaxhighlight> | |||
== Monitoring == | |||
<code>anycast_healthchecker</code> logs can be viewed at <code>/var/log/anycast-healthchecker/anycast-healthchecker.log</code> | |||
Bird's health can be monitored on its [https://grafana.wikimedia.org/d/dxbfeGDZk/anycast Grafana] dashboard. | |||
=== Syslog === | |||
{{See|For more context, see the main [[Gerrit:c/operations/puppet/+/524037/|Gerrit change request (524037)]]}} | |||
A limitation of a non-anycast setup: Some appliances such as network devices or PDUs can only send syslog to udp endpoints and not through the regular pipeline. The previous setup relied on 2 endpoints: syslog.codfw.wmnet and syslog.eqiad.wmnet, both CNAMEs. Some of those devices resolve the configured endpoint FQDN when the configuration is applied. This causes two issues: | |||
# Changing the CNAME doesn't make the device send logs to the new endpoint | |||
# Using DNS round robin is not possible | |||
This left us with two options: | |||
# Configure the devices with only 1 endpoint (eg. the geographically closer): which mean SPOF | |||
# Configure the devices with both endpoints (not always supported): duplicated data in syslog | |||
Using anycast, if an endpoint goes down, logs will automatically be routed to any other one. | |||
==== Configuration ==== | |||
The VIP 10.3.0.4 (syslog.anycast.wmnet) is advertised by role syslog::centralserver (centrallog2002.codfw.wmnet and centrallog1001.eqiad.wmnet in Dec 2021) | |||
Anycast healthchecker looks if there is a process listening on port udp/10514. | |||
The rsyslog daemon on the centrallog hosts binds port 10514 UDP with a plaintext syslog listener, and forwards any logs recieved on this port along to the kafka logging/logstash pipeline using a config called "netdev_kafka_relay" or "netdev-kafka-relay"<br /> | |||
==== Netconsole ==== | |||
The syslog hosts also run a Linux netconsole server to receive UDP logs from kernel urgent messages, the syslog anycast IP address is used by default. See [[Netconsole]] for more information. | |||
== Pooling/Depooling == | |||
To temporarily depool a service, disable [[Puppet]], then stop bird.service. | |||
To depool a server long-term, either: | |||
* Deactivate the neighbor IP on the router side | |||
* (Cleaner) Add a specific <code>profile::bird::advertise_vips</code> with the same identifier to the server, and <code>check_cmd: /bin/false</code> or <code>ensure: absent</code> | |||
== Upgrading Bird == | |||
See The bird 2 upgrade task for https://phabricator.wikimedia.org/T310574 for possible pitfalls. | |||
Notably make sure the <code>anycast-healthchecker</code> and <code>prometheus-bird-exporter</code> packages and tools are compatible. | |||
== anycast-healthchecker logging == | |||
In the default configuration, anycast-healthchecker sets logging level to ''info'' and saves eight backups of logs to disk, taking care of log rotation itself. Since this may not be desired for hosts where anycast-hc is already functioning, you can decrease the verbosity and change the number of backups it maintains by using this Hiera configuration: | |||
profile::bird::anycasthc_logging: | |||
level: 'critical' | |||
num_backups: 2 | |||
You can choose from the following logging levels: 'debug', 'info', 'warning', 'error', 'critical'. | |||
== Troubleshooting == | |||
===Bird daemon not running=== | |||
This will trigger an automatic de-pool of the faulty server. Unless this is happening to multiple servers it is not an emergency. | |||
Open a netops task and investigate like any daemon issue. | |||
Has it been shutdown on purpose? Did any change have been made recently? | |||
What do the log says (grep for "bird" in /var/log/syslog)? | |||
Can it be restarted with <code>service bird start</code> ? | |||
=== Anycast healthchecker not running === | |||
Same as above. | |||
Logs are in <code>/var/log/anycast-healthchecker/anycast-healthchecker.log</code> | |||
Process is <code>sudo service anycast-healthchecker status</code> | |||
=== VIP not being announced by BGP === | |||
The VIP announcement status is controlled by the <code>anycast-healthchecker</code> mechanism. | |||
In particular, if the command configured in <code>check_cmd</code> fails, the VIP will be removed from the BGP announcement. | |||
How this works is like this: | |||
* the <code>anycast-healthchecker</code> service reads the configuration for VIPs from <code>/etc/anycast-healthchecker.d/*.conf</code> files | |||
* for each file, it runs the <code>check_cmd</code> command. | |||
* the service will add/remove the VIP from the <code>/etc/bird/anycast-prefixes.conf</code> file dynamically based on the result of the check. |
Latest revision as of 15:49, 5 May 2023
Anycast-based Wikimedia deployments include Anycast authoritative DNS, Wikimedia DNS and its accompanying verification server, Durum. In production there is Anycast recursive DNS and the #Syslog.
Anycast works thusly:
- The VIP (virtual IP) is configured on the servers loopback
- Bird (routing daemon) advertises the VIP to the routers using BGP
- (optional) A BFD session is established between Bird and the routers to ensure fast failover in case of server or link failure
- Anycast_healthchecker monitors the local (anycasted) service by querying it every second
- If a service failure is detected, the VIP stops being advertised to the routers
- When the service is restored, anycast_healthchecker waits 10s before re-advertising the IP to avoid flaps
- The bird service is linked (systemd bind) to the anycast_healthchecker service so bird is stopped if anycast_healthchecker is not running/crashed
- Time between a local service failure and clients to be redirected to a different server (advertising the same VIP) is 1s max
- All servers advertise the same VIP worldwide, clients will be be routed to the closest (in the BGP definition) server (same DC, then shorter AS path, etc...) but is not based on latency
- Routers do per flow load balancing (ECMP) between all local (same site) servers. Hashing is done on IP and port (L4)
- As last hope backup, in case all servers stop advertising the VIP (eg. global missconfiguration), eqiad and codfw routers have less specific (/30) backup static routes pointing to their local servers
Limitations:
- The server "self-monitor" itself, if it fails in a way where BGP is up, but DNS is unreachable from outside to the VIP (eg. iptables) this will cause an outage
- By the nature of Anycast, Icinga will only check the health of the VIP closer to it
- This could be worked around by checking the anycasted service health from various vantage points - task T311618
- health checks to the servers' real IP still works
Deploying a new service
- Assign an IP in DNS, from the 10.3.0.0/24 range - (eg. Gerrit CR 524045)
- Configure the server side (eg. Gerrit CR 524037)
- Add
include ::profile::bird::anycast
where you see fit (usually to the service's role) - Configure the VIP and its required attributes (usually
hieradata/role/common/
Notes:profile::bird::advertise_vips: <vip_fqdn>: # used as identifier address: 10.3.x.x # VIP to advertise check_cmd: '/bin/true' # Any command to check the health of the service service_type: foobar # Can be any string, if underling applications need to distinguish VIPs check_fail: 2 # (Optional, default = 1) number of tries before the service is considered down
- check_cmd is ran once per second from user "bird"
- If any complex commands create a small bash script or use
check_cmd: "/bin/sh -c '<commands>'"
- anycast-healthchecker use the return code of the heath-check script, 0 = good, everything else is considered as a failure
- IPv6 is supported but not enabled by default. See the section below on how to enable it.
- Add
- Configure the router side:
set protocols bgp group Anycast4 neighbor <server_IP>
- Add monitoring to the VIP, similar to any Icinga checks, but in modules/profile/manifests/bird/anycast_monitoring.pp
- (Optional) if deploying a new type of service, ask Netops to add a backup static route
Other relevant configurations
Hiera keys:
# service the anycast-healthchecker binds to.
# bird is automatically bounded to anycast-healthchecker
profile::bird::bind_anycast_service: haproxy.service
# Router IPs with which Birds establish BGP sessions
# Usually set per site
profile::bird::neighbors_list:
- routerIP
- other_router_IP
# Fast failure detection between router and server (Optional, true by default)
profile::bird::bfd: true
# Usually set per service (role)
# But can be set for a specific host as well, for example to specifically remove the VIP from a host to be decommissioned.
profile::bird::advertise_vips:
<vip_fqdn>: # Used as identifier
address: 10.3.x.x # VIP to advertise (required)
check_cmd: '/bin/true' # Any command to check the healh of the service, ran as user "bird" once per second (required)
service_type: foobar # Can be any string, if underling applications need to distinguish VIPs (required)
ensure: present # Set to absent to cleanly remove the check (optional, present by default)
# IPv6 support (experimental!)
# IPv6 is not enabled by default and needs to be explicitly enabled as the use case is limited and current deployments use IPv4
# To enable IPv6 support, you need to set do_ipv6 to true and then set the relevant IPv6 settings (address_ipv6 and check_cmd_ipv6)
profile::bird::do_ipv6: true
profile::bird::advertise_vips:
<vip_fqdn>: # Used as identifier
address: 10.3.x.x # VIP to advertise (required)
check_cmd: '/bin/true' # Any command to check the healh of the service, ran as user "bird" once per second (required)
service_type: foobar # Can be any string, if underling applications need to distinguish VIPs (required)
ensure: present # Set to absent to cleanly remove the check (optional, present by default)
address_ipv6: 2620:0:860 # /128 IPv6 VIP to advertise (required if do_ipv6 is set to true)
check_cmd_ipv6: '/bin/true' # command to check the health of the service, for IPv6
Routing
![]() | This is being considered for removal. Reason: Code duplication, this will often be out of date. (Discuss in Talk:Anycast) |
Configuration
# show protocols bgp group Anycast4
type external;
/* T209989 */
multihop {
ttl 193;
}
local-address 208.80.153.193; # Router's loopback
damping;
import anycast_import; # See below
family inet {
unicast {
prefix-limit {
maximum 50; # Take the session down if more than 50 prefixes
teardown; # learned from the servers (eg. missconfiguration)
}
}
}
export NONE;
peer-as 64605; # Server's ASN
bfd-liveness-detection {
minimum-interval 300; # Take the session down after 3*300ms failures
}
multipath; # Enable load balancing (remove for active/passive)
neighbor 208.80.153.111; # Servers IPs
neighbor 208.80.153.77;
# show policy-options policy-statement anycast_import
term anycast4 {
from {
prefix-list-filter anycast-internal4 longer; # Only accept prefixes in the defined range
}
then {
damping default;
accept;
}
}
then reject;
# show policy-options prefix-list anycast-internal4
10.3.0.0/24;
# show routing-options static route 10.3.0.0/30
next-hop 208.80.153.111;
readvertise;
no-resolve;
Routing to specific VIPs
Here both next hops (servers) are load balanced, as they are under the same *[BGP] block.
> show route 10.3.0.1
10.3.0.1/32 *[BGP/170] 1w4d 08:54:21, localpref 100, from 208.80.153.77
AS path: 64605 I, validation-state: unverified
to 208.80.153.77 via ae3.2003
> to 208.80.153.111 via ae4.2004
MTR can also be used for less granularity (site). Eg:
bast5001:~$ mtr 10.3.0.1 --report
Start: Fri Apr 5 16:48:21 2019
HOST: bast5001 Loss% Snt Last Avg Best Wrst StDev
1.|-- ae1-510.cr2-eqsin.wikimed 0.0% 10 0.3 0.7 0.2 4.4 1.1
2.|-- ae0.cr1-eqsin.wikimedia.o 0.0% 10 0.2 1.0 0.2 7.8 2.3
3.|-- xe-5-1-2.cr1-codfw.wikime 0.0% 10 195.1 195.3 195.1 196.5 0.3
4.|-- recdns.anycast.wmnet 0.0% 10 195.1 195.1 195.1 195.1 0.0
Monitoring
anycast_healthchecker
logs can be viewed at /var/log/anycast-healthchecker/anycast-healthchecker.log
Bird's health can be monitored on its Grafana dashboard.
Syslog
A limitation of a non-anycast setup: Some appliances such as network devices or PDUs can only send syslog to udp endpoints and not through the regular pipeline. The previous setup relied on 2 endpoints: syslog.codfw.wmnet and syslog.eqiad.wmnet, both CNAMEs. Some of those devices resolve the configured endpoint FQDN when the configuration is applied. This causes two issues:
- Changing the CNAME doesn't make the device send logs to the new endpoint
- Using DNS round robin is not possible
This left us with two options:
- Configure the devices with only 1 endpoint (eg. the geographically closer): which mean SPOF
- Configure the devices with both endpoints (not always supported): duplicated data in syslog
Using anycast, if an endpoint goes down, logs will automatically be routed to any other one.
Configuration
The VIP 10.3.0.4 (syslog.anycast.wmnet) is advertised by role syslog::centralserver (centrallog2002.codfw.wmnet and centrallog1001.eqiad.wmnet in Dec 2021)
Anycast healthchecker looks if there is a process listening on port udp/10514.
The rsyslog daemon on the centrallog hosts binds port 10514 UDP with a plaintext syslog listener, and forwards any logs recieved on this port along to the kafka logging/logstash pipeline using a config called "netdev_kafka_relay" or "netdev-kafka-relay"
Netconsole
The syslog hosts also run a Linux netconsole server to receive UDP logs from kernel urgent messages, the syslog anycast IP address is used by default. See Netconsole for more information.
Pooling/Depooling
To temporarily depool a service, disable Puppet, then stop bird.service.
To depool a server long-term, either:
- Deactivate the neighbor IP on the router side
- (Cleaner) Add a specific
profile::bird::advertise_vips
with the same identifier to the server, andcheck_cmd: /bin/false
orensure: absent
Upgrading Bird
See The bird 2 upgrade task for https://phabricator.wikimedia.org/T310574 for possible pitfalls.
Notably make sure the anycast-healthchecker
and prometheus-bird-exporter
packages and tools are compatible.
anycast-healthchecker logging
In the default configuration, anycast-healthchecker sets logging level to info and saves eight backups of logs to disk, taking care of log rotation itself. Since this may not be desired for hosts where anycast-hc is already functioning, you can decrease the verbosity and change the number of backups it maintains by using this Hiera configuration:
profile::bird::anycasthc_logging: level: 'critical' num_backups: 2
You can choose from the following logging levels: 'debug', 'info', 'warning', 'error', 'critical'.
Troubleshooting
Bird daemon not running
This will trigger an automatic de-pool of the faulty server. Unless this is happening to multiple servers it is not an emergency.
Open a netops task and investigate like any daemon issue.
Has it been shutdown on purpose? Did any change have been made recently?
What do the log says (grep for "bird" in /var/log/syslog)?
Can it be restarted with service bird start
?
Anycast healthchecker not running
Same as above.
Logs are in /var/log/anycast-healthchecker/anycast-healthchecker.log
Process is sudo service anycast-healthchecker status
VIP not being announced by BGP
The VIP announcement status is controlled by the anycast-healthchecker
mechanism.
In particular, if the command configured in check_cmd
fails, the VIP will be removed from the BGP announcement.
How this works is like this:
- the
anycast-healthchecker
service reads the configuration for VIPs from/etc/anycast-healthchecker.d/*.conf
files - for each file, it runs the
check_cmd
command. - the service will add/remove the VIP from the
/etc/bird/anycast-prefixes.conf
file dynamically based on the result of the check.