You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Multicast HTCP purging: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Mark Bergsma
(update, udpmcast now used to extend multicast to pmtpa, not esams)
 
imported>Aklapper
(Link to Kafka HTTP purging in {{historical}})
 
(19 intermediate revisions by 11 users not shown)
Line 1: Line 1:
''This page was heavily edited by RobLa on 2013-01-28, and could use a review by a knowledgable opsen''
{{Archive|date=2013-2020|reason=Since 2020, see [[Kafka HTTP purging]] instead.}}
'''[[w:multicast|Multicast]] [[w:HTCP|HTCP]] purging''' was the dominant method of purging Varnish/ATS HTTP cache objects in our Traffic infrastructure until July 2020, which is when we switched to [[Kafka HTTP purging|Kafka-based CDN purging]]. See
[https://phabricator.wikimedia.org/T250781 T250781] for all migration work details.


'''Multicast HTCP purging''' is a method of [[Squid]] and [[Varnish]] purging by using [[w:multicast|multicast]] [[w:HTCP|HTCP]] packets.
== Typical Purge Flow ==


== Request flow ==
*  MediaWiki instance detects that a purge is needed.  It sends a multicast HTCP packet for each individual URI that needs to be purged.
*  Native multicast routing propagates the packet to all of our datacenters
*  The daemon vhtcpd (replaced by [[purged]] in 2020 while migrating to Kafka) which is running on every relevant cache machine and subscribed to the appropriate multicast group(s) receives a copy of the HTCP request
*  vhtcpd forwards the request to the Varnish instances on the local host over persistent HTTP/1.1 connections, using the PURGE request method
*  PURGE requests are handled by our custom [[VCL]] and cause the URI in question to be purged.


*  MediaWiki instance in Eqiad detects that a purge is needed.  It sends an HTCP purge request to a multicast group for each individual URI that needs to be purged.
== Risks for loss of purge requests ==
*  Native multicast routing is enabled in [[Eqiad]] and [[Pmtpa]], and multicast packets ''should'' natively route between the two datacenters
*  Multicast is sent to [[Esams]] via multicast->unicast->multicast relay located in eqiad (as of 2013-11-04)
*  All Squids/Varnish caches subscribe to the multicast feed


Note, multicast HTCP is a one-way protocol, which means that requests are fired and forgottenIf there is a problem anywhere in the system, the HTCP origin has no way of knowing there was a failure, and thus assumes that the request went through.
In general, multicast HTCP requests are UDP and have no response or confirmation back to the sender, therefore there is *always* the potential for requests to be silently lost at various points along the path.  Specifically:
* Application UDP send buffers - If the sending application (e.g. MediaWiki) does not allocate a sufficient UDP send buffer (e.g. via setsockopt(SO_SNDBUF)) to handle its own outgoing packet rate spikes, they could be dropped from the UDP send buffer before they ever leave the application host.
* Network infrastructure - any router or switch could drop packets on the floor due to excessive congestion or other similar issues.
* vhtcpd's UDP receive buffers - inverse of the first risk, on the receiving side.  vhtcpd currently configures its listening sockets with fairly large (16MB) receive buffers to help mitigate this risk.  Its internal code structure also prioritizes pulling requests from the UDP buffers into the internal memory queue over sending the requests on towards the varnishes (this is the inverse of the usual priority pattern for such software, which would be to prioritize emptying the internal queue over filling it, but helps avoid these potential UDP buffer losses).
* vhtcpd queue overflow - vhtcpd's internal buffers are large (default: 256MB, current config: 1GB), so that they can absorb long rate spikes and deal with temporary varnishd downtimes, etcHowever, if conditions conspire to completely fill the internal buffer, vhtcpd's recourse is to wipe the buffer and start fresh again.  The count of buffer wipes is visible via the queue_overflow statistic.


=== HTCP modifications to Squid ===
== MediaWiki ==
MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see <tt>DefaultSettings.php</tt> to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.


Mark Bergsma modified the HTCP support in Squid to do the following:
All Apaches are configured through <tt>CommonSettings.php</tt> to send HTCP purge requests to the appropriate multicast group. It uses multicast [[w:Time to live|Time To Live]] '''8''' (instead of the default, 1) because the messages need to cross multiple routers.
* work without requiring HTCP CLR responses
* work at all when not requesting HTCP CLR responses
* use a different store searching algorithm instead of htcpCheckHit(), which was intended for finding cache entries for URI hits instead of URI purges
* allow the simultaneous removal of both HEAD and GET entries with a single HTCP request, by specifying ''NONE'' as the HTTP method


The Squids are all configured with the following line:
== One-off purge ==
mcast_groups 239.128.0.112


to have them join the relevant multicast group, and receive all the purge requests.
On [[mwmaint1002]], run:
 
<pre>
=== Varnish ===
$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php
 
</pre>
Varnish relies on a separate listener daemon (varnishhtcpd) to listen for purge requests and respond to them.
 
=== MediaWiki ===
MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see <tt>DefaultSettings.php</tt> to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.


All Apaches are configured through <tt>CommonSettings.php</tt> to send HTCP purge requests to the multicast group address '''239.128.0.112'''. It uses multicast [[w:Time to live|Time To Live]] '''2''' (instead of the default, 1) because the messages need to cross a single subnet/router.
Note that static content under <code>/static/</code> must always be purged via hostname '<code>en.wikipedia.org</code>'. This is the shared virtual hostname under which Varnish caches content for <code>/static/</code>, regardless of requesting wiki hostname. Note also that mobile hostnames are cached independently of desktop hostnames. For example, to purge all copies of enwiki's article about Foo, one must purge both https://en.wikipedia.org/wiki/Foo '''and''' https://en.m.wikipedia.org/wiki/Foo


=== udpmcast relay ===
== Troubleshooting ==
'''udpmcast''' is a small application level multicast tool in Python, . It joins a given multicast group on startup, listens on a specified UDP port, and then forwards all received packets to a given set of (unicast or multicast) destinations.


Its options can be found by running it with the <tt>-h</tt> argument.
Confirming that varnish is receiving and processing purge requests:
# frontend instance:
varnishlog -c -n frontend -m RxRequest:PURGE
# backend instance:
varnishlog -c -m RxRequest:PURGE


As of November 2013, chromium is running udpmcast via /etc/rc.local and sending to dobson. Group is 239.128.0.112 port 4827.
Confirming that vhtcpd is operating correctly:
cat /tmp/vhtcpd.stats


udpmcast.py supports ''forwarding rules'', where it selects the destination address list based on the ''source address'' that sent the packet. These forward rules can be specified as a ''Python dictionary'' on the command line.
Which will show output similar to:
start:1453497429 uptime:353190 inpkts_recvd:611607720 inpkts_sane:611607720 inpkts_enqueued:611607720 inpkts_dequeued:594452309 queue_overflows:3 queue_size:0 queue_max_size:2282100


=== Multicast breakage troubleshooting ===
The file is written out every 15 seconds or so.  The fields are:
(current as of November) 2013
* start - the unix timestamp the daemon started at
* uptime - seconds the daemon has been running
* inpkts_recvd - input HTCP packets received from the network
* inpkts_sane - packets from above that survived sanity-checking and parsing
* inpkts_enqueued - packets from above that made it into the internal queue
* inpkts_dequeued - packets from the above that have been dequeued (sent) to all local varnish daemons
* queue_overflows - number of times the internal queue has reached the maximum size limit and wiped back to zero
* queue_size - current size of the internal request queue
* queue_max_size - the maximum size the queue has ever been since startup or the last overflow wipe


this is for troubleshooting the udp multicast to unicast proxy that enables purges to work in pmtpa
Note that both local varnish daemons (frontend and backend) must dequeue a packet before it leaves the queue.  If one daemon is stuck or stopped, that will eventually cause a queue overflow!


first tcpdump on chromium
To dump traffic directly off the network interfaces, use e.g.:
  tcpdump -n -v udp port 4827 and host 239.128.0.112
  tcpdump -n -v udp port 4827 and host 239.128.0.112


Is there a crazy amount of traffic?  if yes, it's not the network on the eqiad side! if no, it's the network on the pmtpa side.
(but note that you will only see traffic if the machine is subscribed to multicast, and generally the vhtcpd daemon must be up and listening for that to happen!)
 
if there is a lot of traffic then tcpdump on hooft
tcpdump -n -v udp port 4827 and host 208.80.152.173
 
Do you see a huge amount of traffic? If yes - it's not the network!  Let's say that chromium has no traffic.
 
After that making sure it is listening  -
root@chromium:/var/log# netstat -nl | grep 4827
udp        0      0 0.0.0.0:4827            0.0.0.0:*
 
Then let's check and see if chromium can get multicast traffic on the correct group.  Start iperf on chromium.
 
iperf -s -B 239.128.0.112 -u -p 1337 -i 5
 
Then go to a varnish machine (like cp1041) and start up iperf
 
iperf -c 239.128.0.112 -b 50K -t 300 -T 5 -u -p 1337 -i 5
 
Notice the port is NOT one used by a real service. This is important. 
 
You should see output on chromium like
 
root@cp1044:~# iperf -s -B 239.128.0.112 -u -p 8648 -i 5
------------------------------------------------------------
Server listening on UDP port 1337
Binding to local address 239.128.0.112
Joining multicast group  239.128.0.112
Receiving 1470 byte datagrams
UDP buffer size:  122 KByte (default)
------------------------------------------------------------
[  3] local 239.128.0.112 port 1337 connected with 10.64.0.169 port 8442
[ ID] Interval      Transfer    Bandwidth      Jitter  Lost/Total Datagrams
[  3]  0.0- 5.0 sec  30.1 KBytes  49.4 Kbits/sec  0.038 ms    0/  21 (0%)
[  3]  5.0-10.0 sec  30.1 KBytes  49.4 Kbits/sec  0.025 ms    0/  21 (0%)
[  3] 10.0-15.0 sec  30.1 KBytes  49.4 Kbits/sec  0.023 ms    0/  21 (0%)
 
 
If you do not, multicast has gone wrong.
 
Try this step over again but change the group address (like to 239.128.0.115). If this still does not work, multicast is broken between datacenters.
 
== History ==
Previous methods of Squid purging implemented in MediaWiki, SquidUpdate::purge and SquidUpdate::fastPurge, used HTTP PURGE requests over unicast TCP connections from all Apaches to all Squids. This had a few drawbacks:
* All Apaches needed to be able to connect to all Squids
* There was overhead of handling Squid's replies and TCP connection overhead
The biggest drawback was that it was plain slow.  Some profiling runs show that the current method is about '''8000 times''' faster than the older fastPurge method.


== External links ==
== External links ==
* [http://www.faqs.org/rfcs/rfc2756.html Hyper Text Caching Protocol HTCP/0.0]
* [http://www.squid-cache.org/mail-archive/squid-dev/200310/att-0011/htcp.c.diff Original HTCP CLR patch]
* [http://www.nedworks.org/~mark/patches/squid-htcp-clr.diff Mark's improved patch]
* [[mw:Multimedia/Cache Invalidation Misses]]
* [[mw:Multimedia/Cache Invalidation Misses]]
[[Category:Squid]]
* [https://github.com/wikimedia/htcp-purge Node.js HTCP purge module]
 
[[Category:Caching]]

Latest revision as of 17:22, 10 December 2021

Multicast HTCP purging was the dominant method of purging Varnish/ATS HTTP cache objects in our Traffic infrastructure until July 2020, which is when we switched to Kafka-based CDN purging. See T250781 for all migration work details.

Typical Purge Flow

  • MediaWiki instance detects that a purge is needed. It sends a multicast HTCP packet for each individual URI that needs to be purged.
  • Native multicast routing propagates the packet to all of our datacenters
  • The daemon vhtcpd (replaced by purged in 2020 while migrating to Kafka) which is running on every relevant cache machine and subscribed to the appropriate multicast group(s) receives a copy of the HTCP request
  • vhtcpd forwards the request to the Varnish instances on the local host over persistent HTTP/1.1 connections, using the PURGE request method
  • PURGE requests are handled by our custom VCL and cause the URI in question to be purged.

Risks for loss of purge requests

In general, multicast HTCP requests are UDP and have no response or confirmation back to the sender, therefore there is *always* the potential for requests to be silently lost at various points along the path. Specifically:

  • Application UDP send buffers - If the sending application (e.g. MediaWiki) does not allocate a sufficient UDP send buffer (e.g. via setsockopt(SO_SNDBUF)) to handle its own outgoing packet rate spikes, they could be dropped from the UDP send buffer before they ever leave the application host.
  • Network infrastructure - any router or switch could drop packets on the floor due to excessive congestion or other similar issues.
  • vhtcpd's UDP receive buffers - inverse of the first risk, on the receiving side. vhtcpd currently configures its listening sockets with fairly large (16MB) receive buffers to help mitigate this risk. Its internal code structure also prioritizes pulling requests from the UDP buffers into the internal memory queue over sending the requests on towards the varnishes (this is the inverse of the usual priority pattern for such software, which would be to prioritize emptying the internal queue over filling it, but helps avoid these potential UDP buffer losses).
  • vhtcpd queue overflow - vhtcpd's internal buffers are large (default: 256MB, current config: 1GB), so that they can absorb long rate spikes and deal with temporary varnishd downtimes, etc. However, if conditions conspire to completely fill the internal buffer, vhtcpd's recourse is to wipe the buffer and start fresh again. The count of buffer wipes is visible via the queue_overflow statistic.

MediaWiki

MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see DefaultSettings.php to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.

All Apaches are configured through CommonSettings.php to send HTCP purge requests to the appropriate multicast group. It uses multicast Time To Live 8 (instead of the default, 1) because the messages need to cross multiple routers.

One-off purge

On mwmaint1002, run:

$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php

Note that static content under /static/ must always be purged via hostname 'en.wikipedia.org'. This is the shared virtual hostname under which Varnish caches content for /static/, regardless of requesting wiki hostname. Note also that mobile hostnames are cached independently of desktop hostnames. For example, to purge all copies of enwiki's article about Foo, one must purge both https://en.wikipedia.org/wiki/Foo and https://en.m.wikipedia.org/wiki/Foo

Troubleshooting

Confirming that varnish is receiving and processing purge requests:

# frontend instance:
varnishlog -c -n frontend -m RxRequest:PURGE
# backend instance:
varnishlog -c -m RxRequest:PURGE

Confirming that vhtcpd is operating correctly:

cat /tmp/vhtcpd.stats

Which will show output similar to:

start:1453497429 uptime:353190 inpkts_recvd:611607720 inpkts_sane:611607720 inpkts_enqueued:611607720 inpkts_dequeued:594452309 queue_overflows:3 queue_size:0 queue_max_size:2282100

The file is written out every 15 seconds or so. The fields are:

  • start - the unix timestamp the daemon started at
  • uptime - seconds the daemon has been running
  • inpkts_recvd - input HTCP packets received from the network
  • inpkts_sane - packets from above that survived sanity-checking and parsing
  • inpkts_enqueued - packets from above that made it into the internal queue
  • inpkts_dequeued - packets from the above that have been dequeued (sent) to all local varnish daemons
  • queue_overflows - number of times the internal queue has reached the maximum size limit and wiped back to zero
  • queue_size - current size of the internal request queue
  • queue_max_size - the maximum size the queue has ever been since startup or the last overflow wipe

Note that both local varnish daemons (frontend and backend) must dequeue a packet before it leaves the queue. If one daemon is stuck or stopped, that will eventually cause a queue overflow!

To dump traffic directly off the network interfaces, use e.g.:

tcpdump -n -v udp port 4827 and host 239.128.0.112

(but note that you will only see traffic if the machine is subscribed to multicast, and generally the vhtcpd daemon must be up and listening for that to happen!)

External links