Observations of differences between MediaWiki servers on stretch vs buster, before and after task T245757, January 2021

TCP errors

At first glance it appears as if TCP errors have been reduced when looking at these 2 example hosts. The data gap in the middle is when the reimaging to buster happened.

But once you zoom out and look at an entire week before, it turns out it isn't actually a pattern.

mw1268 - TCP errors - over an entire week before the upgrade

disk utilization

Similary it first looks as if disk utilization went through the roof after the upgrade:

But once you zoom out.. you see we have these spikes separate from the upgrade event:

mw1268 - disk utilization over a week leading up to the buster upgrade day

performance (avg response time)

Looking at average response time it can appear as if a buster server is actually slower if we look at mw1268 (stretch) vs mw1267 (buster) over a 6 hour span:

mw1268 (stretch) vs mw1267 (buster) - avg response time - over 6 hours on 2021-01-27

Similarly if we compare these hosts over a week:

mw1268 (stretch) vs mw1267 (buster) - avg response time - over a week in Jan 2021


example grafana links, dashboards used: host-overview, application-servers-red-dashboard-wkandek