Server Admin Log

From Wikitech
Jump to: navigation, search

2017-08-22

  • 23:39 thcipriani@tin: Synchronized php-1.30.0-wmf.15/extensions/Linter/includes/LintErrorsPager.php: SWAT: Fix up 11f4a97ba6bcd0c1de (duration: 00m 49s)
  • 23:35 thcipriani@tin: Finished scap: SWAT: Fix typo in the notification message (duration: 21m 48s)
  • 23:13 thcipriani@tin: Started scap: SWAT: Fix typo in the notification message
  • 23:10 thcipriani@tin: Synchronized php-1.30.0-wmf.14/extensions/Popups/includes/PopupsHooks.php: SWAT: Remove aborting of BeforePageDisplay hook T173411 (duration: 00m 49s)
  • 21:18 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Retry CodeMirror deployment T170966 (duration: 00m 49s)
  • 20:41 ejegg: updated CiviCRM from 4aa177b to fa882f2
  • 20:25 mutante: restbase-dev* - puppet runs fail due to E: Version '3.11.0' for 'cassandra' was not found
  • 20:00 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.30.0-wmf.15
  • 19:56 urandom: starting cassandra restbase2004-a and restbase2006-c, OOMs
  • 19:46 thcipriani@tin: Finished scap: testwiki to php-1.30.0-wmf.15 and rebuild l10n cache (duration: 45m 55s)
  • 19:00 thcipriani@tin: Started scap: testwiki to php-1.30.0-wmf.15 and rebuild l10n cache
  • 18:50 robh: labservies1002 update completed (not 1003, typo)
  • 18:50 robh: labservies1003 update completed
  • 18:40 robh: firmware update of labservices1002 in progress
  • 17:54 andrewbogott: removing obsolete apache2 and puppetmaster packages from labcontrol boxes for https://phabricator.wikimedia.org/T171786
  • 17:07 thcipriani: starting branch cut for 1.30.0-wmf.15
  • 13:58 godog: bounce varnish on cp1074 / cp1049 / cp1073 - mailbox problems
  • 13:48 zeljkof: EU SWAT finished
  • 13:45 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: relatedArticles: Tidy up config (T165991) (duration: 00m 44s)
  • 13:44 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: relatedArticles: Tidy up config (T165991) (duration: 00m 44s)
  • 13:29 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wmgUseWikimediaShopLink to true for ptwiki (T173768) (duration: 00m 45s)
  • 10:41 moritzm: upgrading hhvm-luasandbox on mw1293-mw1295 (image scalers)
  • 10:21 Amir1: another run of rebuildTermSqlIndex (T171460)
  • 09:54 moritzm: upgrading hhvm-luasandbox on mw1189-mw1208 (API servers)
  • 09:42 marostegui: Renaming user Darwinius → DarwIn - T173159
  • 09:35 moritzm: upgrading hhvm-luasandbox on mw1180-1188 and mw1209-mw1220 (app servers)
  • 09:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1064 - T172996 (duration: 00m 44s)
  • 08:59 akosiaris: upload apertium-bel-rus_0.2.0~r81186-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 08:55 moritzm: upgrading hhvm-luasandbox on mw1161-mw1167 (job runners)
  • 08:43 akosiaris: upload apertium-rus_0.1.0~r81184-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 08:43 akosiaris: upload apertium-bel_0.1.0~r81357-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 08:35 godog: bounce varnish on cp1072 - mailbox problems
  • 08:33 godog: bounce varnish on cp4015 - mailbox problems
  • 08:24 marostegui: Drop s4 from db1095 with NO replication - T172996
  • 08:23 moritzm: upgrading hhvm-luasandbox on deployment servers / script runners
  • 07:34 marostegui: Stop replication on db1064 sanitarium2 and sanitarium3 master to move labsdb1009,10 and 11 s4 from db1095 to db1102 - https://phabricator.wikimedia.org/T172996
  • 07:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1064 - T172996 (duration: 00m 52s)
  • 07:28 moritzm: installing ruby security updates on trusty (Debian already fixed)
  • 07:11 moritzm: installing c-ares security updates on trusty (Debian already fixed)
  • 06:54 moritzm: installing graphite2 (the image library, not the metrics tool) security updates on trusty (Debian already fixed)
  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Aug 22 02:30:47 UTC 2017 (duration 6m 55s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.14) (duration: 07m 11s)

2017-08-21

  • 22:50 robh: rebooting tin for firmware update since its idle
  • 22:20 mutante: praseodymium - installing BIOS upgrade, reboot
  • 22:11 mutante: xenon - installing BIOS upgrade
  • 21:58 mutante: cerium (cassandra test) - rebooting for firmware upgrade
  • 21:54 mutante: cerium - installing Dell BIOS upgrade (T162850)
  • 21:26 mutante: bast2001 - running Dell BIOS firmware upgrade (T162850)
  • 21:20 arlolra@tin: Finished deploy [parsoid/deploy@2210a38]: Updating Parsoid to 28a9a22b (duration: 10m 59s)
  • 21:09 arlolra@tin: Started deploy [parsoid/deploy@2210a38]: Updating Parsoid to 28a9a22b
  • 18:40 niharika29@tin: Synchronized dblists/pp_stage1.dblist: Roll page previews out to all wikis except en and de wiki T162672 (duration: 00m 44s)
  • 18:34 niharika29@tin: Synchronized wmf-config/CommonSettings.php: Enable OOjs UI EditPage on all wikis https://gerrit.wikimedia.org/r/#/c/366868/ (duration: 00m 44s)
  • 18:33 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable OOjs UI EditPage on all wikis https://gerrit.wikimedia.org/r/#/c/366868/ (duration: 00m 44s)
  • 18:30 niharika29@tin: Synchronized wmf-config/Wikibase.php: Wikibase on deployment-prep: Exclude non-existent wikis from clientDbList T173571 (duration: 00m 44s)
  • 18:25 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Increase AbuseFilter autodisable thresholds for Meta-Wiki T173633 (duration: 00m 44s)
  • 18:22 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Add CookBook and Cookbook Talk NS on hiwikibooks T173398 (duration: 00m 45s)
  • 18:16 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Administrators to add/remove 'transwiki' at nowiktionary T172365 (duration: 00m 45s)
  • 18:12 niharika29@tin: Synchronized static/images/: Set project logo for wikimania2018wiki T173042 (duration: 00m 44s)
  • 18:11 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Set project logo for wikimania2018wiki T173042 (duration: 00m 44s)
  • 17:22 mobrovac@tin: Finished deploy [cxserver/deploy@f43ef96]: Bring back cxserver on scb2001 to a stable state - T173038 (duration: 00m 14s)
  • 17:22 mobrovac@tin: Started deploy [cxserver/deploy@f43ef96]: Bring back cxserver on scb2001 to a stable state - T173038
  • 17:12 gehel@tin: Finished deploy [wdqs/wdqs@a1c4f1f]: (no justification provided) (duration: 02m 27s)
  • 17:10 gehel@tin: Started deploy [wdqs/wdqs@a1c4f1f]: (no justification provided)
  • 15:40 bblack: cp1099 - varnish backend restart for mailbox lag
  • 14:55 mobrovac: cxserver depool scb2001 to debug failed checks - T173038
  • 14:54 mobrovac@tin: Finished deploy [cxserver/deploy@1065ffe]: Deploy 1065ffe2 to canary scb2001 for debugging - T173038 (duration: 00m 18s)
  • 14:53 mobrovac@tin: Started deploy [cxserver/deploy@1065ffe]: Deploy 1065ffe2 to canary scb2001 for debugging - T173038
  • 14:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079 - T163190 (duration: 00m 44s)
  • 14:04 zeljkof: EU SWAT finished
  • 14:04 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add HD logos for srwikisource, update them too (T172268) (duration: 00m 44s)
  • 14:02 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Add HD logos for srwikisource, update them too (T172268) (duration: 00m 44s)
  • 13:53 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update logos for srwikinews, add HD version for them (T172255) (duration: 00m 49s)
  • 13:52 jynus: stop dbstore2001 mariadb@s4, start mariadb@s7
  • 13:52 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Update logos for srwikinews, add HD version for them (T172255) (duration: 00m 44s)
  • 13:46 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set X-Frame-Options: SAMEORIGIN if UploadWizard enabled (T173631) (duration: 00m 44s)
  • 13:46 ottomata: adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on db1047: T170990
  • 13:37 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update logos for srwiktionary, add HD logos for srwiktionary (T172245) (duration: 00m 44s)
  • 13:36 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Update logos for srwiktionary, add HD logos for srwiktionary (T172245) (duration: 00m 45s)
  • 13:27 jynus: stop dbstore2001 mariadb@s7
  • 13:27 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new logo for the Baskhir Wikibooks (T173471) (duration: 00m 44s)
  • 13:26 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Add new logo for the Baskhir Wikibooks (T173471) (duration: 00m 44s)
  • 13:26 ottomata: adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on dbstore1002: T170990
  • 13:11 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Reopen bawikibooks (T173471) (duration: 00m 44s)
  • 13:10 zfilipin@tin: Synchronized dblists/closed.dblist: SWAT: Reopen bawikibooks (T173471) (duration: 00m 44s)
  • 12:52 marostegui: Stop replication on db1079 and db1041 to compare their data - T163190
  • 12:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T163190 (duration: 00m 45s)
  • 12:35 marostegui: Stop MySQL on db1015 to decommission it - T173570
  • 09:50 jynus: restart dbstore2001 mariadb@s1
  • 09:40 jynus: restart dbstore2001 mariadb@x1
  • 09:29 moritzm: upgrading hhvm-luasandbox on mw1262-1265
  • 09:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove db1015 - T173570 (duration: 00m 44s)
  • 09:15 marostegui@tin: Synchronized wmf-config/db-codfw.php: Remove db1015 - T173570 (duration: 00m 44s)
  • 09:01 moritzm: upgrading hhvm-luasandbox on mw1261
  • 08:08 marostegui: Rename tables article_assessment, article_assessment_pages, article_assessment_ratings tables from testwiki on db1078 - T173590
  • 07:42 marostegui: Drop cx_drafts table from x1 - T172364
  • 07:13 marostegui: Stop s4 replication thread on db1095 - T172996
  • 07:09 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2047 (duration: 00m 45s)
  • 02:36 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Aug 21 02:36:58 UTC 2017 (duration 7m 3s)
  • 02:29 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.14) (duration: 08m 57s)

2017-08-19

  • 13:17 Amir1: another run: (Hotwc3 → HotWC3) (Lamia Bahy → Albedo11) (Monóxido de carbono → Roquetero) (PaulMichaels → PaulBenario) (Rodrigo.dst → RodrigoTavares) (Sadia Tasnim (Moyna) → মুহাম্মদ সুমন মাহমুদ) (Syou 18331322 → Ms3102) (TzvetelinaOOD1 → Tzveti1) (World Para Taekwondo → TKD at World Para Taekwondo) (Yaellerner → Ya1levy777) (平井 俊光 → Toshimit) (T173419)
  • 13:00 Amir1: running the script for ("Clopper228" "CGminded") and ("Gregory.lussier" "StevenSmith83473") (T173419)
  • 12:47 Amir1: ladsgroup@terbium:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki "Bolsée" "Kathmandu2017" (T173419)
  • 03:21 krinkle@tin: Synchronized wmf-config/InitialiseSettings.php: Enable jQuery 3 on test.wikidata and mediawiki.org - Id4d42d8c53 (duration: 00m 45s)

2017-08-18

  • 21:05 mutante: gerrit2001 - restarted gerrit again after reverting gerrit:372426, using systemctl commands, not 'service' or init.d
  • 20:55 mutante: restarting gerrit on gerrit2001
  • 20:48 demon@tin: Synchronized scap/plugins/clean.py: Completeness (duration: 00m 44s)
  • 20:43 RainbowSprinkles: prior thing was a no-op, testing
  • 20:42 demon@tin: Pruned MediaWiki: 1.30.0-wmf.9 [keeping static files] (duration: 01m 43s)
  • 19:03 bblack: cp1074 - varnish backend restart (mailbox lag)
  • 19:01 bblack: reboot cp4021-8
  • 18:21 bblack: puppeting cp4021-8 - expect possibility of ipsec alerts, etc...
  • 13:32 Amir1: another run of /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=item --from-id $(tail -100 /tmp/rebuildTermSqlIndex.log | grep -E "Processed up to page (\d+?)" | sed -E "s/Processed up to page //; s/ \(Q.+?//" | tail -1) >>/tmp/rebuildTermSqlIndex.log 2>&1
  • 13:26 Amir1: ladsgroup@terbium:~$ mwscript namespaceDupes.php --wiki=hiwikiversity (T172977)
  • 12:44 Amir1: start of ladsgroup@terbium:~$ time /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=item
  • 11:36 jynus: change topology of dbstore2001:x1 and dbstore2002:x1
  • 11:00 elukey: reboot dbstore1002 for kernel updates
  • 10:34 elukey: restart mysql on dbstore1002 - attempt to reclaim space after big table drop (stop slaves and el_sync, check running queries, stop mysql, check process, start mysql)
  • 09:55 Amir1: one small pass of ladsgroup@terbium:~$ time /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=entity (T171460)
  • 09:46 Amir1: ladsgroup@terbium:~$ time /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=property (T171460)
  • 09:42 moritzm: installing libmspack security updates
  • 09:16 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2077 - T168409 (duration: 00m 44s)
  • 07:28 marostegui: Stop MySQL on db2077 to copy it to dbstore2001 - T168409
  • 07:26 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2077 - T168409 (duration: 00m 44s)
  • 07:20 moritzm: installing openjdk-7 security updates on trusty
  • 06:58 jynus: and nginx
  • 06:55 jynus: systemctl restart squid3.service on install2002, apt seem stuck on some servers
  • 06:24 jynus: stopping and upgrading db2033
  • 05:43 jynus: upgrading and restarting all mariadb instances on dbstore2002
  • afk: updated fundraising dash from 696a3ff to 7dfc969
  • 00:48 ebernhardson@tin: Synchronized php-1.30.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171213: Increase sampling rate of cirrus satisfaction schema (again) to 1k per bucket per day (duration: 00m 44s)
  • 00:18 ebernhardson@tin: Synchronized php-1.30.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171213: Increase sampling rate of cirrus satisfaction schema (duration: 00m 44s)

2017-08-17

  • 23:31 thcipriani@tin: Synchronized php-1.30.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Revert "Disable cirrus MLR ab test" (duration: 00m 44s)
  • 23:21 thcipriani@tin: Synchronized php-1.30.0-wmf.14/resources/src/mediawiki.rcfilters/dm/mw.rcfilters.dm.FilterGroup.js: SWAT: RCFilters: Fix validation for single_option groups T173303 (duration: 00m 44s)
  • 23:14 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Reapply "Remove temporary wgStructuredChangeFiltersEnableExperimentalViews setting" (duration: 00m 45s)
  • 22:10 ppchelko@tin: Finished deploy [changeprop/deploy@2c553a6]: Lower the concurrecy for transcludes to decrease cassandra load during cluster reshaping (duration: 01m 14s)
  • 22:09 thcipriani@tin: Synchronized php-1.30.0-wmf.14/includes/specials/SpecialNewpages.php: Restore the newFromId() approach in SpecialNewpages::feedItemDesc T173541 (duration: 00m 46s)
  • 22:09 ppchelko@tin: Started deploy [changeprop/deploy@2c553a6]: Lower the concurrecy for transcludes to decrease cassandra load during cluster reshaping
  • 22:07 gwicke: restarting pdfrender on scb100* nodes
  • 21:38 ejegg: updated dash from bec0077 to 696a3ff
  • 21:28 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable LoginNotify on sites without Echo too. :|https://gerrit.wikimedia.org/r/#/c/372456/ (duration: 00m 44s)
  • 21:19 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable LoginNotify finally! Yippeeeeee https://gerrit.wikimedia.org/r/#/c/372450/ (duration: 00m 45s)
  • 21:17 bblack: cp1063: varnish backend restart (mailbox lag)
  • 20:07 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.14
  • 19:28 chasemp: disable puppet for cloud things for some careful refactor merging
  • 19:14 thcipriani@tin: Synchronized php: group1 wikis to wmf.14 (duration: 00m 46s)
  • 19:12 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to wmf.14
  • 19:07 thcipriani@tin: Finished scap: ProofReadPage Revert to db7507246665e69384c1d92af2aedc62263a5116 T173520 (duration: 06m 13s)
  • 19:01 thcipriani@tin: Started scap: ProofReadPage Revert to db7507246665e69384c1d92af2aedc62263a5116 T173520
  • 18:28 ebernhardson: restart logstash on logstash1003 to see why its not reading EventError messages from kafka
  • 18:24 niharika29@tin: Synchronized php-1.30.0-wmf.14/extensions/RevisionSlider/: Revert Reintroduce hover and bar clicking - https://gerrit.wikimedia.org/r/#/c/372384/ (duration: 00m 48s)
  • 18:16 jynus: upgrading and restarting all mariadb instances on dbstore2001
  • 18:12 niharika29@tin: Synchronized php-1.30.0-wmf.14/extensions/LoginNotify/: Log usage statistics https://gerrit.wikimedia.org/r/#/c/372214/ (duration: 00m 51s)
  • 18:03 thcipriani@tin: Synchronized php-1.30.0-wmf.14/extensions/UploadWizard/UploadWizard.config.php: SWAT: Preserve array keys (language keys) when sorting the language dropdown T173522 (duration: 00m 51s)
  • 17:53 jynus: restarting mariadb (dbstore2001:x1) to test new buffer pool configuration
  • 17:20 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis back to wmf.13 now T173520
  • 17:10 RainbowSprinkles: gerrit: cpu spikes, lots of large gc logs. Looking into it. (nb: things might be a little slow, but it is /up/)
  • 16:54 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis back to wmf.14 now for T164173
  • 16:51 thcipriani@tin: Synchronized php-1.30.0-wmf.14/includes/jobqueue/jobs/RefreshLinksJob.php: Avoid lock acquisition errors for multi-title refreshlinks jobs T173462 (duration: 00m 51s)
  • 16:45 ejegg: updated SmashPig from c501f53 to 98c5516
  • 16:25 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Pool db2077 - T170662 (duration: 00m 49s)
  • 16:24 marostegui@tin: Synchronized wmf-config/db-codfw.php: Pool db2077 - T170662 (duration: 00m 51s)
  • 16:16 urandom: T169939: RESTBase: converting (33) new keyspaces to time-windowed compaction
  • 16:11 urandom: T169939: Lower eqiad compaction throughput from 20MB/s to 15MB/s
  • 15:46 godog: reimage restbase2001 - T169939
  • 15:42 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2001.codfw.wmnet
  • 14:32 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp3036.*
  • 13:30 zeljkof: EU SWAT finished
  • 13:24 zfilipin@tin: Synchronized static/favicon/wikiversity.ico: SWAT: Update wikiversity favicon (T160491) (duration: 00m 50s)
  • 13:15 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change $wgArticleCountMethod to any for srwikiquote (T172974) (duration: 00m 51s)
  • 13:08 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: Add one throttling exception (T173444) (duration: 00m 51s)
  • 12:38 moritzm: installing libgd/libsoup security updates
  • 12:08 urandom: T169939: Decommissioning Cassandra/restbase2001-c.codfw.wmnet
  • 12:07 urandom: T169939: Decommissioning Cassandra/restbase2001-b.codfw.wmnet
  • 10:29 gilles: Thumbor stress test finished
  • 10:20 gilles: Load testing Thumbor
  • 10:13 godog: roll-restart nginx on thumbor to apply https://gerrit.wikimedia.org/r/372199
  • 08:36 godog: roll-restart thumbor to apply webp support change
  • 08:30 godog: restart varnish on cp1062 to fix mailbox lag
  • 08:05 godog: reboot ms-be2021, unreachable on network
  • 06:24 marostegui: Stop slave on db2047 to fix duplicate keys - T151029
  • 05:55 marostegui: Stop replication on db2077 to fix duplicate entries - T151029
  • 05:48 marostegui: Stop replication in sync on db1078 and db1015 - T164488
  • 03:11 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Aug 17 03:11:13 UTC 2017 (duration 7m 18s)
  • 03:03 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.14) (duration: 03m 58s)
  • 02:50 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.13) (duration: 08m 33s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 52s)
  • 00:45 urandom: T169939: Decommissioning Cassandra/restbase2001-b.codfw.wmnet

2017-08-16

  • 22:30 urandom: T169939: Decommissioning Cassandra/restbase2001-a.codfw.wmnet
  • 22:14 urandom: T169939: Cleaning up wikipedia parsoid snapshots
  • 22:13 urandom: T169939: Rolling restart of Cassandra complete
  • 21:38 bawolff: deleting private info from securepoll_votes that the script missed due to ref-integ issues for old elections (T173393)
  • 21:37 thcipriani: train is on hold pending resolution of T173462
  • 21:33 bawolff: deleting private info in enwiki arbcom1_vote table (T173393)
  • 21:30 urandom: T169939: Rolling restart of Cassandra instances, eqiad, rack d
  • 21:22 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: revert group1 wikis to 1.30.0-wmf.14 for T173462
  • 21:21 thcipriani@tin: Synchronized php: revert group1 wikis to 1.30.0-wmf.14 for T173462 (duration: 00m 47s)
  • 20:46 urandom: T169939: Rolling restart of Cassandra instances, eqiad, rack b
  • 20:45 arlolra@tin: Finished deploy [parsoid/deploy@a9dc803]: Updating Parsoid to 1832a78e (duration: 08m 52s)
  • 20:37 arlolra@tin: Started deploy [parsoid/deploy@a9dc803]: Updating Parsoid to 1832a78e
  • 19:57 urandom: T169939: Rolling restart of Cassandra instances, eqiad, rack a
  • 19:52 MaxSem: Manually cleaning up PI on enwiki (T173393)
  • 19:37 thcipriani@tin: Synchronized php: group1 wikis to 1.30.0-wmf.14 (duration: 00m 46s)
  • 19:35 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.14
  • 19:06 urandom: T169939: Rolling restart of Cassandra instances, codfw, rack d
  • 18:57 thcipriani@tin: Synchronized wmf-config/CirrusSearch-common.php: SWAT: cirrus Tune ordering of crossproject search results on enwiki T171803 PART II (duration: 00m 50s)
  • 18:56 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: cirrus Tune ordering of crossproject search results on enwiki T171803 PART I (duration: 00m 51s)
  • 18:41 thcipriani@tin: Synchronized wmf-config: SWAT: Apply token count limits to phrase queries on all wikis T172653 (duration: 00m 53s)
  • 18:30 thcipriani@tin: Synchronized php-1.30.0-wmf.13/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Disable cirrus MLR ab test T171214 (duration: 00m 50s)
  • 18:29 thcipriani@tin: Synchronized php-1.30.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Disable cirrus MLR ab test T171214 (duration: 00m 51s)
  • 18:20 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: JobQueueEventBus: Enable group1 T163380 (duration: 00m 54s)
  • 18:18 urandom: T169939: Rolling restart of Cassandra instances, codfw, rack c
  • 17:31 urandom: T169939: Rolling restart of Cassandra instances, codfw, rack b
  • 17:07 demon@tin: Synchronized php-1.30.0-wmf.14/extensions/AntiSpoof/SpoofUser.php: T173394 (duration: 00m 51s)
  • 15:54 marostegui: Stop MySQL and shutdown db1078 for HW checks - T173365
  • 15:43 elukey: drop PageContentSaveComplete_5588433_15423246 from db1047 and dbstore1002 (analytics-slaves)
  • 15:41 godog: delete outdated CFs cassandra metrics from graphite2002 and graphite1003
  • 15:35 marostegui: Rename cx_drafts table on db1029 - T172364
  • 15:09 godog: restart varnish on cp1072 to clear mailbox lag
  • 14:56 godog: restart varnish on cp1099 to clear mailbox lag
  • 14:48 godog: restart varnish on cp1074 to clear mailbox lag
  • 14:44 godog: restart varnish on cp1049 to clear mailbox lag
  • 14:38 gilles: Thumbor stress test finished
  • 14:29 gilles: Stress testing Thumbor from single IP
  • 13:57 zeljkof: EU SWAT finished
  • 13:35 zfilipin@tin: Synchronized dblists/pp_stage1.dblist: SWAT: pagePreviews: Deploy to next 100 stage 1 wikis (T162672) (duration: 00m 50s)
  • 13:09 zfilipin@tin: Synchronized portals: (no justification provided) (duration: 00m 52s)
  • 13:08 zfilipin@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 52s)
  • 12:25 Amir1: ladsgroup@terbium:~$ time /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki testwikidatawiki --entity-type=property (T172776, T171460)
  • 12:25 moritzm: powercycling cp3036
  • 12:24 marostegui: Compressing InnoDB on db2077 - T168409
  • 12:22 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: cp3036.esams.wmnet
  • 10:46 marostegui: Stop replication in sync on db1015 and db1078 - T164488
  • 10:42 marostegui@tin: Synchronized wmf-config/db-codfw.php: Pool db2076 - T170662 (duration: 00m 51s)
  • 10:03 godog: copy ubuntu-font-family-sources to stretch-wikimedia - T170817
  • 08:37 marostegui: Drop wikigrok tables from s1, s3 and s5 - T172020
  • 08:12 marostegui: Stop MySQL on db2047 to copy its content to db2077 - T170662
  • 08:10 moritzm: bounced pdfrender on scb1004 (T159922)
  • 08:07 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2047 - T170662 (duration: 01m 06s)
  • 07:25 marostegui: Stop MySQL on db2076 to copy its content to dbstore2001 - T168409
  • 07:08 elukey: executed sudo find -type f -mtime +30 -exec rm {} \; in /var/log/carbon to free some space
  • 06:40 marostegui: Run pt-table-checksum on s3 for revision table - T164488
  • 06:01 marostegui: Stop replication on db2076 to fix duplicate entries - T151029
  • 03:35 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Aug 16 03:35:43 UTC 2017 (duration 7m 21s)
  • 03:28 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.14) (duration: 16m 32s)
  • 02:51 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.13) (duration: 08m 08s)
  • 02:28 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 51s)
  • 00:43 dereckson@tin: Synchronized php-1.30.0-wmf.12/extensions/Wikidata/Wikidata.php: Explicitly load badges extension (Gerrit:372088) (duration: 00m 51s)
  • 00:42 dereckson@tin: Synchronized php-1.30.0-wmf.14/extensions/Wikidata/Wikidata.php: Explicitly load badges extension (Gerrit:372051) (duration: 00m 51s)
  • 00:38 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Timeless on three French wikis (T154371) + Fixes for Wikidata: Remove wbq_evaluation logging, Update Wikidata property blacklist (Gerrit:367913 and Gerrit:370846) (duration: 00m 53s)
  • 00:02 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable WikidataPageBanner on test wikis (T173388) (duration: 00m 51s)

2017-08-15

  • 23:58 dereckson@tin: Synchronized php-1.30.0-wmf.14/skins/Timeless/resources/screen-common.less: Fix messed up recent changes/watchlist legends (T173151) (duration: 00m 50s)
  • 23:55 dereckson@tin: Synchronized php-1.30.0-wmf.13/skins/Timeless/resources/screen-common.less: Fix messed up recent changes/watchlist legends (T173151) (duration: 00m 54s)
  • 23:14 Dereckson: Ran updateArticleCount.php on sr.wikiquote (T172974)
  • 23:12 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Create a few of namespace aliases for hiwikiversity (T172977) (duration: 00m 53s)
  • 22:50 ppchelko@tin: Started restart [changeprop/deploy@444223d]: (no justification provided)
  • 22:45 Pchelolo: scb - codfw: enabling puppet and starting changeprop back T169939
  • 21:40 mobrovac: scb - codfw: disabling puppet and stopping changeprop to release load on Cassandra while dropping old tables - T169939
  • 21:26 urandom: Starting Cassandra restbase2005-b
  • 19:24 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.14
  • 19:19 mobrovac@tin: Finished deploy [restbase/deploy@1139d00]: Use only new parsoid tables (before the removal of old ones), part #2 - T169939 (duration: 02m 50s)
  • 19:16 mobrovac@tin: Started deploy [restbase/deploy@1139d00]: Use only new parsoid tables (before the removal of old ones), part #2 - T169939
  • 19:14 demon@tin: Finished scap: bootstrap wmf.14 v2.0 (duration: 30m 56s)
  • 19:04 mobrovac@tin: Started deploy [restbase/deploy@1139d00]: Use only new parsoid tables (before the removal of old ones) - T169939
  • 19:03 mobrovac@tin: Finished deploy [restbase/deploy@1139d00] (staging): Use only new parsoid tables (before the removal of old ones) - T169939 (duration: 04m 39s)
  • 18:58 mobrovac@tin: Started deploy [restbase/deploy@1139d00] (staging): Use only new parsoid tables (before the removal of old ones) - T169939
  • 18:43 demon@tin: Started scap: bootstrap wmf.14 v2.0
  • 18:39 demon@tin: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_3982832257" --threads=10 --lang en --quiet' returned non-zero exit status 255 (duration: 03m 05s)
  • 18:35 demon@tin: Started scap: bootstrap wmf.14
  • 18:18 demon@tin: Pruned MediaWiki: 1.30.0-wmf.12 [keeping static files] (duration: 02m 47s)
  • 17:59 bsitzmann@tin: Finished deploy [mobileapps/deploy@34a1304]: Update mobileapps to 33b80dd (T172829 T152441 T172021 T103362) (duration: 04m 00s)
  • 17:55 bsitzmann@tin: Started deploy [mobileapps/deploy@34a1304]: Update mobileapps to 33b80dd (T172829 T152441 T172021 T103362)
  • 15:21 mobrovac: restarting pdfrender on scb1001, added some debug messages to help us diagnose T159922
  • 13:42 gehel: cleanup of old leftover logs on deployment-logstash2:/var/log/logstash
  • 13:21 moritzm: bounced pdfrender on scb1001 (T159922)
  • 11:32 moritzm: installing Linux updates on stretch systems (no reboots yet)
  • 10:45 moritzm: installing PHP security updates on trusty
  • 10:09 jynus: rebooting db1078
  • 09:39 jynus@tin: Synchronized wmf-config/db-eqiad.php: depool db1078 (duration: 00m 49s)
  • 07:56 moritzm: installing cvs security updates
  • 07:18 elukey: restart pdfrender on scb1003
  • 02:56 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Aug 15 02:56:14 UTC 2017 (duration 6m 42s)
  • 02:49 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.13) (duration: 07m 53s)
  • 02:28 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 58s)

2017-08-14

  • 22:16 Dereckson: Previous log entry is related to Gerrit:371960 and Gerrit:371961.
  • 22:15 dereckson@tin: Synchronized php-1.30.0-wmf.11/extensions/EventBus/JobQueueEventBus.php: JobQueueEventBus: Populate the database field + not set properties are accessed (duration: 00m 48s)
  • 20:44 ppchelko@tin: Finished deploy [restbase/deploy@4d6c706]: Temporary fallback to the new storage buckets before truncation (duration: 09m 35s)
  • 20:35 ppchelko@tin: Started deploy [restbase/deploy@4d6c706]: Temporary fallback to the new storage buckets before truncation
  • 17:09 gehel: resetting mode on stat1005:/srv/published-datasets/discovery recursively
  • 16:29 elukey: execute sudo find -type f -mtime +60 -exec rm {} \; in /var/log/carbon on graphite2001 to free some space in /
  • 13:32 elukey: Execute systemctl mask nfacctd on rhenium.wikimedia.org for T172681
  • 13:08 zeljkof: nothing for EU SWAT
  • 12:21 jynus: stopping replication on all instances of dbstore2001 T169516
  • 11:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add db2076 - T170662 T151029 (duration: 00m 47s)
  • 11:00 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add db2076 - T170662 T151029 (duration: 01m 02s)
  • 10:34 marostegui: Restart MySQL on db1069 to pick up new replcation filters
  • 10:27 marostegui: Restart MySQL on db1095 to pick up new replication filters
  • 09:16 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2001.codfw.wmnet
  • 09:07 _joe_: stopping thumbor on thumbor2001 after depooling it for testing
  • 09:05 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2001.codfw.wmnet
  • 03:15 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Aug 14 03:15:31 UTC 2017 (duration 7m 6s)
  • 03:08 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.13) (duration: 06m 47s)
  • 02:39 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 29s)

2017-08-13

  • 22:45 ebernhardson: restart elsaticsearch on elastic1017 after setting md2 readahead to 256 to match md2 on 1032-152
  • 19:36 godog: upload python-thumbor-wikimedia 1.2 - T161719
  • 19:01 godog: bounce pdfrender on scb1001 and scb1003 - T159922
  • 18:46 godog: bounce carbon and uwsgi on graphite1003
  • 17:53 reedy@tin: Synchronized wmf-config/db-eqiad.php: fix comment (duration: 00m 47s)
  • 17:52 reedy@tin: Synchronized wmf-config/db-codfw.php: fix comment (duration: 00m 47s)
  • 17:51 reedy@tin: Synchronized refresh-dblist: phpcs (duration: 00m 48s)

2017-08-12

  • 20:00 krinkle@tin: Synchronized php-1.30.0-wmf.13/includes/jobqueue/JobQueueGroup.php: T171371 - Log job pushes to bogus wikis (duration: 00m 53s)
  • 16:09 Reedy: Deleted some bogus user languages from commonswiki.user_properties
  • 15:25 elukey: powercycle mw2256 (able to use com2 but not to login as root, regular ssh hanging) - T163346

2017-08-11

  • 21:47 legoktm@tin: Finished scap: (no justification provided) (duration: 07m 09s)
  • 21:41 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: mw2256.codfw.wmnet
  • 21:40 legoktm@tin: Started scap: (no justification provided)
  • 21:40 legoktm@tin: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.unknown-but-probably-mediawiki.lock"; owner is "legoktm"; reason is "Deploying Timeless (try 2) - T154371" (duration: 00m 00s)
  • 21:02 ebernhardson: unban elastic1017 from elasticsearch cluster
  • 20:52 bblack: varnish backend restart on cp1099
  • 20:46 legoktm@tin: Started scap: Deploying Timeless (try 2) - T154371
  • 20:42 legoktm@tin: scap aborted: Deploying Timeless - T154371 (duration: 05m 10s)
  • 20:36 legoktm@tin: Started scap: Deploying Timeless - T154371
  • 20:19 bblack: varnish backend restart on cp1049 + cp1074 (mailbox lag)
  • 19:59 ebernhardson: ban elastic1017 from eqiad search cluster
  • 18:28 moritzm: installing subversion security updates
  • 17:19 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2075 (duration: 00m 47s)
  • 17:05 twentyafterfour: Deploying phabricator security update
  • 16:21 jynus_: stop db1069:s6 replication and dropping frwiki, jawiki, ruwiki
  • 15:41 jynus_: stopping db2075 to clone it to dbstore2001
  • 15:39 moritzm: installing git security updates on trusty (jessie/stretch already fixed)
  • 15:38 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2075 (duration: 00m 48s)
  • 13:51 elukey: moved the eventbus scap deployment dirs on kafka[12]00[123] to deploy-service:deploy-service to allow scap to depool/pool - T171506
  • 13:49 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 27s)
  • 13:49 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 13:42 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 13s)
  • 13:42 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 13:25 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 34s)
  • 13:24 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 13:16 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 12s)
  • 13:15 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 13:00 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 17s)
  • 13:00 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 12:51 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 04s)
  • 12:51 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 12:44 mobrovac@tin: Finished deploy [eventlogging/eventbus@41e3418]: (no justification provided) (duration: 00m 05s)
  • 12:44 mobrovac@tin: Started deploy [eventlogging/eventbus@41e3418]: (no justification provided)
  • 12:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2046 - T151029 (duration: 00m 48s)
  • 11:25 jynus: stopping and upgrading labsdb1010
  • 11:19 marostegui: Stop replication on db2046 to fix duplicate entries - T151029
  • 09:28 jynus: stopping and restarting es2013 for upgrade
  • 09:07 jynus: stopping and restarting db2046 for upgrade
  • 07:35 elukey: restart pdfrender on scb1004

2017-08-10

  • 23:43 maxsem@tin: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/371196/2 (duration: 00m 47s)
  • 23:26 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/370310/ (duration: 00m 47s)
  • 23:25 maxsem@tin: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/370310/ (duration: 00m 46s)
  • 22:31 twentyafterfour@tin: Synchronized php-1.30.0-wmf.13/includes/specials/SpecialDoubleRedirects.php: Hopefully fix T173045 (duration: 00m 48s)
  • 22:17 moritzm: installing git security-updates
  • 20:51 mobrovac@tin: Finished deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2.5 - T137371 (duration: 00m 21s)
  • 20:51 mobrovac@tin: Started deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2.5 - T137371
  • 20:50 mobrovac@tin: Finished deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2 - T137371 (duration: 00m 09s)
  • 20:50 mobrovac@tin: Started deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment, take #2 - T137371
  • 20:35 mobrovac@tin: Finished deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment - T137371 (duration: 00m 05s)
  • 20:34 mobrovac@tin: Started deploy [cassandra/metrics-collector@d0169ee] (staging): First Scap3 deployment - T137371
  • 20:00 mobrovac@tin: Finished deploy [cassandra/metrics-collector@5db1a43] (staging): First Scap3 deployment - T137371 (duration: 18m 17s)
  • 19:42 mobrovac@tin: Started deploy [cassandra/metrics-collector@5db1a43] (staging): First Scap3 deployment - T137371
  • 19:35 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 47s)
  • 19:33 reedy@tin: Synchronized wmf-config/: Newsletter on testwiki (duration: 00m 49s)
  • 19:31 mobrovac@tin: Finished deploy [cassandra/logstash-logback-encoder@d085ffa] (aqs): first Scap3 deployment - T116340 (duration: 01m 08s)
  • 19:30 mobrovac@tin: Started deploy [cassandra/logstash-logback-encoder@d085ffa] (aqs): first Scap3 deployment - T116340
  • 19:30 mobrovac@tin: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: first Scap3 deployment - T116340 (duration: 00m 34s)
  • 19:29 mobrovac@tin: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: first Scap3 deployment - T116340
  • 19:28 mobrovac@tin: Finished deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment (rest of the nodes) - T116340 (duration: 00m 12s)
  • 19:28 mobrovac@tin: Started deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment (rest of the nodes) - T116340
  • 19:27 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: All wikis (except wikidata) to 1.30.0-wmf.13 refs T170631
  • 19:16 godog: restart cassandra instances on xenon to test logstash-logback-encoder deploy
  • 19:13 mobrovac@tin: Finished deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment - T116340 (duration: 00m 03s)
  • 19:13 mobrovac@tin: Started deploy [cassandra/logstash-logback-encoder@d085ffa] (staging): first Scap3 deployment - T116340
  • 18:56 Reedy: created newsletter tables on testwiki
  • 18:55 reedy@tin: Synchronized php-1.30.0-wmf.13/extensions/Newsletter/: sql file updates (duration: 00m 52s)
  • 18:53 reedy@tin: Synchronized php-1.30.0-wmf.13/extensions/WikimediaMaintenance/createExtensionTables.php: newsletter (duration: 00m 52s)
  • 18:21 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Fix commons/commonswiki snafu (duration: 00m 52s)
  • 17:53 kartik@tin: Finished deploy [cxserver/deploy@f43ef96]: (no justification provided) (duration: 00m 44s)
  • 17:52 kartik@tin: Started deploy [cxserver/deploy@f43ef96]: (no justification provided)
  • 17:42 kartik@tin: Finished deploy [cxserver/deploy@1065ffe]: Update cxserver to 686f4f3 (duration: 00m 40s)
  • 17:42 kartik@tin: Started deploy [cxserver/deploy@1065ffe]: Update cxserver to 686f4f3
  • 16:23 ema: cp1072: restart varnish backend
  • 15:48 XioNoX: troubleshoting interface errors between pfw3-codfw and fasw-codfw
  • 15:09 marostegui: Stop replication on db2046 to fix duplicate entries - T151029
  • 15:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2046 - T151029 (duration: 00m 50s)
  • 15:03 marostegui: Poweroff es2013 for maintenance - T172265
  • 14:45 marostegui@tin: Synchronized wmf-config/db-codfw.php: Pool db2075 - T170662 (duration: 00m 51s)
  • 14:32 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2045 - T151029 (duration: 00m 51s)
  • 14:11 elukey: restart kafka1012 temporary with some logs to TRACE to debug T172681
  • away: updated civicrm from 200abc2 to 4aa177b
  • 13:43 marostegui: Compress cebwiki on db1095 - T153058
  • 13:32 zeljkof: EU SWAT finished
  • 13:26 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgArticleCountMethod to any on srwikisource (T172974) (duration: 00m 53s)
  • 13:18 zeljkof: EU SWAT, part two
  • 13:17 marostegui: Drop m3 databases from dbstore1002 - T156758
  • 13:16 zeljkof: EU SWAT finished
  • 13:15 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable NewUserMessage on knwiki (T172894) (duration: 00m 52s)
  • 12:56 marostegui: Stop MySQL on db2045 to upgrade socket location - T148507
  • 12:07 elukey: restored varnishakafka on cp3032
  • 11:17 elukey: disabled puppet on cp3032 and restarted varnishkafka with debug logging
  • 09:58 jynus: continuing cloning of dbstore2002 to dbstore2001
  • 09:30 gehel: repooling wdqs2001, long after data reload completed
  • 09:30 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2001.codfw.wmnet
  • 09:27 gehel@tin: Finished deploy [wdqs/wdqs@c186e3e]: (no justification provided) (duration: 01m 31s)
  • 09:25 gehel@tin: Started deploy [wdqs/wdqs@c186e3e]: (no justification provided)
  • 09:17 jynus: disabling lag notification on all s4 replicas
  • 08:59 elukey: update librdkafka1 to 0.9.4.1 on eventlog1001
  • 08:15 elukey: add 50G to carbon lv on graphite1003 and 100G on graphite2002
  • 07:32 jynus: enabling semisymc master replication on db1068
  • 07:29 jynus: disabling semisync slave replication on all SSD hosts on s4
  • 06:45 elukey: powercycle mw2256 - T163346
  • 06:38 elukey: restart pdfrender on scb1004
  • 06:04 Dereckson: Removed 2FA for GoldRingChip account (T172878)
  • 03:23 bblack: cp1008, restbase-dev100[456] have puppet disabled, manual "rm /etc/ssh/userkeys/dzahn"
  • 03:20 bblack: batched cumin puppet agent run on all hosts (not forced)
  • 02:28 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 09m 12s)

2017-08-09

  • 22:06 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: Get baaack you broken CodeMirror! (duration: 00m 51s)
  • 21:20 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy CodeMirror https://gerrit.wikimedia.org/r/#/c/370943/ (duration: 00m 50s)
  • 21:11 maxsem@tin: Synchronized php-1.30.0-wmf.12/extensions/CodeMirror: https://gerrit.wikimedia.org/r/#/c/370904/ (duration: 00m 51s)
  • 21:05 ppchelko@tin: Finished deploy [restbase/deploy@f97beeb]: Rollback on canary (duration: 07m 30s)
  • 21:01 godog: add 100G to carbon lv on graphite1003 and graphite2002
  • 20:58 ppchelko@tin: Started deploy [restbase/deploy@f97beeb]: Rollback on canary
  • 20:54 ppchelko@tin: Finished deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation. All tables created, finish deploy (duration: 07m 12s)
  • 20:51 marostegui: Remove m3 replication from dbstore1002 - T156758
  • 20:47 ppchelko@tin: Started deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation. All tables created, finish deploy
  • 20:43 ppchelko@tin: Finished deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation. Attempt 3 (duration: 08m 05s)
  • 20:37 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2046 - T170662 (duration: 00m 49s)
  • 20:35 ppchelko@tin: Started deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation. Attempt 3
  • 20:22 twentyafterfour: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: Actually sync all group1 wikis (except wikidata) to 1.30.0-wmf.13 refs T170631
  • 18:57 ppchelko@tin: Started deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation. Attempt 2
  • 18:39 marostegui: Stop replication on db2045 to fix duplicate keys - T151029
  • 18:36 urandom: Restarting Cassandra, restbase2001-b.codfw.wmnet (schema jankiness)
  • 18:27 urandom: Restarting Cassandra, restbase2001-a.codfw.wmnet (schema jankiness)
  • 18:10 ebernhardson: restart es on elastic1018 to test interleaved numa
  • 17:50 marostegui: Add db2076 to tendril - T170662
  • 17:41 marostegui: Compress innodb on s6 on db2076 - T170662
  • 17:31 legoktm: changed mediawiki/vendor to fast forward only
  • 17:24 ppchelko@tin: Finished deploy [restbase/deploy@f97beeb]: Rollback on canary (duration: 01m 27s)
  • 17:23 ppchelko@tin: Started deploy [restbase/deploy@f97beeb]: Rollback on canary
  • 16:35 ppchelko@tin: Finished deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation (duration: 04m 46s)
  • 16:30 ppchelko@tin: Started deploy [restbase/deploy@f97beeb]: Temporary fallback to the new storage buckets before truncation
  • 16:29 urandom: T172384: Upgrading Cassandra in RESTBase dev to 3.11.0-wmf2 (patched to disable use of FastThreadLocal)
  • 16:15 marostegui: Stop mysql on db2046 to copy its content to db2076 - T170662
  • 16:14 elukey: rolling restart of eventstream on scb hosts to deploy https://gerrit.wikimedia.org/r/370793
  • 16:11 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2046 - T170662 (duration: 00m 50s)
  • 16:08 gehel: config update on logstash, a few logs might be lost during restart - T172713
  • 15:19 XioNoX: removing old pfw related config from cr1-codfw - T171970
  • 14:08 mutante: purging req.urls with "^/resources" from varnish cluster, to fix redirect with cached 404
  • 14:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add db2075 to s5 - T170662 (duration: 00m 50s)
  • 14:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add db2075 to s5 - T170662 (duration: 00m 51s)
  • 13:55 marostegui: Reboot db2076 for maintenance - T170662
  • 13:21 marostegui: Stop replication on db2075 for maintenance - T170662
  • 13:18 ladsgroup@tin: Synchronized wmf-config/InitialiseSettings.php: Allow bureaucrats remove confirmed user group (T101983) (duration: 00m 51s)
  • 13:10 gehel@tin: Finished deploy [wdqs/wdqs@6620e0f]: (no justification provided) (duration: 01m 34s)
  • 13:09 gehel@tin: Started deploy [wdqs/wdqs@6620e0f]: (no justification provided)
  • 12:41 gehel: restarting updater on wdqs1001 (real fix coming up soon
  • 10:20 jynus: stopping dbstore2002's mysqls and cloning them to dbstore2001
  • 08:39 jynus: disable puppet on dbstore2001, about to be reimaged
  • 07:47 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1065, db1064, db1070 (duration: 01m 04s)
  • 07:12 jynus: stopping replication in sync between db1069 and db1065, db1044, db1064, db1070
  • 07:07 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1065, db1064, db1070 (duration: 00m 47s)
  • 06:26 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 51s)
  • 06:11 jynus: stopping db1069 (s7) and db1079 replication in sync
  • 06:03 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1085, depool db1079 (duration: 00m 51s)
  • 05:42 jynus: stopping db1069 (s6) and db1085 replication in sync
  • 05:32 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060, depool db1085 (duration: 00m 50s)
  • 04:56 jynus: stopping db1069 (s2) and db1060 replication in sync
  • 04:54 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 (duration: 00m 58s)
  • 04:48 jynus: restarting pdfrender on scb1001, unresponsive
  • 04:09 jynus: powercycling mw2256
  • 03:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.13) (duration: 07m 33s)
  • 02:56 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.12) (duration: 07m 45s)
  • 02:38 ebernhardson: restart elasticsearch1017 with niofs store
  • 02:28 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 38s)
  • 00:27 ebernhardson: test vm.zone_reclaim_mode=0 on elastic1017

2017-08-08

  • 23:23 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Allow bureaucrats on WMF wikis to grant and remove 'confirmed' (T101983) (duration: 00m 51s)
  • 21:25 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.13 refs T170631
  • 21:08 ppchelko@tin: Finished deploy [restbase/deploy@c16fb6b]: Update summary and licensing information for pageviews API (duration: 08m 31s)
  • 21:05 twentyafterfour@tin: Finished scap: again: deploy 1.30.0-wmf.13 to testwikis and rebuild l10n refs T170631 (duration: 42m 55s)
  • 20:59 ppchelko@tin: Started deploy [restbase/deploy@c16fb6b]: Update summary and licensing information for pageviews API
  • 20:32 twentyafterfour: updated mediawiki.org changelog for 1.30.0-wmf.13
  • 20:22 twentyafterfour@tin: Started scap: again: deploy 1.30.0-wmf.13 to testwikis and rebuild l10n refs T170631
  • 19:46 ema: restart varnish backend on cp1074
  • 19:45 godog: bounce thumbor-instances on thumbor1003 to make sure all memory limits are applied
  • 19:44 twentyafterfour@tin: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="test2wiki" --outdir="/tmp/scap_l10n_224168097" --threads=10 --lang en --quiet' returned non-zero exit status 255 (duration: 07m 33s)
  • 19:36 twentyafterfour@tin: Started scap: deploy 1.30.0-wmf.13 to testwikis and rebuild l10n refs T170631
  • 18:28 XioNoX: frack-codfw moved to new infrastructure
  • 18:27 elukey: re-enabled irc-echo after the puppet shower
  • 18:11 elukey: stop ircecho to avoid puppet shower
  • 16:59 twentyafterfour: Branching mediawiki/master to mediawiki/wmf/1.30.0-wmf.13 refs T170631
  • 16:04 elukey: rolling restart of varnishkafka-webrequest to apply https://gerrit.wikimedia.org/r/#/c/370659/ (puppet automatically restarts)
  • 15:03 XioNoX: starting pfw-codfw migration - T171970
  • 14:19 elukey: restart of all the varnishkafka statsv/eventlogging instances on caching hosts to pick up https://gerrit.wikimedia.org/r/370644 (puppet automatic restarts)
  • 14:16 elukey: set mw2256 pooled=inactive + downtime to allow BIOS upgrade - T163346
  • 14:07 ladsgroup@tin: Synchronized wmf-config/Wikibase-production.php: SWAT: Add copyright info for Wikidata API (T112606) (duration: 00m 47s)
  • 13:56 gehel: dump of task API for elasticsearch eqiad - T169498
  • 13:54 ladsgroup@tin: Synchronized static/images/project-logos: SWAT: Fix srwiki logos (T150618) (duration: 00m 48s)
  • 13:00 elukey: restart varnishkafka-webrequest with kafka.broker.version.fallback=0.9.0.1 + kafka.api.version.request=false on cp3032 (local test, to rollback remove the lines from /etc/varnishkafka/webrequest.conf)
  • 12:49 Amir1: start of ladsgroup@terbium:~$ /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=property --rebuild-all-terms (T172776)
  • 12:32 elukey: restart pdfrender on scb1002
  • 12:20 Amir1: start of ladsgroup@terbium:~$ /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=property (T172776)
  • 12:15 gehel: restarting wdqs-updater on wdqs1001
  • 12:14 elukey: stop eventlogging on eventlog1001 to test kafka consumer failures
  • 11:22 Amir1: start of ladsgroup@terbium:~$ timeout 3500s /usr/local/bin/mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki wikidatawiki --entity-type=item >>/tmp/rebuildTermSqlIndex.log 2>&1 (T171460)
  • 10:12 elukey: update librdkafka1* on notebook100[12] and stat1003
  • 09:09 jynus@tin: Synchronized wmf-config/db-eqiad.php: Pool db1098 as the main recentchanges/watchlist s6 role (duration: 00m 47s)
  • 08:46 jynus@tin: Synchronized wmf-config/db-eqiad.php: Pool db1098 at 50% load (duration: 00m 46s)
  • 08:28 jynus@tin: Synchronized wmf-config/db-eqiad.php: Pool db1098 with limited load (duration: 00m 46s)
  • 07:53 jynus@tin: Synchronized wmf-config/db-codfw.php: Add db1098 (duration: 00m 47s)
  • 07:41 elukey: stop puppet on cp3032 (cache::text) to set varnishkafka-webrequest logging to debug
  • 07:34 Amir1: stopped the script and re-running without --deduplicate-terms (T171460)
  • 07:21 Amir1: start of ladsgroup@terbium:~$ time mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/rebuildTermSqlIndex.php --wiki=wikidatawiki --entity-type=property --deduplicate-terms (T171460)
  • 06:12 elukey: alert users with big home directories for stat1005 disk alarms (will erase data later on only if they don't answer)
  • 06:12 elukey: restart pdfrender on scb1003
  • 03:55 mutante: phab1001 /usr/local/bin/community_metrics.sh | /usr/local/bin/project_changes.sh creating stats mails to admins (which failed before) (T163938)
  • 03:25 mutante: phab1001 /srv/phab/tools/public_task_dump.py to create dump, was failed cron due to missing /srv/dumps/
  • 02:53 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Aug 8 02:53:13 UTC 2017 (duration 6m 53s)
  • 02:46 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.12) (duration: 06m 59s)
  • 02:36 ejegg: enabled donation queue consumer
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 08m 25s)
  • 00:46 ejegg: disabled donation queue consumer

2017-08-07

  • 21:28 urandom: T172384: Upgrading Cassandra to 3.11.0-wmf1 in dev environment (build patched to disable in-built heap dumping)
  • 19:51 ejegg: updated CiviCRM from f24ba78 to 200abc2
  • 19:05 ebernhardson@tin: Synchronized php-1.30.0-wmf.12/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171212 - Turn on CirrusSearch MLR AB test (duration: 00m 46s)
  • 19:00 ebernhardson@tin: Synchronized wmf-config/CirrusSearch-common.php: T169498 - Enable max token count for phrase rescore on zh lang wikis (step 2) (duration: 00m 46s)
  • 18:59 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T169498 - Enable max token count for phrase rescore on zh lang wikis (step 1) (duration: 00m 46s)
  • 18:53 ebernhardson@tin: Synchronized wmf-config/CirrusSearch-production.php: T171212 - Update CirrusSearch AB test rescore profiles (duration: 00m 46s)
  • 18:44 ebernhardson@tin: Synchronized wmf-config/Wikibase-labs.php: T112606 - beta only - Add copyright info for Wikidata API (duration: 00m 46s)
  • 18:43 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T172630 - Enable wgMinervaEnableSiteNotice for kowiki (duration: 00m 46s)
  • 18:38 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T170687 - Exclude files from Special:ShortPages on commons (duration: 00m 46s)
  • 18:23 ebernhardson@tin: Synchronized php-1.30.0-wmf.12/extensions/CirrusSearch/: T169498 limit phrase token count, T172464 constant boost ltr queries (duration: 00m 58s)
  • 18:10 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T172594 - Translate sitename for nl.wikinews (duration: 00m 47s)
  • 18:05 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: Grant autopatrol to editor in en.wikibooks - T172561 (duration: 00m 47s)
  • 17:57 mutante: phab2001 - re-enabling puppet, but closing firewall for 80/443
  • 17:56 jynus: stopping slave and reparitioning db1098
  • 17:10 marostegui: Restart s7 instance on db1102 to pick up new replication filters - T172693
  • 17:05 gehel@tin: Finished deploy [wdqs/wdqs@da33919]: (no justification provided) (duration: 02m 28s)
  • 17:02 gehel@tin: Started deploy [wdqs/wdqs@da33919]: (no justification provided)
  • 16:44 marostegui: Restart s7 instance on db1069 to pick up new replication filters - T172693
  • 16:37 XioNoX: manually restarted varnish on cp1099
  • 15:10 thcipriani: restarting jenkins for plugin upgrade
  • 14:51 gehel: reducing elasticsearch eqiad concurrent rebalance to 4 (from 8)
  • 14:38 elukey: updated librdkafka1 and ++1 to 0.9.4.1 on hafnium
  • 14:32 mutante: phab2001 - stopping Apache,schedule downtime for http and puppet
  • 14:22 herron: mx[1,2]001, fermium: Installed libmail-dkim-perl and restarted spamassassin service - T172689
  • 13:15 jynus: reboot db1098
  • 12:39 _joe_: restarting pdfrender on scb1001, T159922
  • 12:39 elukey: restart kafka on kafka1018 to force it out of the kafka topic leaders - T172681
  • 12:26 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2074 - T171321 (duration: 00m 45s)
  • 12:08 gehel: deploying https://gerrit.wikimedia.org/r/#/c/299825/ - some logs will be lost during logstash restart
  • 10:02 marostegui: Add dbstore2002:3313 to tendril - T171321
  • 09:47 jynus: stopping db1050's mysql and cloning it to db1089
  • 09:06 elukey: set net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 (was 120) on all the analytics kafka brokers - T136094
  • 09:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2065 after fixing: linter, page and watchlist tables (duration: 00m 47s)
  • 08:12 marostegui: Force BBU re-learn on db1016 - T166344
  • 07:02 marostegui: Stop replication on db2065 to reimport: page, linter and watchlist tables
  • 07:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2065 to reimport: page, linter and watchlist tables (duration: 00m 47s)
  • 06:38 marostegui: Stop MySQL on db2074 - T171321
  • 06:37 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2074 - T171321 (duration: 00m 46s)
  • 06:33 marostegui: Stop replication on db2075 - T170662
  • 06:27 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2073 - T171321 (duration: 00m 47s)
  • 06:20 marostegui: Force BBU re-learn on db1016 - T166344
  • 02:57 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Aug 7 02:57:42 UTC 2017 (duration 6m 42s)
  • 02:51 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.12) (duration: 07m 56s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 10m 16s)

2017-08-06

  • 13:17 elukey: powercycle mw2256 - com2 frozen - T163346
  • 13:13 elukey: restart pdfrender on scb1002
  • 06:18 ebernhardson@tin: Synchronized wmf-config/PoolCounterSettings.php: T169498: Reduce cirrus search pool counter to 200 parallel requests cluster wide (duration: 02m 54s)
  • 01:28 chasemp: conf2002:~# service etcdmirror-conftool-eqiad-wmnet restart (not sure what else to do the service failed)

2017-08-05

  • 14:40 Reedy: created oauth tables on foundationwiki T172591
  • 14:13 reedy@tin: Synchronized php-1.30.0-wmf.12/extensions/WikimediaMaintenance/createExtensionTables.php: add oauth (duration: 00m 48s)

2017-08-04

  • 23:51 mutante: phab2001 - removed outdated /etc/hosts entries, that fixed rsync, syncing /srv/repos/ from phab1001
  • 23:35 mutante: phab2001 rebooting
  • 23:35 mutante: phab2001 - installing various package upgrades, apt-get autoremove old kernel images
  • 23:12 mutante: "reserved" UID 498 for phd on https://wikitech.wikimedia.org/wiki/UID | phab2001: find -exec chown to fix all the files , restart cron
  • 23:04 mutante: phab2001 - changing UID/GID for phd user from 997:997 to 498:498 to make it match phab1001, to fix rsync breaking permissions. (rsync forces --numeric-ids when fetching from and rsyncd configured with chroot=yes). chown -R phw:www-data /srv/repos/
  • 22:37 ejegg: restarted donations and refund queue consumers
  • 21:44 ejegg: stopped donations and refund queue consumers
  • 21:24 urandom: T172384: Disabling Puppet in dev environment to prevent unattended Cassandra restarts
  • 20:19 mutante: renewing SSL cert for status.wm.org (just like wikitech-static, but that one didnt have monitoring?)
  • 20:02 mutante: wikitech-static-ord - apt-get install certbot
  • 19:41 ejegg: updated CiviCRM from f1fd7f0 to f24ba78
  • 19:38 mutante: renaming graphite varnish director/fixing config, running puppet on cache misc, tested on cp1045
  • 18:17 andrewbogott: switched most cloud instance to new puppetmasters, as per https://phabricator.wikimedia.org/T171786
  • 11:46 marostegui: Deploy schema change directly on s3 master for maiwikimedia - T172485
  • 11:30 marostegui: Deploy schema change directly on s3 master for kbpwiki - T172485
  • 11:14 marostegui: Deploy schema change directly on s3 master for dinwiki - T172485
  • 10:14 marostegui: Deploy schema change directly on s3 master for atjwiki - T172485
  • 10:05 marostegui: Stop replication on db2073 for maintenance
  • 09:22 marostegui: Add dbstore2002 to tendril - T171321
  • 09:19 marostegui: Deploy schema change directly on s3 master for techconductwiki - T172485
  • 08:35 marostegui: Deploy schema change directly on s3 master for hiwikiversity - T172485
  • 08:19 marostegui: Deploy schema change directly on s3 master for wikimania2018wiki - T172485
  • 08:04 marostegui: Sanitize wikimania2018wiki on sanitarium and sanitarium2 - T155041
  • 07:47 marostegui: Stop MySQL on db2073 to copy its data to dbstore2002 - https://phabricator.wikimedia.org/T171321
  • 07:47 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2073 - T171321 (duration: 00m 47s)
  • 07:07 moritzm: installing imagemagick regression security updates on trusty
  • 06:47 marostegui: Sanitize hiwikiversity on sanitarium and sanitarium2 - T171829
  • 05:23 mutante: phab1001 sudo ip addr del 10.64.32.186/32 dev eth0 (T172478)
  • 02:28 dzahn@neodymium: conftool action : set/pooled=yes; selector: name=phab1001-vcs.eqiad.wmnet
  • 02:28 dzahn@neodymium: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
  • 02:15 mutante: phab1001 can't talk to mx servers via IPv6, but works via IPv4. iridium and other mailservers can also talk IPv6 to it. why? it did not change even when stopping ferm on client and on server it allows from anywhere. workaround for now was to hardcode IPv4 IP in phab config. (T163938)
  • 02:13 twentyafterfour: outgoing phab mail is working again
  • 01:16 twentyafterfour: twentyafterfour@phab1001:/srv/repos$ sudo chown -R phd:www-data /srv/repos
  • 01:16 twentyafterfour: twentyafterfour@phab1001:/srv/repos$ sudo chmod -x /usr/local/sbin/sync-srv-repos
  • 01:12 mutante: phab1001 - stopped and started exim, which is now running with same options as iridium
  • 00:53 twentyafterfour: starting phd to shut up icinga
  • 00:40 ebernhardson@tin: Synchronized php-1.30.0-wmf.12/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn CirrusSearch MLR test back off (duration: 00m 47s)
  • 00:35 twentyafterfour: phabricator web service is also up
  • 00:34 mutante: phab1001 - service IPs switched - puppet ran - ssh-phab service up
  • 00:30 twentyafterfour: phab1001 chown -R phd:phd /srv/repos
  • 00:28 twentyafterfour: testing phd on phab1001
  • 00:24 mutante: phab1001 sudo ip addr add 2620:0:861:103:10:64:32:186/128 dev eth0
  • 00:24 mutante: phab1001 sudo ip addr add 10.64.32.186/32 dev eth0
  • 00:23 mutante: iridium sudo ip addr del 2620:0:861:ed1a::3:16/128 dev lo
  • 00:23 mutante: iridiium sudo ip addr del 208.80.154.250/32 dev lo
  • 00:23 mutante: iridium sudo ip addr del 2620:0:861:103:10:64:32:186/128 dev eth0
  • 00:23 mutante: iridium sudo ip addr del 10.64.32.186/32 dev eth0
  • 00:07 twentyafterfour: stoped phd and apache on iridium
  • 00:07 twentyafterfour: stopped ssh-phab on iridium
  • 00:03 mutante: phab1001 - /usr/bin/rsync -av rsync://iridium.eqiad.wmnet/srv-repos /srv/repos/
  • 00:02 twentyafterfour: Taking phabricator down for maintenance / migration to a new server: phab1001

2017-08-03

  • 23:43 ebernhardson@tin: Synchronized php-1.30.0-wmf.12/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171212: Turn on CirrusSearch MLR AB test (duration: 00m 46s)
  • 23:36 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/370117/2 (duration: 00m 47s)
  • 23:14 ebernhardson@tin: Synchronized php-1.30.0-wmf.12/extensions/CodeMirror/resources/ext.CodeMirror.js: Only show popup if CodeMirror button exists (duration: 00m 46s)
  • 23:12 ebernhardson@tin: Synchronized php-1.30.0-wmf.12/extensions/WikimediaEvents/extension.json: (no justification provided) (duration: 00m 47s)
  • 23:08 ebernhardson@tin: Synchronized wmf-config/CirrusSearch-production.php: CirrusSearch config for MLR test (step 2) (duration: 00m 46s)
  • 23:07 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: CirrusSearch config for MLR test (step 1) (duration: 00m 47s)
  • 22:06 ejegg: restarted donation and refund queue consumers
  • 21:51 maxsem@tin: Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/370096/ (duration: 00m 47s)
  • 21:51 ejegg: enabled ingenico_audit drupal module
  • 21:45 ejegg: updated civicrm from 5c741b1 to f1fd7f0
  • 21:43 ejegg: disabled donation and refund queue consumers
  • 21:31 chasemp: clear out nova-fullstack project so we can monitor fresh on new puppetmaster
  • 21:30 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: Enable CodeMirror https://gerrit.wikimedia.org/r/#/c/370072/ (duration: 00m 47s)
  • 21:20 maxsem@tin: Synchronized php-1.30.0-wmf.12/extensions/CodeMirror: https://gerrit.wikimedia.org/r/#/c/370066/ https://gerrit.wikimedia.org/r/#/c/370074/ (duration: 00m 47s)
  • 21:04 herron: copper /var/cache/pbuilder 95% full - grew /dev/copper-vg/pbuilder fs by +5G and tune2fs -m 0. now at 85% full
  • 20:40 milimetric@tin: Finished deploy [analytics/refinery@cc40bf2]: Fix sqoop script with updated scap config (duration: 11m 51s)
  • 20:29 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis except wikidata to 1.30.0-wmf.12, leaving wikidata behind due to T172320
  • 20:28 milimetric@tin: Started deploy [analytics/refinery@cc40bf2]: Fix sqoop script with updated scap config
  • 20:24 milimetric@tin: Finished deploy [analytics/refinery@cc40bf2]: Fix sqoop script (duration: 33m 17s)
  • 19:51 milimetric@tin: Started deploy [analytics/refinery@cc40bf2]: Fix sqoop script
  • 19:21 twentyafterfour@tin: Synchronized php-1.30.0-wmf.12/extensions/Wikidata/extensions/Wikibase/client/includes/Changes: be sure https://gerrit.wikimedia.org/r/#/c/369847/ is sync'd (duration: 00m 46s)
  • 19:12 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.12 refs T168053
  • 19:08 twentyafterfour: deploying 1.30.0-wmf.12 to group1, will proceed to group2 after verifying the branch is stable.
  • 18:59 arlolra: Updated Parsoid to 6e1a20d5 (T168765, T155038, T165977)
  • 18:53 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Final gerrit 369960: "pagePreviews: Deploy to first 50 of stage 1 wikis" (duration: 00m 46s)
  • 18:52 arlolra@tin: Finished deploy [parsoid/deploy@48be65b]: Updating Parsoid to 6e1a20d5 (duration: 06m 25s)
  • 18:51 demon@tin: Synchronized wmf-config/CommonSettings.php: new dblist for gerrit 369960 (duration: 00m 46s)
  • 18:46 arlolra@tin: Started deploy [parsoid/deploy@48be65b]: Updating Parsoid to 6e1a20d5
  • 18:41 demon@tin: Synchronized docroot/: gerrit 369960 (duration: 00m 47s)
  • 18:39 demon@tin: Synchronized dblists/: gerrit 369960 (duration: 00m 47s)
  • 18:29 demon@tin: Synchronized wmf-config/InitialiseSettings.php: last part of gerrit 351287 (duration: 00m 47s)
  • 18:23 mobrovac@tin: Finished deploy [restbase/deploy@65af18d]: Expose the recommendation API publicly and activate hiwikiversity - T170877 T168765 (duration: 08m 33s)
  • 18:17 demon@tin: Synchronized wmf-config/CommonSettings.php: Add new dblist for page preview stuff, basically no-op (duration: 00m 47s)
  • 18:15 demon@tin: Synchronized docroot/noc/conf/pp_stage0.dblist: tidy up pp config (duration: 00m 46s)
  • 18:15 mobrovac@tin: Started deploy [restbase/deploy@65af18d]: Expose the recommendation API publicly and activate hiwikiversity - T170877 T168765
  • 18:14 demon@tin: Synchronized dblists/pp_stage0.dblist: clean up config (duration: 00m 47s)
  • 18:11 mobrovac@tin: Finished deploy [restbase/deploy@65af18d] (staging): (no justification provided) (duration: 01m 57s)
  • 18:09 mobrovac@tin: Started deploy [restbase/deploy@65af18d] (staging): (no justification provided)
  • 18:03 arlolra: Updated Parsoid to 651f12c2 (T119802, T73386, T170289)
  • 17:55 andrewbogott: restarting rabbitmq-server on labcontrol1001
  • 17:50 arlolra@tin: Finished deploy [parsoid/deploy@612e711]: Updating Parsoid to 651f12c2 (duration: 11m 17s)
  • 17:39 arlolra@tin: Started deploy [parsoid/deploy@612e711]: Updating Parsoid to 651f12c2
  • 16:15 jynus: repointing dbproxy1008 back to db1043
  • 16:06 demon@tin: Finished deploy [gerrit/gerrit@15f1544]: This is a test, disregard (duration: 00m 03s)
  • 16:05 demon@tin: Started deploy [gerrit/gerrit@15f1544]: This is a test, disregard
  • 16:03 demon@tin: Synchronized wmf-config/: doc fixes, no-op (duration: 00m 48s)
  • 16:02 demon@tin: Synchronized scap/plugins/prep.py: doc fixes, no-op (duration: 00m 47s)
  • 16:00 godog: silence cassandra-related alerts on restbase-dev cluster, known OOMs
  • 15:44 urandom: T172384: lower tombstone failure threshold in RESTBase dev to 1000
  • 15:39 reedy@tin: Synchronized wmf-config/interwiki.php: Update for 3 new wikis (duration: 00m 46s)
  • 15:31 reedy@tin: rebuilt wikiversions.php and synchronized wikiversions files: techconductwiki
  • 15:27 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: techconductwiki (duration: 00m 46s)
  • 15:25 reedy@tin: Synchronized static/images/project-logos/: techconductwiki (duration: 00m 44s)
  • 15:24 reedy@tin: Synchronized dblists/: techconductwiki (duration: 00m 47s)
  • 15:03 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic1028.eqiad.wmnet
  • 15:02 gehel: unbanning and repooling elastic1028 - T168816
  • 14:58 reedy@tin: Synchronized static/images/project-logos/: hiwikiversity (duration: 00m 46s)
  • 14:57 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: hiwikiversity (duration: 00m 48s)
  • 14:55 reedy@tin: rebuilt wikiversions.php and synchronized wikiversions files: hiwikiversity
  • 14:55 reedy@tin: Synchronized dblists: hiwikiversity (duration: 00m 49s)
  • 14:53 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(29|30|31|32).eqiad.wmnet
  • 14:53 Dereckson: mwscript initSiteStats.php --wiki srwikiquote --update (T172241)
  • 14:52 gehel: unbanning and repooling elastic10(29|30|31|32) - T168816
  • 14:29 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: wikimania2018wiki (duration: 00m 47s)
  • 14:28 reedy@tin: rebuilt wikiversions.php and synchronized wikiversions files: (no justification provided)
  • 14:27 reedy@tin: Synchronized dblists: wikimania2018wiki (duration: 00m 47s)
  • 14:21 marostegui: Restart MySQL on db2019 to get the new socket location updated
  • 14:19 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2045 - T170662 (duration: 00m 46s)
  • 14:06 marostegui: Compress s5 on db2075 - T170662
  • 14:02 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(28|29|30|31|32).eqiad.wmnet
  • 14:01 gehel: depooling and shutting down elastic10(28|29|30|31|32) - T168816
  • 14:00 marostegui: Add db2075 to tendril - T170662
  • 13:44 marostegui: Enable gtid back on codfw s4 slaves - T170351
  • 13:35 marostegui@tin: Synchronized wmf-config/db-codfw.php: Promote db2051 as s4 codfw master - T170351 (duration: 00m 46s)
  • 13:28 elukey: drop CookieBlock backup tables for T171883
  • 13:26 marostegui: Starting the actual s4 codfw failover db2019 -> db2051 - T170351
  • 13:22 gehel: banning elastic1032 - T168816
  • 13:19 chasemp: disabling puppet acros wmcs things for testing
  • 13:14 marostegui: Start topology change for s4 in codfw, slaves will be moved under db2051 - T170351
  • 13:10 aude@tin: Synchronized php-1.30.0-wmf.12/extensions/Wikidata: Fix bug in InjectRCRecordsJob (duration: 02m 12s)
  • 13:01 marostegui: Disable gtid on s4 codfw slaves to get ready for the topology change - T170351
  • 12:15 marostegui: Restart MySQL on db2051 - T170351
  • 10:21 elukey: removed /run/pacct_shadow.d on stat1005 to allow atopacct.service restart
  • 10:01 moritzm: installing poppler security updates on trusty
  • 09:57 _joe_: systemctl reset-failed puppetmaster.service on labpuppetmaster1002
  • 09:52 moritzm: installing mysql 5.5 security updates (package as shipped by Debian jessie, not our internal wmf-mariadb package)
  • 09:32 gehel: banning elastic103[01] - T168816
  • 08:56 ema: upgrading cache_upload to varnish 4.1.8-1wm1
  • 08:36 gehel: banning elastic102[89] - T168816
  • 08:35 marostegui: Stop MySQL on db2045 to copy its data to db2075 - T170662
  • 08:34 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2045 - T170662 (duration: 00m 47s)
  • 08:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db2045 - T170662 (duration: 00m 47s)
  • 07:58 marostegui: Add db2073 and db2074 to tendril - T170662
  • 06:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add db2074 to s3 - T170662 (duration: 00m 46s)
  • 06:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add db2074 to s3 - T170662 (duration: 00m 46s)
  • 06:01 marostegui: Compress s7 on db1102 - T172169
  • 05:57 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2051 - T170351 (duration: 00m 54s)
  • 05:27 marostegui: Deploy alter table on enwiki - labsdb1010 - T166204
  • 05:17 marostegui: Stop replication on labsdb1011 for maintenance - T153743
  • 03:09 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Aug 3 03:09:22 UTC 2017 (duration 7m 18s)
  • 03:02 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.12) (duration: 05m 47s)
  • 02:33 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 07m 53s)
  • 01:40 twentyafterfour: repositories synced for phabricator migration.
  • 00:28 reedy@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Simply ores (duration: 00m 47s)
  • 00:26 twentyafterfour: scheduled downtime for phabricator migration
  • 00:12 mutante: phab1001 starting ferm service
  • 00:11 mutante: iridium restarted ferm - rsync fragment was in config but not applied, breaking data rsync to phab1001
  • 00:05 mutante: rsyncing /srv/repos from iridium to phab1001 (T163938)

2017-08-02

  • 23:53 ebernhardson@tin: Finished scap: dblists/rtl.dblist T172305: Adding RTL database list for project with default RTL languages (duration: 36m 21s)
  • 23:17 ebernhardson@tin: Started scap: dblists/rtl.dblist T172305: Adding RTL database list for project with default RTL languages
  • 21:48 ejegg: updated SmashPig from f4ca53c to c501f53
  • 20:24 bsitzmann@tin: Finished deploy [mobileapps/deploy@3b61ced]: Update mobileapps to 2d8e8f6 (T170325) (duration: 05m 38s)
  • 20:19 bsitzmann@tin: Started deploy [mobileapps/deploy@3b61ced]: Update mobileapps to 2d8e8f6 (T170325)
  • 19:50 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.11 refs T168053 - rollback due to T172320
  • 19:24 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.12 refs refs T168053
  • 19:23 twentyafterfour: group1 wikis to 1.30.0-wmf.12 refs T168053
  • 18:41 thcipriani@tin: Synchronized php-1.30.0-wmf.12/includes/specials/SpecialRecentchanges.php: SWAT: Follow-up 31be7d0: send tags list if experimental mode is disabled (duration: 00m 47s)
  • 18:21 thcipriani@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: beta-only change Enable HTML5 sections in betalabs (duration: 00m 46s)
  • 18:20 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(24|25|26|27).eqiad.wmnet
  • 18:18 thcipriani@tin: Synchronized php-1.30.0-wmf.12/includes/specials/SpecialUndelete.php: SWAT: Fix Special:Undelete search - use variable and not request param (duration: 00m 46s)
  • 18:18 gehel: un-banning and repooling elastic102[4567] - T168816
  • 18:13 thcipriani@tin: Synchronized wmf-config/jobqueue.php: SWAT: JobQueueEventBus: Enable job events in group0 wikis T163380 Part II (duration: 00m 47s)
  • 18:11 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: JobQueueEventBus: Enable job events in group0 wikis T163380 Part I (duration: 00m 47s)
  • 17:56 ottomata: restart kafka1012 broker with listeners=PLAINTEXT://:9092 to verify https://gerrit.wikimedia.org/r/#/c/356232/ before merge. This should be a functional no-op
  • 17:44 mobrovac@tin: Finished deploy [cxserver/deploy@f43ef96]: Revert canary scb2001 to f43ef963, take #2 (duration: 00m 16s)
  • 17:44 mobrovac@tin: Started deploy [cxserver/deploy@f43ef96]: Revert canary scb2001 to f43ef963, take #2
  • 17:44 mobrovac@tin: Finished deploy [cxserver/deploy@f43ef96]: Revert canary scb2001 to f43ef963 (duration: 00m 29s)
  • 17:43 mobrovac@tin: Started deploy [cxserver/deploy@f43ef96]: Revert canary scb2001 to f43ef963
  • 17:26 mobrovac@tin: Finished deploy [recommendation-api/deploy@8dfae34]: (no justification provided) (duration: 01m 36s)
  • 17:25 mobrovac@tin: Started deploy [recommendation-api/deploy@8dfae34]: (no justification provided)
  • 17:18 herron: installed libmail-spf-perl and restarted spamassassin service on mx[1,2]001 - T172299
  • 16:45 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(24|25|26|27).eqiad.wmnet
  • 16:43 gehel: depooling and shutting down elastic102[4567] - T168816
  • 15:34 mobrovac@tin: Finished deploy [cxserver/deploy@cf3e280]: (no justification provided) (duration: 00m 22s)
  • 15:34 mobrovac@tin: Started deploy [cxserver/deploy@cf3e280]: (no justification provided)
  • 15:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Change db2051 IP - T169501 (duration: 00m 46s)
  • 15:17 marostegui@tin: Synchronized wmf-config/db-codfw.php: Change db2051 IP - T169501 (duration: 00m 46s)
  • 15:17 nschaaf@tin: Finished deploy [recommendation-api/deploy@baa11c0]: source parameter validation (duration: 03m 22s)
  • 15:14 nschaaf@tin: Started deploy [recommendation-api/deploy@baa11c0]: source parameter validation
  • 15:06 marostegui: Poweroff db2051 to get it move to another rack - T169501
  • 14:46 ema: upgrading cache_misc to varnish 4.1.8-1wm1
  • 14:31 ema: varnish 4.1.8-1wm1 (fixes VSV00001, DSA 3924-1) built and uploaded to apt.w.o
  • 14:12 marostegui: Stop MySQL on db2051 in order to get it ready to move to another rack - T170351
  • 13:16 hashar@tin: Synchronized wmf-config/Wikibase-production.php: Write to term_full_entity_id column in wb_terms table in prod too - T167229 (duration: 00m 46s)
  • 13:07 hashar@tin: Synchronized php-1.30.0-wmf.12/extensions/Wikidata: fix constraint type checks - T169326 (duration: 02m 13s)
  • 12:59 gehel: banning elastic102[67] - T168816
  • 12:41 gehel: banning elastic102[45] - T168816
  • 11:42 jynus: disabling autolearn on newest db1* hosts
  • 11:39 jynus: disabling autolearn on newest db2* hosts
  • 11:33 marostegui: Stop MySQL on db2051 for maintenance - T170351
  • 11:29 jynus: setting all es hosts with disabled auto-learn bbu properties
  • 11:22 kartik@tin: (no justification provided)
  • 11:21 kartik@tin: (no justification provided)
  • 11:19 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2051 - T170351 (duration: 00m 46s)
  • 11:18 kartik@tin: Finished deploy [cxserver/deploy@cf3e280]: Update cxserver to fe03ad7 (duration: 00m 56s)
  • 11:17 kartik@tin: Started deploy [cxserver/deploy@cf3e280]: Update cxserver to fe03ad7
  • 11:12 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool es2013 (duration: 00m 47s)
  • 11:02 hashar: Upgraded tox on CI to 2.5.0
  • 10:49 marostegui: Force a re-learn cycle on es2013
  • 10:18 marostegui: Rename wikigrok tables on enwiki on db1089 - T172020
  • 09:03 marostegui: Drop table click_tracking_user_properties wherever it exists - T115982
  • 09:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2057 - T170662 (duration: 00m 46s)
  • 08:51 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add db2073 to s4 - T170662 (duration: 00m 47s)
  • 08:50 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add db2073 to s4 - T170662 (duration: 00m 57s)
  • 08:32 marostegui: Drop table click_tracking wherever it exists - T115982
  • 08:15 godog: bounce pdfrender on scb1001, stuck
  • 07:09 marostegui: Disable BBU auto-learn on es2013
  • 06:58 marostegui: Stop MySQL on db2057 for maintenance - T148507
  • 06:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 - T166204 (duration: 00m 45s)
  • 05:47 demon@tin: Synchronized dblists/: No-op (duration: 00m 47s)
  • 02:59 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Aug 2 02:59:33 UTC 2017 (duration 7m 3s)
  • 02:52 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.12) (duration: 05m 54s)
  • 02:36 eileen: update process-control to 3d3978a
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 07m 35s)

2017-08-01

  • 23:47 reedy@tin: Synchronized wmf-config/CommonSettings.php: Make Babel use databasey stuff T145366 (duration: 00m 46s)
  • 23:46 eileen: update process control to33a0262d87aa1512888c82ac6b4ed91de3d3cac5
  • 23:44 reedy@tin: Synchronized wmf-config/CommonSettings.php: wfLoadExtension Scribunto (duration: 00m 46s)
  • 23:43 reedy@tin: Synchronized wmf-config/extension-list: json! (duration: 00m 46s)
  • 23:41 reedy@tin: Synchronized wmf-config/CommonSettings.php: Unbreak ContactPage T172199 (duration: 00m 46s)
  • 23:39 reedy@tin: Synchronized wmf-config/MetaContactPages.php: Unbreak ContactPage T172199 (duration: 00m 46s)
  • 23:36 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: T167071 (duration: 00m 47s)
  • 23:30 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: wikinews - T172211 (duration: 00m 46s)
  • 23:26 reedy@tin: Synchronized php-1.30.0-wmf.12/resources/src/mediawiki.rcfilters/mw.rcfilters.Controller.js: T172156 T171514 (duration: 00m 46s)
  • 23:24 reedy@tin: Synchronized static/images/project-logos/: optimise pngs (duration: 00m 46s)
  • 23:23 eileen: update process_control to ecb6669
  • 23:23 reedy@tin: Synchronized static/apple-touch/: optimise pngs (duration: 00m 46s)
  • 23:21 reedy@tin: Synchronized docroot/noc/css/images/: optimise pngs (duration: 00m 47s)
  • 23:13 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Disable popups on Special pages T170893 (duration: 00m 47s)
  • 23:09 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: New wordmarks T168203 T171769 (duration: 00m 47s)
  • 23:08 reedy@tin: Synchronized static/images/mobile/copyright/: 2 new workmarks (duration: 00m 48s)
  • 22:57 chasemp: push procs off swap for labcontrol1001 w/ swapoff -a & swapon -a
  • 22:20 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/369434/3 (duration: 00m 47s)
  • 22:20 eileen: tools updated from 58bcbf3 to 9554229
  • 22:17 eileen: update process_control to c198b7f
  • 21:48 demon@tin: Started deploy [gerrit/gerrit@15f1544]: Initial deploy x2
  • 21:47 demon@tin: Finished deploy [gerrit/gerrit@15f1544]: Initial deploy (duration: 00m 40s)
  • 21:47 demon@tin: Started deploy [gerrit/gerrit@15f1544]: Initial deploy
  • 21:46 demon@tin: Started deploy [gerrit/gerrit@15f1544]: (no justification provided)
  • 21:45 demon@tin: Finished deploy [gerrit/gerrit@15f1544]: (no justification provided) (duration: 00m 06s)
  • 21:45 demon@tin: Started deploy [gerrit/gerrit@15f1544]: (no justification provided)
  • 21:44 demon@tin: Started deploy [gerrit/gerrit@15f1544]: (no justification provided)
  • 20:40 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.12 refs refs T168053
  • 19:33 twentyafterfour@tin: Finished scap: Sync 1.30.0-wmf.12 and build l10n refs T168053 (duration: 29m 58s)
  • 19:19 urandom: restarting Cassandra, restbase1004-a.eqiad.wmnet, aberrant read latency
  • 19:03 twentyafterfour@tin: Started scap: Sync 1.30.0-wmf.12 and build l10n refs T168053
  • 18:49 twentyafterfour@tin: Finished deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment (duration: 00m 02s)
  • 18:49 twentyafterfour@tin: Started deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment
  • 18:47 twentyafterfour@tin: Finished deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment (duration: 00m 16s)
  • 18:47 twentyafterfour@tin: Started deploy [phabricator/deployment@3d728e1]: testing phab1001 deployment
  • 18:43 twentyafterfour@tin: Finished deploy [phabricator/deployment@3d728e1]: (no justification provided) (duration: 00m 48s)
  • 18:42 twentyafterfour@tin: Started deploy [phabricator/deployment@3d728e1]: (no justification provided)
  • 17:23 twentyafterfour: MediaWiki train for 1.30.0-wmf.12 - finished `scap prep` & `scap patch` refs T168053
  • 16:41 ejegg: updated CiviCRM from 23f2bbf to 5c741b1
  • 16:25 twentyafterfour: MediaWiki Train: Creating new branch wmf/1.30.0-wmf.12 from master. See T168053 for deployment blockers.
  • 16:17 dcausse: restarting elastic on relforge100x servers to pick up new version of the plugins
  • 15:54 bblack: varnish backend restart on cp1072 (mailbox lag)
  • 15:43 bblack: rebooting lvs1002
  • 15:42 marostegui: db1069: Migrate trwiktionary.page from TokuDB to InnoDB
  • 15:40 bblack: stopping pybal on lvs1002 for impending reboot
  • 15:39 bblack: stopping pybal on 1002 for impending reboot
  • 15:33 marostegui: Stop s3 on db1069 - replication stuck
  • 15:17 marostegui: Stop MySQL on db1055 for maintenance - https://phabricator.wikimedia.org/T148507
  • 14:56 andrewbogott: rebooting labvirt1016
  • 14:46 marostegui: Deploy InnoDB compression on s3 - db2074 for the following tables (revision, pagelinks and templatelinks) - T170662
  • 14:45 ema: lvs1001-1003 (eqiad primaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 14:28 ema: lvs1004-1006 (eqiad secondaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 14:27 bblack: restart varnish backend on cp1049 (mailbox lag)
  • 14:12 bblack: restart varnish backend on cp1074 (mailbox lag)
  • 13:53 ema@neodymium: conftool action : set/pooled=yes; selector: name=achernar.wikimedia.org,service=pdns_recursor
  • 13:26 ema@neodymium: conftool action : set/pooled=no; selector: name=achernar.wikimedia.org,service=pdns_recursor
  • 13:10 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Enable OOjs UI EditPage on all wikis except Commons (duration: 00m 44s)
  • 13:06 marostegui: Compress s2 on db1102 - T172169
  • 12:49 elukey: restart hive daemons on analytics1003 to pick up new jvm settings (bigger Xmx, JMX ports)
  • 12:06 dcausse: 100% cpu spike on elastic1023 caused percentiles to jump for a short period of time (T169498)
  • 12:04 elukey: stop eventlogging_sync on analytics-slaves && rename all CookieBlock* tables (log db) to CookieBlock*_backup - T171883
  • 11:52 marostegui: Stop MySQL on db2057 to copy its data to db2074 - T170662
  • 11:36 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2057 - T170662 (duration: 00m 43s)
  • 11:10 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2065 - T170662 (duration: 00m 43s)
  • 10:31 hashar: Enabling Zuul/CI again and reenabling puppet on contint1001
  • 10:24 hashar: contint1001 stopped puppet agent to prevent Zuul server to come back up
  • 10:12 hashar: Stopped Zuul / CI for mass mediawiki extension changes
  • 08:55 ema: lvs2001-2003 (codfw primaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 08:32 ema: lvs2004-2006 (codfw secondaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 08:03 ema: lvs3*: upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 07:40 ema: lvs4001, lvs4002 (ulsfo primaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 07:35 ema: lvs4003, lvs4004 (ulsfo secondaries): upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882
  • 07:33 ema: pybal 1.13.11 uploaded to apt.w.o T103882
  • 06:20 marostegui: Stop MySQL on db2065 to copy its data to db2073 - T170662
  • 06:13 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2065 - T170662 (duration: 00m 43s)
  • 05:21 marostegui: Restart MySQL on labsdb1003 as it is totally stuck
  • 03:50 mobrovac@tin: Finished deploy [restbase/deploy@0d12138]: Add nl.wikinews - T171897 (duration: 07m 57s)
  • 03:42 mobrovac@tin: Started deploy [restbase/deploy@0d12138]: Add nl.wikinews - T171897
  • 02:32 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Aug 1 02:32:17 UTC 2017 (duration 6m 36s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 07m 46s)

2017-07-31

  • 23:34 thcipriani@tin: Synchronized dblists: SWAT: Revert "Make ptwikimedia a fishbowl wiki" T171501 (duration: 00m 42s)
  • 23:32 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Path for enwikiquote logo T171810 (duration: 00m 43s)
  • 23:17 thcipriani@tin: Synchronized dblists: SWAT: Make ptwikimedia a fishbowl wiki T171501 (duration: 00m 43s)
  • 23:11 thcipriani@tin: Synchronized wmf-config/Wikibase-production.php: SWAT: Cleanup old BC config for JsonUnitStorage T171107 (duration: 00m 42s)
  • 22:38 mobrovac@tin: Finished deploy [citoid/deploy@7ad598d]: Do not wait for PubMed requests to complete - T162886 (duration: 05m 52s)
  • 22:32 mobrovac@tin: Started deploy [citoid/deploy@7ad598d]: Do not wait for PubMed requests to complete - T162886
  • 20:49 mutante: restarting pdfrender service on sc1001 after icinga alert (T159922)
  • 20:33 cscott: Updated Parsoid to version 08114f35 (T43716, T154718, T166413)
  • 20:32 cscott@tin: Finished deploy [parsoid/deploy@c1cba48]: Updating Parsoid to 08114f35 (duration: 10m 50s)
  • 20:22 cscott@tin: Started deploy [parsoid/deploy@c1cba48]: Updating Parsoid to 08114f35
  • 19:43 ejegg: updated payments-wiki from 2d10807 to bd9f730
  • 19:29 ejegg: updated payments-wiki from c531e11 to 2d10807
  • 18:37 robh: we're migrating mr1-ulsfo, disregard mgmt icinga alerts
  • 18:30 ejegg: updated payments-wiki from 084d0f9 to c531e11
  • 18:19 thcipriani@tin: Synchronized php-1.30.0-wmf.11/extensions/OpenStackManager/special/SpecialNovaRole.php: SWAT: Do not clobber $out in local scope T172077 (duration: 00m 42s)
  • 18:14 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable archive search via Elastic everywhere except Wikidata T163235 (duration: 00m 42s)
  • 18:13 demon@tin: Pruned MediaWiki: 1.30.0-wmf.10 [keeping static files] (duration: 01m 18s)
  • 17:52 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(23).eqiad.wmnet
  • 17:52 gehel: un-banning and repooling elastic1023 - T168816
  • 17:25 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(20|21|22).eqiad.wmnet
  • 17:25 gehel: un-banning and repooling elastic102[012] - T168816
  • 17:12 gehel@tin: Finished deploy [wdqs/wdqs@bdf3494]: (no justification provided) (duration: 01m 48s)
  • 17:12 herron: scb1001 restarted pdfrender service - T159922
  • 17:10 gehel@tin: Started deploy [wdqs/wdqs@bdf3494]: (no justification provided)
  • 16:58 ejegg: updated Misc fundraising tools from 457bddb to 58bcbf3
  • 16:30 gehel: mistaken restart of elastic1030 as part of T168816
  • 16:15 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(20|21|22|23).eqiad.wmnet
  • 16:15 gehel: depooling and shutting down elastic102[0123] for thermal paste - T168816
  • 15:33 marostegui: Create index on u2041__ores_p.monthly_wp10_enwiki - T146718
  • 15:14 marostegui@tin: Synchronized wmf-config/db-codfw.php: Specify mariadb running version on db2072 (duration: 00m 43s)
  • 14:59 gehel: banning elastic10(22|23) - T168816
  • 14:37 zeljkof: EU SWAT finished
  • 14:37 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic10(17|18|19).eqiad.wmnet
  • 14:36 gehel: banning and repooling elastic10(20|21) - T168816
  • 14:36 gehel: un-banning and repooling elastic10(17|18|19) - T168816
  • 14:34 zfilipin@tin: Synchronized wmf-config/Wikibase-production.php: SWAT: Add P279 to $wgPropertySuggesterClassifyingPropertyIds (T169060) (duration: 00m 42s)
  • 14:20 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove wm?gRevisionSliderBetaFeature (duration: 00m 42s)
  • 14:19 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Remove wm?gRevisionSliderBetaFeature (duration: 00m 42s)
  • 14:14 zfilipin@tin: Synchronized wmf-config/Wikibase-production.php: SWAT: Turn on reading from the term_full_entity_id in testwikidata (T165197) (duration: 00m 42s)
  • 14:08 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic10(17|18|19).eqiad.wmnet
  • 14:08 gehel: shutting down elastic10(17|18|19) for thermal paste - T168816
  • 14:04 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove wgRevisionSliderAlternateSlider (duration: 00m 42s)
  • 13:57 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Remove Wikibase vs Interwikisorting checks (T150183) (duration: 00m 43s)
  • 13:47 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove WMDE log Channel (T168635) (duration: 00m 43s)
  • 13:30 chasemp: disable puppet for cloud-y things
  • 12:56 gehel: un-banning elastic1020 since it seems to have impact on cluster performances - T168816
  • 12:39 gehel: banning elastic10(17|18|19|20) to prepare for thermal paste - T168816
  • 11:12 marostegui: Compress s6 on db1102 - T153743
  • 09:28 marostegui: Stop replication on labsdb1009 and labsdb1010 for maintenance - T153743
  • 08:55 elukey: update nodejs* on aqs100[56789] to 6.11 - T170790
  • 08:45 marostegui: Rename table click_tracking and click_tracking_user_properties on db1089 (s1) - T115982
  • 08:35 marostegui: Drop table old_growth on s1 - T115982
  • 07:17 marostegui: Stop replication on s7 on db1102 for maintenance - T153743
  • 07:12 marostegui: Deploy alter table on db1055 - T166204
  • 07:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 - T166204 (duration: 00m 52s)
  • 02:31 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jul 31 02:31:39 UTC 2017 (duration 6m 40s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 07m 41s)

2017-07-29

  • 06:15 _joe_: restarting pdfrender on scb1003

2017-07-28

  • 23:32 mutante: puppetmaster2001 - git pulled in /var/lib/git/operations/puppet to sync with puppetmaster1001 - accidentally interrupted puppet-merge
  • 20:21 foks: removing 2FA from User:SPoore (WMF)
  • 19:40 mutante: releases2001 - OS install worked this time, could not reproduce grub error, signing puppet cert, initial puppet run (T171917)
  • 18:48 chasemp: enable and force puppet on labtestservices2001,labtestvirt2001,labtestcontrol2001,labservices1002,labcontrol1002,labnet1002,labvirt1014 and labtestneutron2001 to see a newly installed host get the change instead of a noop
  • 18:33 chasemp: disabling puppet for labs things for trying out refactor rollout
  • 17:18 herron: cleaned up core files in mw1209:/var/tmp/core to clear disk alert
  • 16:21 andrewbogott: apt-get install apache2 on labcontrol1001 and labcontrol1002 for security updates
  • 16:19 andrewbogott: apt-get install apache2 on silver for security updates
  • 16:18 andrewbogott: apt-get install apache2 on californium for security updates
  • 15:23 jynus: upgrading and restarting db1102
  • 14:11 paravoid: upgrading rhenium to stretch via dist-upgrade
  • 13:04 jynus: upgrading and restarting db1095
  • 10:31 jynus: upgrading and restarting labsdb1009 and labsdb1011
  • 09:41 elukey: re-enable irc-echo on einstenium
  • 09:07 moritzm: installing apache security updates on puppet masters
  • 08:41 elukey: stop ircecho on einstenium as puppet-error-shower countermeasure
  • 07:56 elukey: update nodejs to 6.11 on aqs1004 (testing prod node after beta qa) - T170790
  • 07:52 gehel: repooling wdqs1001 (data import completed)
  • 07:52 elukey: forced mii-tool -r eth0 on analytics1034 to get 1G negotiated speed
  • 07:52 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1001.eqiad.wmnet
  • 06:35 moritzm: installing apache security updates on trusty systems
  • 02:26 mutante: scb1002 - systemctl restart pdfrender - was "connect to address 10.64.16.21 and port 5252: Connection refused" in Icinga since a couple hours (T159922) - recovered
  • 02:08 ottomata: stat1002: disabled puppet, umounted /tmp, /home and /a, poweroff
  • 00:51 mutante: releases1001 - rsynced reprepro db data from bromine
  • 00:27 mutante: bromine sudo -E reprepro clearvanished to deleted unused precise-mediawiki causing reprepro errors

2017-07-27

  • 23:48 catrope@tin: Synchronized php-1.30.0-wmf.11/resources/src/mediawiki.rcfilters/: T171368 (duration: 00m 42s)
  • 23:39 eileen: update process-control to 24c7bbe (renable omnirecipient)
  • 23:36 eileen: update process-control to 2c1c8a3bcb0186 - new frequency on receipient load
  • 23:35 catrope@tin: Synchronized wmf-config/: Enable emails for minor edits everywhere but keep default prefs (T29884, T142727) (duration: 00m 45s)
  • away: disabled Omnimail recipient load job
  • 23:14 catrope@tin: Finished scap: wmf-config/InitialiseSettings.php Enable experimental RCFilters on group2 too (duration: 02m 55s)
  • 23:11 catrope@tin: Started scap: wmf-config/InitialiseSettings.php Enable experimental RCFilters on group2 too
  • 22:46 ejegg: enabled omnimail recipient load job, throttling inserts to 15,000 every 60 sec
  • 21:46 ejegg: updated CiviCRM from ceff739 to 23f2bbf
  • 21:40 urandom: Restarting Cassandra, restbase-dev1004-{a,b} to apply updated data directories list
  • 21:08 herron: bast3002 repointed mdadm at null alias to clear systemd degraded state alert
  • 20:31 gwicke: restarting all pdfrender instances on scb in eqiad; one of them was hanging & causing user requests to fail
  • 20:24 Krinkle: Un-dirtying state of /srv/deployment/jobrunner/jobrunner on tin (from T129148). Checking-out https://gerrit.wikimedia.org/r/367743 instead.
  • 20:22 moritzm: installing apache security updates on cobalt
  • 20:13 ppchelko@tin: Finished deploy [restbase/deploy@cfb9c46]: Correctly escape the quotes in PDF names (duration: 07m 48s)
  • 20:05 ppchelko@tin: Started deploy [restbase/deploy@cfb9c46]: Correctly escape the quotes in PDF names
  • 19:50 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.11
  • 19:43 mutante: switching https://releases.wikimedia.org backend from bromine to releases1001 - all files have been rsynced (T164030)
  • 19:37 robh: cp4021 shutting down for relocation in rack, will put in maint mode for next 2 hours
  • 19:21 catrope@tin: Synchronized wmf-config: T171556 (duration: 00m 47s)
  • 19:20 catrope@tin: Synchronized dblists: T171556 (duration: 00m 46s)
  • 19:11 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable namespace and tag filters in RCFilters on group0 and group1 (duration: 00m 46s)
  • 19:07 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T171751 (duration: 00m 46s)
  • 19:06 catrope@tin: Synchronized php-1.30.0-wmf.11/autoload.php: (no justification provided) (duration: 00m 45s)
  • 19:05 catrope@tin: Synchronized php-1.30.0-wmf.11/includes/: T168501, take two (duration: 01m 27s)
  • 19:05 mutante: added new misc::cache director "releases" for releases* servers, releases moving away from bromine (T164030)
  • 18:44 catrope@tin: Synchronized php-1.30.0-wmf.11/includes/: Temporarily revert patch for T168501 while I fix it (duration: 01m 27s)
  • 18:32 mutante: labpuppetmaster1001 - restarted ferm twice, DNS lookup for AAAA worked, error gone on second time. then did same on labpuppetmaster1002 (T171880)
  • 18:29 catrope@tin: Synchronized php-1.30.0-wmf.11/includes/: T168501 and T163380 (duration: 01m 31s)
  • 18:22 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable WikidataPageBanner on euwiki (T171763) (duration: 00m 46s)
  • 18:15 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Limit FeaturedFeed on dewiki to 7 days (T159664) (duration: 00m 47s)
  • 18:05 herron: installed libmail-spf-perl on fermium to address spamassassin "module not installed: Mail::SPF ('require' failed)" error
  • 17:05 ejegg: restarted recurring Ingenico charge job
  • 16:01 ejegg: updated CiviCRM from e83c012 to ceff739
  • 15:44 jynus: stopping mysql, upgrading and restarting labsdb1010
  • 15:03 moritzm: stopping jobrunner/jobchron on mw1260 to investigate a few failing ffmpeg2theora invocations (T145742)
  • 13:31 ema: lvs1009, lvs1010: upgrade to pybal 1.13.10 (one-packet-scheduling) T104442
  • 13:19 moritzm: installing apache updates on mendelevium and terbium
  • 13:08 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Enable mapframe for cswiki - T171588 (duration: 00m 47s)
  • 10:32 moritzm: reimaging mw2246 to jessie (T145742)
  • 10:31 jynus: shutting down and rebooting db2016
  • 10:14 _joe_: restarting puppetdb on nihal, T170740
  • 10:06 jynus@tin: Synchronized wmf-config/db-codfw.php: Promote db2048 as the new codfw-s1 master (duration: 00m 46s)
  • 10:05 jynus: starting actual master failover s1-codfw db2016->db2048
  • 09:48 ema: pybal 1.13.10 (one-packet-scheduling) built and uploaded to apt.w.o T104442
  • 09:23 jynus: starting s1-codfw database topology changes
  • 09:02 godog: copy python-conftool to stretch-wikimedia for scap dep
  • 08:54 jynus: stopping mysql and restarting db2048
  • 08:48 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2048 (duration: 00m 46s)
  • 08:28 jynus: disable puppet on db2016 and db2048 to prepare for switchover
  • 08:21 godog: upload scap 3.6.0-1 - T127762
  • 08:12 moritzm: installing apache security updates on graphite*
  • 07:37 moritzm: upgrading apache on planet1001
  • 07:02 moritzm: installing spice secuerity updates on trusty hosts (jessie already fixed)
  • 03:29 ejegg: disabled recurring Ingenico charges
  • 03:12 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jul 27 03:12:30 UTC 2017 (duration 7m 16s)
  • 03:05 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.11) (duration: 06m 00s)
  • 02:39 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.10) (duration: 14m 21s)

2017-07-26

  • 23:33 eileen: civicrm update from fb83798 to e83c012
  • 23:30 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Echo per-user blacklist on meta T150419 (duration: 00m 49s)
  • 23:29 mutante: tin rm /var/lock/scap.operations_mediawiki-config.lock
  • 22:46 ejegg: reactivated remaining fundraising queue consumers
  • 22:38 ejegg: reactivated antifraud / payment-init queue consumer
  • 22:34 ejegg: updated CiviCRM from 461900e to fb83798
  • 21:39 andrewbogott: restarting rabbitmq on labcontrol1001
  • 21:10 mobrovac@tin: Finished deploy [electron-render/deploy@8dd5f13]: (no justification provided) (duration: 01m 37s)
  • 21:08 mobrovac@tin: Started deploy [electron-render/deploy@8dd5f13]: (no justification provided)
  • 21:03 mobrovac@tin: Finished deploy [electron-render/deploy@8dd5f13]: (no justification provided) (duration: 02m 38s)
  • 21:00 mobrovac@tin: Started deploy [electron-render/deploy@8dd5f13]: (no justification provided)
  • 20:54 mforns@tin: Finished deploy [analytics/refinery@58176d0]: deploying refinery to use 0.0.49 jars (duration: 03m 12s)
  • 20:51 mforns@tin: Started deploy [analytics/refinery@58176d0]: deploying refinery to use 0.0.49 jars
  • 20:32 mobrovac@tin: Finished deploy [mathoid/deploy@44ea6d8]: Switch node_modules to Node v6.11 (duration: 03m 16s)
  • 20:31 mobrovac@tin: Started deploy [electron-render/deploy@8dd5f13]: Switch node_modules to node v6.11
  • 20:30 mobrovac@tin: Finished deploy [recommendation-api/deploy@e7adea0]: Switch node_modules to node v6.11 (duration: 02m 26s)
  • 20:29 mobrovac@tin: Started deploy [mathoid/deploy@44ea6d8]: Switch node_modules to Node v6.11
  • 20:28 mobrovac@tin: Finished deploy [eventstreams/deploy@a2a0f19]: Switch node_modules to Node v6.11 (duration: 02m 57s)
  • 20:27 mobrovac@tin: Started deploy [recommendation-api/deploy@e7adea0]: Switch node_modules to node v6.11
  • 20:27 mobrovac@tin: Finished deploy [trending-edits/deploy@22967f3]: Switch node_modules to node v6.11 (duration: 07m 01s)
  • 20:25 mobrovac@tin: Started deploy [eventstreams/deploy@a2a0f19]: Switch node_modules to Node v6.11
  • 20:24 mobrovac@tin: Finished deploy [changeprop/deploy@444223d]: Switch node_modules to Node v6.11 (duration: 01m 35s)
  • 20:22 mobrovac@tin: Started deploy [changeprop/deploy@444223d]: Switch node_modules to Node v6.11
  • 20:22 mobrovac@tin: Finished deploy [mobileapps/deploy@bb81d91]: Switch node_modules to Node v6.11 (duration: 04m 08s)
  • 20:20 mobrovac@tin: Started deploy [trending-edits/deploy@22967f3]: Switch node_modules to node v6.11
  • 20:19 mobrovac@tin: Finished deploy [graphoid/deploy@1707b3c]: Switch node_modules to node v6.11 (duration: 07m 50s)
  • 20:18 mobrovac@tin: Started deploy [mobileapps/deploy@bb81d91]: Switch node_modules to Node v6.11
  • 20:12 demon@tin: Finished scap: no-op, ideal timing scenario (duration: 03m 35s)
  • 20:12 mobrovac@tin: Finished deploy [citoid/deploy@43c2776]: Switch node_modules to Node v6.11 (duration: 02m 56s)
  • 20:11 mobrovac@tin: Started deploy [graphoid/deploy@1707b3c]: Switch node_modules to node v6.11
  • 20:10 mobrovac@tin: Finished deploy [cxserver/deploy@f43ef96]: Switch node_modules to node v6.11 (duration: 02m 36s)
  • 20:09 demon@tin: Started scap: no-op, ideal timing scenario
  • 20:09 mobrovac@tin: Started deploy [citoid/deploy@43c2776]: Switch node_modules to Node v6.11
  • 20:08 mobrovac@tin: Started deploy [cxserver/deploy@f43ef96]: Switch node_modules to node v6.11
  • 20:01 demon@tin: Finished scap: group1 to wmf.11 (duration: 13m 22s)
  • 19:48 demon@tin: Started scap: group1 to wmf.11
  • 19:47 demon@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) (duration: 12m 03s)
  • 19:47 demon@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details)
  • 19:35 demon@tin: Started scap: group1 to wmf.11
  • 19:30 mutante: mx1001 - temp disable puppet to test adjusted sudo privileges for an icinga check
  • 19:24 ejegg: disabled queue consumers for CiviCRM update
  • 19:06 bblack: cp1074: run-no-puppet varnish-backend-restart (mailbox lag in icinga)
  • 19:01 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1001.eqiad.wmnet
  • 19:01 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1001.wmnet
  • 19:01 gehel: depooling wdqs1001 for data reload - T166244
  • 18:50 niharika29@tin: Synchronized php-1.30.0-wmf.11/resources/src/mediawiki.rcfilters/: RCFilters: Followup I78e23f85c3: Don't disable RCFilters system when fetching results https://gerrit.wikimedia.org/r/#/c/367850/ (duration: 00m 46s)
  • 18:35 niharika29@tin: Synchronized php-1.30.0-wmf.11/resources/src/mediawiki.rcfilters/: RCFilters: Improve loading animation https://gerrit.wikimedia.org/r/#/c/367833/, RCFilters UI: Unbreak limit and days widgets in non-experimental mode https://gerrit.wikimedia.org/r/#/c/367837/ (duration: 00m 45s)
  • 18:20 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Create 'rollbacker' user group in frwiki https://gerrit.wikimedia.org/r/#/c/365538/ (duration: 00m 47s)
  • 17:46 bblack: nitrogen: disabled puppet agent, manually hacked puppetdb.service unit file, restarted puppetdb.service...
  • 17:12 moritzm: restarting gerrit to pick up Java security update
  • 17:11 moritzm: installing openjdk-8 security updates on cobalt and removing unused openjdk-7 packages
  • 16:55 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2070 (duration: 00m 45s)
  • 16:27 jynus: upgrading and rebooting db2070
  • 16:22 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2069, depool db2070 (duration: 00m 45s)
  • 16:14 moritzm: upgraded nodejs on restbase*
  • 15:48 jynus: upgrade and reboot db2069
  • 15:45 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2068, depool db2069, pool db2072 with more weight (duration: 00m 46s)
  • 15:39 moritzm: rolling upgrade/service restarts of nodejs in eqiad
  • 15:32 andrewbogott: patching puppetmaster1001, possible puppet hiccups coming up
  • 15:29 moritzm: upgrade nodejs on remaining scb hosts (along with service restarts)
  • 15:20 moritzm: upgrade nodejs on scb2001 (currently depooled for testing)
  • 15:17 jynus: restarting and upgrading db2068
  • 15:15 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2068 (duration: 00m 46s)
  • 14:40 moritzm: installing spice security updates
  • 14:21 jynus@tin: Synchronized wmf-config/db-codfw.php: Pool db2072 (duration: 00m 45s)
  • 13:44 gehel: restarting cassandra on maps clusters
  • 13:36 zeljkof: EU SWAT finished
  • 13:34 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert eswikisource paths due to oversized logos (T170604) (duration: 00m 46s)
  • 13:24 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: HD logos for eswikivoyage and added some missing paths to the config (T170604) (duration: 00m 46s)
  • 13:22 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: HD logos for eswikivoyage and added some missing paths to the config (T170604) (duration: 00m 54s)
  • 12:12 moritzm: installing xorg-server updates from jessie 8.9 point release
  • 11:02 hoo: Updated the Wikidata property suggester with data from Monday's JSON dump and applied the T132839 workarounds
  • 09:58 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: mw2119.codfw.wmnet
  • 09:58 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: mw2119.codfw.wmnet
  • 09:58 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: mw2119.codfw.wmnet
  • 09:57 moritzm: reimaging mw2152 to jessie (T145742)
  • 09:52 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 after maintenance (duration: 00m 46s)
  • 08:47 elukey: rollout logster 0.0.10-2~jessie1 to the cache hosts
  • 08:46 elukey: upload logster 0.0.10-2~jessie1 to jessie-wikimedia
  • 08:42 jynus: upgrading and restarting db1066
  • 08:26 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1066 for maintenance (duration: 00m 46s)
  • 08:26 moritzm: reimaging mw2119 to jessie (T145742)
  • 08:02 moritzm: installing Java security updates on jessie-based stat systems
  • 07:59 moritzm: restarting cassandra-metrics-collector on maps* to pick up openjdk security update
  • 07:56 moritzm: restarting cassandra-metrics-collector on restbase* to pick up openjdk security update
  • 07:53 jynus: start defragmenging on pc1* hosts T167784
  • 07:14 ema: cp1008: use sdb only in varnish.service, waiting for Chris to replace sda T171028
  • 05:53 _joe_: moving all conf* servers to the future puppet parser
  • 03:01 l10nupdate@tin: LocalisationUpdate failed: git pull of extensions failed
  • 01:15 reedy@tin: Synchronized multiversion/: phpcs (duration: 01m 06s)
  • 01:13 reedy@tin: Synchronized phpcs.xml: phpcs (duration: 00m 46s)
  • 01:12 reedy@tin: Synchronized tests/multiversion/: phpcs (duration: 00m 46s)

2017-07-25

  • 23:13 reedy@tin: Synchronized wmf-config/abusefilter.php: Allow contentadmin/sysop to configure blocking AbuseFilters (duration: 00m 46s)
  • 22:14 reedy@tin: Synchronized php-1.30.0-wmf.11/includes/specials/SpecialUndelete.php: T171523 (duration: 00m 46s)
  • 22:12 reedy@tin: Synchronized php-1.30.0-wmf.10/includes/specials/SpecialUndelete.php: T171523 (duration: 00m 47s)
  • 21:13 urandom: Rolling restart of eqiad Cassandra instances (applying OpenJDK update)
  • 21:12 ejegg: updated SmashPig from 523d6dd to f4ca53c
  • 20:39 ejegg: updated fundraising process-control to adb3325
  • 19:31 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.11
  • 18:52 mobrovac@tin: Finished deploy [restbase/deploy@36ca85f]: (no justification provided) (duration: 03m 49s)
  • 18:48 mobrovac@tin: Started deploy [restbase/deploy@36ca85f]: (no justification provided)
  • 18:48 mobrovac@tin: Finished deploy [restbase/deploy@36ca85f]: (no justification provided) (duration: 00m 49s)
  • 18:47 mobrovac@tin: Started deploy [restbase/deploy@36ca85f]: (no justification provided)
  • 18:46 mobrovac@tin: Finished deploy [restbase/deploy@36ca85f]: Switch to Node v6.11 - T170548 (duration: 05m 17s)
  • 18:42 krinkle@tin: Synchronized wmf-config/InitialiseSettings.php: Enable jQuery 3 on testwikis - I37a68472cf (duration: 00m 50s)
  • 18:41 mobrovac@tin: Started deploy [restbase/deploy@36ca85f]: Switch to Node v6.11 - T170548
  • 18:28 mobrovac: restbase upgrading node to v6.11 - T170548
  • 18:14 demon@tin: Finished scap: bootstrap wmf.11 (x2) (duration: 19m 23s)
  • 17:55 demon@tin: Started scap: bootstrap wmf.11 (x2)
  • 17:54 demon@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details) (duration: 16m 32s)
  • 17:53 demon@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details)
  • 17:49 urandom: Rolling restart of codfw Cassandra instances (applying OpenJDK update)
  • 17:40 halfak@tin: Finished deploy [ores/deploy@835d848]: T171505 (duration: 34m 56s)
  • 17:37 demon@tin: Started scap: bootstrap wmf.11
  • 17:33 jynus: creating new database on m1 (rddmarc) T170158
  • 17:10 demon@tin: Pruned MediaWiki: 1.30.0-wmf.9 [keeping static files] (duration: 01m 39s)
  • 17:05 halfak@tin: Started deploy [ores/deploy@835d848]: T171505
  • 16:16 jynus: about to delete orfphan files on einstenium T149557
  • 15:49 moritzm: installing imagemagick security updates on trusty hosts (jessie already fixed)
  • 15:31 cmjohnson1: updating firmware lvs1007 T167299
  • 15:03 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1015.eqiad.wmnet
  • 14:58 _joe_: enabled hyperthreading on restbase1015.eqiad.wmnet T162735, rebooting the server
  • 14:52 _joe_: shutting down restbase1015, T162735
  • 14:51 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
  • 14:36 urandom: draining restbase1015.eqiad.wmnet T162735
  • 14:35 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2002.codfw.wmnet
  • 14:29 _joe_: enabled hyperthreading on restbase2002.codfw.wmnet T162735, rebooting the server
  • 14:23 _joe_: shutting down restbase2002, T162735
  • 14:14 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2002.codfw.wmnet
  • 14:04 dcausse: restarting elastic on relforge100x servers to test new config
  • 13:32 moritzm: rebooting hydrogen for kernel update
  • 13:15 moritzm: rebooting achernar for kernel update
  • 13:13 hashar: Purged project-logos for eswikisource/eswikiquote high density logos T170604  : find static/images/project-logos -maxdepth 1 -type f| sed -e 's%^%https://en.wikipedia.org/%'
  • 13:11 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: High density logos for es.wikisource - T170604 (duration: 00m 43s)
  • 13:10 hashar@tin: Synchronized static/images/project-logos: High density logos for es.wikisource - T170604 (duration: 00m 44s)
  • 13:08 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: High density logos for es.wikiquote - T170604 (duration: 00m 49s)
  • 13:07 hashar@tin: Synchronized static/images/project-logos: High density logos for es.wikiquote - T170604 (duration: 00m 46s)
  • 13:05 dcausse: restarting elastic relforge100x servers to pick up new version of the ltr plugin
  • 12:56 elukey: rolling restart of aqs* for jvm upgrades
  • 12:45 moritzm: enabling mw1260 (jessie-based video scaler) for job processing
  • 12:44 gehel: restarting cassandra on maps clusters
  • 12:37 godog: powercycle ms-be1016, couldn't get getty output from console
  • 12:09 godog: upgrade diamond to 4.0.515 in eqiad - T97635
  • 12:08 aude: ran rebuildTermSqlIndex.php on test.wikidata
  • 11:59 jynus: testing defragmenting pc2004 - if lag is created, ignore
  • 11:58 moritzm: installing binutils update from jessie point release
  • 11:20 marostegui: Start a run of "timeout 10h purgeParserCache.php" on terbium, which will be killed at around 21:00 UTC so it doesn't overlap with the normal cron run - T167784
  • 11:16 moritzm: upgrading/restarting logstash* for openjdk security update
  • 11:13 marostegui: Killing old running instances of purgeParserCache.php in terbium - https://phabricator.wikimedia.org/T167784
  • 11:02 moritzm: installing openjdk security updates on elastic*
  • 10:50 moritzm: installing openjdk security updates on restbase*
  • 10:41 marostegui: Run mwscript purgeParserCache.php --wiki=aawiki --age=1900800 --msleep 500 from terbium - T167784
  • 10:28 marostegui@tin: Synchronized wmf-config/InitialiseSettings.php: Parsercache: Reduce expiration time to 22 days - T167784 (duration: 00m 44s)
  • 10:27 moritzm: upgrade restbase2010 to latest OpenJDK security update
  • 09:20 godog: upgrade diamond to 4.0.515 in codfw - T97635
  • 09:15 moritzm: upgrade restbase-test* and restbase-dev* to latest OpenJDK security update
  • 09:14 godog: upgrade diamond to 4.0.515 in ulsfo and esams - T97635
  • 07:27 moritzm: installing apache security updates on app servers in eqiad
  • 05:29 mattflaschen@tin: Synchronized wmf-config/CommonSettings-labs.php: Article reminder: Beta Cluster only (duration: 00m 44s)
  • 03:00 l10nupdate@tin: LocalisationUpdate failed: git pull of extensions failed

2017-07-24

  • 23:29 eileen1: update process-control from 915bbf9 to 4eb053d
  • 23:22 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on sqwiki and rowiki (T170723) (duration: 00m 44s)
  • 22:35 ejegg: updated payments-wiki from c3be2bf to 084d0f9
  • 21:43 eileen1: update CiviCRM from 382a189 to 461900e
  • 21:11 dcausse: unbanning elastic1027/elastic1017
  • 21:04 reedy@tin: Synchronized wmf-config/CommonSettings.php: Add a global email blacklist (duration: 00m 43s)
  • 20:59 dcausse: banning elastic1027 after elastic1017 to move shards around
  • 20:26 bsitzmann@tin: Finished deploy [mobileapps/deploy@2b4ca3b]: Update mobileapps to b608ec8 (duration: 04m 06s)
  • 20:22 bsitzmann@tin: Started deploy [mobileapps/deploy@2b4ca3b]: Update mobileapps to b608ec8
  • 19:55 ebernhardson: ban elastic1031 from elasticsearch cluster, it's overloaded
  • 19:30 otto@tin: Finished deploy [eventlogging/analytics@41e3418]: unique index only for id columns (duration: 00m 02s)
  • 19:30 otto@tin: Started deploy [eventlogging/analytics@41e3418]: unique index only for id columns
  • 19:26 reedy@tin: Synchronized tests/: phpcs (duration: 00m 43s)
  • 19:26 reedy@tin: Synchronized wmf-config/: phpcs (duration: 00m 44s)
  • 18:36 reedy@tin: Synchronized composer.lock: phpunit (duration: 00m 43s)
  • 18:35 reedy@tin: Synchronized composer.json: phpunit (duration: 00m 43s)
  • 18:30 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 12s)
  • 18:30 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes
  • 18:29 reedy@tin: Synchronized wmf-config/CommonSettings.php: T153271 (duration: 00m 43s)
  • 18:28 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 10s)
  • 18:28 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes
  • 18:27 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 14s)
  • 18:27 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes
  • 18:27 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes (duration: 00m 39s)
  • 18:26 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: statsd dns fixes
  • 18:25 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2003 (duration: 00m 17s)
  • 18:25 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2003
  • 18:25 reedy@tin: Synchronized wmf-config/CommonSettings.php: T169478 T169481 (duration: 00m 42s)
  • 18:24 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: T169478 T169481 (duration: 00m 43s)
  • 18:20 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: T159895 (duration: 00m 43s)
  • 18:19 reedy@tin: Synchronized php-1.30.0-wmf.10/includes/specials/pagers/UsersPager.php: T171332 (duration: 00m 43s)
  • 18:18 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2002 (duration: 01m 06s)
  • 18:17 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2002
  • 18:16 reedy@tin: Synchronized wmf-config/Wikibase.php: T125500 (duration: 00m 43s)
  • 18:15 reedy@tin: Synchronized php-1.30.0-wmf.10/extensions/Echo/modules/styles/mw.echo.ui.NotificationBadgeWidget.less: T171302 (duration: 00m 45s)
  • 18:15 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001 (duration: 00m 04s)
  • 18:14 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001
  • 18:13 reedy@tin: Synchronized php-1.30.0-wmf.10/extensions/Thanks/extension.json: T170917 (duration: 00m 43s)
  • 18:11 reedy@tin: Synchronized wmf-config/unitConversionConfig.json: T168582 (duration: 00m 43s)
  • 18:06 otto@tin: Finished deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001 (duration: 01m 39s)
  • 18:05 otto@tin: Started deploy [eventlogging/eventbus@c1c2c39]: test deploy with scap depool on kafka2001
  • 18:03 reedy@tin: Synchronized phpcs.xml: phpcs (duration: 00m 43s)
  • 18:02 reedy@tin: Synchronized tests: phpcs (duration: 00m 43s)
  • 18:01 reedy@tin: Synchronized w: phpcs (duration: 00m 43s)
  • 18:00 reedy@tin: Synchronized docroot/: phpcs (duration: 00m 44s)
  • 17:59 reedy@tin: Synchronized wmf-config/: phpcs (duration: 00m 45s)
  • 17:40 gehel@tin: Finished deploy [wdqs/wdqs@c1b5c27]: (no justification provided) (duration: 01m 58s)
  • 17:38 gehel@tin: Started deploy [wdqs/wdqs@c1b5c27]: (no justification provided)
  • 16:20 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=pdfrender
  • 15:29 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=pdfrender
  • 14:29 marostegui: Run maintain-views on labsdb1009, labsdb1010 and labsdb1011 for s2 wikis - T153743
  • 14:15 marostegui: Global rename of Carrotkit - T171474
  • 14:11 zeljkof: EU SWAT finished
  • 14:07 zeljkof: extending EU SWAT until https://gerrit.wikimedia.org/r/#/c/367384/ is deployed
  • 14:03 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: pagePreviews: Increase i13n sampling rate for ruwiki (T171325) (duration: 00m 43s)
  • 14:03 hashar: mwdebug1002 ran scap pull
  • 13:57 zfilipin@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/2cc7028226a539553178454fc2f14459 for details)
  • 13:53 moritzm: installing bind security updates (we're using client-side libs/tools only)
  • 13:45 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add some import sources for tawikisource (T171395) (duration: 00m 43s)
  • 13:32 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Allow flooders to remove themselves from the flood group on zhwiki (171379) (duration: 00m 43s)
  • 13:28 paravoid: upgrading nagios-nrpe-server to 3.0.1-3+deb9u1 on all stretch hosts
  • 13:23 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: pagePreviews: Increase instrumentation sampling rate (T171325) (duration: 00m 43s)
  • 13:14 gehel: restarting elasticsearch on relforge for jmv upgrade
  • 13:12 aude@tin: Synchronized wmf-config/Wikibase.php: Bump cache epoch for wikidata (duration: 00m 43s)
  • 13:04 aude@tin: Synchronized php-1.30.0-wmf.10/extensions/Wikidata: Fix several Wikidata bugs (duration: 02m 10s)
  • 11:13 moritzm: updates for jessie 8.8 and stretch 9.1 point updates
  • 10:22 godog: roll restart thumbor to apply new memory limits
  • 10:09 moritzm: installing openjdk security updates on praseodymium/cerium/xenon
  • 09:30 moritzm: uploaded openjdk-8 8u145-b15 to apt.wikimedia.org/jessie-wikimedia
  • 09:04 godog: restart thumbor on thumbor1004 with MemoryLimit=8G
  • 08:29 godog: restart thumbor on thumbor1001 temporarily without memory cgroup limitations
  • 07:17 marostegui: Rename table old_growth on db1089 - T115982
  • 07:15 jynus@tin: Synchronized wmf-config/db-eqiad.php: Fix indenting (duration: 00m 43s)
  • 07:14 jynus@tin: Synchronized wmf-config/StartProfiler.php: Fix indenting (duration: 00m 43s)
  • 07:13 jynus@tin: Synchronized wmf-config/db-codfw.php: Fix indenting (duration: 00m 45s)
  • 06:45 moritzm: installing apache security updates on appserver canaries
  • 05:53 eileen: CiviCRM updated from 38f246d to 382a189
  • 05:40 marostegui: Configure and start s2 replication on labsdb1011 - T153743
  • 03:05 l10nupdate@tin: LocalisationUpdate failed: git pull of extensions failed

2017-07-23

  • 23:26 legoktm@tin: Synchronized php-1.30.0-wmf.10/includes/page/Article.php: [SECURITY] Restore ability to suppress pages while deleting - T171405 (duration: 00m 45s)
  • 12:06 hoo: Restarted hhvm and apache2 on mwdebug1001
  • 12:02 hashar: CI should self recover when the queue is processed. Will check again in an hour or so
  • 12:02 hashar: CI is overloaded due to a mass update of mediawiki-codesniffer to 0.10.1

2017-07-21

  • 22:21 eileen: civicrm updated from 74f9588 to 38f246d
  • 21:29 jynus_: dropping enwiki database from dbstore2002:3306 (default instance) - new s1 already imported on 3311
  • 19:58 hashar: Restarting Jenkins
  • 17:02 jynus: now that db2072 is compressed and fixed, stop it to finally clone it to dbstore2002 T171321
  • 16:36 Reedy: run namespaceDupes.php against tawikisource T165813
  • 15:16 jynus: restarting replication on db2072 after maintenance T151029
  • 15:02 moritzm: installation apache security updates on labmon1001 and netmon*
  • 14:59 moritzm: installation apache security updates on krypton and auth*
  • 14:54 hashar: Restarting Jenkins
  • 14:33 _joe_: ocg started again on ocg1003
  • 14:30 _joe_: stopping ocg temporarily on ocg1003, T162780
  • 13:39 moritzm: installation apache security updates on hafnium, bromine, krypton, rutherfordium
  • 13:29 moritzm: installing apache security updates on fermium/lists.wikimedia.org
  • 12:50 moritzm: rebooting cp* spares for kernel update
  • 11:51 godog: run compiler-update-facts
  • 10:55 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ocg1001.eqiad.wmnet
  • 09:42 godog: add 100G to graphite2002/graphite1003 vgs
  • 08:44 jynus: stopping replication on db2072 to fix some duplicate key errors
  • 08:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Fix some indents (duration: 00m 43s)
  • 07:25 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 - T166204 (duration: 00m 44s)
  • 02:24 eileen: update CiviCRM from 2de7f2a to 74f9588
  • 00:48 reedy@tin: Synchronized search-redirect.php: phpcs (duration: 00m 43s)
  • 00:47 reedy@tin: Synchronized rpc/RunJobs.php: phpcs (duration: 00m 43s)
  • 00:46 reedy@tin: Synchronized w: phpcs (duration: 00m 43s)
  • 00:45 reedy@tin: Synchronized wmf-config/: phpcs (duration: 00m 44s)
  • 00:44 reedy@tin: Synchronized tests/: phpcs (duration: 00m 43s)
  • 00:22 reedy@tin: Synchronized wmf-config/wikitech.php: phpcs (duration: 00m 43s)
  • 00:21 reedy@tin: Synchronized docroot/noc/db.php: phpcs (duration: 00m 43s)
  • 00:11 reedy@tin: Synchronized phpcs.xml: phpcs (duration: 00m 43s)
  • 00:09 reedy@tin: Synchronized docroot/: phpcs (duration: 00m 44s)
  • 00:05 reedy@tin: Synchronized wmf-config/missing.php: phpcs (duration: 00m 43s)

2017-07-20

  • 23:57 reedy@tin: Synchronized wmf-config/: phpcs (duration: 00m 45s)
  • 23:56 reedy@tin: Synchronized w: phpcs (duration: 00m 43s)
  • 23:55 reedy@tin: Synchronized tests: phpcs.xml (duration: 00m 42s)
  • 23:54 reedy@tin: Synchronized errorpages/404.php: phpcs (duration: 00m 43s)
  • 23:53 reedy@tin: Synchronized docroot/: phpcs (duration: 00m 44s)
  • 23:36 reedy@tin: Synchronized php-1.30.0-wmf.10/extensions/Wikidata: Update Wikidata - fix uncaught exception in constraints (duration: 02m 09s)
  • 23:15 reedy@tin: Synchronized wmf-config/Wikibase-production.php: T169647 T168938 (duration: 00m 42s)
  • 23:10 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Author namespace for tawikisource T165813 (duration: 00m 43s)
  • 23:06 reedy@tin: Synchronized phpcs.xml: phpcs (duration: 00m 43s)
  • 23:05 reedy@tin: Synchronized w/health-check.php: phpcs (duration: 00m 43s)
  • 23:04 reedy@tin: Synchronized errorpages/hhvm-fatal-error.php: phpcs (duration: 00m 44s)
  • 23:03 reedy@tin: Synchronized docroot/search.wikimedia.org/index.php: phpcs (duration: 00m 43s)
  • 23:02 reedy@tin: Synchronized wmf-config/: phpcs (duration: 00m 45s)
  • 23:01 reedy@tin: Synchronized tests: phpcs (duration: 00m 44s)
  • 22:22 reedy@tin: Synchronized wmf-config/CommonSettings.php: phpcs (duration: 00m 43s)
  • 22:21 reedy@tin: Synchronized search-redirect.php: phpcs (duration: 00m 43s)
  • 22:20 reedy@tin: Synchronized phpcs.xml: phpcs (duration: 00m 43s)
  • 22:19 reedy@tin: Synchronized tests: phpcs (duration: 00m 44s)
  • 22:18 reedy@tin: Synchronized wmf-config/: phpcs (duration: 00m 46s)
  • 20:29 nuria@tin: Finished deploy [eventlogging/analytics@c1c2c39]: (no justification provided) (duration: 00m 02s)
  • 20:29 nuria@tin: Started deploy [eventlogging/analytics@c1c2c39]: (no justification provided)
  • 19:56 robh@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1003.eqiad.wmnet
  • 19:14 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.10
  • 18:46 otto@tin: Finished deploy [eventlogging/analytics@36846d6]: auto add mysql indexes for meta style events (duration: 00m 04s)
  • 18:46 otto@tin: Started deploy [eventlogging/analytics@36846d6]: auto add mysql indexes for meta style events
  • 18:37 andrewbogott: upgraded mediawiki version on wikitech-static
  • 18:36 thcipriani@tin: Synchronized php-1.30.0-wmf.9/resources/src/mediawiki.widgets.visibleByteLimit/mediawiki.widgets.visibleByteLimit.js: SWAT: mw.widgets.visibleByteLimit: Temporarily disable whilst OOjs UI label bug is fixed T169982 (duration: 00m 47s)
  • 18:35 thcipriani@tin: Synchronized php-1.30.0-wmf.10/resources/src/mediawiki.widgets.visibleByteLimit/mediawiki.widgets.visibleByteLimit.js: SWAT: mw.widgets.visibleByteLimit: Temporarily disable whilst OOjs UI label bug is fixed T169982 (duration: 00m 48s)
  • 18:24 thcipriani@tin: Synchronized php-1.30.0-wmf.10/skins/MonoBook/main.css: SWAT: Revert "Remove `position: absolute` and z-index from #p-logo" T171195 (duration: 00m 47s)
  • 18:16 thcipriani@tin: Synchronized static/images/project-logos: SWAT: Fix hywiki big and medium logos (duration: 00m 47s)
  • 17:54 arlolra: Updated Parsoid to a89a9cc4 (T169293)
  • 17:48 arlolra@tin: Finished deploy [parsoid/deploy@97dbabb]: Updating Parsoid to a89a9cc4 (duration: 09m 09s)
  • 17:39 arlolra@tin: Started deploy [parsoid/deploy@97dbabb]: Updating Parsoid to a89a9cc4
  • 17:16 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikidata back to wmf.10
  • 17:14 ottomata: killed tranquility instances tranq-banners and tranq-netflow  running on druid1003 in joal's screen sessions
  • 14:41 godog: upload diamond 4.0.515-4~bpo8+2 to jessie-wikimedia - T97635
  • 14:33 ema@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org
  • 14:31 ema@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org
  • 14:23 godog: upload diamond 4.0.515-4~bpo8+1 to jessie-wikimedia - T97635
  • 14:10 andrewbogott: upgrading apache on labs via "dpkg -s apache2 && apt-get -y install apache2"
  • 14:07 mobrovac@tin: Finished deploy [restbase/deploy@5aa7bc1]: Translation API bug fix (duration: 07m 58s)
  • 14:00 godog: test diamond 4.0.515-4~bpo8+1 on cp1008
  • 13:59 mobrovac@tin: Started deploy [restbase/deploy@5aa7bc1]: Translation API bug fix
  • 13:59 mobrovac@tin: Finished deploy [restbase/deploy@5aa7bc1] (staging): (no justification provided) (duration: 01m 31s)
  • 13:57 mobrovac@tin: Started deploy [restbase/deploy@5aa7bc1] (staging): (no justification provided)
  • 13:52 moritzm: uprading nodejs on wtp*
  • 13:42 ema: cp1050 stuck at 'Initializing firmware interfaces...', trying to powerdown/powerup
  • 13:37 zeljkof: EU SWAT finished
  • 13:29 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Stop RelatedArticles A/B test and clean up config"" (T169948) (duration: 00m 46s)
  • 13:29 cmjohnson1: downtimed restbase-dev100[1-3] to power off and move ssds to newly racked restbase-dev100[4-6] phab task: T166181
  • 13:29 ema: cp1050 stuck rebooting, power-cycling
  • 13:28 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Revert "Stop RelatedArticles A/B test and clean up config"" (T169948) (duration: 00m 47s)
  • 13:12 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule (T171146) (duration: 00m 48s)
  • 12:58 ema@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org
  • 12:55 ema@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org
  • 12:37 ema@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org
  • 12:25 ema@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org
  • 09:04 ema: eqiad cache_text/upload: upgrade to varnish 4.1.7-1wm1 and reboot for kernel updates
  • 09:04 hashar: Restored CI cache storage (castor) on a fresh new instance. Cache is empty though so jobs will be a bit slower until the cache is populated - T171148
  • 09:02 moritzm: uploaded apache2 2.4.10-10+deb8u10+wmf1 (rebase of WMF-specific patches on top of latest DSA) to apt.wikimedia.org/jessie
  • 08:34 marostegui: Force a BBU relearn on db1016 - T166344
  • 08:29 ema@neodymium: conftool action : set/pooled=yes; selector: name=cp3048.esams.wmnet
  • 08:25 hashar: CI is restored albeit in degraded mode (lack of Castor cache) - T171148
  • 08:01 marostegui: Stop replication on labsdb1011 for maintenance - T153743
  • 07:55 marostegui: Start importing s2 into labsdb1011 - T153743
  • 07:48 godog: restart diamond on serpens/seaborgium to pick up the updated CA
  • 07:41 elukey: powercycle cp3048 - mgmt reachable - T171145
  • 06:54 marostegui: Force a BBU relearn on db1016 - T166344
  • 06:24 mutante: netmon1002 - librenms: fix permissions on /srv/librenms/rrd data after rsyncing, mismatching UIDs vs netmon1001 and rsyncd in chroot-issue
  • 06:00 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp3048.esams.wmnet
  • 05:46 TimStarling: on contint1001 restarted zuul and zuul-merger
  • 05:30 TimStarling: on contint1001 restarted jenkins
  • 05:05 marostegui: Configure replication for s2 on labsdb1009 and labsdb1010 - T153743
  • 04:42 mutante: netmon1002 - restarted Apache for LDAP issue - librenms.wm.org switched back to it, after rsyncing rrd data, re-enabling puppet
  • 04:05 andrewbogott: restarting rabbitmq-server on labcontrol1001
  • 03:34 andrewbogott: service nova-network restart on labnet1001
  • 03:32 andrewbogott: service uwsgi-labspuppetbackend restart on labcontrol1001
  • 03:02 l10nupdate@tin: LocalisationUpdate failed: git pull of extensions failed
  • 02:22 mutante: netmon1001 - rsyncing librenms rrd data to netmon1002 - T159756
  • 01:17 andrewbogott: restarting keystone on labcontrol1001
  • 01:14 twentyafterfour: phabricator upgrade complete
  • 01:10 twentyafterfour: begin (belated) phabricator upgrade, expect momentary downtime.
  • 00:09 dereckson@tin: Synchronized php-1.30.0-wmf.9/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: Revert "Make mw.widgets.SearchInputWidget extend OO.ui.SearchInputWidget" (3/3) (duration: 00m 46s)
  • 00:08 dereckson@tin: Synchronized php-1.30.0-wmf.9/resources/Resources.php: Revert "Make mw.widgets.SearchInputWidget extend OO.ui.SearchInputWidget" (2/3) (duration: 00m 46s)
  • 00:08 dereckson@tin: Synchronized php-1.30.0-wmf.9/includes/widget/SearchInputWidget.php: Revert "Make mw.widgets.SearchInputWidget extend OO.ui.SearchInputWidget" (1/3) (duration: 00m 46s)
  • 00:06 dereckson@tin: Synchronized php-1.30.0-wmf.10/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: Revert "Make mw.widgets.SearchInputWidget extend OO.ui.SearchInputWidget" (3/3) (duration: 00m 46s)
  • 00:04 dereckson@tin: Synchronized php-1.30.0-wmf.10/resources/Resources.php: Revert "Make mw.widgets.SearchInputWidget extend OO.ui.SearchInputWidget" (2/3) (duration: 00m 46s)
  • 00:03 dereckson@tin: Synchronized php-1.30.0-wmf.10/includes/widget/SearchInputWidget.php: Revert "Make mw.widgets.SearchInputWidget extend OO.ui.SearchInputWidget" (1/3) (duration: 00m 46s)

2017-07-19

  • 23:29 dereckson@tin: Synchronized wmf-config/Wikibase-production.php: Use correct class name for JsonUnitStorage (T171107) (duration: 00m 48s)
  • 22:14 reedy@tin: Synchronized multiversion/: (no justification provided) (duration: 01m 11s)
  • 22:01 mutante: hafnium, labmon1001 - restarted apache
  • 22:00 demon@tin: Finished scap: all kinds of code style stuff for James_F & Reedy (duration: 05m 23s)
  • 21:59 mutante: bromine _transparency.wm.org - restarted apache
  • 21:59 mutante: dbmonitor2001 - restarted apache
  • 21:57 ejegg: re-enabled CiviCRM de-dupe job
  • 21:56 mutante: graphite200* - restarted apache
  • 21:54 demon@tin: Started scap: all kinds of code style stuff for James_F & Reedy
  • 21:52 mutante: netmon1003 - puppet run, restarted apache - fixed servermon.wikimedia.org
  • 21:50 mutante: tegmen - restarted apache
  • 21:47 mutante: netmon1001 - adding manual ferm rule for 80/443 - fixed librenms.wm.org
  • 21:44 mutante: netmon1001 (librenms) - re-enable puppet once to get new CA, restart Apache, disable puppet again
  • 21:39 jynus: reloading haproxy on dbproxy1005 for repooling db1009
  • 21:35 mutante: graphite1001 - restarted apache, ran puppet
  • 21:34 chasemp: labstore1004/1005 puppet agent --test && service nslcd restart
  • 21:31 RainbowSprinkles: running puppet & restarting gerrit/apache on cobalt/gerrit2001
  • 21:29 mutante: tungsten - restarted apache for CA change (xhgui)
  • 21:26 mutante: logstash1001/1002 - restarted apache for CA change (logstash/kibana back)
  • 21:25 RainbowSprinkles: Ran puppet and restarted apache on logstash100[1..3]
  • 21:23 madhuvishy: Ran puppet and restarted apache on thorium (Runs hue, yarn, and pivot)
  • 21:21 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: most of group1 back on wmf.10
  • 21:05 mutante: krypton - run puppet, restart apache, fixed grafana-admin
  • 20:19 andrewbogott: restaring slapd on seaborgium
  • 20:15 mobrovac@tin: Finished deploy [restbase/deploy@3bb90c9]: (no justification provided) (duration: 09m 19s)
  • 20:13 chasemp: seaborgium:~# service slapd restart
  • 20:12 chasemp: serpens:~# service slapd restart
  • 20:05 mobrovac@tin: Started deploy [restbase/deploy@3bb90c9]: (no justification provided)
  • 20:05 mobrovac@tin: Finished deploy [restbase/deploy@3bb90c9] (staging): (no justification provided) (duration: 01m 39s)
  • 20:03 mobrovac@tin: Started deploy [restbase/deploy@3bb90c9] (staging): (no justification provided)
  • 19:57 urandom: Restarting Cassandra; restbase-dev1001-a to apply additional data_file_directory (T170276)
  • 19:50 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: Abort wmf.10
  • 19:48 demon@tin: Finished scap: group1 to wmf.10 + symlink swap (duration: 21m 37s)
  • 19:38 ejegg: disabled civicrm dedupe job
  • 19:27 demon@tin: Started scap: group1 to wmf.10 + symlink swap
  • 19:16 mobrovac@tin: Finished deploy [restbase/deploy@c5938f4]: Expose the translation API end points and fix SwaggerUI - T107914 T170729 (duration: 08m 02s)
  • 19:08 mobrovac@tin: Started deploy [restbase/deploy@c5938f4]: Expose the translation API end points and fix SwaggerUI - T107914 T170729
  • 19:07 mobrovac@tin: Finished deploy [restbase/deploy@c5938f4] (staging): (no justification provided) (duration: 01m 42s)
  • 19:05 mobrovac@tin: Started deploy [restbase/deploy@c5938f4] (staging): (no justification provided)
  • 19:05 niharika29@tin: Synchronized php-1.30.0-wmf.10/maintenance/updateRestrictions.php: Set batch size to 1000 in updateRestrictions https://gerrit.wikimedia.org/r/#/c/366301/ (duration: 00m 47s)
  • 19:04 niharika29@tin: Synchronized php-1.30.0-wmf.9/maintenance/updateRestrictions.php: Set batch size to 1000 in updateRestrictions https://gerrit.wikimedia.org/r/#/c/366302/ (duration: 00m 46s)
  • 18:49 niharika29@tin: Synchronized php-1.30.0-wmf.10/includes/collation/IcuCollation.php: Update FIRST_LETTER_VERSION for rowiki changes https://gerrit.wikimedia.org/r/#/c/366299/ (duration: 00m 46s)
  • 18:47 niharika29@tin: Synchronized php-1.30.0-wmf.9/includes/collation/IcuCollation.php: Update FIRST_LETTER_VERSION for rowiki changes https://gerrit.wikimedia.org/r/#/c/366298/ (duration: 00m 46s)
  • 18:40 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Remove 'din' from wmgExtraLanguageNames [mediawiki-config] - https://gerrit.wikimedia.org/r/362876 (https://phabricator.wikimedia.org/T168523) (duration: 00m 47s)
  • 18:25 ariel@tin: Finished deploy [dumps/dumps@63705de]: write list of special dump files with no dump job content (duration: 00m 02s)
  • 18:25 ariel@tin: Started deploy [dumps/dumps@63705de]: write list of special dump files with no dump job content
  • 18:24 niharika29@tin: Synchronized php-1.30.0-wmf.10/includes/collation/IcuCollation.php: IcuCollation: Fix diacritic characters for Romanian (ro) headings https://gerrit.wikimedia.org/r/#/c/366296/ (duration: 00m 46s)
  • 18:23 niharika29@tin: Synchronized php-1.30.0-wmf.9/includes/collation/IcuCollation.php: IcuCollation: Fix diacritic characters for Romanian (ro) headings https://gerrit.wikimedia.org/r/#/c/366295/ (duration: 00m 47s)
  • 16:35 robh@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1002.eqiad.wmnet
  • 16:27 robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
  • 16:27 robh@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1003.eqiad.wmnet
  • 16:23 robh@puppetmaster1001: conftool action : set/pooled=active; selector: name=wdqs1002.eqiad.wmnet
  • 16:16 marostegui: Compressing innodb on dbstore1002 for the following wikis: viwiki ukwiki kowiki huwiki hewiki frwiktionary fawiki eswiki cawiki arwiki - T168303
  • 16:10 robh@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1196.eqiad.wmnet
  • 16:04 robh: mw1196 has hardware failure and is being decommissioned
  • 16:04 robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1196.eqiad.wmnet
  • 14:44 hashar: Restarting Jenkins
  • 14:31 moritzm: installing imagemagick security updates
  • 13:55 _joe_: running clear-host-cache.js for ocg1001 decommission T170886
  • 13:53 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=pdf,name=ocg1001.eqiad.wmnet
  • 13:46 marostegui: Compress database rowiki on dbstore1002 - T168303
  • 13:45 elukey: restart hive-server on analytics1003 - Java OOM issue due to a huge query
  • 13:34 hashar: European SWAT completed
  • 13:32 elukey: Limit the access to the conf* zookeeper ports via ferm rules - https://gerrit.wikimedia.org/r/366228
  • 13:29 hashar: Purged all 1685 project-logos ( find static/images/project-logos -maxdepth 1 -type f| sed -e 's%^%https://en.wikipedia.org/%'%7Cmwscript purgeList.php --wiki=enwiki )
  • 13:26 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Change timezone on nl.wikinews to Europe/Berlin - T170985 (duration: 00m 44s)
  • 13:24 marostegui: Optimize EditConflict_8860941_15423246 and Echo_7731316 on dbstore1002 - T168303
  • 13:17 hashar@tin: Synchronized static/images/project-logos/nlwikinews.png: Change logo on nl.wikinews - T170984 (duration: 00m 47s)
  • 13:11 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: nescio.wikimedia.org
  • 13:10 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Update wikiversity logos to 2017 - T160491 (duration: 00m 46s)
  • 13:09 hashar@tin: Synchronized static/images/project-logos: Update wikiversity logos to 2017 - T160491 (duration: 00m 48s)
  • 13:05 hashar@tin: Synchronized wmf-config/throttle.php: Extend throttle rule - T170844 (duration: 00m 48s)
  • 12:56 moritzm: rebooting nescio (DNS recursor) for kernel update
  • 12:55 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: nescio.wikimedia.org
  • 12:49 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: maerlant.wikimedia.org
  • 12:40 Reedy: running foreachwiki updateRestrictions.php T166184
  • 12:34 moritzm: rebooting maerlant (DNS recursor) for kernel update
  • 12:29 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: maerlant.wikimedia.org
  • 12:10 ema@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org
  • 12:09 ema@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org
  • 12:07 ariel@tin: Finished deploy [dumps/dumps@f95292e]: fix api call bug, page range query min pages (duration: 00m 03s)
  • 12:07 ariel@tin: Started deploy [dumps/dumps@f95292e]: fix api call bug, page range query min pages
  • 11:40 kartik@tin: Finished deploy [cxserver/deploy@1029833]: Update cxserver to d28ad0c (duration: 03m 01s)
  • 11:37 moritzm: rebooting acamar (DNS recursor) for kernel update
  • 11:37 kartik@tin: Started deploy [cxserver/deploy@1029833]: Update cxserver to d28ad0c
  • 10:58 marostegui: Global rename of user Moros - T170941
  • 10:09 ema: ulsfo cache_text/upload: upgrade to varnish 4.1.7-1wm1 and reboot for kernel updates
  • 10:06 marostegui: Deploy alter table on s1 - db1051 - T166204
  • 10:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 - T166204 (duration: 00m 47s)
  • 10:01 filippo@tin: Finished deploy [statsv/statsv@0a86be8]: (no justification provided) (duration: 00m 03s)
  • 10:01 filippo@tin: Started deploy [statsv/statsv@0a86be8]: (no justification provided)
  • 09:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1065 - T166204 (duration: 00m 47s)
  • 09:16 ema: finish up codfw cache_text/upload varnish/kernel upgrades
  • 09:05 oblivian@neodymium: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(restbase-async|citoid)
  • 09:03 XioNoX: codfw repooled in dns - T170380
  • 09:01 hashar: restarting nodepool for upgrade 0.1.1-wmf7 -> 0.1.1-wmf8
  • 08:48 moritzm: uploaded nodepool 0.1.1+wmf8 to apt.wikimedia.org
  • 08:29 XioNoX: asw-c-codfw back online - T170380
  • 08:28 XioNoX: asw-c-codfw restarted 8min ago for switch upgrade - T170380
  • 08:09 akosiaris: disable librenms crons on netmon1002 for a while
  • 07:57 marostegui: Drop migrateuser_medium from s7 - T170310
  • 07:49 oblivian@neodymium: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(restbase-async|citoid)
  • 05:26 _joe_: ran systemctl reset-failed on codfw jobrunners after the jobrunner process was activated by mistake running scap at 21.20 UTC yesterday
  • 03:03 l10nupdate@tin: LocalisationUpdate failed: git pull of extensions failed
  • 01:27 mutante: netmon1001 - stopping all the services, killing snmpwalk, disarming keyholder
  • 00:35 reedy@tin: Synchronized wmf-config/CommonSettings.php: Remove rcs1001 and rcs1002 from CommonSettings wgRCFeeds. Stops a load of logspam T170157 (duration: 00m 48s)

2017-07-18

  • 23:53 mutante: netmon1002 - copied Letsencrypt cert/key for librenms from netmon1001 for migration after netmon1002 has been reinstalled and now has RAID. (T159756)
  • 23:40 thcipriani@tin: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: Add din to InterwikiSortOrders T168518 (duration: 00m 46s)
  • 23:35 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Welsh mobile logo (just changes 'k' to 'c' PART II (duration: 00m 46s)
  • 23:34 thcipriani@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-cy.svg: SWAT: Add Welsh mobile logo (just changes 'k' to 'c' PART I (duration: 00m 47s)
  • 23:27 thcipriani@tin: Synchronized php-1.30.0-wmf.9/extensions/Thanks/extension.json: SWAT: Add missing jQueryMsg dependency for mobile diff view T170917 (duration: 00m 47s)
  • 23:22 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable OOjs UI EditPage buttons on all Wikipedias T162849 (duration: 00m 47s)
  • 23:13 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable CodeMirror on simplewiki for better testing and more exposure (duration: 00m 48s)
  • 22:58 thcipriani: restared jobrunner on mw1299.eqiad.wmnet mw1168.eqiad.wmnet mw1164.eqiad.wmnet mw1305.eqiad.wmnet mw1304.eqiad.wmnet mw1301.eqiad.wmnet mw1259.eqiad.wmnet mw1166.eqiad.wmnet mw1300.eqiad.wmnet
  • 22:42 krinkle@tin: Finished deploy [jobrunner/jobrunner@5f6099f]: (no justification provided) (duration: 08m 18s)
  • 22:34 krinkle@tin: Started deploy [jobrunner/jobrunner@5f6099f]: (no justification provided)
  • 22:02 krinkle@tin: Finished deploy [jobrunner/jobrunner@5f6099f]: (no justification provided) (duration: 07m 58s)
  • 21:54 krinkle@tin: Started deploy [jobrunner/jobrunner@5f6099f]: (no justification provided)
  • 21:43 Krinkle: Attempt to deploy mediawiki/services/jobrunner – https://gerrit.wikimedia.org/r/#/c/349364/ - failed.
  • 19:56 dzahn@neodymium: conftool action : set/pooled=yes; selector: name=mw2202.codfw.wmnet
  • 19:48 robh: starting wipe on cp400[1-4] per T169020
  • 19:15 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.10
  • 18:59 demon@tin: Synchronized php-1.30.0-wmf.9/extensions/MobileFrontend/extension.json: One (more) last thing (duration: 02m 49s)
  • 18:51 demon@tin: Synchronized php-1.30.0-wmf.9/extensions/MobileFrontend/extension.json: One last thing (duration: 02m 55s)
  • 18:42 mutante: netmon1002 - reinstall OS - didn't use the right partman recipe - didn't have md0 - revoke old puppet cert , salt-key, scheduled downtime, services over at netmon2001
  • 18:36 mutante: mw2202 - scheduled downtime - mainboard replacement
  • 18:36 ejegg: updated payments-wiki from bdc5226 to c3be2bf
  • 18:29 demon@tin: Finished scap: mobilefrontend wmf.9 + forced l10n rebuild (duration: 20m 53s)
  • 18:26 mutante: mw2202 - remove /etc/udev/rules.d/70-persistent-net.rules for mainboard replacement - to detect new NICs with new MACs (T170307)
  • 18:24 dzahn@neodymium: conftool action : set/pooled=no; selector: name=mw2202.codfw.wmnet
  • 18:08 demon@tin: Started scap: mobilefrontend wmf.9 + forced l10n rebuild
  • 18:02 ottomata: stopping kafka on kafka1012 again, i think we swapped the wrong disk T168927
  • 17:55 awight@tin: Finished deploy [ores/deploy@1d35aa5]: T170485 (duration: 35m 06s)
  • 17:47 mutante: smokeping - switched to netmon2001 - ping times to codfw hosts went down - ping times to eqiad hosts went up - since service is on both but data has been synced over
  • 17:41 demon@tin: Synchronized wmf-config/InitialiseSettings.php: labtest typofix for tgr (duration: 00m 46s)
  • 17:21 mobrovac@tin: Finished deploy [parsoid/deploy@1eaa07e]: Bring wtp2019 up to date and repool it - T146113 (duration: 01m 02s)
  • 17:20 mobrovac@tin: Started deploy [parsoid/deploy@1eaa07e]: Bring wtp2019 up to date and repool it - T146113
  • 17:20 awight@tin: Started deploy [ores/deploy@1d35aa5]: T170485
  • 17:18 demon@tin: Finished scap: testwiki to wmf.10 + l10n cache build (duration: 24m 23s)
  • 17:16 ottomata: stopping kafka broker on kafka1012 to replace disk T168927
  • 16:53 demon@tin: Started scap: testwiki to wmf.10 + l10n cache build
  • 16:45 oblivian@tin: Started deploy [search/MjoLniR@0140aed]: init
  • 16:44 oblivian@tin: Started deploy [search/MjoLniR@0140aed]: (no justification provided)
  • 16:40 demon@tin: Pruned MediaWiki: 1.30.0-wmf.7 [keeping static files] (duration: 06m 06s)
  • 16:31 godog: finish rollout of thumbor 1.1 in eqiad - T170677
  • 16:00 marostegui: Deploy alter table on s1 - labsdb1003 - T166204
  • 15:59 ema: power-cycle cp2017, stuck rebooting
  • 15:45 tgr@tin: Synchronized wmf-config/InitialiseSettings.php: T170863 deploy TemplateStyles to some non-content wikis (all target wikis) (duration: 00m 45s)
  • 15:37 tgr@tin: Finished scap: T170863 deploy TemplateStyles to some non-content wikis (first step: testwiki/labstestwiki only) (forcing; canary errors are unrelated) (duration: 10m 19s)
  • 15:26 tgr@tin: Started scap: T170863 deploy TemplateStyles to some non-content wikis (first step: testwiki/labstestwiki only) (forcing; canary errors are unrelated)
  • 15:14 marostegui: Stop MySQL and shutdown pc2006 for mainboard replacement - T170520
  • 15:08 tgr@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details) (duration: 09m 42s)
  • 15:07 tgr@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 14:58 tgr@tin: Started scap: T170863 deploy TemplateStyles to some non-content wikis (first step: testwiki/labstestwiki only)
  • 14:55 godog: upload and roll-upgrade thumbor to 1.1 - T170677
  • 14:44 zeljkof: EU SWAT finished!
  • 14:42 awight@tin: Finished deploy [ores/deploy@1d35aa5]: T170485 (duration: 00m 26s)
  • 14:41 awight@tin: Started deploy [ores/deploy@1d35aa5]: T170485
  • 14:39 zfilipin@tin: Synchronized portals: (no justification provided) (duration: 00m 45s)
  • 14:38 zfilipin@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 44s)
  • 14:37 moritzm: installing apache updates on silver
  • 14:16 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Make maiwikimedias logo a little bit bigger (T170922) (duration: 00m 43s)
  • 14:10 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: New throttle rule (T170844) (duration: 00m 43s)
  • 14:07 zfilipin@tin: Synchronized static/images/project-logos/enwikiquote.png: SWAT: Update enwikiquotes logo (T170722) (duration: 00m 43s)
  • 14:01 zeljkof: continuing with EU SWAT
  • 13:51 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Provide HD logos for several Wikiquotes (T150618) (duration: 00m 43s)
  • 13:50 ema: codfw cache_text/upload: upgrade to varnish 4.1.7-1wm1 and reboot for kernel updates
  • 13:48 zfilipin@tin: Synchronized static/images/project-logos: SWAT: Provide HD logos for several Wikiquotes (T150618) (duration: 00m 44s)
  • 13:19 zfilipin@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Enable mobile non-JavaScript editing on all MobileFrontend wikis (T125174) (duration: 00m 43s)
  • 13:18 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable mobile non-JavaScript editing on all MobileFrontend wikis (T125174) (duration: 00m 43s)
  • 13:16 zfilipin@tin: Synchronized wmf-config/mobile.php: SWAT: Enable mobile non-JavaScript editing on all MobileFrontend wikis (T125174) (duration: 00m 44s)
  • 12:32 marostegui: Run maintain-views on labsdb1001,1003,1009,1010 and 1011 - T168788
  • 12:10 akosiaris: remove oresrdb.svc.eqiad.wmnet in scb1001's /etc/hosts, but do not restart/reload uwsgi-ores and ores-celery-worker
  • 11:56 akosiaris: add oresrdb.svc.eqiad.wmnet in scb1001's /etc/hosts, restart uwsgi-ores and ores-celery-worker
  • 11:25 ema: powercycle cp3034, not rebooting properly
  • 11:00 ema: lvs200[12] upgrade pybal to 1.13.9 T82747 T154759
  • 10:59 ema: lvs200[45] upgrade pybal to 1.13.9 T82747 T154759
  • 10:54 ema: lvs400[12] upgrade pybal to 1.13.9 T82747 T154759
  • 10:52 ema: lvs400[34] upgrade pybal to 1.13.9 T82747 T154759
  • 10:43 ema: lvs100[12] upgrade pybal to 1.13.9 T82747 T154759
  • 10:33 moritzm: rebooting oresrdb1002 for kernel update
  • 10:32 ema: lvs100[45] upgrade pybal to 1.13.9 T82747 T154759
  • 10:28 moritzm: rebooting oresrdb2002 for kernel update
  • 09:54 ema: esams cache_text/upload: upgrade to varnish 4.1.7-1wm1 and reboot for kernel updates
  • 09:29 ema: cp3030: upgrade to varnish 4.1.7-1wm1 and reboot for kernel update
  • 09:15 ema: lvs300[12] upgrade pybal to 1.13.9 T82747
  • 09:13 ema: lvs300[34] upgrade pybal to 1.13.9 T82747
  • 09:08 elukey: reboot conf1003 for kernel updates
  • 09:00 elukey: reboot conf1002 for kernel updates
  • 07:52 moritzm: upgrade wtp1001 to nodejs 6.11
  • 07:32 elukey: moved /home to /srv/home on stat1006 to free disk space (created symling from /home -> /srv/home too) - T152712
  • 06:42 moritzm: upgrading restbase on the various test clusters to nodejs 6.11
  • 05:59 marostegui: Deploy alter table on s1 - db1065 - T166204
  • 05:59 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1065 - T166204 (duration: 00m 43s)
  • 05:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1073 - T166204 (duration: 00m 44s)
  • 04:22 Jamesofur: remove 2FA from NativeForeigner per T170911
  • 02:45 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jul 18 02:45:25 UTC 2017 (duration 6m 36s)
  • 02:38 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.9) (duration: 13m 25s)
  • 02:13 mutante: nitrogen/nihal - rm /usr/lib/ganglia/python_modules/postgresql.py ; rm /etc/ganglia/conf.d/* ; restart gmond (T169953)

2017-07-17

  • 23:46 dzahn@neodymium: conftool action : set/pooled=yes; selector: name=mw2201.codfw.wmnet
  • 23:33 demon@tin: Synchronized wmf-config/InitialiseSettings.php: all wikis to minervaneue (duration: 00m 44s)
  • 23:19 demon@tin: Synchronized wmf-config/InitialiseSettings.php: testwiki to minervaneue (duration: 00m 44s)
  • 22:16 thcipriani@tin: Finished scap: Add missing Minerva skin description message key prep for MinervaNeue deployment (duration: 18m 57s)
  • 21:57 thcipriani@tin: Started scap: Add missing Minerva skin description message key prep for MinervaNeue deployment
  • 21:21 mutante: ocg1001 - Type: General Protection Fault (13) Source: Software (UEFI0011) - depooled
  • 21:20 dzahn@neodymium: conftool action : set/pooled=no; selector: name=ocg1001.eqiad.wmnet
  • 21:19 mutante: ocg1001 - dead - " Exception Inside the Exception Handler
  • 21:18 mutante: powercycling ocg1001 which went down and had no console output at all
  • 21:13 eileen1: update CiviCRM from 15831ac to 2de7f2a
  • 21:04 mutante: mw2202 - renew puppet cert that was accidentally revoked
  • 21:01 mutante: mw2201 - revoke old puppet cert, salt key, accept/sign news cert and key, initial pupet run .. T170307
  • 20:52 eileen1: revision for civicrm changed...
  • 20:39 eileen1: update civicrm from 8840b94 to e4824fb
  • 20:33 mutante: mw2201 - reinstalling OS after mainboard replacement (network interfaces became eth2/eth3 from eth0/eth1 so ferm failed etc) - T170307
  • 20:26 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.9
  • 20:19 thcipriani@tin: Synchronized php-1.30.0-wmf.9/extensions/RelatedArticles: Add limit via ResourceLoaderGetConfigVars (duration: 02m 38s)
  • 19:26 thcipriani@tin: Synchronized php-1.30.0-wmf.9/extensions/CirrusSearch: Add PoolCounter specifically for morelike T170648 (duration: 03m 02s)
  • 18:44 mutante: rebooting mw2201 for MAC address change
  • 18:36 niharika29@tin: Synchronized php-1.30.0-wmf.9/resources/src/mediawiki.rcfilters/: RCFilters: Allow experimental live update feature to be enabled with query string parameter https://gerrit.wikimedia.org/r/#/c/365413/ (duration: 02m 51s)
  • 18:20 mobrovac@tin: Finished deploy [restbase/deploy@f5ca520]: Activate dinwiki support (duration: 07m 39s)
  • 18:16 niharika29@tin: Synchronized wmf-config/PoolCounterSettings.php: Configure CirrusSearch-MoreLike pool counter [mediawiki-config] - https://gerrit.wikimedia.org/r/365406 (T170648) (duration: 02m 54s)
  • 18:12 mobrovac@tin: Started deploy [restbase/deploy@f5ca520]: Activate dinwiki support
  • 18:07 mobrovac@tin: Finished deploy [changeprop/deploy@f80c333]: (no justification provided) (duration: 01m 17s)
  • 18:06 mobrovac@tin: Started deploy [changeprop/deploy@f80c333]: (no justification provided)
  • 17:39 mobrovac@tin: Finished deploy [restbase/deploy@f5ca520]: Bringing restbase2001 up to date (duration: 01m 21s)
  • 17:38 mobrovac@tin: Started deploy [restbase/deploy@f5ca520]: Bringing restbase2001 up to date
  • 16:41 _joe_: trying to revive pdfrender on scb1002, the usual bug with its restarts
  • 16:12 dzahn@neodymium: conftool action : set/pooled=no; selector: name=mw2201.codfw.wmnet
  • 15:47 marostegui: Stop MySQL on pc2006 - T170520
  • 15:35 ema: restart pybal on lvs100[36] T169765
  • 15:32 jynus: starting table compressing at db2072 (lag is possible)
  • 15:29 zeljkof: EU SWAT finished! (updateCollation.php still running in the background)
  • 15:21 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set collation for Romanian wikis to uca-ro-u-kn (T168711) (duration: 00m 47s)
  • 15:07 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Allow uploads to autoconfirmed-only at huwiki (T169438) (duration: 00m 47s)
  • 15:03 moritzm: uploaded Linux 4.9.30-2+deb9u2 backport to jessie-wikimedia
  • 14:58 ema: restart pybal on lvs100[12] T169765
  • 14:57 ema: restart pybal on lvs100[45] T169765
  • 14:56 marostegui: Deploy, manually, alter tables on enwiki on db1047 - T166204
  • 14:47 zfilipin@tin: Synchronized static/images/: SWAT: Run optipng -o7 at all PNGs (T170569) (duration: 00m 47s)
  • 14:46 elukey: reboot conf1001 for kernel updates
  • 14:39 Dereckson: Created account "Biplab Anand" at bureaucrat level on mai.wikimedia (T168782)
  • 14:34 andrewbogott: changing nodepool rate to '6' and restarting nodepool
  • 14:34 marostegui: Run maintain-views on labsdb1009,10 and 11 for s6 - T153743
  • 14:13 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add HD logos for several Wiktionaries (T150618) (duration: 00m 46s)
  • 14:11 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Add HD logos for several Wiktionaries (T150618) (duration: 00m 49s)
  • 14:11 ottomata: decommissioning rcs100[12] to spare::system: T170157
  • 14:02 zeljkof: Extending EU SWAT
  • 13:59 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Provide HD logos for several Wikiversities (T150618) (duration: 00m 46s)
  • 13:58 zfilipin@tin: Synchronized static/images/project-logos: SWAT: Provide HD logos for several Wikiversities (T150618) (duration: 00m 47s)
  • 13:46 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Provide HD logos for several Wikipedias (T150618) (duration: 00m 46s)
  • 13:43 marostegui: Deploy alter table on s4 - dbstore1001 - T168661
  • 13:37 zfilipin@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 13:36 zfilipin@tin: Synchronized static/images/project-logos/: SWAT: Provide HD logos for several Wikipedias (T150618) (duration: 00m 48s)
  • 11:35 marostegui: Deploy alter table on s4 - dbstore1002 - T168661
  • 11:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 - T168661 (duration: 00m 46s)
  • 10:16 moritzm: installing apache updates on graphite hosts
  • 10:02 moritzm: installing apache updates on logstash
  • 10:01 moritzm: installing apache updates on otrs.wikimedia.org
  • 09:48 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2062 (duration: 00m 47s)
  • 09:24 marostegui: Disable puppet on labsdb1010 for maintenance - T153743
  • 09:22 marostegui: Stop replication on labsdb1009 and labsdb1010 for maintenance - T153743
  • 09:05 marostegui: Disable puppet on labsdb1009 for maintenance - T153743
  • 08:28 akosiaris: reboot helium/heze for kernel upgrades
  • 08:23 marostegui: Deploy alter table s1 - labsdb1001 - T166204
  • 08:20 marostegui: Increase expire_logs_days on db1069:3311 from 7 to 14 temporarily - T166204
  • 08:17 ema: lvs100[39]: upgrade pybal to 1.13.9 T82747 T154759
  • 08:06 ema: lvs2003: upgrade pybal to 1.13.9 T82747 T154759
  • 07:57 ema@neodymium: conftool action : set/pooled=inactive; selector: name=wdqs1002.eqiad.wmnet
  • 07:55 akosiaris: upgrade nodejs to 6.11 on etherpad1001
  • 07:32 moritzm: updating ruthenium to nodejs 6.11
  • 07:12 marostegui: Stop slave s2 on db1102 for maintenance - T153743
  • 07:09 marostegui: Deploy alter table s4 - db1056 - T168661
  • 07:06 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056 - T168661 (duration: 00m 46s)
  • 07:05 marostegui@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 07:00 marostegui: Rename labsdb1011 main replication thread to an specific one - T153743
  • 06:50 marostegui: Stop replication on db1095 for maintenance - T153743
  • 06:48 marostegui: Deploy alter table on s1 - db1073 - T166204
  • 06:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1073 - T166204 (duration: 01m 04s)
  • 05:21 marostegui: Add 50G to /srv on db1069
  • 05:09 marostegui: Restart MySQL on labsdb1009 for maintenance - T170657
  • 03:13 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jul 17 03:13:51 UTC 2017 (duration 7m 16s)
  • 03:06 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.9) (duration: 12m 48s)
  • 02:31 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 09m 13s)

2017-07-15

  • 10:42 elukey: puppetdb restarted on nitrogen, puppet agents re-enabled - T170740
  • 10:06 akosiaris: disable puppet on the entire fleet for puppetdb debugging on nitrogen
  • 02:53 mutante: servermon - switched to netmon1003 backend (jessie ganeti)
  • 02:46 mutante: netmon1001 - stopping "make_updates" cron , migration to netmon1003, flipping cache::backend to netmon1003 (T170653)

2017-07-14

  • 21:48 mutante: netmon1003 - reinstalled with jessie - saw nothing on ganeti console at all which was a bit confusing, but install finished anyways - adding to puppet / signing cert (T170655)
  • 20:47 bblack: mailbox lag: restarting cp1074 backend
  • 19:50 mutante: wikitech-static: re-enabled HSTS - line was commented out in Apache config, activated it again
  • 18:54 herron: added exim from/subject filter for spam observed from qq.com - T170601
  • 16:36 herron: lowered mailman/lists spam_score exim acl to 6 - T170601
  • 11:41 marostegui: Add 50G to /srv/ on dbstore1002 - T168303
  • 11:35 jynus: stop db2062 and db2072 for cloning
  • 10:43 jynus: altering wmde_analytics_betafeature_users_today table to ENGINE=InnoDB
  • 10:17 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2062 (duration: 00m 47s)
  • 09:57 moritzm: uploaded nodejs_6.11.0~dfsg-1+wmf to apt.wikimedia.org (for jessie and stretch) (T170548)
  • 07:22 marostegui: Stop replication on labsdb1011 for maintenance - T153743
  • 06:59 marostegui: Create views for dinwiki on labsdb1009, 1010 and 1011 - T169193
  • 05:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072 - T166204 (duration: 00m 46s)
  • 04:21 mutante: netmon1002/netmon2001 - change UID/GID for rancid to universal 445/445, use find -exec to chown existing files, for unmessy data syncing, define UID on wikitech page UID (T166180)

2017-07-13

  • 23:47 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: revert all wikis to php-1.30.0-wmf.9, again
  • 23:19 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to php-1.30.0-wmf.9
  • 23:08 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: revert all wikis to php-1.30.0-wmf.9
  • 22:57 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.9
  • 22:04 bd808: Stashbot working after backend ElasticSearch cluster upgrade
  • 21:31] robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2001.codfw.wmnet
  • 21:34] demon@tin: Locking from deployment [operations/mediawiki-config]: Nobody use this (planned duration: 60m 00s)
  • 21:36 demon@tin: Unlocked for deployment [operations/mediawiki-config]: Nobody use this (duration: 01m 23s)
  • 21:28 robh@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2002.codfw.wmnet
  • 21:28 robh@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2003.codfw.wmnet
  • 20:56 demon@tin: Synchronized wmf-config/InitialiseSettings.php: MinervaNeue on testwiki (duration: 00m 47s)
  • 20:01 smalyshev@tin: Finished deploy [wdqs/wdqs@a32dbeb]: Redeploy GUI due to breakage in T165228 (duration: 02m 19s)
  • 19:59 smalyshev@tin: Started deploy [wdqs/wdqs@a32dbeb]: Redeploy GUI due to breakage in T165228
  • 18:39 dzahn@neodymium: conftool action : set/pooled=yes; selector: name=mw2202.codfw.wmnet
  • 18:38 dzahn@neodymium: conftool action : set/pooled=yes; selector: name=mw2201.codfw.wmnet
  • 18:31 arlolra: Updated Parsoid to 71c07681 (T169293)
  • 18:29 bblack: upgrading nginx on +wmf1 hosts: conf[1001-1003].eqiad.wmnet,cp1048.eqiad.wmnet,cp3036.esams.wmnet,elastic2020.codfw.wmnet,hassaleh.codfw.wmnet,hassium.eqiad.wmnet
  • 18:22 arlolra@tin: Finished deploy [parsoid/deploy@d0041f2]: Updating Parsoid to 71c07681 (duration: 11m 12s)
  • 18:11 arlolra@tin: Started deploy [parsoid/deploy@d0041f2]: Updating Parsoid to 71c07681
  • 17:46 volans: re-enabling puppet and force run on 'R:Package = nginx-common'
  • 17:38 bblack: restarting varnish-be on cp1049 (mailbox lag)
  • 17:36 bblack: restarting puppetmasters, staggered
  • 17:06 volans: disabled puppet on nitrogen
  • 16:34 chasemp: labstore2001:~# systemctl disable lvm2-activation && systemctl disable lvm2-activation-early && systemctl reset-failed (slated to be reimaged by madhu -- this alert is non-actionable)
  • 16:19 urandom: Starting cassandra-a, restbase2007 (OOM)
  • 16:03 dzahn@neodymium: conftool action : set/pooled=no; selector: name=mw2202.codfw.wmnet
  • 16:03 dzahn@neodymium: conftool action : set/pooled=no; selector: name=mw2201.codfw.wmnet
  • 15:33 marostegui: Deploy alter table on s1 - labsdb1009 - T166204
  • 15:14 ejegg: updated civicrm from 0aa0f8f to 8840b94
  • 14:49 marostegui: Skip maiwikimedia database creation which is breaking dbstore2001 replication - T168788
  • 14:42 godog: roll-restart cassandra in services-test to pick up renewed certs
  • 14:21 marostegui: Run redact_sanitarium on db1069 and db1095 for maiwikimedia - T168788
  • 14:19 Reedy: `mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=maiwikimedia Translate` for T168782
  • 14:13 moritzm: rebooting graphite1* for kernel update
  • 13:11 zeljkof: EU SWAT finished
  • 13:09 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add enwiki as import source for specieswiki (T170094) (duration: 00m 47s)
  • 12:52 moritzm: installing nginx security updates on cp1*
  • 12:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1053 - T168661 (duration: 01m 03s)
  • 11:40 akosiaris: stop ircecho, icinga is misbehaving badly, no point it having it around
  • 11:28 akosiaris: restart icinga, it's reporting wrong stuff all over the place
  • 10:35 moritzm: installing nginx security updates on cp3*
  • 10:09 moritzm: rebooting graphite2* for kernel update
  • 10:04 ema: lvs[12]006: upgrade pybal to 1.13.9 T82747 T154759
  • 09:43 ema: lvs1010: upgrade pybal to 1.13.9 T82747 T154759
  • 09:41 ema: pybal 1.13.9 uploaded to apt.w.o
  • 09:21 moritzm: installing nginx security updates on cp2*
  • 08:32 moritzm: enabling jobrunner/jobchron on mw1260 (jessie video scaler)
  • 08:19 godog: upgrade grafana to 4.4.1 on krypton - T169773
  • 08:11 jynus: powercycle pc2006, was down
  • 08:09 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: mw1260.eqiad.wmnet
  • 07:58 marostegui: Deploy alter table on s4 - db1053 - T168661
  • 07:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1053 - T168661 (duration: 00m 47s)
  • 07:33 moritzm: rebooting netmon1001
  • 06:57 moritzm: installing apache security updates on remaining mw1* hosts
  • 06:51 moritzm: installing nginx security updates on cp4*
  • 06:51 marostegui: Manually deploy some alter tables on dbstore1001 for enwiki - T166204
  • 06:42 _joe_: rolling restart of pybal on low-traffic balancers
  • 06:14 XioNoX: restricting ssh algorithms on network devices - T170369
  • 06:11 moritzm: fixed salt setup for reimaged stat1006
  • 03:09 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jul 13 03:09:35 UTC 2017 (duration 7m 8s)
  • 03:02 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.9) (duration: 07m 57s)
  • 02:34 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 09m 54s)
  • 02:26 mutante: labtestpuppetmaster2001 - flapping icinga alerts about salt-minion starting and stopping constantly - there is an accepted salt-key but it was rejected by the master, server was reinstalled but still old key - deleted old key, accepted new key (T167157)

2017-07-12

  • 23:28 thcipriani@tin: Synchronized php-1.30.0-wmf.9/extensions/CirrusSearch/resources/ext.cirrus.explore-similar.js: SWAT: Adding full URLs to Explore Similar API calls T149809 T164856 (duration: 00m 47s)
  • 23:08 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Index deletes everywhere T163235 (duration: 00m 47s)
  • 22:08 mobrovac@tin: Finished deploy [recommendation-api/deploy@d5076c2]: (no justification provided) (duration: 01m 50s)
  • 22:06 mobrovac@tin: Started deploy [recommendation-api/deploy@d5076c2]: (no justification provided)
  • 21:54 andrewbogott: restarting nodepool to pick up a config change
  • 21:42 mobrovac@tin: Finished deploy [recommendation-api/deploy@ca816ac]: (no justification provided) (duration: 02m 24s)
  • 21:40 mobrovac@tin: Started deploy [recommendation-api/deploy@ca816ac]: (no justification provided)
  • 21:29 demon@tin: Synchronized wmf-config/mobile.php: MinervaNeue config (duration: 00m 46s)
  • 21:28 demon@tin: Synchronized wmf-config/InitialiseSettings-labs.php: MinervaNeue config (duration: 00m 46s)
  • 21:27 demon@tin: Synchronized wmf-config/InitialiseSettings.php: MinervaNeue config (duration: 00m 47s)
  • 21:22 demon@tin: Finished scap: Rebuilding l10n cache for new skin (duration: 38m 47s)
  • 20:43 demon@tin: Started scap: Rebuilding l10n cache for new skin
  • 20:16 bsitzmann@tin: Finished deploy [mobileapps/deploy@3f90bf1]: Update mobileapps to d30dae2 (T169930, T170225) (duration: 05m 00s)
  • 20:11 bsitzmann@tin: Started deploy [mobileapps/deploy@3f90bf1]: Update mobileapps to d30dae2 (T169930, T170225)
  • 19:37 XioNoX: adding ignore-l3-incompletes to all peering/transit interfaces - T163542
  • 19:27 thcipriani@tin: Synchronized php: promote php symlink group1 wikis to 1.30.0-wmf.9 (duration: 00m 45s)
  • 19:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.9
  • 19:10 demon@tin: Synchronized php-1.30.0-wmf.9/skins/MinervaNeue/: Latest code (duration: 00m 47s)
  • 19:09 demon@tin: Synchronized php-1.30.0-wmf.7/skins/MinervaNeue/: Latest code (duration: 00m 48s)
  • 18:16 mobrovac@tin: Finished deploy [recommendation-api/deploy@7fd10f2]: Use the domain parameter as the target language - T170439 (duration: 00m 40s)
  • 18:15 mobrovac@tin: Started deploy [recommendation-api/deploy@7fd10f2]: Use the domain parameter as the target language - T170439
  • 18:15 andrewbogott: depooling labvirt1015, deleting a bunch of stuck contintcloud instances
  • 18:14 demon@tin: Synchronized static/images/project-logos/: Fixing srwikiquote logos (duration: 00m 48s)
  • 17:52 chasemp: labnodepool1001:~# sudo puppet agent --enable
  • 17:29 chasemp: labnodepool1001:~# service nodepool stop
  • 17:21 _joe_: rolling restart of pybal on low-traffic LVS in eqiad,codfw
  • 17:19 chasemp: labnet1001:~# service nova-api restart
  • 17:15 chasemp: labcontrol1001:~# service rabbitmq-server restart
  • 17:13 robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2002.codfw.wmnet
  • 17:13 robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2003.codfw.wmnet
  • 17:07 moritzm: upgrading nginx on mwdebug*
  • 16:52 godog: roll-restart and upgrade thumbor in eqiad
  • 16:47 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api
  • 16:36 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=recommendation-api,dc=eqiad
  • 16:15 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: service=recommendation-api,dc=codfw
  • 16:14 ema: downgrade pybal to 1.13.6 on lvs1010 T82747 T154759 (1.13.7 throwing exceptions)
  • 16:09 godog: upload thumbor 1.0-1 to install1002
  • 16:06 mobrovac@tin: Finished deploy [recommendation-api/deploy@eb2fef3]: (no justification provided) (duration: 00m 33s)
  • 16:06 ema: lvs1006, lvs1010: upgrade pybal to 1.13.7 T82747 T154759
  • 16:05 mobrovac@tin: Started deploy [recommendation-api/deploy@eb2fef3]: (no justification provided)
  • 15:57 mobrovac@tin: Finished deploy [recommendation-api/deploy@ed41fc4]: Initial deploy on canary scb2001, take #3 - T165760 (duration: 00m 46s)
  • 15:56 mobrovac@tin: Started deploy [recommendation-api/deploy@ed41fc4]: Initial deploy on canary scb2001, take #3 - T165760
  • 15:52 mobrovac@tin: Finished deploy [recommendation-api/deploy@ed41fc4]: Initial deploy on canary scb2001, take #2 - T165760 (duration: 00m 06s)
  • 15:52 mobrovac@tin: Started deploy [recommendation-api/deploy@ed41fc4]: Initial deploy on canary scb2001, take #2 - T165760
  • 15:51 mobrovac@tin: Finished deploy [recommendation-api/deploy@ed41fc4]: Initial deploy on canary scb2001 - T165760 (duration: 00m 15s)
  • 15:51 mobrovac@tin: Started deploy [recommendation-api/deploy@ed41fc4]: Initial deploy on canary scb2001 - T165760
  • 15:07 ema: lvs2006: upgrade pybal to 1.13.7 T82747 T154759
  • 14:56 marostegui: Run redact_sanitarium on db1069 and db1095 for maiwikimedia - T168788
  • 14:55 marostegui: Run redact_sanitarium on db1069 and db1095 for maiwikimedia - T169510
  • 14:37 moritzm: installing apache security updates on californium / horizon.wikimedia.org
  • 14:28 addshore@tin: Synchronized wmf-config/extension-list-labs: Add Newsletter to extension-list PT1/2 (duration: 00m 46s)
  • 14:27 addshore@tin: Synchronized wmf-config/extension-list: Add Newsletter to extension-list PT1/2 (duration: 00m 47s)
  • 14:17 jynus: restarting labsdb1005 (toolsdb)
  • 14:14 madhuvishy: Disable icinga notifications and event handler checks for labsdb1005
  • 14:07 marostegui: Run redact_sanitarium on db1069 for dinwiki - T169193
  • 13:23 marostegui: Deploy alter table s1 - db1072 - T166204
  • 13:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1072 - T166204 (duration: 00m 46s)
  • 13:17 zeljkof: EU SWAT finished!
  • 13:14 zfilipin@tin: Synchronized wmf-config/Wikibase-production.php: SWAT: Temporarily set $wgPropertySuggesterClassifyingPropertyIds to [ 31 ]. (T169058) (duration: 00m 46s)
  • 13:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 - T166204 (duration: 00m 46s)
  • 12:54 dereckson@tin: Synchronized wmf-config/interwiki.php: Interwiki map update (duration: 00m 46s)
  • 12:53 marostegui: Deploy alter table s1 - db1095 - T166204
  • 12:26 moritzm: reimage mw1260 (video scaler) to jessie
  • 12:20 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Logos for mai.wikimedia (T168782) (duration: 00m 46s)
  • 12:18 dereckson@tin: Synchronized static/images/project-logos: Logos for mai.wikimedia (T168782) (duration: 00m 46s)
  • 12:14 godog: upgrade nginx on thumbor and prometheus machines
  • 12:04 dereckson@tin: Synchronized multiversion/MWMultiVersion.php: +mai.wikimedia new subdomain (duration: 00m 46s)
  • 12:03 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for mai.wikimedia (T168782) (duration: 00m 46s)
  • 12:02 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: +maiwiki
  • 12:01 dereckson@tin: Synchronized dblists: +maiwikimedia (duration: 00m 46s)
  • 11:57 Dereckson: Run add wiki maintenance script for maiwikimedia database / mai.wikimedia.org (T168782)
  • 11:52 moritzm: installing apache security updates on mw*
  • 11:34 Dereckson: Run add wiki maintenance script for dinwiki database / din.wikipedia.org (T168518)
  • 11:28 moritzm: installing spice security updates
  • 11:21 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for din.wikipedia (thanks Urbanecm) (T168518) (duration: 00m 46s)
  • 11:20 dereckson@tin: Synchronized langlist: +din (duration: 00m 46s)
  • 11:18 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: +dinwiki
  • 11:11 moritzm: installing tomcat security updates
  • 11:04 dereckson@tin: Synchronized dblists: Create din.wikipedia (duration: 00m 49s)
  • 10:54 moritzm: installing nginx updates on ms1001/dataset1001
  • 10:46 _joe_: running namespaceDupes.php on eswiki, T170176
  • 10:45 moritzm: upgrading nginx on meitnerium/archiva.wikimedia.org
  • 10:31 jynus: stopping all mysql instances on dbstore2002 and doing an in-place upgrade
  • 10:04 moritzm: installing nginx security updates on mw* canaries
  • 09:19 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(citoid|restbase-async)
  • 08:58 moritzm: uploading nginx 1.11.10-1+wmf3 for jessie-wikimedia/stretch-wikimedia
  • 08:32 XioNoX: asw-b-codfw back up - T169345
  • 08:20 XioNoX: restarting asw-b-codfw for upgrade
  • 08:11 marostegui: Stop MySQL on db2033 (x1) - T169510
  • 07:54 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(citoid|restbase-async)
  • 07:39 XioNoX: depooled codfw for T169345
  • 07:34 marostegui: Rename table migrateuser_medium on db1094 and db1079 - T170310
  • 07:29 marostegui: Drop table localisation_file_hash from testwiki and drop database l10nwiki on s3 - T119811
  • 07:27 marostegui: Drop table localisation_file_hash from enwiki - T119811
  • 07:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1064 - T168661 (duration: 00m 44s)
  • 06:49 _joe_: saved the current state of mediawiki-staging (in detached head) in the branch "wtf-live"; saved what is in master on tin in "wtf-master"; reset master to the latest commit in origin/master
  • 03:10 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jul 12 03:10:42 UTC 2017 (duration 6m 54s)
  • 03:03 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.9) (duration: 13m 57s)
  • 02:29 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 09m 25s)

2017-07-11

  • 23:25 dereckson@tin: Synchronized wmf-config/CommonSettings.php: Config changes for LoginNotify (T107707) (duration: 00m 47s)
  • 21:20 bblack: varnish backend restart on cp1072 (mailbox lag)
  • 20:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.30.0-wmf.9
  • 20:23 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: Can't use NS_MODULE constant T170317 (duration: 00m 43s)
  • 19:54 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: revert group0 to 1.30.0-wmf.9
  • 19:53 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.30.0-wmf.9
  • 19:32 cmjohnson1: powering off mw1199 to reset idrac
  • 19:29 thcipriani@tin: Finished scap: testwiki to php-1.30.0-wmf.9 and rebuild l10n cache (duration: 25m 11s)
  • 19:17 paravoid: shutting down sodium for iDRAC reset (T169360)
  • 19:17 ejegg: updated payments-wiki from f935c06 to bdc5226
  • 19:04 thcipriani@tin: Started scap: testwiki to php-1.30.0-wmf.9 and rebuild l10n cache
  • 18:47 dzahn@neodymium: conftool action : set/pooled=yes; selector: name=mw2154.codfw.wmnet
  • 18:43 thcipriani@tin: Pruned MediaWiki: 1.30.0-wmf.6 [keeping static files] (duration: 06m 28s)
  • 18:28 dcausse: T169498: elastic@eqiad huge but short load spike on 24+ nodes (despite the workaround on token_count_router deployed)
  • 18:27 dzahn@neodymium: conftool action : set/pooled=no; selector: name=mw2154.codfw.wmnet
  • 18:25 mutante: mw2154 - depool for attempting IPMI fix
  • 18:17 mutante: ms2202 - repooled
  • 18:09 mutante: mw2201 - repooled
  • 17:20 mutante: mw2201, mw2202 - depool appservers for T169360 (drain flea power)
  • 17:19 thcipriani: starting branch cut for 1.30.0-wmf.9 T167893
  • 16:32 bblack: restarting varnish backend on cp1074 (mailbox lag)
  • 16:16 tzatziki: Removed 2FA for Arsog1985 SUL account (T168779)
  • 15:56 dcausse: restarting elastic on relforge100*.eqiad.wmnet to pickup a new version of the ltr plugin
  • 15:36 moritzm: rolling restart of thumbor to pick up tiff and expat security updates
  • 15:29 marostegui: Stop replication labsdb1010 for maintenance - T153743
  • 15:25 marostegui: Stop replication labsdb1009 for maintenance - T153743
  • 15:24 elukey: restart burrow on krypton
  • 15:21 moritzm: rebooting uranium for kernel update
  • 14:31 marostegui: Deploy alter table on db1064 - commonswiki and let it replicate to db1095 and labsdb1009, 1010 and 1011 - T168661
  • 14:28 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1064 - T168661 (duration: 00m 43s)
  • 14:26 moritzm: installing apache security updates on mw2*
  • 14:24 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: mw2118.codfw.wmnet
  • 14:21 jynus: rebooting labsdb1004 for kernel upgrade T168584
  • 14:16 jynus: upgradem wmf-mariadb10 on labsdb1004
  • 14:13 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1079 original weight (duration: 00m 42s)
  • 14:13 madhuvishy: Disable event handler icinga checks for labsdb1004
  • 14:10 madhuvishy: disabled icinga notifications for host and services for labsdb1004
  • 13:44 dereckson@tin: Synchronized php-1.30.0-wmf.7/extensions/EventLogging/modules/ext.eventLogging.subscriber.js: Don't subscribe EventLogging twice if window.onload fires twice (T170018) (duration: 00m 42s)
  • 13:35 Dereckson: Purged https://en.wikipedia org/static/images/project-logos/eswikibooks.png
  • 13:29 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: High density logos for es.wikibooks (T170248, 2/2) (duration: 00m 42s)
  • 13:26 dereckson@tin: Synchronized static/images/project-logos/: High density logos for es.wikibooks (T170248, 1/2) (duration: 00m 43s)
  • 10:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1067 with 0 weight - T166204 (duration: 00m 41s)
  • 10:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1079 weight (duration: 00m 42s)
  • 10:37 akosiaris: enable puppet everywhere but on einsteinium (icinga host) for merge of https://gerrit.wikimedia.org/r/#/c/363295/5
  • 10:36 XioNoX: bump BFD timer from 300 to 600 on the eqiad-codfw link for T170131
  • 10:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1081 - T168661 (duration: 00m 42s)
  • 10:20 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1079 weight (duration: 00m 42s)
  • 10:11 marostegui: Drop table localisation_file_hash from commonswiki - T119811
  • 10:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1079 weight (duration: 00m 42s)
  • 09:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079 with low weight - T153743 (duration: 00m 42s)
  • 09:41 moritzm: installing tiff security updates
  • 09:40 marostegui: Stop slave s6 on db1102 for exporting its content - T153743
  • 09:12 moritzm: reboot sarin for kernel update
  • 08:50 marostegui: Deploy alter table on s4 - db1081 - T168661
  • 08:49 moritzm: rebooting mw1169 for kernel update
  • 08:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1081 - T168661 (duration: 00m 42s)
  • 08:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1084 - T168661 (duration: 00m 42s)
  • 08:25 akosiaris: disable puppet everywhere but on einsteinium (icinga host) for merge of https://gerrit.wikimedia.org/r/#/c/363295/5
  • 08:24 akosiaris: disable puppet on einsteinium (icinga host) for merge of https://gerrit.wikimedia.org/r/#/c/363295/5
  • 08:18 marostegui: Drop localisation_file_hash table from dewiki (s5) - T119811
  • 07:57 marostegui: Drop localisation_file_hash table from frwiki and jawiki (s6) - T119811
  • 07:38 marostegui: Stop MySQL db1102 for maintenance - T153743
  • 07:35 volans: amending previous SAL, I meant ircecho ofc
  • 07:34 volans: bouncing icinga-wm (tcpircbot) on einsteinium to get back it's primary nick
  • 07:26 marostegui: Stop MySQL on db1079 for maintenance - T153743
  • 07:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T153743 (duration: 00m 41s)
  • 07:07 marostegui: Deploy alter table db1084 - T168661
  • 07:06 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1084 - T168661 (duration: 00m 42s)
  • 06:58 marostegui: Deploy alter table on s1 - dbstore1002 - T166204
  • 06:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1059 - T168661 (duration: 00m 41s)
  • 05:14 marostegui: Deploy alter table on db1066 - T166204
  • 05:11 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1066 - T166204 (duration: 00m 43s)
  • 05:08 marostegui: Deploy alter table on enwiki - labsdb1011 - T166204
  • 02:32 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jul 11 02:32:17 UTC 2017 (duration 6m 39s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 08m 46s)
  • 00:07 legoktm: running mwscript refreshLinks.php --wiki=metawiki --namespace=2 on terbium (T145366)

2017-07-10

  • 23:25 thcipriani@tin: Synchronized php-1.30.0-wmf.7/extensions/Flow/Hooks.php: SWAT: Do not override other flags on enhanced recent changes T169181 (duration: 00m 42s)
  • 23:08 thcipriani@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-sr.svg: SWAT Compress srlogo for Wikipedia T165896 (duration: 00m 43s)
  • 22:28 twentyafterfour: reloaded apache2 config on iridium to activate the changes from https://gerrit.wikimedia.org/r/#/c/363356/
  • 22:28 bawolff@tin: Synchronized php-1.30.0-wmf.7/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: T134931 (duration: 00m 44s)
  • 22:13 MaxSem: Re-ran cleanupTitles.php on Meta with live fix applied, works now (ref T61837)
  • 22:08 reedy@tin: Synchronized php-1.30.0-wmf.7/includes/: (no justification provided) (duration: 01m 33s)
  • 22:00 bblack: restart varnish backend on cp1099 (mailbox lag)
  • 21:06 volans: running IPMI auditing to update status of T150160
  • 19:48 MaxSem: Running cleanupTitles.php on Meta
  • 19:14 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/CirrusSearch/: Ignore archive records with null page_id (T169977) (duration: 00m 52s)
  • 19:09 niharika29@tin: Synchronized wmf-config/: Stop disabling MFTidyMobileViewSections (T168671) and Logo changes for various wiki projects (T165896) (duration: 00m 21s)
  • 19:07 niharika29@tin: Synchronized static/images/mobile/: Logo changes for various wiki projects [mediawiki-config] - https://gerrit.wikimedia.org/r/364241 (https://phabricator.wikimedia.org/T165896) (duration: 00m 20s)
  • 19:07 niharika29@tin: scap failed: average error rate on 2/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:58 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Logo and favicon changes for arbcom_dewiki (T166947) (duration: 00m 20s)
  • 18:57 niharika29@tin: Synchronized static/: Logo and favicon changes for arbcom_dewiki (T166947) (duration: 00m 20s)
  • 18:56 niharika29@tin: scap failed: average error rate on 2/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:47 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Add wgMetaNamespace / wgMetaNamespaceTalk for lv.wiktionary [mediawiki-config] - https://gerrit.wikimedia.org/r/364197 (T170065) (duration: 00m 20s)
  • 18:45 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:43 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:41 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:40 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:36 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Logo updates for sr.wikiquote (T168444) (duration: 00m 40s)
  • 18:35 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:34 niharika29@tin: Synchronized static/images/project-logos: Logo updates for sr.wikiquote (T168444) (duration: 00m 40s)
  • 18:32 niharika29@tin: Synchronized static/images/project-logos/srwikiquote-1.5x.png: Logo updates for sr.wikiquote (T168444) (duration: 00m 41s)
  • 18:23 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Remove Programs and Participation namespaces from meta.wikimedia [mediawiki-config] - https://gerrit.wikimedia.org/r/363745 (T61837) (duration: 00m 42s)
  • 18:22 niharika29@tin: scap aborted: wmf-config/InitialiseSettings.php Remove Programs and Participation namespaces from meta.wikimedia [mediawiki-config] - https://gerrit.wikimedia.org/r/363745 (T61837) (duration: 06m 19s)
  • 18:15 niharika29@tin: Started scap: wmf-config/InitialiseSettings.php Remove Programs and Participation namespaces from meta.wikimedia [mediawiki-config] - https://gerrit.wikimedia.org/r/363745 (T61837)
  • 17:55 otto@tin: Finished deploy [eventstreams/deploy@3d37f5d]: Redirect routes for RCStream deprecation (duration: 02m 41s)
  • 17:55 ottomata: disabling RCStream varnish routing: T170157
  • 17:52 otto@tin: Started deploy [eventstreams/deploy@3d37f5d]: Redirect routes for RCStream deprecation
  • 17:35 ejegg: updated payments-wiki from 8bdd706 to f935c06
  • 17:07 gehel@tin: Finished deploy [wdqs/wdqs@1b3b73e]: (no justification provided) (duration: 01m 42s)
  • 17:06 gehel@tin: Started deploy [wdqs/wdqs@1b3b73e]: (no justification provided)
  • 17:00 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1080 - T166204 (duration: 00m 42s)
  • 16:14 nuria@tin: Finished deploy [eventlogging/analytics@5e16da1]: (no justification provided) (duration: 00m 04s)
  • 16:14 nuria@tin: Started deploy [eventlogging/analytics@5e16da1]: (no justification provided)
  • 15:56 elukey@tin: Finished deploy [analytics/refinery@6da2774]: Update stat1002 with the last refinery deployment (duration: 00m 04s)
  • 15:55 elukey@tin: Started deploy [analytics/refinery@6da2774]: Update stat1002 with the last refinery deployment
  • 15:47 milimetric@tin: Finished deploy [analytics/refinery@6da2774]: Update Sqoop fix python error (duration: 00m 07s)
  • 15:47 milimetric@tin: Started deploy [analytics/refinery@6da2774]: Update Sqoop fix python error
  • 15:43 milimetric@tin: Finished deploy [analytics/refinery@6da2774]: Update Sqoop fix python error (duration: 01m 36s)
  • 15:42 milimetric@tin: Started deploy [analytics/refinery@6da2774]: Update Sqoop fix python error
  • 15:37 milimetric@tin: Finished deploy [analytics/refinery@6da2774]: Update Sqoop fix python error (duration: 02m 06s)
  • 15:35 milimetric@tin: Started deploy [analytics/refinery@6da2774]: Update Sqoop fix python error
  • 15:24 andrewbogott: adding two new hosts (labvirt1014 and labvirt1015) to the nova-compute scheduling pool. Possible nodepool side-effects, maybe good ones?
  • 15:14 marostegui: Drop ukwikimedia_p views from labsdb hosts - T169488
  • 15:00 moritzm: installing apache security updates on app server canaries
  • 14:40 andrewbogott: rebooting labvirt1015-1018 for kernel updates
  • 14:23 marostegui: Deploy alter table on db1059 - T168661
  • 14:22 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1059 - T168661 (duration: 00m 41s)
  • 14:20 marostegui@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 14:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091 - T168661 (duration: 00m 42s)
  • 14:13 moritzm: reimaging mw2118 (video scaler) to jessie
  • 14:12 TabbyCat: mwscript updateCollation.php --wiki=frwiktionary --previous-collation=uppercase is being running by zfilipin to finish T169810
  • 14:11 zeljkof: EU SWAT finished
  • 14:01 dcausse: elastic@eqiad unbanning elastic1018 & elastic1021
  • 13:49 chasemp: labstore2003:~# umount -fl /srv/backup/tools (for T169774 recovery)
  • 13:44 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgCategoryCollation to uca-default for fr.wiktionary (T169810) (duration: 00m 42s)
  • 13:43 dcausse: elastic@eqiad banning elastic1018 & elastic1021 to rebalance heavy shards
  • 13:32 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add import sources for specieswiki (T170094) (duration: 00m 43s)
  • 13:29 moritzm: installing graphite2 security updates (image lib)
  • 13:28 marostegui: Disable puppet on db1102 to run check_private_data - T153743
  • 13:20 zfilipin@tin: Synchronized wmf-config/CirrusSearch-common.php: SWAT: [cirrus] Enable the token_count_router only for chinese (T169498) (duration: 00m 43s)
  • 13:06 milimetric@tin: Finished deploy [analytics/refinery@c22eb93]: Update Sqoop with better parallelism (duration: 02m 54s)
  • 13:04 milimetric@tin: Started deploy [analytics/refinery@c22eb93]: Update Sqoop with better parallelism
  • 12:58 marostegui: Run redact_sanitarium on s2 and s6 - db1102 - T153743
  • 12:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: db1079 to become master for sanitarium3 - T153743 (duration: 00m 41s)
  • 12:23 marostegui@tin: scap failed: average error rate on 2/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 12:22 moritzm: installing xorg-server security updates
  • 12:22 marostegui@tin: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 12:12 marostegui: Deploy alter table on db1091 - T168661
  • 12:11 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 - T168661 (duration: 00m 42s)
  • 12:02 marostegui: Upgrade db1102 to 10.1 and enable rbr triggers - T153743
  • 12:02 moritzm: installing bind security updates (we only have client libs/tools installed)
  • 11:39 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore original weight for db1097 - T168661 (duration: 00m 42s)
  • 11:36 marostegui: Stop MySQL on db1102 for maintenance - T153743
  • 11:16 moritzm: installing libgcrypt and expat security updates
  • 11:16 kartik@tin: Finished deploy [cxserver/deploy@c209bec]: Update cxserver to 3375da5 (duration: 02m 49s)
  • 11:13 kartik@tin: Started deploy [cxserver/deploy@c209bec]: Update cxserver to 3375da5
  • 10:28 addshore: WMDE Summer campaign deploy slot DONE
  • 10:28 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: WMDE Summer campaign - Add logging (fix spacing) NOOP (duration: 00m 43s)
  • 10:22 addshore@tin: Synchronized php-1.30.0-wmf.7/extensions/WikimediaEvents: WMDE Summer campaign - Add hook (duration: 00m 42s)
  • 10:21 addshore@tin: Synchronized php-1.30.0-wmf.7/extensions/CentralAuth: CentralAuth (undeployed patches) gerrit:363892, gerrit:363893, gerrit:363891 & revert gerrit:364182 T169261 (duration: 00m 47s)
  • 10:13 addshore: reverting https://gerrit.wikimedia.org/r/#/c/363891 as it is sitting on tin undeployed T169261
  • 09:59 moritzm: rebooting mc2* servers for kernel update
  • 09:54 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: WMDE Summer campaign - Add logging (duration: 00m 45s)
  • 09:10 marostegui: Compress innodb on wikidata on dbstore2001
  • 09:00 moritzm: rebooting mw1168 (video scaler) for kernel update
  • 08:52 moritzm: rebooting mwlog2001 for kernel update
  • 08:35 moritzm: rebooting ms1001 for kernel update
  • 08:29 moritzm: rebooting francium for kernel update
  • 08:17 marostegui: Deploy alter table on db1097 - T168661
  • 08:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T168661 (duration: 00m 46s)
  • 08:16 marostegui@tin: scap failed: average error rate on 2/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 08:03 marostegui: Drop database l10nwiki on s2 - T119811
  • 07:53 moritzm: rebooting hafnium for kernel update
  • 07:18 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp3009.*
  • 07:13 moritzm: reboot netmon1001 for kernel update
  • 06:11 marostegui: Deploy alter table on s1 - db1080 and db1067 - T166204
  • 06:00 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1080, depool db1067 - T166204 (duration: 00m 42s)
  • 05:59 marostegui@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 02:27 l10nupdate@tin: scap failed: average error rate on 2/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)

2017-07-08

  • 22:14 bd808: Deleted ukwikimedia records in CentralAuth localuser and localnames tables for T170005.

2017-07-07

  • 21:54 legoktm@tin: Synchronized php-1.30.0-wmf.7/extensions/CentralAuth/: Fix handling of password hash upgrade on login - T169261 (duration: 00m 45s)
  • 21:52 demon@tin: Synchronized wmf-config/interwiki.php: Updating interwiki cache, T169979 (duration: 00m 43s)
  • 15:07 marostegui: Stop MySQL on db1102 for MariaDB upgrade
  • 15:00 dcausse: deleting commonswiki_file_1499379383 on elastic@eqiad (failed reindex)
  • 12:20 elukey: restart mysql on dbstore1002 - high swap used
  • 11:40 moritzm: rebooting rdb* servers in codfw for kernel update
  • 10:30 gehel: restarting elastic1043 (corrupted statistics)
  • 09:42 gehel: unbanning elastic1020 and 1026 from elasticsearch eqiad
  • 09:37 gehel: restarting elastic1036 (corrupted statistics)
  • 09:30 TabbyCat: Global rename of Idh0854 → Garam has finished (T167031)
  • 09:24 moritzm: installing NTP security updates on trusty hosts
  • 09:23 akosiaris: schedule a month's worth of downtime for ores100X
  • 08:56 moritzm: restarting HHVM on app server canaries to pick up libgcrypt and expat updates
  • 08:54 _joe_: reenabling puppet across the fleet
  • 08:52 _joe_: restarting apache on all puppetmaster, after a successful puppet run
  • 08:39 _joe_: disabling puppet across the fleet for enabling directory environments in puppet
  • 08:32 moritzm: installing expat security updates
  • 08:27 TabbyCat: Starting global rename of Idh0854 → Garam (T167031)
  • 08:23 gehel: banning elastic1020 and elastic1026 from elasticsearch eqiad cluster
  • 07:55 moritzm: installing libgcrypt security updates
  • 07:42 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1083 - T166204 (duration: 00m 42s)
  • 07:39 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2056 - T169510 (duration: 00m 43s)
  • 07:37 marostegui@tin: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 06:49 moritzm: rebooting bast3002 for kernel update

2017-07-06

  • 23:26 demon@tin: Synchronized php-1.30.0-wmf.7/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: Only message box styles should be loaded on editor (duration: 00m 43s)
  • 23:15 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Wikivoyage projects can show more than 3 related articles (duration: 00m 43s)
  • 21:52 ppchelko@tin: Finished deploy [changeprop/deploy@e1230e6]: Extend automatic blacklisting T169911 (duration: 01m 09s)
  • 21:51 ppchelko@tin: Started deploy [changeprop/deploy@e1230e6]: Extend automatic blacklisting T169911
  • 20:27 ejegg: turned all SmashPig jobs back on
  • 19:51 ejegg: updated SmashPig from d4458fa to 523d6dd
  • 19:46 ejegg: rolled back SmashPig to d4458fa
  • 19:44 ejegg: updated SmashPig from d4458fa to 523d6dd
  • 19:40 ejegg: disabled smashpig jobs and donation queue consumer
  • 19:23 chasemp: labstore2003 time bash restore.sh &> /tmp/restore_7_6_2017v1.log for T169774
  • 19:22 demon@tin: Finished scap: Forcing l10n rebuild for James_F, plus some wmf-config cleanup (duration: 17m 22s)
  • 19:05 demon@tin: Started scap: Forcing l10n rebuild for James_F, plus some wmf-config cleanup
  • 18:51 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/CodeMirror: (no justification provided) (duration: 00m 43s)
  • 18:38 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/CirrusSearch/: Fix metastore.php notices https://gerrit.wikimedia.org/r/#/c/363637/ (duration: 00m 53s)
  • 18:31 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/CirrusSearch/: Fix metastore.php notices https://gerrit.wikimedia.org/r/#/c/363637/ (duration: 00m 54s)
  • 18:11 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Add CodeMirror as a beta feature [mediawiki-config] - https://gerrit.wikimedia.org/r/363497 (duration: 00m 43s)
  • 18:03 herron: moved ununpentium to exim4-daemon-light - T169794
  • 16:50 demon@tin: Synchronized README: Testing testing 1 2 3 (duration: 00m 44s)
  • 16:45 godog: manually create mwdeploy's new home
  • 16:12 godog: bounce thumbor to apply https://gerrit.wikimedia.org/r/#/c/363626/
  • 16:02 ema@neodymium: conftool action : set/pooled=yes; selector: name=cp4014.ulsfo.wmnet,service=varnish-be
  • 15:12 herron: extend mx[1,2]001 exim log retention to 60 days - T167333
  • 14:50 moritzm: rebooting prometheus1003/1004 for kernel update
  • 14:39 moritzm: rebooting prometheus2004 for kernel update
  • 14:39 ema: repool cp4006
  • 14:29 ema: restart pybal on lvs4001 T169765
  • 14:26 moritzm: rebooting prometheus2003 for kernel update
  • 14:20 ema: restart pybal on lvs4003 T169765
  • 14:16 godog: upgrade labmon to grafana 4.4.1 - T169773
  • 14:08 ema: restart pybal on lvs4002 T169765
  • 14:06 ema: restart pybal on lvs4004 T169765
  • 14:02 ema: depool cp4006
  • 13:59 moritzm: rebooting restbase2001 for kernel update
  • 13:29 ema: cp4013: upgrade to varnish 4.1.7-1wm1 and reboot for kernel update
  • 13:28 marostegui: Stop MySQL on db2056 for maintenance - T169510
  • 13:27 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2056 - T169510 (duration: 00m 44s)
  • 12:58 herron: changed lists.wikimedia.org spf to soft fail (~all) - T167703
  • 12:48 moritzm: rebooting mc* servers in codfw for kernel update
  • 12:20 moritzm: reboot lithium for kernel update
  • 12:14 marostegui: Deploy alter table db1083 - https://phabricator.wikimedia.org/T166204
  • 12:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1083 - T166204 (duration: 00m 44s)
  • 11:15 elukey: reboot conf2003 for kernel updates
  • 11:07 moritzm: rebooting restbase1017 for kernel update
  • 10:52 elukey: reboot conf2002 for kernel update
  • 10:34 moritzm: rebooting ocg1001/1002 for kernel update
  • 10:18 moritzm: rebooting ocg1003 for kernel update
  • 08:52 moritzm: rebooting restbase-test cluster for kernel updates
  • 08:36 moritzm: rebooting restbase1014 for kernel update
  • 08:04 moritzm: rebooting restbase1013 for kernel update
  • 07:48 jynus@tin: Synchronized wmf-config/db-eqiad.php: Revert parsercaches to pc100[456] (duration: 00m 43s)
  • 07:42 moritzm: reboot wasat for kernel update
  • 07:15 marostegui: Stop MySQL on dbstore2002 for maintenance - T169510
  • 07:11 marostegui: Disable puppet on dbstore2002 - T169510
  • 06:45 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 - T168661 (duration: 00m 44s)
  • 06:42 moritzm: rebooting wtp1* for kernel update
  • 06:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1083, depool db1089 - T168661 (duration: 00m 43s)
  • 06:30 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1083 - T168661 (duration: 00m 42s)
  • 06:20 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1080 - T168661 (duration: 00m 42s)
  • 06:19 marostegui@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 06:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1080 - T168661 (duration: 00m 42s)
  • 05:51 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1073 - T168661 (duration: 00m 42s)
  • 05:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1073 - T168661 (duration: 00m 42s)
  • 05:16 marostegui: Stop mysql on db2056 for maintenance - T148507 T169510
  • 05:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 - T168661 (duration: 00m 43s)
  • 05:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 - T168661 (duration: 00m 43s)
  • 04:56 marostegui: Deploy alter table on s1 eqiad hosts - T168661
  • 02:37 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jul 6 02:37:44 UTC 2017 (duration 6m 39s)
  • 02:31 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 09m 25s)
  • 01:26 ejegg: re-enabled fundraising jobs
  • 00:46 mutante: labcontrol1002 has multiple IPs, 208.80.154.102 (no DNS name) and 208.80.154.12 (labservices1002). labservices1002 is another host that ALSO has the 208.80.154.12 IP and 208.80.154.20 (lab-recursor1). Can the duplicate IP be removed from one of them? T169039
  • 00:21 twentyafterfour: phabricator deployment really finished this time. really.
  • 00:18 twentyafterfour: diffusion fatals resolved by restarting apache and clearing phabricator's bytecode cache
  • 00:16 twentyafterfour: restarting apache and clearing phabricator caches
  • 00:04 twentyafterfour: phabricator update completed.
  • 00:01 twentyafterfour: preparing to deploy phabricator release/2017-07-05/1 (Milestone: https://phabricator.wikimedia.org/project/view/2881/ )

2017-07-05

  • 23:34 eileen: civicrm updated from a9e3e0c to 8914782
  • 22:47 Reedy: running `mwscript updateArticleCount.php --wiki=commonswiki --update` on screen on terbium T169822
  • 22:29 mutante: subra/suhail: re-enabled puppet, now with role::spare, no more poolcounter, scheduled icinga downtimes for decom (T169506)
  • 22:27 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Syncing InitialiseSettings for CodeMirror deployment (take 4) (duration: 00m 42s)
  • 22:26 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 22:21 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Syncing InitialiseSettings for CodeMirror deployment (take 2) (duration: 00m 42s)
  • 22:20 demon@tin: Synchronized wmf-config/CommonSettings.php: apache_request_headers protection (duration: 00m 42s)
  • 22:16 mutante: subra/suhail: disabling puppet, stopping poolcounterd, stopping other services, first step of decom, replaced by poolcounter200[12] (T169506)
  • 22:14 niharika29@tin: Finished scap: Deploying Codemirror on testwiki- full scap (T169284) (duration: 20m 43s)
  • 22:09 eileen: upgrade CiviCRM from ea9e3af to a9e3e0c
  • 21:53 niharika29@tin: Started scap: Deploying Codemirror on testwiki- full scap (T169284)
  • 21:53 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Syncing InitialiseSettings for CodeMirror deployment (duration: 00m 42s)
  • 21:51 niharika29@tin: scap aborted: Deploying Codemirror on testwiki- full scap (T169284) (duration: 03m 10s)
  • 21:47 niharika29@tin: Started scap: Deploying Codemirror on testwiki- full scap (T169284)
  • 21:38 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/CodeMirror/: Deploying CodeMirror to testwiki (T169284) (duration: 00m 44s)
  • 21:03 chasemp: add madhuvishy to wmf-nda phab group
  • 20:58 eileen: update CiviCRM from e53d621 to ea9e3af
  • 20:14 RainbowSprinkles: commonswiki: nevermind that article count thing
  • 20:07 mutante: phab2001 - deleted /etc/systemd/system/phd.service (base::service_unit uses /lib/systemd/system/phd.service both have DIFFERENT content and conflicted, causing systemd degradation after reboot)
  • 19:48 RainbowSprinkles: commonswiki: running updateArticleCount.php (against the vslow slave)
  • 19:31 mutante: phab2001 - rebooting for kernel upgrade
  • 19:19 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/MobileApp/config/android.json: Syncing in hopes of invalidating cache (duration: 00m 42s)
  • 18:59 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Fix nowikisource template namespace subpages [mediawiki-config] - https://gerrit.wikimedia.org/r/362272 (duration: 00m 42s)
  • 18:46 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Add 'WP' namespace alias to ruwiki [mediawiki-config] - https://gerrit.wikimedia.org/r/362267 (duration: 00m 42s)
  • 18:41 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Add H as wgNamespaceAlias to NS_HELP for en.wikisource [mediawiki-config] - https://gerrit.wikimedia.org/r/362508 (duration: 00m 42s)
  • 18:36 niharika29@tin: Synchronized php-1.30.0-wmf.7/extensions/MobileApp/: Enable description editing for all wikis except enwiki. (T146705) (duration: 00m 43s)
  • 18:26 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Set wgCategoryCollation to 'numeric' at he.wikisource [mediawiki-config] - https://gerrit.wikimedia.org/r/362592 (duration: 00m 43s)
  • 18:14 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable OOjs UI EditPage buttons on es/fr/it/ja/ru-wiki and meta [mediawiki-config] - https://gerrit.wikimedia.org/r/360370 (duration: 00m 45s)
  • 18:08 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable mobile non-JavaScript editing on ptwiki [mediawiki-config] - https://gerrit.wikimedia.org/r/361455 (duration: 00m 45s)
  • 18:00 moritzm: rebooting tungsten for kernel update
  • 17:53 moritzm: rebooting osmium for kernel update
  • 17:46 gehel: cleaning /srv/wdqs/import on all wdqs servers
  • 17:41 apergos: re-enabled puppet on stat1003 (last dataset nfs client), manually mounted /mnt/data because puppet run has an unrelated error
  • 16:33 jynus: restart mysql on db2062
  • 16:04 ema: restart pybal on lvs200[12] to make them reconnect to conf2001
  • 16:03 ema: restart pybal on lvs200[45] to make them reconnect to conf2001
  • 15:54 jynus: restart mysql on db2072
  • 15:30 apergos: re-enabled puppet on stat1002, did a manual run, dataset filesystem available again there
  • 15:09 apergos: re-enabled puppet on snapshot6,7, still watching dataset1001 performance
  • 15:09 ema: restart pybal on lvs2003 to make it reconnect to conf2001
  • 14:45 ema: bounce pybal on lvs2006, not synced with etcd information
  • 14:40 moritzm: rebooting restbase1012 for kernel update
  • 14:19 moritzm: rebooting logstash100[4-6] for kernel update
  • 14:00 moritzm: rebooting logstash100[1-3] for kernel update
  • 13:59 ema: cache_misc: upgrade to varnish 4.1.7-1wm1 and reboot for kernel update
  • 13:48 apergos: re-enabling puppet on snapshot1001, 1005 for testing
  • 13:46 moritzm: rebooting restbase1011 for kernel update
  • 13:44 zeljkof: EU SWAT finished!
  • 13:43 zfilipin@tin: Synchronized wmf-config/Wikibase-production.php: SWAT: Set Wikibase readFullEntityIdColumn setting to false (duration: 00m 42s)
  • 13:35 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WikiLove for ckbwiki (T169563) (duration: 00m 43s)
  • 13:24 zfilipin@tin: Synchronized dblists/closed.dblist: SWAT: Reopen nlwikinews (T168764) (duration: 02m 50s)
  • 13:21 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: mw1196.eqiad.wmnet
  • 13:18 apergos: power cycled dataset1001, crashed, unresponsivle on mgmt console
  • 13:18 zfilipin@tin: Synchronized dblists/closed.dblist: SWAT: Reopen nlwikinews (T168764) (duration: 02m 50s)
  • 13:16 elukey: reboot conf2001 for kernel updates
  • 13:09 moritzm: rebooting restbase1010 for kernel update
  • 12:49 marostegui: Force BBU relearn on db1016 - T166344
  • 12:36 marostegui: Move labsdb1010 main general replication thread to a named replication thread called db1095 - T153743
  • 12:33 marostegui: Stop all replication threads on db1095 for maintenance - T153743
  • 12:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1085 - T153743 (duration: 02m 49s)
  • 12:29 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 - T168661 (duration: 02m 50s)
  • 12:16 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 - T168661 (duration: 02m 51s)
  • 12:11 apergos: puppet is currently disabled again on snapshots 1,5,6,7 and on dataset1001; we saw the same nfs issue shortly after reboot, with no dump processes going, as snapshots 5,6,7 had not remounted the filesystem
  • 11:20 moritzm: rebooting wtp2* servers for kernel update
  • 11:14 moritzm: rebooting restbase1009 for kernel update
  • 10:56 hashar: restarting Jenkins for plugin upgrades
  • 10:45 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 59s)
  • 10:41 marostegui: Run redact_sanitarium on s6 databases db1102 - T153743
  • 10:41 moritzm: rebooting wtp1001 for kernel update
  • 10:37 moritzm: rebooting restbase1008 for kernel update
  • 10:32 apergos: rebooting snapshot hosts to clean up hung nfs client processes
  • 10:30 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1072 - T168661 (duration: 02m 51s)
  • 10:24 apergos: rebooted dataset1001 to unstick nfsd and pick up new kernel, re-enabled puppet
  • 10:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 - T168661 (duration: 02m 50s)
  • 10:11 moritzm: rebooting restbase1007 for kernel update
  • 10:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1066 - T168661 (duration: 02m 50s)
  • 09:57 marostegui: Deploy alter table on s1 eqiad hosts - T168661
  • 09:48 godog: move 'instances' graphite hierarchy out of the way, do not delete yet - T143405
  • 09:27 marostegui: Stop MySQL on db1085 for maintenance - T153743
  • 09:21 godog: upload nginx_1.11.10-1+wmf2 to jessie-wikimedia and nginx_1.11.10-1+wmf2~stretch1 to stretch-wikimedia
  • 09:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1085 - T153743 (duration: 02m 50s)
  • 08:44 apergos: puppet disabled and processes accessing dataset1001 exported filesystem shot, on: stat1002,3, snapshot1001,5,6,7, while investigation continues
  • 07:27 moritzm: rebooting restbase-dev* for kernel update
  • 07:13 moritzm: rebooting notebook* hosts
  • 05:18 marostegui: Deploy alter table on s3 master - db1075 - T168661
  • 05:13 marostegui: Deploy alter table on s7 master - db1062 - T168661
  • 05:08 marostegui: Force a relearn on db1046's BBU - T166141
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 10m 23s)

2017-07-04

  • 21:40 volans: ACK'ed puppet not running on stat100[2-3],snapshot100[1,5-7] due to NFS overloaded on dataset1001 - T169680
  • 16:54 jynus: dropping ukwikimedia from several labsdbhosts
  • 16:10 moritzm: rebooting radium for kernel update
  • 15:09 mobrovac@tin: Finished deploy [citoid/deploy@9d22567]: Fallback to crossRef (T165105) and use MarcXML (T165105) (duration: 02m 52s)
  • 15:06 mobrovac@tin: Started deploy [citoid/deploy@9d22567]: Fallback to crossRef (T165105) and use MarcXML (T165105)
  • 15:02 godog: set operations/debs/nginx as hidden and update description
  • 14:57 ema: pybal 1.13.7 uploaded to apt.w.o, testing it on pybal-test2001 T82747 T154759
  • 14:31 godog: copy nginx from jessie-wikimedia to stretch-wikimedia
  • 14:15 paravoid: reset db2038's iLO
  • 13:06 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet
  • 11:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1039 status - T166208 (duration: 02m 50s)
  • 11:25 joal@tin: Finished deploy [analytics/refinery@88cbb9e]: Regular weekly deploy (2) - Bug patch (duration: 03m 38s)
  • 11:21 joal@tin: Started deploy [analytics/refinery@88cbb9e]: Regular weekly deploy (2) - Bug patch
  • 11:15 elukey: powercycle elastic1018, host unreachable
  • 11:02 joal@tin: Finished deploy [analytics/refinery@12c5f57]: Regular weekly deploy (duration: 04m 47s)
  • 11:00 moritzm: rebooting kubernetes workers for kernel update
  • 10:58 godog: copy wikimedia-lvs-realserver from jessie-wikimedia to stretch-wikimedia
  • 10:57 joal@tin: Started deploy [analytics/refinery@12c5f57]: Regular weekly deploy
  • 10:53 gehel: killing stuck wmf-reimage on puppetmaster1001 for maps-test2001
  • 10:40 marostegui: Stop replication on db1102 (sanitarium3) on s2 shard for maintenance - T153743
  • 10:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 - T153743 (duration: 02m 49s)
  • 10:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1035 - T168661 (duration: 02m 49s)
  • 10:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1035 - T168661 (duration: 02m 50s)
  • 09:58 marostegui: Move labsdb1009 main general replication thread to a named replication thread called db1095 - T153743
  • 09:54 marostegui: Stop replication on db1095 for maintenance - T153743
  • 09:38 moritzm: rebooting restbase2002-restbase2004 for kernel updates
  • 09:27 moritzm: rebooting thumbor1001/1002 for kernel updates
  • 08:54 marostegui: Run redact_sanitarium on db1102 (sanitarium3) - T153743
  • 08:39 moritzm: rebooting sca2* for kernel update
  • 08:25 elukey: restart redis 6380 (slave) jobqueue instance on rdb1004/2003 to force resync with master
  • 08:12 moritzm: powercycling mw1260, stuck in reboot
  • 07:56 moritzm: powercycling mw1259, stuck in reboot
  • 07:52 gehel: restart of relforge for kernel upgrade
  • 07:42 moritzm: rebooting video scalers in eqiad for kernel update
  • 07:15 marostegui: Deploy alter table on s3 hosts (eqiad) - T168661
  • 06:05 marostegui: Stop MySQL on db1060 for maintenance - T153743
  • 05:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 - T153743 (duration: 02m 51s)
  • 05:26 marostegui: Deploy alter table on s5 directly on s5 master (db1063) - T168661
  • 05:20 marostegui: Deploy alter table on s6 directly on s6 master (db1061) - T168661
  • 05:08 marostegui: Deploy alter table on s2 directly on s2 master (db1054) - T168661
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 10m 14s)
  • 01:30 mutante: releases1001: switching GID of reprepro and promemetheus-node-exporter group (1000 vs 1001), changing reprepro UID to 13927. using find -exec to fix all the permissions and make it identical to bromine. prevent permissions snafu when rsyncing (T164030)

2017-07-03

  • 20:46 gehel: unbanning elastic1018 from elasticsearch eqiad cluster
  • 20:24 gehel: banning elastic1018 from elasticsearch eqiad clsuter
  • 19:29 hashar: restarting jenkins
  • 19:10 nuria@tin: Finished deploy [eventlogging/analytics@328dea6]: (no justification provided) (duration: 00m 03s)
  • 19:09 nuria@tin: Started deploy [eventlogging/analytics@328dea6]: (no justification provided)
  • 17:35 chasemp: labvirt1003:~# service nova-compute restart
  • 16:55 bd808: Running maintain-views --all-databases --clean --replace-all --debug on labsdb1001
  • 16:51 mobrovac@tin: Finished deploy [mobileapps/deploy@58a5b19]: (no justification provided) (duration: 00m 41s)
  • 16:51 mobrovac@tin: Started deploy [mobileapps/deploy@58a5b19]: (no justification provided)
  • 16:02 chasemp: labvirt1002:~# service nova-compute restart
  • 15:43 mobrovac@tin: Finished deploy [mobileapps/deploy@58a5b19]: Remove pronunciation from the spec - T169299 (duration: 09m 30s)
  • 15:33 mobrovac@tin: Started deploy [mobileapps/deploy@58a5b19]: Remove pronunciation from the spec - T169299
  • 15:30 ema: cp1099: restart varnish-be
  • 15:16 chasemp: labcontrol1001 clean out admin-monitoring leaks
  • 15:12 chasemp: labvirt1001 service nova-compute restart
  • 14:40 elukey: running EventLogging alter tables on dbstore1002 (script in /home/elukey/dbstore1002.sql) - T167162
  • 14:33 akosiaris: set enable_notification=0 in icinga
  • 13:54 moritzm: rebooting ms-be2028 to ms-be2035 for kernel update
  • 13:43 marostegui: Global rename of Antero de Quintal → JMagalhães - T169527
  • 13:39 moritzm: uploaded apache2 2.4.10+deb8u9+wmf1 to apt.wikimedia.org/jessie-wikimedia
  • 12:40 marostegui: Compress innodb on db2056 - T169510
  • 12:36 marostegui@tin: Synchronized wmf-config/db-codfw.php: Add comments about db2056 status - T169510 (duration: 02m 50s)
  • 12:26 elukey: reimage stat1005 with Debian Stretch
  • 12:03 moritzm: rebooting scb1004 for kernel update
  • 10:39 moritzm: rebooting mw1298 for kernel update
  • 09:37 marostegui: Global rename of Markos90 → Mαρκος - T169396
  • 09:24 marostegui: Deploy alter table on s1 directly on codfw master (db2016) and let it replicate - T166204
  • 09:07 _joe_: restarting the passenger app on puppetmasters in codfw serially with a sleep of 3 seconds for T169493
  • 08:58 _joe_: restarting the passenger app on puppetmaster1002 for T169493
  • 08:49 gehel: unbanning elastic1020 from elasticsearch eqiad
  • 08:30 marostegui: Compress dewiki on dbstore2001 - T168354
  • 08:25 gehel: banning elastic1020 from elasticsearch eqiad waiting for its recovery
  • 08:02 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments about db1039 status - T166208 (duration: 02m 49s)
  • 07:51 marostegui: Deploy alter table db1039 - s7 - T166208
  • 07:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1034 - T166208 (duration: 03m 00s)
  • 07:35 marostegui: Drop alter table s7 - labsdb1003 - T166208
  • 07:24 volans: bounced uwsgi-graphite-web on graphite1003, log stopped since Jul 2 10:23:45
  • 05:44 marostegui: Run redact sanitarium on db1069 - T160869
  • 05:31 marostegui: Run redact sanitarium on db1095 - T160869
  • 02:40 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 14m 45s)

2017-07-02

2017-07-01

  • 13:08 ppchelko@tin: Finished deploy [restbase/deploy@8ea07d6]: Manual blacklist for russian wiki (duration: 07m 59s)
  • 13:00 ppchelko@tin: Started deploy [restbase/deploy@8ea07d6]: Manual blacklist for russian wiki
  • 00:54 mutante: APT - importing php-net-ipv4 to stretch (for librenms) T159756

2017-06-30

  • 23:16 dzahn@neodymium: conftool action : set/pooled=no; selector: name=mw1196.eqiad.wmnet
  • 23:12 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Limit thanks for new users at pl.wikipedia to 3 per day - T169268 (duration: 02m 49s)
  • 23:05 mutante: librenms has been deployed on netmon1002 - works on stretch now - except Letsencrypt part, expected. not switched yet
  • 20:03 ariel@tin: Finished deploy [dumps/dumps@02c71bc]: permit batching of abstract jobs, fix a dryrun reporting typo, smaller stub/abstract queries (duration: 00m 03s)
  • 20:03 ariel@tin: Started deploy [dumps/dumps@02c71bc]: permit batching of abstract jobs, fix a dryrun reporting typo, smaller stub/abstract queries
  • 19:41 cmjohnson1: powering off mw1196 for unresponsive idrac
  • 19:37 cmjohnson1: powering off mw1191 for unresponsive idrac
  • 19:31 cmjohnson1: powering off mw1190 to reestablish idrac connection
  • 19:21 cmjohnson1: mw1182 powering down to due to unresponsive idrac
  • 17:01 paravoid: rebooting mw1196
  • 15:39 gehel: unbanning elastic1019 from cluster and keeping an eye on it
  • 14:51 gehel: banning elastic1019 from cluster to move heavy shards around
  • 14:16 bblack: reboot cp4021
  • 13:39 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1028 - T166208 (duration: 00m 42s)
  • 13:32 bawolff: Reset 2FA of wikitech User:Samtar (T169332)
  • 12:47 jynus: just upgraded wmf-mariadb101-client on mariadb::client hosts
  • 12:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1060 about its future usage as a sanitarium master - T153743 (duration: 00m 42s)
  • 12:08 akosiaris@tin: Synchronized wmf-config/ProductionServices.php: Replace subra/suhail as poolcounters (duration: 00m 43s)
  • 12:07 akosiaris: replace subra and suhail as poolcounters in codfw
  • 11:22 _joe_: rebooting copper for kernel upgrade
  • 11:17 _joe_: purging varnish, varnish-dbg from copper
  • 11:08 jynus: removing leftover data on tegmen T149557
  • 10:55 elukey: deploy kafkatee 0.1.6-1 to oxygen
  • 10:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 - T168661 (duration: 00m 42s)
  • 10:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 - T168661 (duration: 00m 42s)
  • 09:54 elukey: uploaded kafkatee 0.1.6-1 to reprepro - T151748
  • 09:10 marostegui: Deploy alter table on s5 all eqiad hosts (primary master not included) - T168661
  • 09:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1037 - T168661 (duration: 00m 42s)
  • 08:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1037 - T168661 (duration: 00m 42s)
  • 08:34 ayounsi@tin: Finished deploy [librenms/librenms@3f407a7]: (no justification provided) (duration: 00m 05s)
  • 08:34 ayounsi@tin: Started deploy [librenms/librenms@3f407a7]: (no justification provided)
  • 08:29 ayounsi@tin: Finished deploy [librenms/librenms@b10cc7c]: (no justification provided) (duration: 00m 02s)
  • 08:29 ayounsi@tin: Started deploy [librenms/librenms@b10cc7c]: (no justification provided)
  • 08:16 akosiaris: poweroff labcontrol1003. It was in the deian installer
  • 07:20 akosiaris: restart pdfrender on scb1002
  • 06:45 _joe_: started manually burrow on krypton, could not start due to a stale pidfile
  • 06:38 marostegui: Deploy alter table on s6 all eqiad hosts (primary master not included) - T168661
  • 06:12 marostegui: Deploy alter table on db1018 on s2 - T168661
  • 06:12 marostegui: Deploy alter table on db1090 on s2 - T168661
  • 06:11 marostegui: Deploy alter table on db1076 on s2 - T168661
  • 06:10 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 - T168661 (duration: 00m 42s)
  • 06:10 marostegui: Deploy alter table on db1074 on s2 - T168661
  • 06:07 marostegui: Deploy alter table on db1060 on s2 - T168661
  • 06:06 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 - T168661 (duration: 00m 42s)
  • 05:59 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1036 - T168661 (duration: 00m 43s)
  • 05:56 marostegui: Deploy alter table on db1047 on s2 - T168661
  • 05:56 marostegui: Deploy alter table on db1036 on s2 - T168661
  • 05:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T168661 (duration: 00m 42s)
  • 05:47 marostegui: Deploy alter table on db1021 on s2 - T168661
  • 05:44 marostegui: Deploy alter table on dbstore1002 on s2 - T168661
  • 05:43 marostegui: Deploy alter table on dbstore1001 on s2 - T168661
  • 05:37 marostegui: Deploy alter table on db1069 (and let it replicate) on s2 - T168661
  • 02:29 chasemp: labstore1005 start drbd
  • 02:14 chasemp: reboot labstore1005 (5m ago)
  • 01:25 chasemp: reboot labstoer1005
  • 01:23 chasemp: fail nfs from labstore1005 to labstore1004 (I failed to log a previous failover to 1004 and back)

2017-06-29

  • 23:32 RoanKattouw: Sorry I meant T169163
  • 23:31 catrope@tin: Synchronized php-1.30.0-wmf.7/resources/src/mediawiki.rcfilters/: RCFilters fixes (T169169, T169107, T169042) (duration: 00m 42s)
  • 23:28 catrope@tin: Synchronized php-1.30.0-wmf.7/extensions/WikimediaEvents/: Add event logging for explode-similar on SRP (T149809) (duration: 00m 42s)
  • 23:26 catrope@tin: Synchronized php-1.30.0-wmf.7/extensions/CirrusSearch/: "Explore similar" widget for CirrusSearch (T149809) (duration: 00m 54s)
  • 23:19 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Stop reader surveys (T131949) (duration: 00m 43s)
  • 22:16 chasemp: set cfq scheduler on labstore1005
  • 21:51 mutante: APT - uploading python-django-south from jessie to wikimedia-stretch for servermon on stretch (T159756)
  • 21:40 chasemp: reboot labstore1004 with grub set to gnulinux-advanced-1773f282-5a1b-441e-865c-8b70a0ebc925>gnulinux-4.4.0-3-amd64-advanced-1773f282-5a1b-441e-865c-8b70a0ebc925
  • 21:10 mobrovac@tin: Finished deploy [restbase/deploy@bcb83f4]: (no justification provided) (duration: 04m 21s)
  • 21:06 mobrovac@tin: Started deploy [restbase/deploy@bcb83f4]: (no justification provided)
  • 21:06 mobrovac@tin: Finished deploy [restbase/deploy@bcb83f4]: (no justification provided) (duration: 01m 02s)
  • 21:05 mobrovac@tin: Started deploy [restbase/deploy@bcb83f4]: (no justification provided)
  • 21:04 mobrovac@tin: Finished deploy [restbase/deploy@bcb83f4]: Fix special char handling in PDF back-end requests - T169223 (duration: 03m 14s)
  • 21:00 mobrovac@tin: Started deploy [restbase/deploy@bcb83f4]: Fix special char handling in PDF back-end requests - T169223
  • 20:46 mutante: APT - reprepro copy stretch-wikimedia jessie-wikimedia prometheus-snmp-exporter (to make it available on stretch for netmon1002) (T159756)
  • 20:39 ppchelko@tin: Finished deploy [changeprop/deploy@350076c]: Config: Enable red links processing. T133221 (duration: 01m 01s)
  • 20:38 ppchelko@tin: Started deploy [changeprop/deploy@350076c]: Config: Enable red links processing. T133221
  • 19:23 demon@tin: Synchronized scap/plugins/clean.py: Because I need to learn basic python syntax before trying stuff (duration: 00m 42s)
  • 19:21 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.7 refs T167536
  • 19:16 twentyafterfour: deploying wmf/1.30.0-wmf.7 to all wikis refs T167536
  • 19:06 demon@tin: Pruned MediaWiki: 1.30.0-wmf.5 [keeping static files] (duration: 01m 16s)
  • 18:30 chasemp: restart nfs on labstore1004 (primary)
  • 18:10 demon@tin: Synchronized php-1.30.0-wmf.7/extensions/TextExtracts/extension.json: T107206 (duration: 00m 47s)
  • 17:28 arlolra: Updated Parsoid to b4187f18 (T168900, T168675, T168404, T153203)
  • 17:21 arlolra@tin: Finished deploy [parsoid/deploy@717df08]: Updating Parsoid to b4187f18 (duration: 09m 41s)
  • 17:12 arlolra@tin: Started deploy [parsoid/deploy@717df08]: Updating Parsoid to b4187f18
  • 17:08 mobrovac: scb2005 repooling back the services - T167763
  • 16:21 godog: temporarily stop ircecho, puppet spam
  • 16:05 akosiaris@tin: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
  • 15:40 akosiaris: disable puppet on all of eqiad/esams, problems with ganeti and puppetdb
  • 15:38 chasemp: restart nfs-exportd on labstore1004
  • 15:34 akosiaris@tin: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 02m 54s)
  • 15:26 mobrovac: scb2005 depooled all services for T167763
  • 15:09 chasemp: set downtimes for labstore1004/1005 failover see https://etherpad.wikimedia.org/p/labstore_reboots
  • 15:02 akosiaris: purge d-i-test from puppet/salt
  • 14:57 akosiaris: reboot aluminium.wikimedia.org bromine.eqiad.wmnet etherpad1001.eqiad.wmnet d-i-test.eqiad.wmnet kubestagetcd1001.eqiad.wmnet mx1001.wikimedia.org seaborgium.wikimedia.org for kernel upgrades
  • 14:47 jynus: several restarts of db2072 services and host on the following hour
  • 14:30 ema: varnish 4.1.7-1wm1 uploaded to apt.w.o, cp1008 upgraded T164768
  • 14:08 marostegui: Deploy alter table on s7 on dbstore1001 - T166208
  • 13:54 godog: kick sdb out of mdadm arrays on bast3002 - T169035
  • 12:56 akosiaris@tin: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
  • 12:47 akosiaris: reboot argon.eqiad.wmnet, darmstadtium.eqiad.wmnet, dbmonitor1001.wikimedia.org, etcd1001.eqiad.wmnet, etcd1006.eqiad.wmnet, krypton.eqiad.wmnet, mendelevium.eqiad.wmnet, mwdebug1001.eqiad.wmnet, roentgenium.eqiad.wmnet, sca1003.eqiad.wmnet for kernel upgrades
  • 12:41 akosiaris: reboot poolcounter1001 for kernel upgrades
  • 12:38 marostegui: Stop replication on dbstore1002 - x1 - T169050
  • 12:29 akosiaris: reboot nitrogen for kernel upgrades
  • 12:23 gehel: forcing reindex of cirrus / elasticsearch after switch upgrade
  • 12:23 akosiaris@tin: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 05s)
  • 12:20 akosiaris: depool poolcounter1001 for kernel upgrades
  • 12:18 marostegui: Re-enable event scheduler on dbstore1001 - T169050
  • 11:58 marostegui: Stop replication on the same position for: dbstore1001 (s6) and db1050 - T169050
  • 11:51 godog: create xfs filesystems on fourth partition on ms-be machines - T151648
  • 11:48 ema: cp4015: restart varnish-be
  • 11:32 ema: route ulsfo back to codfw T168462
  • 11:09 ema@neodymium: conftool action : set/ttl=300; selector: dnsdisc=(citoid|restbase-async)
  • 11:06 ema: repool codfw in DNS after T168462
  • 11:03 ema@neodymium: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
  • 11:02 ema@neodymium: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
  • 11:02 ema@neodymium: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
  • 10:57 elukey@tin: Finished deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment (duration: 02m 56s)
  • 10:54 elukey@tin: Started deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment
  • 10:53 elukey@tin: Finished deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment (duration: 00m 11s)
  • 10:53 elukey@tin: Started deploy [analytics/refinery@f6cccf9]: Weekely refinery deployment
  • 10:47 ema@neodymium: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
  • 10:46 ema@neodymium: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=restbase-async
  • 10:45 ema: switching citoid and restbase-async back to codfw after T168462
  • 10:34 ema: re-enable puppet and start pybal on lvs2001-2003 T168462
  • 10:30 ema@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org,service=pdns_recursor
  • 10:30 ema: repooling acamar T168462
  • 09:29 godog: silence paging alerts for *.svc.codfw.wmnet for two hours - T168462
  • 08:34 marostegui: Shutdown MySQL and reboot db1034 for maintenance
  • 08:29 XioNoX: asw-a-codfw upgrade started - T168462
  • 08:25 ema: failover codfw LVSs to secondaries T168462
  • 08:19 elukey: restart pdfrender on scb1004 - xpra issue
  • 08:16 volans@neodymium: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=restbase-async
  • 08:15 volans@neodymium: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
  • 08:14 volans@neodymium: conftool action : set/ttl=60; selector: dnsdisc=restbase-async
  • 08:14 volans@neodymium: conftool action : set/ttl=60; selector: dnsdisc=citoid
  • 08:08 volans@neodymium: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
  • 08:07 volans@neodymium: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
  • 08:05 ema: bounce pybal on codfw secondary LVSs (lvs2004-2006)
  • 07:57 volans: switching citoid and restbase-async temporarily to eqiad for T168462
  • 07:47 XioNoX: Route cache traffic around codfw - T168462
  • 07:46 ema@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org,service=pdns_recursor
  • 07:44 XioNoX: codfw depooled from DNS - T168462
  • 07:36 elukey: depooled kafka2001.codfw.wmnet for T168462
  • 07:19 elukey@tin: Finished deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment (duration: 02m 55s)
  • 07:18 marostegui: Disable event scheduler temporarily on dbstore1001 - T169050
  • 07:16 elukey@tin: Started deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment
  • 07:13 marostegui: Deploy alter table on s7 - db1028 - T166208
  • 07:12 elukey@tin: Finished deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment (duration: 02m 36s)
  • 07:10 elukey@tin: Started deploy [analytics/refinery@f6cccf9]: Updated stat1002 with the last refinery deployment
  • 07:10 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1028 - T166208 (duration: 00m 47s)
  • 07:09 joal@tin: Finished deploy [analytics/refinery@f6cccf9]: (no justification provided) (duration: 05m 55s)
  • 07:03 joal@tin: Started deploy [analytics/refinery@f6cccf9]: (no justification provided)
  • 04:01 Krinkle: 'service hhvm restart' on mwdebug1001 and mwdebug1002 (T168540)
  • 03:56 Krinkle: 'service hhvm restart' on mwdebug1001 and mwdebug1002 to help investigate T168540
  • 03:09 kartik@tin: Finished deploy [cxserver/deploy@6f0e9a7]: Update cxserver to e69353b (duration: 02m 28s)
  • 03:07 kartik@tin: Started deploy [cxserver/deploy@6f0e9a7]: Update cxserver to e69353b
  • 02:52 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 29 02:52:57 UTC 2017 (duration 6m 52s)
  • 02:46 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 07m 43s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 09m 24s)
  • 01:08 mutante: mwlog1001 - deleted /srv/xenon/logs from 2015 and 2016 as requested by Krinkle. Also merged https://gerrit.wikimedia.org/r/#/c/362114/ so now logs are retained for 14 days
  • 00:23 krinkle@tin: Synchronized wmf-config/InitialiseSettings.php: I8ce28a4ce7 - test2wiki config cleanup (duration: 00m 47s)

2017-06-28

  • 23:44 thcipriani@tin: Synchronized php-1.30.0-wmf.7/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Adding ssclick events for sister-search results T168916 (duration: 00m 46s)
  • 23:36 thcipriani@tin: Synchronized php-1.30.0-wmf.6/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php: SWAT: Revert "Run DiffViewHeader in mobile mode, too" T169024 (duration: 00m 46s)
  • 23:35 thcipriani@tin: Synchronized php-1.30.0-wmf.7/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php: SWAT: Revert "Run DiffViewHeader in mobile mode, too" T169024 (duration: 00m 47s)
  • 22:10 demon@tin: Synchronized wmf-config/InitialiseSettings.php: rm more stupid logging, wow this stuff has piled up (duration: 00m 46s)
  • 22:09 ppchelko@tin: Finished deploy [eventstreams/deploy@ba71a84]: redeploy to pick up config changes (duration: 02m 01s)
  • 22:07 ppchelko@tin: Started deploy [eventstreams/deploy@ba71a84]: redeploy to pick up config changes
  • 22:06 demon@tin: Synchronized wmf-config/InitialiseSettings.php: kill temp-debug (duration: 00m 46s)
  • 21:50 robh: wtp1025-1048 are having icinga reporting errors, they are new installs on stretch
  • 21:48 demon@tin: Synchronized wmf-config/InitialiseSettings.php: kill weird testwiki logging (duration: 00m 47s)
  • 21:38 ppchelko@tin: Finished deploy [eventstreams/deploy@05bcc8f]: redeploy to pick up config changes (duration: 00m 20s)
  • 21:37 ppchelko@tin: Started deploy [eventstreams/deploy@05bcc8f]: redeploy to pick up config changes
  • 21:34 demon@tin: Synchronized wmf-config/InitialiseSettings.php: kill oai logging channel (duration: 00m 47s)
  • 20:17 twentyafterfour@tin: Synchronized php-1.30.0-wmf.7/extensions/VisualEditor/VisualEditor.hooks.php: sync https://gerrit.wikimedia.org/r/#/c/361941/ refs T169132 T167536 (duration: 00m 47s)
  • 20:08 mutante: migrating servermon to stretch on netmon1002 is currently blocked by "python-django-south" package not existing anymore
  • 19:36 robh: puppet suspended on install1002 for robh to livehack the dhcp file for a single reboot of wtp1025
  • 19:26 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.7 refs T167536
  • 19:26 twentyafterfour@tin: Synchronized php-1.30.0-wmf.7/extensions/LoginNotify/includes/Hooks.php: deploy https://gerrit.wikimedia.org/r/#/c/361935/ to wmf.7 refs T168899 + T167536 (duration: 00m 45s)
  • 19:17 twentyafterfour: cherry-picked https://gerrit.wikimedia.org/r/#/c/361935/ to wmf.7 refs T168899 + T167536
  • 19:00 ebernhardson: starting load testing of elasticsearch in codfw
  • 18:31 joal@tin: Finished deploy [analytics/refinery@f6cccf9]: Regular deploy - One week late- Big changes (duration: 04m 49s)
  • 18:26 joal@tin: Started deploy [analytics/refinery@f6cccf9]: Regular deploy - One week late- Big changes
  • 18:13 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable autopatrol flag on ptwikivoyage T168981 (duration: 00m 47s)
  • 18:05 aaron@tin: Synchronized wmf-config/CommonSettings.php: Set $wgTrxProfilerLimits[PostSend] to avoid notices for now (duration: 00m 47s)
  • 18:04 kartik@tin: Finished deploy [cxserver/deploy@894e3fe]: (no justification provided) (duration: 02m 03s)
  • 18:02 kartik@tin: Started deploy [cxserver/deploy@894e3fe]: (no justification provided)
  • 15:52 marostegui: Temporary ignore jawiki.watchlist table during replication on dbstore1001 - T169050
  • 15:47 kartik@tin: Finished deploy [cxserver/deploy@894e3fe]: (no justification provided) (duration: 02m 47s)
  • 15:44 kartik@tin: Started deploy [cxserver/deploy@894e3fe]: (no justification provided)
  • 15:29 jynus: slowly enabling puppet on pending database hosts, checking diff on each one
  • 14:42 hashar: pypi.python.org is back again - T169091
  • 14:06 hashar: pypi.python.org has an issue with its CDN . That would affect any CI jobs relying on tox/python - See https://status.python.org for updates and T169091
  • 14:03 hashar: pypi.python.org has an issue with its CDN . That would affect any CI jobs relying on tox/python - See https://status.python.org for updates
  • 13:51 XioNoX: tigntening BGP configuration on cr* routers - T169048
  • 13:44 gehel: start reimage of the maps-test cluster - T169011
  • 13:30 akosiaris: renumber install1002
  • 12:47 marostegui: Deploy alter table on s3 directly on codfw master (db2018) and let it replicate - T168661
  • 12:42 jynus: starting enabling puppet on db2* hosts
  • 12:37 XioNoX: restricted inbound BGP to configured neighbors on pfw - T169048
  • 12:18 marostegui: Deploy alter table on s7 directly on codfw master (db2029) and let it replicate - T168661
  • 11:48 akosiaris: renumber dubnium fermium meitnerium ununpentium
  • 11:14 elukey: stop eventlogging_sync on db1047 - alter tables running
  • 11:04 jynus: restarting db2062's mysql
  • 10:52 jynus: restarting db2072's mysql for testing of new config
  • 09:05 legoktm@tin: Synchronized php-1.30.0-wmf.7/includes/parser/ParserCache.php: Add debug logging for T168040 (duration: 00m 46s)
  • 08:46 legoktm@tin: Synchronized php-1.30.0-wmf.6/includes/parser/ParserCache.php: Add debug logging for T168040 (duration: 00m 48s)
  • 07:49 jynus: disable puppet on all database hosts for deployment of gerrit:361456
  • 07:33 marostegui: Re-enable event scheduler on dbstore2001 - T168354
  • 07:01 elukey: stop jobrunner/jobchron on mw130[4,5,6] and reboot them for kernel updates
  • 06:43 elukey: stop jobrunner/jobchron on mw130[2,3] and reboot them for kernel updates
  • 06:37 elukey: restart pdfrender.service on scb1003 - xpra race condition
  • 06:35 elukey: executed sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +10 -delete on graphite1001 to free space
  • 06:34 marostegui: Stop Replication in sync on db2033 and dbstore2001 (x1) - T168354
  • 05:55 marostegui: Temporarily disable event scheduler on dbstore2001 - https://phabricator.wikimedia.org/T168354
  • 05:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1033 status - T166208 (duration: 00m 47s)
  • 05:24 marostegui: Stop MySQL and reboot db1034 for maintenance - T166208
  • 03:05 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 28 03:05:57 UTC 2017 (duration 7m 0s)
  • 02:58 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.7) (duration: 14m 50s)
  • 02:46 eileen: Update civicrm from d558df2 to e53d621
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 55s)
  • 01:43 demon@tin: Synchronized README: profiling (duration: 00m 47s)

2017-06-27

  • 23:26 demon@tin: Synchronized php-1.30.0-wmf.6/extensions/RelatedArticles/: Hygene and stuff (duration: 00m 46s)
  • 23:22 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Only enable logging on enwiki for MobileFormatter#moveFirstParagraphBeforeInfobox (duration: 00m 46s)
  • 23:20 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Removing wgMFContentNamespace (duration: 00m 46s)
  • 23:14 demon@tin: Synchronized portals: (no justification provided) (duration: 00m 47s)
  • 23:13 demon@tin: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 00m 47s)
  • 23:05 demon@tin: Synchronized dblists/: ukwikimedia swapped from closed to deleted (duration: 00m 46s)
  • 22:44 demon@tin: Synchronized README: force co-master sync (duration: 00m 47s)
  • 21:58 bblack: pybal restarts on lvs4004,lvs4002 for misc@ulsfo
  • 21:50 bblack: removing cp4001-4 (cache_misc@ulsfo), except a few minor related alerts from race conditions
  • 21:24 bblack: cp1074: restart backend (mailbox lag)
  • 21:03 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.7 refs T167536
  • 20:46 twentyafterfour@tin: Finished scap: sync 1.30.0-wmf.7 and promote to test wikis - refs T167536 (duration: 30m 44s)
  • 20:16 twentyafterfour@tin: Started scap: sync 1.30.0-wmf.7 and promote to test wikis - refs T167536
  • 18:41 godog: switch thumbor back on with a fix for T168949
  • 18:35 godog: upgrade thumbor to 0.1.41
  • 18:25 gehel: reduce cluster_concurrent_rebalance to 8 and node_concurrent_recoveries to 4 on elasticsearch eqiad
  • 18:05 hashar: Some CI jobs are broken with "tidy.so: cannot open shared object file: No such file or directory" see T169004
  • 17:52 twentyafterfour: branching 1.30.0-wmf.7 - T167536
  • 17:44 bblack: restart pybal on lvs4004
  • 16:37 mutante: releases1001 - setting boot parameters to network, rebooting
  • 16:26 mutante: rebooting ganeti instance releases1001 - which is down network-wise but was running
  • 16:23 godog: revert back to imagescalers for thumbs - T168949
  • 16:22 twentyafterfour: restarted apache on iridium, phabricator was running an old version of libphutil
  • 14:22 elukey: stop jobcron/jobrunner on mw1300 and mw1301 and reboot the hosts for kernel updates
  • 13:52 marostegui: Rename table enwiki.localisation_file_hash on db1089 - T119811
  • 12:35 marostegui: Deploy alter table on s4 directly on codfw master (db2019) to let it replicate - T168661
  • 12:19 marostegui: Deploy alter table on s5 directly on codfw master (db2023) to let it replicate - T168661
  • 12:06 elukey: stop jobcron/jobrunner on mw1167 and mw1299 and reboot the hosts for kernel updates
  • 11:58 marostegui: Deploy alter table on s6 directly on codfw master (db2028) to let it replicate - T168661
  • 11:54 elukey: stop nova-spiceproxy and neutron-metadata-agent on labtestnet2001 to avoid root partition to fill up
  • 11:48 akosiaris: upload apertium-spa-cat_2.1.0~r79717-1 to apt.wikimedia.org/jessie-wikimedia/main
  • 11:36 elukey: stop jobcron/jobrunner on mw116[56] and reboot the hosts for kernel updates
  • 11:36 akosiaris: upload apertium-spa_1.1.0~r79716-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main
  • 11:36 akosiaris: upload apertium-cat_2.2.0~r79715-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main
  • 10:29 elukey: stop jobcron/jobrunner on mw116[34] and reboot the hosts for kernel updates
  • 10:25 elukey: re-enabled puppet and eventlogging_sync on db1047
  • 09:49 marostegui: executing alter tables to the log database on dbstore1002 for https://phabricator.wikimedia.org/T167162#3340421
  • 09:43 bawolff@tin: Synchronized php-1.30.0-wmf.6/api.php: Use redirect for api requests with pathinfo (duration: 00m 43s)
  • 09:24 gehel: restart of maps eqiad cluster completed
  • 08:59 elukey: stop puppet and eventlogging_sync on db1047
  • 08:46 elukey: executing alter tables to the log database on db1047 for https://phabricator.wikimedia.org/T167162#3340421
  • 08:44 gehel: reboot maps eqiad cluster
  • 08:33 gehel: restart of maps codfw cluster completed
  • 08:25 akosiaris: upload etherpad-lite_1.6.0-3 to apt.wikimedia.org/jessie-wikimedia/main
  • 08:18 elukey: stop jobcron/jobrunner on mw116[12] and reboot the hosts for kernel updates
  • 08:14 marostegui: Re-enable event scheduler on dbstore2001 - T168354
  • 08:08 godog: roll-restart swift-proxy on ms-fe1* to pick up thumbor changes
  • 07:57 gehel: reboot maps codfw cluster
  • 07:16 marostegui: Temporarily disable event scheduler on dbstore2001 - T168354
  • 07:11 marostegui: Deploy alter table db1034 - T166208
  • 06:48 marostegui: Deploy alter table s7 on labsdb1001 - T166208
  • 06:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1034 - T166208 (duration: 00m 43s)
  • 06:40 marostegui: Deploy alter table s7 - dbstore1002 - T166208
  • 05:58 elukey: restored rdb2004 as slave of rdb2003 (end of experiment)
  • 05:08 marostegui: Global rename of Green Cardamom → GreenC - T168776
  • 05:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079 - T166208 (duration: 00m 43s)
  • 03:43 mutante: smokeping on stretch means 2.6.11-3 vs 2.6.9-1 we had before
  • 03:35 mutante: smokeping - stop/rsync/fix permissions/start one more time to minimize gaps in graphs - now fully migrated netmon1001->netmon1002, historic data has been copied (T159756)
  • 03:28 mutante: netmon1002 - ganglia apache_status.py broken in stretch (?), ganglia deprecated, stopping gmond, aggregator role got removed, was for torrus
  • 03:03 mutante: netmon1002 - fixing permissions on /var/lib/smokeping rrd files (rsynced, inconstent UIDs )
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jun 27 02:29:22 UTC 2017 (duration 6m 25s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 46s)
  • 00:39 mutante: netmon1001 - rsyncing smokeping data (/var/lib/smokeping) over to netmon1002

2017-06-26

  • 23:51 maxsem@tin: Synchronized php-1.30.0-wmf.6/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/361584/ (duration: 00m 44s)
  • 23:38 maxsem@tin: Synchronized fonts/: https://gerrit.wikimedia.org/r/361195 (duration: 00m 45s)
  • 23:24 twentyafterfour@tin: Synchronized php-1.30.0-wmf.6/extensions/Scribunto/engines/LuaSandbox/Engine.php: deploy https://gerrit.wikimedia.org/r/#/c/361508 (duration: 00m 43s)
  • 23:23 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/361508
  • 22:56 halfak@tin: Finished deploy [ores/deploy@82dfd56]: Unscheduled/urgent deploy (T168099) (duration: 30m 55s)
  • 22:49 bd808: Updated LDAP loginShell to /bin/bash for 969 accounts that were still set to /usr/local/bin/sillyshell (T86668)
  • 22:34 legoktm@tin: Synchronized php-1.30.0-wmf.6/extensions/Linter/includes/ApiRecordLint.php: Add debug logging for missing 'dsr' - T168900 (duration: 00m 43s)
  • 22:32 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Enable 'Linter' debug log channel (duration: 00m 44s)
  • 22:27 mutante: netmon1001 - deactivate rancid crons - now running on netmon1002 instead - avoid duplicate mails (T159756)
  • 22:25 halfak@tin: Started deploy [ores/deploy@82dfd56]: Unscheduled/urgent deploy (T168099)
  • 21:50 robh: shutting down and decommissioning mw117[0-9] per T168271
  • 21:27 bawolff: deployed patch for T128209
  • 21:00 robh: attempting firmware update on lvs1007, which is currently offline
  • 20:38 bsitzmann@tin: Finished deploy [mobileapps/deploy@07066c7]: Update mobileapps to 0b05026 (duration: 03m 41s)
  • 20:34 bsitzmann@tin: Started deploy [mobileapps/deploy@07066c7]: Update mobileapps to 0b05026
  • 19:56 herron: updated ops list accept_these_nonmembers regex (T168903)
  • 19:41 hashar: Restarted Jenkins to lower console log spam ( https://gerrit.wikimedia.org/r/#/c/359116/ )
  • 19:35 urandom: T160570: Upgrading restbase-dev1003 to Cassandra 3.11.0 (release)
  • 19:30 urandom: T160570: Upgrading restbase-dev1002 to Cassandra 3.11.0 (release)
  • 19:05 mobrovac@tin: Finished deploy [restbase/deploy@3975ab2]: Update Parsoid HTML version to 1.5.0 - T39902 (duration: 06m 16s)
  • 18:59 mobrovac@tin: Started deploy [restbase/deploy@3975ab2]: Update Parsoid HTML version to 1.5.0 - T39902
  • 18:51 arlolra: Updated Parsoid to b59045f2 (T39902, T149794)
  • 18:32 urandom: T160570: Upgrading restbase-dev1001 to Cassandra 3.11.0 (release)
  • 18:31 arlolra@tin: Finished deploy [parsoid/deploy@70538a6]: Updating Parsoid to b59045f2 (duration: 11m 13s)
  • 18:20 arlolra@tin: Started deploy [parsoid/deploy@70538a6]: Updating Parsoid to b59045f2
  • 18:18 niharika29@tin: Finished scap: wmf-config/InitialiseSettings.php Deploy Quiz extension on huwikibooks (https://gerrit.wikimedia.org/r/#/c/361084) (duration: 03m 14s)
  • 18:15 niharika29@tin: Started scap: wmf-config/InitialiseSettings.php Deploy Quiz extension on huwikibooks (https://gerrit.wikimedia.org/r/#/c/361084)
  • 18:14 niharika29@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details) (duration: 02m 15s)
  • 18:14 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:11 niharika29@tin: Started scap: wmf-config/InitialiseSettings.php Deploy Quiz extension on huwikibooks (https://gerrit.wikimedia.org/r/#/c/361084)
  • 17:46 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.6
  • 17:36 twentyafterfour: Deploying 1.30.0-wmf.6 to all wikis refs T167535
  • 17:35 twentyafterfour: resuming the train for wmf.6 which was blocked at group 1
  • 17:12 gehel@tin: Finished deploy [wdqs/wdqs@f8b9294]: (no justification provided) (duration: 03m 42s)
  • 17:09 gehel@tin: Started deploy [wdqs/wdqs@f8b9294]: (no justification provided)
  • 16:59 elukey: EXPERIMENT - T163337 - set slaveof no one on rdb2004 to remove its dependency to rdb2003 (puppet disabled on rdb2004, to rollback just systemctl unmask redis-instance-tcp_6380.service, enable/run puppet and start redis if it is not up)
  • 16:55 elukey: stop neutron-server on labtestnet2001 to avoid the root partition to fill up
  • 15:41 marostegui: Deploy alter table s7 - db1079 - T166208
  • 15:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T166208 (duration: 00m 46s)
  • 15:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1086 - T166208 (duration: 00m 46s)
  • 14:47 marostegui: Deploy alter table on silver and labtestweb2001 - T168661
  • 13:49 marostegui: Deploy alter table s7 - db1033 - T166208
  • 13:48 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1033 status - T166208 (duration: 00m 48s)
  • 13:08 elukey: truncate /var/log/upstart/neutron-server.log on labtestnet2001 (root filled up, spam in logs for 'ERROR neutron.service OperationalError: (sqlite3.OperationalError) no such table:')
  • 12:58 marostegui: Deploy alter table on db2062 and db2055 - T168661
  • 12:55 elukey: reboot mw129[5,6,7,8] for kernel update (mw imagescalers, two at the time)
  • 12:02 marostegui: Deploy alter table on s2 codfw master (db2017) and let it replicate - T168661
  • 11:05 godog: roll-restart pybal in codfw to pick up thumbor.svc.codfw.wmnet
  • 10:28 elukey: reboot mw1288->90 for kernel updates (last batch of api-appservers)
  • 10:18 elukey: reboot mw128[4,5,6,7] for kernel updates (api-appservers)
  • 10:03 godog: roll-restart nginx on thumbor to disable te: chunked
  • 09:34 elukey: reboot mw128[0,1,2,3] for kernel updates (api-appservers)
  • 09:04 elukey: reboot mw127[6,7,8,9] for kernel updates (api-appservers)
  • 08:58 elukey: reboot mw127[3,4,5] for kernel updates (appservers)
  • 08:50 gehel: starting restart of elasticsearch codfw for kernel upgrade
  • 08:48 elukey: reboot mw1269 -> mw1272 for kernel updates (appservers)
  • 08:37 godog: roll-restart swift-proxy to use thumbor for commons
  • 08:28 elukey: reboot mw1258, 126[6,7,8] for kernel updates (appservers)
  • 08:11 elukey: reboot mw125[4,5,6,7] for kernel updates (appservers)
  • 07:55 marostegui: Stop replication on db1069:3313 (s3) and db1044 in the same position - T166546
  • 07:15 elukey: restart pdfrender on scb1002 for the xpra issue
  • 07:08 elukey: powercycle elastic1017 (stuck in console, no ssh access)
  • 06:57 marostegui: Drop table wikilove_image_log from silver - T127219
  • 06:56 elukey: truncated neutron-server.log files in /var/log on labtestnet2001 to free some space in root
  • 06:55 marostegui: Drop table wikilove_image_log from s1 - T127219
  • 06:51 marostegui: Drop table wikilove_image_log from s3 - T127219
  • 06:50 elukey: execute sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete on graphite1001 to free some space for /var/lib/carbon
  • 06:49 marostegui: Drop table wikilove_image_log from s7 - T127219
  • 06:47 marostegui: Drop table wikilove_image_log from s2 - T127219
  • 06:45 marostegui: Drop table wikilove_image_log from s4 - T127219
  • 06:44 marostegui: Drop table wikilove_image_log from s6 - T127219
  • 06:36 marostegui: Deploy alter table s7 - db1086 - T166208
  • 06:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1086 - T166208 (duration: 00m 46s)
  • 06:26 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1041 long running alter status - T166208 (duration: 00m 47s)
  • 03:01 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jun 26 03:01:35 UTC 2017 (duration 6m 52s)
  • 02:54 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 08m 04s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 08m 03s)

2017-06-25

  • 09:00 elukey: Executing 'sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete' on graphite1001 to free some space (/var/lib/carbon filling up) - T1075

2017-06-23

  • 23:42 akosiaris: bounce celery-ores-worker on scb1004
  • 19:38 ppchelko@tin: Finished deploy [changeprop/deploy@ffabd13]: Re-enable ORES rules back (duration: 01m 07s)
  • 19:37 ppchelko@tin: Started deploy [changeprop/deploy@ffabd13]: Re-enable ORES rules back
  • 19:34 akosiaris: restart celery-ores-workers on scb1001, scb1002, scb1003, leave scb1004 alone
  • 18:39 godog: roll restart celery-ores-worker in codfw
  • 17:01 mobrovac@tin: Finished deploy [changeprop/deploy@1f45fae]: Temporary disable ORES (ongoing outage) (duration: 01m 19s)
  • 16:59 mobrovac@tin: Started deploy [changeprop/deploy@1f45fae]: Temporary disable ORES (ongoing outage)
  • 16:44 mobrovac: scb1001 disabling puppet
  • 16:34 akosiaris: restart celery ores worker on scb1003
  • 15:54 hashar_: Restarted Jenkins
  • 15:45 godog: bounce celery-ores-worker on scb1001 with logging level INFO
  • 13:51 akosiaris: issue flashdb on oresrdb1001:6379
  • 13:21 akosiaris: issue flashdb on oresrdb1001:6379
  • 13:13 akosiaris: bump uwsgi-ores and celery-ores-worker on scb100*
  • 12:38 akosiaris: disable changeprop due to ORES issues
  • 12:26 Amir1: restarting celery and uwsgi on all scb nodes in eqiad
  • 11:55 Amir1: restarted uwsgi-ores and celery-ores-worker services in scb1003
  • 11:45 ema: scb1001: restart pdfrender.service
  • 09:55 elukey: reboot mw1250-53 for kernel updates
  • 09:27 jynus: reapplying dns change - small downtime on tendril until puppet deploy and run
  • 08:38 jynus: deploying dns change to tendril
  • 06:17 mutante: releases1001 - systemctl reset-failed to clear Icinga systemd status CRIT - service puppet
  • 06:17 marostegui: Deploy alter table on db1041 - s7 - T166208
  • 06:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1041 long running alter status - T166208 (duration: 00m 46s)
  • 06:08 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2066 - T168354 (duration: 00m 46s)
  • 05:59 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026 - T166207 (duration: 00m 47s)
  • 00:15 mutante: RT (ununpentium) installing pending package upgrades

2017-06-22

  • 23:15 Dereckson: kbp.wikipedia wiki creation done.
  • 23:11 dereckson@tin: Synchronized wmf-config/interwiki.php: Add kbp.wikipedia to interwiki map (T160868) (duration: 00m 46s)
  • 23:07 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Add kbp.wikipedia to interwiki map (T160868) (duration: 00m 47s)
  • 22:56 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for kbp.wikipedia (T160868) (duration: 00m 45s)
  • 22:54 dereckson@tin: Synchronized langlist: +kbp (T160868) (duration: 00m 46s)
  • 22:53 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: +kbpwiki (T160868)
  • 22:52 dereckson@tin: Synchronized dblists: (no justification provided) (duration: 00m 48s)
  • 22:51 Dereckson: Create tables for kbpwiki (T160868)
  • 21:43 RainbowSprinkles: gerrit: Stopping momentarily, reindexing accounts
  • 21:03 andrewbogott: restarting rabbitmq-server on labcontrol1001
  • 20:34 mutante: icinga - re-enabling disabled notifications for IPMI temp checks on some mc* and mw* hosts where check is fine and OK
  • 20:21 andrewbogott: labtestnet2001 turning neutron debug logs off because they're flooding the (very small) '/' partition
  • 19:52 twentyafterfour: the train is currently blocked by https://phabricator.wikimedia.org/T168681
  • 19:31 thcipriani@tin: Finished scap: SWAT: Translation updates for QuickSurveys T131949 (duration: 22m 10s)
  • 19:09 thcipriani@tin: Started scap: SWAT: Translation updates for QuickSurveys T131949
  • 19:04 thcipriani@tin: Synchronized wmf-config: SWAT: Create a FeaturedFeed for the Wikimag bulletin on frwiki T168005 (duration: 00m 54s)
  • 18:51 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant the "movefile" right to the "autopatrolled" group on rowiki T168192 (duration: 00m 48s)
  • 18:39 thcipriani@tin: Synchronized php-1.30.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: Switch to data-attribute for sister-search sidebar results T164854 (duration: 00m 50s)
  • 18:29 thcipriani@tin: Synchronized wmf-config: SWAT: relatedArticles: SamplingRate -> BucketSize PART II (duration: 00m 48s)
  • 18:27 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: relatedArticles: SamplingRate -> BucketSize PART I (duration: 00m 53s)
  • 18:24 jynus: restart db2062
  • 17:51 jynus: testing in-place upgrade from jessie to stretch of db2062
  • 17:34 bsitzmann@tin: Finished deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d (duration: 02m 54s)
  • 17:31 bsitzmann@tin: Started deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d
  • 17:24 gehel: restarting logstash on logstash1001 to validate plugin deplyoment with scap3
  • 17:23 gehel@tin: Finished deploy [logstash/plugins@720b648]: (no justification provided) (duration: 00m 02s)
  • 17:23 gehel@tin: Started deploy [logstash/plugins@720b648]: (no justification provided)
  • 17:14 gehel: moving to scap for logstash plugin deployment
  • 17:13 jynus: disable puppet on db2062 before maintenance
  • 17:05 andrewbogott: rebooting labsdb1007
  • 17:04 bd808: Log events between 15:46 and 17:03 missed due to stashbot downtime
  • 17:03 andrewbogott: rebooting labsdb1007
  • 15:46 moritzm: repooling scb1003 after hardware maintenance
  • 15:31 otto@tin: Finished deploy [eventlogging/analytics@328dea6]: inserting eventlogging events into mysql based on topic name if it exists, falling back to schema name (duration: 00m 03s)
  • 15:31 otto@tin: Started deploy [eventlogging/analytics@328dea6]: inserting eventlogging events into mysql based on topic name if it exists, falling back to schema name
  • 15:21 moritzm: rebooting restbase2005 for kernel update
  • 14:37 gehel: restarting maps-test cluster for kernel upgrade
  • 14:22 gehel: restart wdqs servers completed
  • 13:55 gehel: restart wdqs servers for kernel upgrade
  • 13:45 akosiaris: reboot planet1001 for kernel upgrades and renumbering
  • 13:21 moritzm: rebooting restbase2006 for kernel update
  • 13:09 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Reader Survey using QuickSurveys - T131949 (duration: 01m 04s)
  • 12:17 moritzm: rebooting restbase2008 for kernel update
  • 11:21 moritzm: rebooting ms-be2026 to ms-be2030 for kernel update
  • 11:12 moritzm: rebooting restbase2009 for kernel update
  • 10:45 ema: cp1074: restart varnish backend
  • 10:25 moritzm: rebooting ms-be2022 to ms-be2025 for kernel update
  • 10:19 moritzm: rearmed keyholder on tin
  • 10:12 moritzm: rebooting restbase2010 for kernel update
  • 10:00 moritzm: depooled mw1228, broken disk cause boot failure
  • 09:50 moritzm: rebooting tin for kernel update
  • 09:46 jynus: reimage db2072
  • 09:42 moritzm: powercycling mw1228, stuck in reboot
  • 09:36 akosiaris: rebooting chlorine.eqiad.wmnet etcd1004.eqiad.wmnet etcd1005.eqiad.wmnet mwdebug1002.eqiad.wmnet neon.eqiad.wmnet sca1004.eqiad.wmnet for kernel upgrades
  • 09:25 moritzm: rebooting mw1221-mw1235 for kernel update
  • 09:15 moritzm: rebooting restbase2011 for kernel update
  • 09:11 marostegui: Deploy alter table s5 - labsdb1003 - T166207
  • 09:06 elukey: rebooting kafka100[23] for kernel updates (evenbus eqiad)
  • 09:01 moritzm: rebooting rhenium for kernel update
  • 08:55 marostegui: Stop MySQL and reboot labsdb1011 - T168584
  • 08:50 moritzm: rebooting restbase2012 for kernel update
  • 08:44 marostegui: Stop MySQL and reboot labsdb1010 - T168584
  • 08:40 moritzm: rearmed keyholder on naos
  • 08:32 akosiaris: reboot etcd1002 for kernel upgrades
  • 08:20 moritzm: rebooting naos for kernel update
  • 08:20 marostegui: Stop MySQL and reboot labsdb1009 - T168584
  • 08:07 moritzm: powercycling labtestservices2001 (didn't come up after reboot)
  • 07:26 moritzm: rebooting suhail/subra for kernel update
  • 07:24 elukey: reboot kafka1001 for kernel updates (eventbus eqiad)
  • 07:24 marostegui: Deploy alter table s5 - db1026 - T166207
  • 07:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 - T166207 (duration: 00m 44s)
  • 07:15 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 - T166207 (duration: 01m 03s)
  • 07:12 marostegui: Deploy alter table s5 - dbstore1001 - T166207
  • 06:53 moritzm: rebooting mw1205-mw1208 for kernel update
  • 06:38 moritzm: rebooting bast2001 for kernel update
  • 05:34 moritzm: rebooting mw1238-mw1249 for kernel update
  • 05:02 moritzm: rebooting ms-be2015-ms-be2020 for kernel update
  • 03:40 mutante: regarding my last log message: this is just true for stretch! ah!
  • 03:35 mutante: netmon1002 - installed psmisc to have 'killall' - will clean it up, but also suggest we add psmisc to base packages. it provides killall, fuser, pstree...
  • 02:49 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 22 02:49:58 UTC 2017 (duration 6m 53s)
  • 02:43 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 07m 23s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 08m 10s)
  • 00:15 twentyafterfour: finished phabricator deployments
  • 00:13 twentyafterfour: deploying https://phabricator.wikimedia.org/D687

2017-06-21

  • 23:19 twentyafterfour@tin: Synchronized static/images/project-logos/wikimania2017wiki.png: swat (duration: 00m 45s)
  • 23:10 twentyafterfour@tin: Synchronized static/images/project-logos/wikimania2017wiki.png: swat (duration: 00m 45s)
  • 22:38 mutante: new language din.wikipedia.org has been created in DNS - Dinka is a Nilotic dialect cluster spoken by the Dinka people, the major ethnic group of South Sudan. (T168518) - https://en.wikipedia.org/wiki/Dinka_language
  • 22:34 mutante: DNS - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones to trigger template recreation after edit to langs.tmpl
  • 22:31 chasemp: remove manual 10.64.37.26 definition from eth1 on labstore1005 in /etc/network/interfaces
  • 22:27 chasemp: reboot labstore1004 to reset network config from boot
  • 21:44 RainbowSprinkles: cobalt: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)
  • 21:40 RainbowSprinkles: gerrit2001: updated to 2.13.8-11-gde96955fb2 (T168360, T161206)
  • 21:14 mutante: apt.wm.org - reprepro copy stretch-wikimedia jessie-wikimedia gerrit - make gerrit available in stretch
  • 21:05 mutante: apt.wm.org - reprepro, include gerrit_2.13.8+git1-wmf.6 for jessie-wikimedia
  • 21:02 mutante: install1002 - rsynced gerrit packages from copper, closed firewall again, cleaned up rsyncd config from old unused things
  • 20:57 arlolra: Updated Parsoid to 881ade32 (T127421, T167933, T167714)
  • 20:50 mutante: install1002 - allow rsync from copper (build host) to /srv/wikimedia/incoming , temp for package upload
  • 20:49 arlolra@tin: Finished deploy [parsoid/deploy@2c4c0de]: Updating Parsoid to 881ade32 (duration: 12m 02s)
  • 20:37 arlolra@tin: Started deploy [parsoid/deploy@2c4c0de]: Updating Parsoid to 881ade32
  • 20:35 mutante: install1002 - removing rsyncd config fragments from carbon migration, running puppet
  • 20:25 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.6
  • 20:15 bearND: rolled back deploy since scap could not connect to scb1003
  • 20:14 twentyafterfour@tin: Synchronized php-1.30.0-wmf.6/includes/gallery/ImageGalleryBase.php: deploy https://gerrit.wikimedia.org/r/#/c/360695/ refs T168479 to unblock the train (duration: 00m 56s)
  • 20:13 bsitzmann@tin: Finished deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d (duration: 08m 43s)
  • 20:13 andrewbogott: deleting the old IAD wikitech-static server so we stop paying rackspace for it
  • 20:11 ppchelko@tin: Finished deploy [changeprop/deploy@63e6a7b]: Actually start black-listing and rate-limiting articles. T161710 (duration: 01m 16s)
  • 20:09 ppchelko@tin: Started deploy [changeprop/deploy@63e6a7b]: Actually start black-listing and rate-limiting articles. T161710
  • 20:04 bsitzmann@tin: Started deploy [mobileapps/deploy@7bfe571]: Update mobileapps to 21f771d
  • 19:41 mutante: copper: building gerrit_2.13.8+git1-wmf.6 for stretch (experimental)
  • 19:39 mutante: copper: building gerrit_2.13.8+git1-wmf.6 for jessie
  • 19:30 twentyafterfour: The train for wmf.6 (T167535) is currently blocked by T168479
  • 19:13 madhuvishy: Rebooting labstore1004 (secondary in drbd pair)
  • 18:55 andrewbogott: rebooting labnet1001, which will cause a labs-wide network outage
  • 18:46 gehel: restarting wdqs-updater on all wdqs servers
  • 18:38 krinkle@tin: Synchronized static/images/: I737e6f9fce (duration: 00m 46s)
  • 18:20 gehel@tin: Finished deploy [wdqs/wdqs@d67d4a4]: (no justification provided) (duration: 01m 50s)
  • 18:18 gehel@tin: Started deploy [wdqs/wdqs@d67d4a4]: (no justification provided)
  • 18:17 gehel: deploying wdqs to fix missing lib
  • 18:14 andrewbogott: rebooting labnet1002
  • 18:09 andrewbogott: rebooting labnodepool1001
  • 18:02 andrewbogott: rebooting labcontrol1001
  • 18:02 andrewbogott: rebooting labservices1001
  • 18:02 andrewbogott: rebooting silver
  • 18:02 andrewbogott: rebooting californium
  • 17:59 andrewbogott: rebooting labservices1001
  • 17:58 andrewbogott: disabling the openstack scheduler so that we don't get new inconsistent VMs during some reboots
  • 17:53 andrewbogott: rebooting labcontrol1002
  • 17:53 andrewbogott: rebooting labservices1002
  • 17:37 twentyafterfour: phabricator is back online
  • 17:36 andrewbogott: rebooting labvirt1013
  • 17:35 herron: iridium - upgraded exim packages and rebooted to apply kernel upgrade
  • 17:35 ottomata: beginning reboots of kafka10(14|18|20|22) for kernel upgrade
  • 17:34 twentyafterfour: phabricator will be offline momentarily while iridium reboots
  • 17:25 andrewbogott: rebooting labvirt1012
  • 17:12 andrewbogott: rebooting labvirt1011
  • 16:59 andrewbogott: rebooting labvirt1010
  • 16:57 herron: reboot fermium (lists) for kernel upgrade
  • 16:42 andrewbogott: rebooting labvirt1009
  • 16:41 moritzm: rebooting video scalers in codfw for kernel update
  • 16:35 moritzm: rebooting mw1293/mw1294 for kernel update
  • 16:32 andrewbogott: rebooting labvirt1008
  • 15:53 godog: upgrade ms-be10[31-39] to swift 2.10
  • 15:46 ema: reboot lvs[4001-4002] (ulsfo primaries) for kernel update
  • 15:45 moritzm: upgrade ms-be2013/ms-be2014 to final stretch release and reboot for kernel update
  • 15:34 ema: reboot lvs[4003-4004] (ulsfo secondaries) for kernel update
  • 15:32 moritzm: reboot image scalers in codfw for kernel update
  • 15:32 andrewbogott: rebooting labvirt1007
  • 15:13 andrewbogott: rebooting labvirt1006
  • 15:04 moritzm: rebooting ruthenium for kernel update
  • 15:01 moritzm: reboot job runners in codfw for kernel update
  • 15:01 elukey: reboot kafka200[23] for kernel updates (eventbus codfw)
  • 14:53 andrewbogott: rebooting labvirt1005
  • 14:40 moritzm: reboot remaining scb* hosts for kernel update
  • 14:38 andrewbogott: rebooting labvirt1004
  • 14:32 ema: reboot lvs[3001-3002] (esams primaries) for kernel update
  • 14:25 andrewbogott: rebooting labvirt1003
  • 14:21 andrewbogott: rebooting labvirt1002
  • 14:18 herron: rebooting mx1001 for kernel upgrade
  • 14:08 ema: reboot lvs[3003-3004] (esams secondaries) for kernel update
  • 14:03 elukey: reboot eventlog2001 for kernel update
  • 14:02 andrewbogott: rebooting labvirt1001
  • 14:01 gehel: restarting wdqs1001 for kernel upgrade
  • 14:01 godog: reimage ms-be1020 / ms-be1021 with stretch
  • 13:52 gehel: install analysis-kuromoji plugin on relforge
  • 13:52 herron: install exim security updates on fermium (lists)
  • 13:51 elukey: rebooting eventlog1001 for kernel update (eventlogging host)
  • 13:50 moritzm: pruning old kernels on prometheus*
  • 13:48 addshore@tin: Synchronized php-1.30.0-wmf.6/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js: SWAT: Fix errors leading to wrong slider scroll postions T168299 (duration: 00m 44s)
  • 13:47 addshore@tin: Synchronized php-1.30.0-wmf.5/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js: SWAT: Fix errors leading to wrong slider scroll postions T168299 (duration: 00m 46s)
  • 13:44 elukey: reboot aqs100[89] for kernel updates
  • 13:39 ema: reboot lvs[2001-2003] (codfw primaries) for kernel update
  • 13:29 elukey: reboot aqs1007 for kernel update
  • 13:22 marostegui: Deploy alter table on s7 - directly on codfw master (db2029) - this will generate lag on codfw - T166208
  • 13:21 elukey: reboot kafka1013 for kernel updates
  • 13:16 marostegui: Deploy alter table s5 - labsdb1001 - T166207
  • 13:15 marostegui: Deploy alter table s5 - db1045 - T166207
  • 13:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1045 - T166207 (duration: 00m 44s)
  • 13:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 - T166207 (duration: 00m 46s)
  • 13:05 elukey: reboot analytics1003 (Hue, Camus, Oozie, Hive master) for kernel upgrade
  • 12:32 gehel: deploying T167871 and restarting kartotherian / tilerator on maps eqiad
  • 12:32 moritzm: rebooting mw1189-mw1199 for kernel update
  • 12:10 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sca1004.eqiad.wmnet
  • 12:09 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
  • 11:59 moritzm: rebooting mw1209-mw1220 for kernel update
  • 11:45 moritzm: rebooting mediawiki api servers in codfw for kernel update
  • 11:42 akosiaris: rollback change in asw-a-eqiad for ganeti interface range due to alerts
  • 11:23 akosiaris: reboot ganeti1007 for insertion into ganeti cluster
  • 11:14 elukey: reboot aqs1006 for kernel update
  • 11:04 moritzm: rebooting mw1180-mw1188 for kernel update
  • 11:02 akosiaris: starting up all instances on ganeti01.svc.codfw.wmnet
  • 11:01 godog: reimage ms-be1018 / 1019 with stretch
  • 10:58 ema: reboot lvs[2004-2006] (codfw secondaries) for kernel update
  • 10:50 akosiaris: rebooting all ganeti200X nodes
  • 10:47 akosiaris: shutdown all VMs on the ganeti01.svc.codfw.wmnet cluster
  • 10:43 elukey: reboot analytics1001 (Hadoop master) for kernel update
  • 10:35 akosiaris: rebooting the entire codfw ganeti cluster for kernel upgrades. Silenced hosts in icinga already. T167643
  • 10:30 moritzm: rebooting bast4001 for kernel update
  • 10:21 ema: reboot lvs[1001-1003] (eqiad primaries) for kernel update
  • 10:17 elukey: running a script in tmux on rdb[12]003 called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed and stopped the one on rdb2004
  • 10:07 ema: reboot lvs[1004-1006] (eqiad secondaries) for kernel update
  • 10:01 elukey: reboot analytics1002 (Hadoop master standby) for kernel update
  • 10:01 moritzm: rebooting auth* servers for kernel update
  • 09:48 ema: reboot lvs[1010-1012] for kernel update
  • 09:48 elukey: reboot aqs1005 for kernel update
  • 09:10 elukey: reboot kafka2001 for kernel update (eventbus codfw)
  • 09:06 moritzm: rebooting restbase1017 for kernel update
  • 08:52 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=restbase2001.codfw.wmnet,dc=codfw,service=restbase
  • 08:49 _joe_: correction: restarting pybal
  • 08:49 _joe_: restarting etcd on lvs2003/2006, connection lost to etcd
  • 08:34 elukey: reboot kafka1012 for kernel upgrades
  • 08:34 marostegui: Deploy alter table db1070 s5 - T166207
  • 08:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 - T166207 (duration: 00m 44s)
  • 08:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1082 - T166207 (duration: 00m 45s)
  • 08:26 godog: reimage ms-be1014 / 1015 with jessie
  • 07:37 marostegui: Stop and reset slave s5 on dbstore2001 - T168354
  • 06:23 mutante: planet2001 wget missing unpuppetized logo file from https://en.planet.wikimedia.org/images/planet-wm2.png - should fix puppet run
  • 06:19 marostegui: Stop replication and puppet on db2066 for maintenance - T168354
  • 06:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2066 - T168354 (duration: 00m 43s)
  • 06:08 elukey: reboot thorium for kernel upgrades (outage to all the analytics websites)
  • 06:05 marostegui: Deploy alter table s5 - db1082 - T166207
  • 06:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1082 - T166207 (duration: 00m 44s)
  • 06:04 marostegui: Deploy alter table s5 - dbstore1002 - T166207
  • 05:59 elukey: reboot stat100[2,3,4] for kernel upgrades
  • 05:57 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1087 - T166207 (duration: 00m 44s)
  • 05:54 marostegui: Deploy alter table s5 - labsdb1011 - T166207
  • 05:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1021 - T166205 (duration: 01m 00s)
  • 05:41 marostegui: Start relearn BBU cycle on db1016 - T166344
  • 03:13 mutante: planet - copying HTML files from docroot from planet1001 to planet2001 - (don't serve Debian default page)
  • 03:03 mutante: planet1001 - remove/purge all php5* packages
  • 02:57 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 21 02:57:19 UTC 2017 (duration 6m 41s)
  • 02:50 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.6) (duration: 06m 06s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 06m 52s)
  • 01:45 mutante: planet1001 - remove php5 package
  • 00:34 mutante: planet2001 - revoke old puppet cert, salt-key, re-add new cert/key after reinstall
  • 00:24 mutante: planet2001 - scheduled downtime, reinstall with stretch
  • 00:06 mutante: tin (deployment): manually remove l10nupdate cron, let puppet re-create it after gerrit:350749. stops l10nupdate cron from running on weekends. naos didn't need an action. (T164035).

2017-06-20

  • 23:06 aude@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Remove temp wiktionary site link settings (duration: 00m 43s)
  • 23:05 aude@tin: Synchronized wmf-config/Wikibase-labs.php: Remove temp wiktionary site link settings (duration: 00m 44s)
  • 23:03 aude@tin: Synchronized wmf-config/Wikibase-production.php: Remove temp wiktionary site link settings for test wikidata (duration: 00m 43s)
  • 22:59 aude@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase (phase 1) on Wiktionary wikis (duration: 00m 44s)
  • 22:49 aude: created wbc_entity_usage table and updated sites table on wiktionary wikis
  • 21:36 legoktm@tin: Synchronized wmf-config: touch (duration: 00m 45s)
  • 21:29 arlolra@tin: Started restart [parsoid/deploy@4b60bf9]: (no justification provided)
  • 21:17 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to all wikis (try #2) - T148609 (duration: 00m 44s)
  • 21:17 andrewbogott: rebooting labvirt1014 as practice for tomorrow's security reboots
  • 21:13 mutante: labtestpuppetmaster2001 - install-console, activate puppet, sign cert, initial puppet run, add salt key (T167157)
  • 20:54 twentyafterfour: Finished train deployment for group0, train will resume tomorrow as scheduled.
  • 20:53 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: Group0 to 1.30.0-wmf.6 refs T167535
  • 20:44 twentyafterfour@tin: Synchronized php-1.30.0-wmf.6/includes/changes/EnhancedChangesList.php: deploy bad7bde refs T167535 (duration: 00m 53s)
  • 20:37 twentyafterfour@tin: Finished scap: sync 1.30.0-wmf.6 refs T167535 (duration: 29m 16s)
  • 20:08 twentyafterfour@tin: Started scap: sync 1.30.0-wmf.6 refs T167535
  • 19:17 twentyafterfour: Prepping 1.30.0-wmf.6 - T167535
  • 18:09 mutante: netmon1002 - arm keyholder with rancid key
  • 18:06 ema: route ulsfo back to codfw T167274
  • 18:02 chasemp: ssh labsdb101[0|1].eqiad.wmnet 'sudo maintain-meta_p --all-databases --debug'
  • 17:53 mutante: cobalt (gerrit) - re-enabling puppet, running it. nothing should change, the system unit file mentioned in T168360#3362314 does not get installed by puppet, it comes from the deb
  • 17:49 subbu: Since arlolra noticed some unexpected warnings from the canaries, the Parsoid deploy was rolled back, so Parsoid was not updated to e2e2b5f6 (contrary to what scap said above).
  • 17:48 gehel@tin: Finished deploy [wdqs/wdqs@b60d224]: (no justification provided) (duration: 01m 41s)
  • 17:47 XioNoX: repool codfw - T167274
  • 17:46 gehel@tin: Started deploy [wdqs/wdqs@b60d224]: (no justification provided)
  • 17:45 gehel: deploying wdqs blazegraph and GUI updates
  • 17:43 mutante: RT - ununpentium - upgraded rt4-db-mysql
  • 17:42 arlolra@tin: Finished deploy [parsoid/deploy@4b60bf9]: Updating Parsoid to e2e2b5f6 (duration: 07m 57s)
  • 17:40 mutante: mwreleases1001 - puppet node clean, puppet node deactivate - was reinstalled as releases1001
  • 17:34 arlolra@tin: Started deploy [parsoid/deploy@4b60bf9]: Updating Parsoid to e2e2b5f6
  • 17:29 elukey: running a script in tmux on rdb200[34] called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed
  • 17:21 elukey: restart redis-instance-tcp_6380.service on rdb2003 to force sync with its master
  • 17:16 elukey: restart redis-instance-tcp_6380.service on rdb2004 to force sync with its master
  • 17:04 XioNoX: re-enable igmp-snooping on asw-d-codfw
  • 17:01 bd808: Ran maintain-meta_p --all-databases on labsdb1003
  • 16:55 bd808: Ran maintain-meta_p --all-databases on labsdb1001
  • 16:53 paravoid: updating the d-i image for stretch in puppet volatile
  • 16:09 chasemp: openstack server delete admin-monitoring openstack project instances (we have leaked 7)
  • 16:05 elukey: reboot kafka1013 for kernel upgrade
  • 15:08 XioNoX: starting asw-d-codfw switch upgrade - T167274
  • 14:47 elukey: rolling restart of druid100[123] for kernel upgrades
  • 14:32 XioNoX: depooled codfw - T167274
  • 14:27 moritzm: rebooting scb1001 for kernel update
  • 14:17 hashar: CI is fully backup (following reboot of contint1001 / labnodepool1001 )
  • 14:16 hashar: Upgraded Jenkins plugins
  • 14:05 hashar: Starting Jenkins on contint1001
  • 14:05 elukey: reboot kafka2001 for kernel upgrade
  • 14:02 hashar: Rebooting contint1001
  • 14:00 hashar: Stopping Nodepool service to prevent new builds
  • 13:55 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1087 - T166207 (duration: 01m 41s)
  • 13:55 marostegui: Deploy alter table db1087 - s5 - T166207
  • 13:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1071 - T166207 (duration: 00m 41s)
  • 13:44 aude@tin: Synchronized wmf-config/Wikibase-production.php: Enable Wiktionary site links on test.wikidata (duration: 00m 43s)
  • 13:42 _joe_: manually started nrpe on ms-be1016
  • 13:39 marostegui: Deploy alter table on db1049 - s5 - T166207
  • 13:39 moritzm: rebooting labnodepool1001 for kernel update
  • 13:37 hashar: Restarting Jenkins
  • 13:36 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=sca1004.eqiad.wmnet
  • 13:36 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=mwdebug1002.eqiad.wmnet
  • 13:33 godog: pool thumbor100[34] into service - T168297
  • 13:26 marostegui: Deploy alter table labsdb1010 - s5 - T166207
  • 13:14 moritzm: rebooting restbase staging cluster (cerium/praseodymium/xenon) for kernel update
  • 12:09 gehel: starting cluster restart elasticsearch eqiad
  • 12:00 elukey: reboot analytics1029 -> analytics1069 for kernel upgrades (Hadoop worker nodes)
  • 11:36 moritzm: installing libgcrypt security updates
  • 11:29 moritzm: rebooting mediawiki app servers in codfw for kernel update
  • 11:13 akosiaris: renumber sca1004, mwdebug1002. Downtime should be a few minutes
  • 11:08 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
  • 10:56 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=sca1004.eqiad.wmnet
  • 10:07 moritzm: rebooting mwdebug servers for kernel update
  • 10:03 elukey: reboot kafka1012, analytics1028, aqs1004 for kernel upgrades (canary hosts)
  • 10:00 godog: reimage ms-be1016 with stretch
  • 09:53 godog: reset ms-be1014 idrac via ipmitool
  • 09:46 moritzm: rebooting app server canaries for kernel update
  • 09:40 godog: roll-restart thumbor to increase swift timeout
  • 09:29 marostegui: Rename table on db1089 enwiki.wikilove_image_log - T127219
  • 08:46 marostegui: Drop table titlekey from s1 - T164949
  • 08:35 godog: roll restart swift-proxy on ms-fe* to pick up thumbor changes
  • 08:30 _joe_: restarting gerrit T168360
  • 08:25 _joe_: manually patching gerrit's systemd unit file to allow more open files
  • 08:22 marostegui: Drop table titlekey from s3 - T164949
  • 08:15 marostegui: Drop table titlekey from s4 - T164949
  • 08:06 marostegui: Drop table titlekey from s7 - https://phabricator.wikimedia.org/T164949
  • 07:45 marostegui: Drop table titlekey from s5 - T164949
  • 07:35 gehel: restarting elastic1017 to validate upgrades
  • 07:27 marostegui: kill alter table on enwiki.revision db1047 after running for 13 days - T166452
  • 07:23 moritzm: installing glibc security updates
  • 07:22 marostegui: Stop MySQL dbstore2001 for maintenance - T168354
  • 07:20 marostegui: Deploy alter table s5 - db1071 - T166207
  • 07:10 marostegui: Deploy alter table s5 - db1095 - T166207
  • 06:57 moritzm: install remaining exim security updates

2017-06-19

  • 23:38 andrewbogott: are we logging?
  • 23:35 legoktm: legoktm@tin: Synchronized static/images/project-logos/: Upload logos for the Dinka Wikipedia (duration: 00m 42s)
  • 22:45 andrewbogott: removed some big dirs from /home/ori on install1002
  • 22:30 andrewbogott: find /srv/carbon/whisper/archived_metrics -mtime +730 -type f -delete on labmon1001
  • afk: Added non-voting operations-puppet-tests-docker job for operations/puppet repo, should (hopefully) be fast, and will timeout after 1 minute if it's not. More info https://gerrit.wikimedia.org/r/#/c/360091/ + T166888
  • afk: updated payments-wiki from 7a50542 to 8bdd706
  • 19:39 mepps: correction: updated civicrm from dfc26f0 to d558df2
  • 19:28 mepps: updated from dfc26f0 to d558df2
  • 18:35 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 41s)
  • 18:34 reedy@tin: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 42s)
  • 18:29 reedy@tin: Synchronized wmf-config/abusefilter.php: (no justification provided) (duration: 00m 41s)
  • 18:21 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: logos (duration: 00m 41s)
  • 18:20 reedy@tin: Synchronized static/favicon/wmf.ico: (no justification provided) (duration: 00m 41s)
  • 18:19 reedy@tin: Synchronized wmf-config/flaggedrevs.php: Remove old setting that does nothing (duration: 00m 41s)
  • 18:18 reedy@tin: Synchronized static/images: (no justification provided) (duration: 00m 41s)
  • 18:10 reedy@tin: Synchronized dblists/securepollglobal.dblist: (no justification provided) (duration: 00m 41s)
  • 18:02 reedy@tin: Synchronized wmf-config/InterwikiSortOrders.php: Add atjwiki (duration: 00m 41s)
  • 17:42 ejegg: updated fundraising tools from 585f546 to 457bddb
  • 17:21 moritzm: installing exim4 security updates
  • 15:48 moritzm: uploaded linux-meta_1.13 to apt.wikimedia.org (with this update the linux-meta package now also defaults to 4.9 (previously 4.4))
  • 15:47 moritzm: uploaded linux_4.9.25-1~bpo8+3 to apt.wikimedia.org
  • 15:25 volans: installed python-setuptools-scm on copper
  • 15:16 marostegui: Deploy alter table labsdb1009 - T166207
  • 15:12 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=chlorine.eqiad.wmnet
  • 15:03 mobrovac: restbase restbase2001 is out of rotation, performing experiments with the new cassandra driver v3.2.2 which seems to be causing problems only in production
  • 14:59 godog: cold reset ms-be1013 drac
  • 14:53 gehel: pausing cluster restart of elasticsearch eqiad
  • 14:24 godog: roll-upgrade swift to 2.10 on ms-be10[22-30] - T162609
  • 14:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 - T153743 (duration: 00m 41s)
  • 14:06 gehel: starting cluster restart on elasticsearch / cirrus / eqiad for ltr plugin deployment
  • 14:01 gehel: restarting elasticsearch / relforge for ltr plugin deployment
  • 13:58 gehel: remove decommissioned nodes from redis / trebuchet for elasticsearch/plugins
  • 13:48 gehel: deploying latest elasticsearch plugin (ltr plugin)
  • 13:48 moritzm: fixing salt minion setup on wtp1047
  • 13:44 hashar: European SWAT completed
  • 13:44 aude@tin: Synchronized wmf-config/Wikibase.php: Remove old constraints section config (duration: 00m 41s)
  • 13:42 aude@tin: Synchronized wmf-config/Wikibase-production.php: Add constraints section to property pages on test.wikidata (duration: 00m 41s)
  • 13:29 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: [cleanup] remove old interwiki search config (duration: 00m 41s)
  • 13:28 dcausse@tin: Synchronized wmf-config/CirrusSearch-labs.php: [cleanup] remove old interwiki search config (duration: 00m 41s)
  • 13:21 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Enable OOjs UI buttons on EditPage for plwiki - T162849 (duration: 00m 42s)
  • 13:08 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Add sandbox link for dtywiki - T168038 (duration: 00m 42s)
  • 12:54 dcausse: restarting elasticsearch on relforge1* to pickup new snapshot of the ltr plugin
  • 12:36 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=chlorine.eqiad.wmnet
  • 12:04 elukey: run 'echo "autoLearnMode=1" > /tmp/disable_learn && megacli -AdpBbuCmd -SetBbuProperties -f /tmp/disable_learn -a0' on all the analytics workers to disable BBU Auto learn - T167809
  • 11:33 marostegui: Rename user Smuconlaw → Sgconlaw - T168109
  • 11:31 jynus: restarting replication on dbstore1002:s3 and db1015
  • 11:19 moritzm: rebooting cp3007 for kernel update
  • 11:01 _joe_: depooling mw1170-mw1179 for decommissioning, T168271
  • 10:15 godog: roll-upgrade swift to 2.10 on to ms-fe1* - T162609
  • 09:56 akosiaris: migrate neon.eqiad.wmnet to ganeti01.svc.eqiad.wmnet's row_A nodegroup
  • 09:55 dcausse: restarting elasticsearch on relforge1* to pickup new snapshot of the ltr plugin
  • 09:33 jynus: temporarily stop dbstore1002:s3 and db1015 to fix srwiki
  • 09:30 marostegui: Deploy alter table on s2 - dbstore1001 - T166205
  • 09:18 godog: swift eqiad-prod: remove ms-be1001 - ms-be1012 - T166489
  • 09:13 paravoid: rebooting achernar to address CPU throttling and apply the BIOS update
  • 09:11 paravoid: upgrading achernar's BIOS from 1.2.4 to 2.4.2 hoping it will address recurring CPU throttling issue (T162850)
  • 09:07 akosiaris: restart ircecho on einsteinium, was not notifying due to a thrown exception
  • 08:35 marostegui: Drop table title key from s2 - T164949
  • 08:16 marostegui: Drop table titlekey on s6 - T164949
  • 07:59 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool pc2004,5,6 after maintenance (duration: 00m 41s)
  • 07:42 moritzm: restarting app server canaries to pick up gnutls update
  • 07:13 marostegui: Reboot ms-be1010
  • 07:10 marostegui: Deploy alter table s5 - codfw master - db2023 (and will replicate) so this will generate lag on codfw slaves - T166207
  • 07:09 jynus: upgrade, reboot and clear data on pc2006
  • 07:05 jynus: upgrade, reboot and clear data on pc2005
  • 07:03 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool pc2005 & pc2006 (duration: 00m 41s)
  • 06:58 moritzm: installing gnutls security updates
  • 06:38 marostegui: Deploy alter table s2 - labsdb1001 - T166205
  • 06:37 jynus: force learning cycle to db1046 controller T166141
  • 06:23 marostegui: Deploy alter table on s2 - db1021 - T166205
  • 06:18 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1021 - T166205 (duration: 00m 41s)
  • 04:21 reedy@tin: Synchronized composer.lock: update (duration: 00m 41s)
  • 04:20 reedy@tin: Synchronized composer.json: update (duration: 00m 41s)
  • 04:19 reedy@tin: Synchronized multiversion/vendor/: Update! (duration: 01m 05s)
  • 04:05 reedy@tin: Synchronized wmf-config/CommonSettings.php: Fix comments minor code style (duration: 00m 42s)
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jun 19 02:26:06 UTC 2017 (duration 6m 8s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 04s)

2017-06-18

  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Jun 18 02:25:55 UTC 2017 (duration 6m 8s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 27s)

2017-06-17

  • 19:30 ebernhardson: restarting elasticsearch on relforge to pick up new vrsion of ltr-query
  • 16:51 volans: restarted pdfrender on scb200[2,4] T159922
  • 15:26 jynus: rebuild pc2004's (depooled) data from scratch
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Jun 17 02:29:51 UTC 2017 (duration 6m 8s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 09s)

2017-06-16

  • 19:54 Reedy: disabled cluster 2fa for Chrissymad for T168064 (confirmed by email)
  • 19:26 ejegg: re-enabled paypal audit download and parse job
  • 19:13 ebernhardson: restarting elasticesarch on relforge to pick up new ltr-query plugin version
  • 18:14 mutante: ms-be1001: did not change config, tried again, now detected 13 drives again, coming back
  • 18:10 mutante: ms-be1001 - The following VDs are missing: 09
  • 18:08 mutante: ms-be1001 - powercycling crashed server - "[14076481.245487] general protection fault: 0000 [#4] SMP
  • 13:36 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove comments from db1018 current status - T166205 (duration: 00m 41s)
  • 13:26 twentyafterfour: fixed phabricator "upgrade database" error.
  • 13:20 twentyafterfour: fixing phab database migrations
  • 13:01 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091 after performance testing (duration: 00m 41s)
  • 10:18 jynus: running analyze on db1091 (depooled), may create lag
  • 10:11 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 for performance testing (duration: 00m 42s)
  • 09:52 moritzm: installing guile security updates
  • 09:13 moritzm: re-enabled puppet on mw2129 (no reason was given why it was disabled(
  • 08:50 jynus: bringing down pc1005 and pc1006 for maintenance T167567
  • 08:40 jynus@tin: Synchronized wmf-config/db-codfw.php: Add db1099 and db1001 hosts to config (duration: 00m 41s)
  • 08:23 jynus@tin: Synchronized wmf-config/db-eqiad.php: Switchover pc1005 and pc1006 to db1099 and db1001 (duration: 00m 45s)
  • 08:20 jynus: about to swithover pc1005 and pc1006 to db1099 and db1001
  • 05:45 ebernhardson: increase enwiki_content replicas on codfw from 2 to 3 to match eqiad
  • 02:37 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Jun 16 02:37:05 UTC 2017 (duration 6m 25s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 02s)

2017-06-15

  • 23:37 mutante: added stretch support for jenkins (https://gerrit.wikimedia.org/r/#/c/359227/, https://gerrit.wikimedia.org/r/#/c/359356/) | 'reprepro copy stretch-wikimedia jessie-wikimedia jenkins' to make .deb available on stretch | releases1001 now running jenkins , icinga recovered | (hashar) (T164030)
  • 23:30 mutante: APT - reprepro copy strech-wikimedia jessie-wikimedia jenkins (copy existing jenkins package to stretch, it can be used on both)
  • 23:18 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: T166408: Remove dead config variable MinervaPrintStyles (duration: 00m 41s)
  • 23:15 ebernhardson@tin: Finished scap: wmf-config Scap: T162276: Enable crossproject search (duration: 03m 37s)
  • 23:11 ebernhardson@tin: Started scap: wmf-config Scap: T162276: Enable crossproject search
  • 23:10 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: Scap: T162276: Enable crossproject search (duration: 00m 51s)
  • 22:59 mutante: mw2251 - repooled
  • 22:56 mutante: mw2251 - scap pull
  • 22:53 ebernhardson: restarting elasticsearch on relforge to pickup new ltr-query plugin
  • 22:30 ejegg: updated DjangoBannerStats from 9e6b117 to 5963e7c
  • 22:02 volans: restarted pdfrender on scb1001 T159922
  • 21:45 mutante: powercycling mw2251 (frozen console)
  • 21:39 volans@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2251.codfw.wmnet
  • 21:37 volans: re-enabled puppet and force run to re-enable ircecho on einstenium
  • 21:29 demon@tin: Finished scap: Removing Cards extension (duration: 21m 49s)
  • 21:08 demon@tin: Started scap: Removing Cards extension
  • 20:57 mutante: upgrading RT (request tracker)
  • 19:35 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.5
  • 19:22 ladsgroup@tin: Finished deploy [ores/deploy@ab88a74]: Deploying gerrit:359224/1 for missing config variables (duration: 24m 15s)
  • 19:17 XioNoX: Re-enabled link between cr2-codfw and cr1-eqdfw - T167261
  • 18:58 ladsgroup@tin: Started deploy [ores/deploy@ab88a74]: Deploying gerrit:359224/1 for missing config variables
  • 18:44 paravoid: restarting all puppetmasters
  • 18:40 paravoid: temporarily stopping icinga-wm
  • 18:27 demon@tin: Synchronized wmf-config/CirrusSearch-common.php: Remove quirks and enable token_count_router thingie (duration: 00m 44s)
  • 18:16 demon@tin: Synchronized php-1.30.0-wmf.5/includes/libs/objectcache/MultiWriteBagOStuff.php: T167465 (duration: 00m 44s)
  • 18:14 demon@tin: Synchronized wmf-config/InitialiseSettings.php: T167617 (duration: 00m 44s)
  • 18:12 demon@tin: Synchronized wmf-config/FeaturedFeedsWMF.php: T167617 (duration: 00m 44s)
  • 17:50 mutante: install2002 - re-enabled puppet, reverted live hack, back to normal (issue seems to be NIC or other)
  • 17:28 mutante: install2002 - temp disabling puppet and applying hot fix to debug install issue for papaul
  • 17:27 bblack: disabling puppet on cp*wmnet to avoid puppet races on https://gerrit.wikimedia.org/r/#/c/341729 merge
  • 14:39 gehel: killing stuck replication on maps1001
  • 14:38 krinkle@tin: Synchronized wmf-config/CommonSettings.php: no-op Ifc7b1ea80 - Remove EtcdConfig from beta (duration: 00m 45s)
  • 13:24 gehel: elasticsearch upgrade to 5.3.2 on relforge cluster completed, cluster still recovering - T163708
  • 13:23 aude@tin: Synchronized wmf-config/Wikibase.php: Add constraints statements section on Wikidata T167126 (duration: 00m 43s)
  • 13:19 dcausse: [cirrus] reindexing all zh wikis (eqiad & codfw)
  • 13:14 aude@tin: Synchronized wmf-config/InitialiseSettings.php: Enable BM25 for Chinese wikis (duration: 00m 44s)
  • 13:13 aude@tin: Synchronized tests/cirrusTest.php: (no justification provided) (duration: 00m 45s)
  • 13:02 gehel: starting elasticsearch upgrade to 5.3.2 on relforge cluster - T163708
  • 12:14 gehel: restart elasticsearch on relforge1001 to validate latest config changes
  • 10:16 moritzm: rollout remaining systemd updates from jessie point release
  • 09:14 jynus: shutting down and deleting data at pc1004 for cloning from db1096
  • 09:10 hashar: Jenkins back up and happy.
  • 09:05 moritzm: reenable puppet on notebook1002, was disabled for the merge of the zookeeper role refactor two days ago, can be re-enabled now
  • 09:04 hashar: Restarting Jenkins. It seems I managed to deadlock it
  • 08:52 ariel@tin: Finished deploy [dumps/dumps@1734c6d]: history dump rebalance script, fixup for extension script dumps, root logger for misc dumps (duration: 00m 02s)
  • 08:52 ariel@tin: Started deploy [dumps/dumps@1734c6d]: history dump rebalance script, fixup for extension script dumps, root logger for misc dumps
  • 08:40 gehel: restart relforge1001 to validate latest config changes
  • 08:16 akosiaris@tin: Finished deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec (duration: 07m 44s)
  • 08:09 akosiaris@tin: Started deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec
  • 08:02 moritzm: updating HHVM on terbium/wasat to 3.18
  • 07:57 akosiaris@tin: Finished deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec (duration: 00m 38s)
  • 07:57 akosiaris@tin: Started deploy [citoid/deploy@ba0db9c]: Remove the bad PMCID test from spec
  • 07:48 akosiaris: schedule 2 hours downtime for all citoid endpoints health on scb boxes
  • 06:08 marostegui: Deploy alter table s2 - labsdb1003 - T166205
  • 05:50 marostegui: Deploy alter table s2 - db1018 - T166205
  • 05:49 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments to db1018 current status - T166205 (duration: 00m 43s)
  • 05:41 marostegui: Deploy alter table s4 - dbstore1001 - T166206
  • 05:22 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1036 - T166205 (duration: 00m 44s)
  • 02:50 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 15 02:50:16 UTC 2017 (duration 6m 48s)
  • 02:43 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 34s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 09m 15s)
  • 01:17 mutante: releases1001 - reinstalling with stretch
  • 00:15 mutante: dumpsdata1001 - was reported in icinga as CRIT systemdstate - reason was puppet service was failed with "Invalid value '"no"' for boolean parameter: daemonize" (it was ok on other hosts??). commented the option, stopped puppet, systemctl reset-failed - which made it recover (T165368)
  • 00:02 twentyafterfour: Deploying phabricator update (tagged release/2017-06-14/1) details: https://phabricator.wikimedia.org/project/view/2831/

2017-06-14

  • 23:55 mutante: mwreleases: revoke puppet cert, delete salt key, remove from icinga. releases1001 still syncing disks for a while (50m), being created... T164030
  • 23:49 mutante: ganeti: removed instance mwreleases1001, created new instance releases1001 with same parameters (2 VCPUS,4G memory, 1 x 128G disk) (T164030)
  • 23:41 mutante: mwreleases1001 - scheduled downtime, shutdown, kill VM, re-install as releases1001 (T164030)
  • 23:33 catrope@tin: Synchronized php-1.30.0-wmf.5/includes/: Unbreak watchlist highlighting T167922 (duration: 01m 30s)
  • 23:30 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Send search traffic back to eqiad T149006 (duration: 00m 44s)
  • 23:23 catrope@tin: Synchronized wmf-config/: ORES config cleanups (duration: 00m 46s)
  • 22:43 reedy@tin: Synchronized php-1.30.0-wmf.5/extensions/WikimediaMaintenance/addWiki.php: Remove accountaudit (duration: 00m 44s)
  • 22:33 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: meta namespace talk for atjwiki (duration: 00m 44s)
  • 21:36 reedy@tin: Synchronized wmf-config/interwiki.php: Update interwiki map for atjwiki T167714 (duration: 00m 44s)
  • 21:29 reedy@tin: Synchronized langlist: Add atj T167714 (duration: 00m 43s)
  • 21:29 reedy@tin: Synchronized static/images/project-logos/: atjwiki T167714 (duration: 00m 43s)
  • 21:27 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: atjwiki T167714 (duration: 00m 43s)
  • 21:26 reedy@tin: rebuilt wikiversions.php and synchronized wikiversions files: Add atjwiki T167714
  • 21:25 reedy@tin: Synchronized dblists/: add atjwiki T167714 (duration: 00m 42s)
  • 21:22 reedy@tin: Synchronized php-1.30.0-wmf.4/extensions/WikimediaMaintenance/addWiki.php: Remove accountaudit (duration: 00m 44s)
  • 21:15 reedy@terbium: scap aborted: (no justification provided) (duration: 00m 01s)
  • 21:15 reedy@terbium: Started scap: (no justification provided)
  • 20:06 reedy@tin: Synchronized wmf-config/CommonSettings-labs.php: noop (duration: 00m 43s)
  • 20:05 reedy@tin: Synchronized wmf-config/CommonSettings.php: CollaborationKit loader code (duration: 00m 43s)
  • 20:03 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Add CollaborationKit to testwiki (duration: 00m 44s)
  • 19:47 demon@tin: Synchronized wmf-config/CommonSettings-labs.php: no-op (duration: 00m 44s)
  • 19:42 Reedy: running mwscript initSiteStats.php srnwiki --update
  • 19:37 demon@tin: Synchronized wmf-config/extension-list-labs: No-op (duration: 00m 44s)
  • 19:23 demon@tin: Synchronized php: symlink bump (duration: 00m 43s)
  • 19:17 bblack: restart varnish backend on cp1074
  • 19:08 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.5
  • 18:50 otto@tin: Finished deploy [eventlogging/analytics@1ce446d]: (no justification provided) (duration: 00m 04s)
  • 18:49 otto@tin: Started deploy [eventlogging/analytics@1ce446d]: (no justification provided)
  • 18:34 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/358007/ Add wmgBabelMainCategory for many languages (duration: 00m 43s)
  • 18:32 niharika29@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:25 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Sort wmgBabelMainCategory alphabetically https://gerrit.wikimedia.org/r/#/c/358006/ (duration: 00m 44s)
  • 18:24 jynus: reimporting data from pc1004 to db1096
  • 18:17 niharika29@tin: Synchronized tests/cirrusTest.php: https://gerrit.wikimedia.org/r/#/c/358625/ Test elastic2020 does not fall out of cluster (duration: 00m 43s)
  • 18:13 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/358625/ Test elastic2020 does not fall out of cluster (duration: 00m 44s)
  • 18:06 moritzm: installing unzip security updates
  • 17:55 moritzm: restarting hhvm on mw1261-mw1265 to pick up libxslt update
  • 17:49 moritzm: installing mongodb update from jessie point release on tungsten
  • 16:03 godog: point varnish upload in esams back to eqiad
  • 16:00 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 51s)
  • 15:55 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 15:44 godog: point varnish upload back to swift eqiad
  • 15:14 ema: restart varnish-backend on cp2017
  • 15:08 moritzm: installing systemd bugfix updates from jessie point update
  • 15:00 ema: restart varnish-backend on cp2014
  • 13:50 zeljkof: eu swat finished
  • 13:42 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove ContentTranslationTargetNamespace config (T167865) (duration: 00m 43s)
  • 13:41 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Remove unneeded ContentTranslationTargetNamespace (T167865) (duration: 00m 44s)
  • 13:35 zfilipin@tin: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 12:24 jynus@tin: Synchronized wmf-config/db-codfw.php: Switchover pc2004 to db2072 (duration: 00m 43s)
  • 12:13 akosiaris: upload apertium-spa-ita_0.2.0~r78826-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 12:13 akosiaris: upload apertium-fra-cat_1.2.0~r78602-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 11:41 jynus@tin: Synchronized wmf-config/db-eqiad.php: Switchover pc1004 to db1096 (duration: 00m 54s)
  • 11:34 jynus: about to deploy performance-impacting change on the parsercache persistent storage T167567
  • 11:19 marostegui: Deploy alter table s4 - labsdb1011 - T166206
  • 09:46 marostegui: Rename table titlekey before dropping it on enwiki - db1089 - T164949
  • 09:18 godog: delete files older than 365d from 'servers' graphite hierarchy
  • 07:59 marostegui: Drop table updates on s3 - T139342
  • 07:32 moritzm: installing zziplib security updates on jessie
  • 07:04 elukey: restart pdfrender on scb200[2,4] (xpra race condition)
  • 07:03 elukey: restart pdfrender on scb1004 (xpra race condition)
  • 06:32 moritzm: installing remaining libtasn security updates
  • 03:14 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 14 03:14:28 UTC 2017 (duration 6m 56s)
  • 03:07 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.5) (duration: 14m 52s)
  • 02:32 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 58s)
  • 01:48 mutante: netmon1002 - chown rancid:rancid /var/lib/rancid ; touch /var/lib/rancid/.gitconfig, let rancid write to config, then git config --global user.email and user.name as the rancid user | fix permissions on .git/objects files, let rancid user own them all | re-commit .gitingore change | SSH_AUTH_SOCK=/run/keyholder/proxy.sock /usr/lib/rancid/bin/rancid-run as user "rancid" runs clean,
  • 01:20 mutante: netmon1002 - copied missing router.db, routers.all/.down/.up over from netmon1001 to /var/lib/rancid/core. routers.db is an untracked file, the others are in .gitignore. this is all like on netmon1001 as well. adding routers.db to .gitignore file on both, like the other router* files already were (T159756)
  • 01:00 mutante: netmon1002 - locally "git clone /var/lib/rancid/GIT/core" into /var/lib/rancid (i rsynced that but it's a bare repository without a work tree. work tree is /var/lib/rancid/core (after this) (T159756)
  • 00:44 mutante: naos: disarm keyholder and armed it again to proof i didn't break anything on jessie by fixing keyholder on stretch with gerrit:358884
  • 00:39 demon@tin: Synchronized wmf-config/CommonSettings.php: extdist update (duration: 00m 44s)
  • 00:09 aaron@tin: Synchronized wmf-config/InitialiseSettings.php: Capture messages on 'autoloader' debug log channel (duration: 00m 44s)

2017-06-13

  • 23:29 RainbowSprinkles: gerrit: upgrading on master 2.13.4-13-gc0c5cc4742 -> 2.13.8-1-g7c438d37a2 (been running on slave for a week)
  • 23:13 mutante: contint1001 - started zuul using the old init script
  • 23:05 mutante: netmon1001/1002: rsynced /var/lib/rancid/CVS and /var/lib/rancid/GIT from 1001 to 1002 for rancid migration (T159756)
  • 23:04 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/OpenStackManager: Re-adding deleted special page (duration: 00m 45s)
  • 22:06 ejegg: updated fundraising tools from f2522cd to 585f546
  • 21:59 gwicke: restarted pdfrender on scb1003; was spinning on CPU & using 15G of memory (!)
  • 21:58 gwicke: restarted pdfrender on scb1002 and scb1004; was spinning on CPU
  • 21:56 hashar: Zuul back, running in an interactive terminal.
  • 21:46 mutante: netmon1002 - was able to "keyholder arm" after stretch install after applying https://gerrit.wikimedia.org/r/358884 as hotfix
  • 21:30 mobrovac@tin: Finished deploy [restbase/deploy@9a86d4c]: (no justification provided) (duration: 01m 06s)
  • 21:29 mobrovac@tin: Started deploy [restbase/deploy@9a86d4c]: (no justification provided)
  • 21:13 hashar: Gracefully restarting Zuul
  • 21:11 ppchelko@tin: Finished deploy [changeprop/deploy@4ba3c59]: Rate-limiter enhancements (duration: 01m 08s)
  • 21:10 ppchelko@tin: Started deploy [changeprop/deploy@4ba3c59]: Rate-limiter enhancements
  • 21:02 demon@tin: Synchronized php-1.30.0-wmf.5/extensions/CentralAuth/includes/CentralAuthHooks.php: Fix bad method name (duration: 00m 44s)
  • 20:37 hashar: Restarting Nodepool. apparently confused in pool tracking and spawning to many Trusty nodes (7 instead of 4)
  • 20:02 demon@tin: Synchronized php-1.30.0-wmf.5/includes/api/ApiParse.php: T167826 (duration: 00m 44s)
  • 20:00 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 29s)
  • 19:56 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 19:37 Amir1: restarting ores-related services in scb1001 (T167819)
  • 19:24 mutante: scb1001 - killed process 10971 (pdfrendering/electron)
  • 19:24 demon@tin: Synchronized php-1.30.0-wmf.5/extensions/CategoryTree/CategoryPageSubclass.php: Fix up variable visibility (duration: 00m 44s)
  • 19:12 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.5
  • 19:09 mobrovac@tin: Finished deploy [restbase/deploy@9a86d4c]: (no justification provided) (duration: 07m 33s)
  • 19:08 mutante: netmon1002 - reinstallled with stretch, revoked puppet cert, salt key, signing new cert, accepting new key, initial puppet run (T159756)
  • 19:01 mobrovac@tin: Started deploy [restbase/deploy@9a86d4c]: (no justification provided)
  • 18:56 mutante: reinstalling netmon1002 with stretch - scheduled icinga downtime
  • 18:54 legoktm: starting to delete all rows from linter tables on large wikis - T167758
  • 18:48 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 36s)
  • 18:43 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 18:39 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 18:37 mobrovac@tin: Finished deploy [restbase/deploy@4c1cdd0]: (no justification provided) (duration: 04m 19s)
  • 18:33 mobrovac@tin: Started deploy [restbase/deploy@4c1cdd0]: (no justification provided)
  • 18:27 demon@tin: Finished scap: testwiki to wmf.5 + l10n bootstrap (duration: 42m 16s)
  • 17:52 bblack: cp4021 reboot for bnx2x modparam change
  • 17:50 ottomata: merged removal of x_forwarded_for from all varnishkafka webrequest instances
  • 17:45 ladsgroup@tin: Finished deploy [ores/deploy@862aea9]: ORES deploy early June: T167223 (duration: 33m 52s)
  • 17:45 demon@tin: Started scap: testwiki to wmf.5 + l10n bootstrap
  • 17:42 demon@tin: Pruned MediaWiki: 1.30.0-wmf.2 [keeping static files] (duration: 01m 13s)
  • 17:40 demon@tin: Pruned MediaWiki: 1.30.0-wmf.1 [keeping static files] (duration: 05m 10s)
  • 17:39 bblack: restart varnish-be on cp2002 (mailbox lag, likely induced by swift traffic testing in codfw)
  • 17:11 ladsgroup@tin: Started deploy [ores/deploy@862aea9]: ORES deploy early June: T167223
  • 17:06 akosiaris: rebooting sca2003 for tests
  • 16:35 moritzm: upgrading osmium to HHVM 3.18
  • 16:08 moritzm: installing libnl security updates on trusty
  • 15:41 akosiaris: upload apertium-spa_1.0.0~r78827-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 akosiaris: upload apertium-ita_0.9.0~r78828-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 akosiaris: upload apertium-fra_1.1.0~r78695-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 akosiaris: upload apertium-cat_2.1.0~r78615-1+wmf to apt.wikimedia.org/jessie-wikimedia/main
  • 15:41 gehel: restart of relforge1001 to test https://gerrit.wikimedia.org/r/#/c/358353/
  • 15:09 gehel: applying new GC configuration on elastic1018 - T167636
  • 14:53 godog: update inter-routing for upload to point esams to codfw
  • 14:22 gehel: restarting elasticsearch on relforge to validate GC configuration - T167636
  • 14:17 ottomata: stopping puppet on cp1045, testing removal of xff from varnishkafka webrequest data
  • 14:14 godog: point upload varnish to swift in codfw - T162609
  • 14:11 moritzm: upgrading mw1299-mw1306 to HHVM 3.18
  • 14:10 urandom: T164865: Restart RESTBase dev; apply range delete probability of 1.0
  • 13:30 godog: Thumbor to group1 wikis + mediawiki.org - T167793
  • 13:15 hashar: European SWAT completed
  • 13:13 hashar@tin: Synchronized php-1.30.0-wmf.4/extensions/Popups: actions/rest: Use DB-key version of title - T167633 (duration: 00m 41s)
  • 13:08 hashar@tin: Synchronized php-1.30.0-wmf.4/includes/htmlform/OOUIHTMLForm.php: Do not try to parse empty argument in getErrorsOrWarnings in OOUI - T167644 (duration: 00m 41s)
  • 13:04 hashar@tin: Synchronized wmf-config/Wikibase-production.php: Enable Wikidata echo notifications for all wikis (except enwiki, frwiki, dewiki) - T142102 (duration: 00m 42s)
  • 12:44 marostegui: Deploy alter table on s2 on db1036 - T166205
  • 12:39 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T166205 (duration: 00m 41s)
  • 12:12 marostegui: Deploy alter table on s2 on dbstore1002 - T166205
  • 12:11 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1064 - T166206 (duration: 00m 51s)
  • 11:56 godog: enable thumbor serving for group0 wikis with media files - T167782
  • 11:41 moritzm: upgrading HHVM on tin/naos to HHVM 3.18
  • 10:59 moritzm: upgrading mw1283-mw1290 to HHVM 3.18
  • 10:21 godog: reenable thumbor swift storage, same paths as mediawiki - T167783
  • 10:11 elukey: completed rollout of https://gerrit.wikimedia.org/r/354449
  • 09:54 moritzm: upgrading mw2248-mw2250 to HHVM 3.18
  • 09:37 godog: disable thumbor shadow requests, enable thumbor-only serving for testwiki - T167490
  • 09:28 moritzm: upgrading mw1276-mw1282 to HHVM 3.18
  • 09:27 elukey: puppet disabled on kafka*, analytics*, druid*, conf* for https://gerrit.wikimedia.org/r/354449 - incremental rollout
  • 09:13 marostegui: Deploy alter table s4 - db1095 - T166206
  • 08:56 moritzm: upgrading mw1165-mw1167 to HHVM 3.18
  • 08:42 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 - T166205 (duration: 00m 41s)
  • 08:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1089 original weight - T166935 (duration: 00m 42s)
  • 08:21 gehel: restart OSM synchronisation on maps2001
  • 08:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight (duration: 00m 42s)
  • 08:05 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic2020.codfw.wmnet
  • 08:01 gehel: adding elastic2020 back in the elasticsearch cluster - T149006
  • 07:48 marostegui: Drop table updates on enwiki (s1) - T139342
  • 07:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight (duration: 00m 41s)
  • 07:30 moritzm: restarting HHVM on mw canaries to pick up libtasn update
  • 07:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 with less weight (duration: 00m 41s)
  • 07:12 marostegui: Reboot scb2005 - T167638
  • 06:55 elukey: executed "cumin 'mw2*.codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;'" to fix the last occurences of wrong root:adm owned hhvm error logs
  • 06:51 moritzm: installing libtasn security updates
  • 06:43 marostegui: Stop MySQL on db1089 to upgrade its raid controller firmware - T166935
  • 06:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T166935 (duration: 00m 42s)
  • 02:33 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jun 13 02:33:23 UTC 2017 (duration 6m 12s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 08m 00s)
  • 01:29 papaul: OS install on labtestnet2002
  • 00:40 andyrussg@tin: Finished scap: Update CentralNotice (duration: 20m 51s)
  • 00:19 andyrussg@tin: Started scap: Update CentralNotice

2017-06-12

  • 23:22 mutante: netmon1002 - keyholder arm - loaded rancid deploy key (uses separate passphrase from deployment key)
  • 22:01 mutante: netmon1002 - apt-get -t jessie-backports install rancid (upgrade from 2.3.8 to 3.6.2 to match version on netmon1001) - rancid version is not specified in puppet so even though backports gets enabled the older version gets installed and this manual step is needed unless we start specifying the version in the manifest (T159756)
  • 20:30 mutante: ns0, ns1 - same as before - gen zones, check zones, reload zones, to add "atj.wikipedia.org" (T167714)
  • 20:26 mutante: ns2 - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones to add new Wikipedia language "atj" (needed when editing langlist but not touching templates) (T167714)
  • 19:10 thcipriani@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Wikipedia workshop (14 June 2017) T167011 + Fix throttle rule for Scotland university editathon (duration: 00m 41s)
  • 18:46 thcipriani@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Editathon (13 June 2017) T167517 (duration: 00m 41s)
  • 18:40 thcipriani@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Wikipedia Editathon (June 16th 2017) T167201 (duration: 00m 41s)
  • 18:30 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add NS:100 to wgNamespacesToBeSearchedDefault for enwikisource T167511 (duration: 00m 41s)
  • 18:27 thcipriani@tin: Synchronized php-1.30.0-wmf.4/resources/src/mediawiki.rcfilters/mw.rcfilters.Controller.js: SWAT: RCFilters: Retain extra url params when comparing url equivalency T167551 (duration: 00m 41s)
  • 18:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Setup the new wgPopupsGateway config variable. NOOP T165018 (duration: 00m 42s)
  • 17:24 joal@tin: Finished deploy [analytics/refinery@08fe129]: Bug correction on regular weekly deploy of refinery (2) (duration: 03m 00s)
  • 17:24 gehel: running stress + bonnie on elastic2020 to check new hardware - T149006
  • 17:21 joal@tin: Started deploy [analytics/refinery@08fe129]: Bug correction on regular weekly deploy of refinery (2)
  • 17:07 gehel@tin: Finished deploy [wdqs/wdqs@84557b8]: (no justification provided) (duration: 02m 32s)
  • 17:05 gehel@tin: Started deploy [wdqs/wdqs@84557b8]: (no justification provided)
  • 16:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 with less weight (duration: 00m 41s)
  • 14:32 gehel: restart elasticsearch on relforge1001 to validate GC configuration
  • 14:14 moritzm: updating tor on radium to 0.2.9.11-1~d80.jessie+1
  • 14:14 hashar: European SWAT completed
  • 14:13 hashar@tin: Synchronized static/images/project-logos/: Update logo for the Norwegian Wikisource - T167192 (duration: 00m 41s)
  • 14:12 hashar@tin: Synchronized static/images/: Delete duplicate HD logos for the Punjabi Wikipedia (duration: 00m 41s)
  • 14:04 moritzm: updating tor in jessie-wikimedia to 0.2.9.11-1~d80.jessie+1 (via reprepro update from tor repository)
  • 13:59 moritzm: upgrading mw1296-mw1298 to HHVM 3.18
  • 13:53 marostegui: Shutdown db1089 for maintenance - T166935
  • 13:48 hashar@tin: Synchronized php-1.30.0-wmf.4/includes/specials/SpecialNewimages.php: SpecialNewimages: Do not add the module when the special page is included - T167601 (duration: 00m 41s)
  • 13:40 hashar: redoing all the fawiki* updateCollation.php since I ran them without deploying the IS.php change :(
  • 13:38 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Change Persian Wikis from uca-fa to xx-uca-fa - T139110 (duration: 00m 41s)
  • 13:35 moritzm: uploaded openssl 1.1.0f to apt.wikimedia.org
  • 13:31 joal@tin: Finished deploy [analytics/refinery@0dda4a9]: Bug correction for egular weekly deploy of refinery (duration: 03m 40s)
  • 13:30 aharoni: running mwscript updateCollation.php --wiki=bawikibooks
  • 13:28 joal@tin: Started deploy [analytics/refinery@0dda4a9]: Bug correction for egular weekly deploy of refinery
  • 13:25 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikiquote --previous-collation=uca-fa
  • 13:24 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikinews --previous-collation=uca-fa
  • 13:24 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikibooks --previous-collation=uca-fa
  • 13:24 aharoni: running mwscript updateCollation.php --wiki=bawiki
  • 13:23 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawiktionary --previous-collation=uca-fa
  • 13:22 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawikisource --previous-collation=uca-fa
  • 13:21 hashar: terbium: for T139110 mwscript updateCollation.php --wiki=fawiki --previous-collation=uca-fa
  • 13:17 moritzm: upgrading cp1008 to openssl 1.1.0f
  • 13:13 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Set collation for Bashkir wikis to uppercase-ba - T162823 (duration: 00m 41s)
  • 13:10 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: update some logos 6974b9ab4..76939d15f (duration: 00m 41s)
  • 13:08 hashar@tin: Synchronized static/images/project-logos: (no justification provided) (duration: 00m 43s)
  • 12:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance - T166935 (duration: 00m 41s)
  • 12:01 moritzm: upgrading mw1266-mw1275 to HHVM 3.18
  • 11:09 joal@tin: Finished deploy [analytics/refinery@d9c3419]: Regular weekly deploy of refinery (mostly unique_devices patches) (duration: 06m 18s)
  • 11:05 moritzm: upgrading job runners mw1162-mw1164 to HHVM 3.18
  • 11:03 joal@tin: Started deploy [analytics/refinery@d9c3419]: Regular weekly deploy of refinery (mostly unique_devices patches)
  • 10:59 marostegui: Drop table updates on commonswiki (s4) - T139342
  • 10:28 moritzm: upgrading mw1250-mw1258 to HHVM 3.18
  • 09:55 moritzm: upgrading mw1221-mw1235 to HHVM 3.18
  • 09:25 godog: swift eqiad-prod finish decom ms-be1005/6/7 - T166489
  • 09:13 moritzm: upgrading mw1236-mw1249 to HHVM 3.18
  • 09:12 marostegui: Drop table updates on dewiki and wikidatawiki (s5) - T139342
  • 08:31 godog: reboot ms-be1002, load avg slowly creeping up
  • 08:22 elukey: powercycle scb2005 (console frozen, host unresponsive)
  • 07:40 elukey: restarted citoid on scb1001 (kept failing health checks for Error: write EPIPE)
  • 07:38 marostegui: Reboot ms-be1008 as xfs is failing
  • 07:31 marostegui: Deploy alter table s2 - db1060 - T166205
  • 07:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1060 - T166205 (duration: 00m 41s)
  • 07:26 elukey: ran restart-pdfrender on scb1001 (OOM errors in the dmesg from hours ago)
  • 07:22 elukey: ran restart-pdfrender on scb1002 (OOM errors in the dmesg from hours ago)
  • 07:21 marostegui: Deploy alter table s4 - db1064 - https://phabricator.wikimedia.org/T166206
  • 07:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1064 - T166206 (duration: 00m 41s)
  • 06:53 moritzm: upgrade remaining app servers running HHVM 3.18 to 3.18.2+wmf5
  • 05:38 marostegui: Deploy alter table s4 - labsdb1003 - T166206
  • 02:14 l10nupdate@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)

2017-06-11

  • 14:14 elukey: executed cumin 'mw22[51-60].codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;' to reduce cron-spam (new hosts added in March) - T146464
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Sun Jun 11 02:25:53 UTC 2017 (duration 6m 6s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 37s)

2017-06-10

  • 11:54 andrewbogott: cleared leaked instances out of the nova fullstack test. Six were up and running and reachable, one had a network failure.
  • 10:19 TimStarling: on terbium: running purgeParserCache.php prior to cron job due to observed disk space usage increase
  • 10:00 marostegui: Purge binary logs on pc1006-pc2006
  • 09:58 marostegui: Purge binary logs on pc1004-pc2004 and pc1005-pc2005
  • 02:22 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Jun 10 02:22:22 UTC 2017 (duration 6m 13s)
  • 02:16 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 05m 33s)

2017-06-09

  • 21:18 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35]: (no justification provided) (duration: 01m 40s)
  • 21:17 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35]: (no justification provided)
  • 21:07 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2) (duration: 05m 23s)
  • 21:02 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2)
  • 21:01 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (duration: 04m 57s)
  • 20:56 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045
  • 20:54 mobrovac@tin: Finished deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response (duration: 03m 39s)
  • 20:50 mobrovac@tin: Started deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response
  • 20:12 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/CirrusSearch/includes/Job/DeleteArchive.php: Really fix it this time (duration: 00m 43s)
  • 19:49 mutante: fermium: $ sudo /usr/local/sbin/disable_list wikino-bureaucrats (T166848)
  • 19:46 RainbowSprinkles: mw1299: running scap pull, maybe out of date?
  • 18:12 gehel: retry allocation of failed shards on elasticsearch eqiad
  • 15:47 _joe_: installed python-service-checker 0.1.3 on einsteinium,tegmen T167048
  • 15:44 _joe_: uploaded service-checker 0.1.3
  • 15:11 _joe_: upgraded python-service-checker to 0.1.2 on tegmen,einsteinium
  • 13:18 godog: upgrade thumbor to 0.1.40 - T167462
  • 12:36 gehel: reducing high watermark on elasticsearch eqiad to rebalance shards
  • 07:51 elukey: run megacli -LDSetProp -Direct -LALL -aALL on analytics[1058-1068] - T166140
  • 07:40 moritzm: upgrade app servers in codfw running HHVM 3.18 to +wmf5
  • 07:26 elukey: run megacli -LDSetProp ADRA -LALL -aALL on analytics[1058-1068] - T166140
  • 07:15 elukey: deleted /etc/logrotate.d/nova-manage from labtestvirt2003 to reduce cronspam (same solution used in T132422#2679434)
  • 06:58 moritzm: updating mw117* to HHVM 3.18+wmf5
  • 06:41 moritzm: updating mw1161 to HHVM 3.18
  • 05:57 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 - T166206 (duration: 00m 41s)
  • 05:51 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1074 - T166205 (duration: 00m 42s)
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Fri Jun 9 02:25:29 UTC 2017 (duration 6m 27s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 06m 04s)
  • 00:36 ejegg: disabled banner impressions loader
  • 00:15 mutante: mw1275 depooled (T124956)
  • 00:08 ejegg: updated CiviCRM from 5a83ee1 to dfc26f0
  • 00:01 mutante: seeing "php: Lost parent, LightProcess exiting" in syslog on mw1275 today (T124956)

2017-06-08

  • 23:48 mutante: mw1275 - restarted hhvm (php: Lost parent, LightProcess exiting in syslog)
  • 23:37 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: remaining wikis to wmf.4
  • 23:16 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/CirrusSearch/includes/Job/DeleteArchive.php: Fix array access bug (duration: 00m 43s)
  • 23:15 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/GeoData/includes/Searcher.php: Temp hax to point GeoData at codfw DC (duration: 00m 43s)
  • 22:56 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/RevisionSlider/src/RevisionSliderHooks.php: Re-syncing with permanent committed fix (duration: 00m 44s)
  • 22:36 ejegg: updated civicrm from c70ae65 to 5a83ee1
  • 22:29 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/RevisionSlider/src/RevisionSliderHooks.php: Livehack/test (duration: 00m 44s)
  • 22:17 demon@tin: Synchronized php-1.30.0-wmf.4/extensions/MobileFrontend/includes/specials/SpecialMobileDiff.php: (no justification provided) (duration: 00m 44s)
  • 22:15 mobrovac@tin: Finished deploy [changeprop/deploy@836b070]: Rate limiting, attempt #2 (duration: 01m 23s)
  • 22:13 mobrovac@tin: Started deploy [changeprop/deploy@836b070]: Rate limiting, attempt #2
  • 21:56 mobrovac@tin: Finished deploy [changeprop/deploy@dc1948f]: (no justification provided) (duration: 01m 39s)
  • 21:54 mobrovac@tin: Started deploy [changeprop/deploy@dc1948f]: (no justification provided)
  • 21:54 mobrovac@tin: Finished deploy [changeprop/deploy@56f7511]: (no justification provided) (duration: 01m 32s)
  • 21:52 mobrovac@tin: Started deploy [changeprop/deploy@56f7511]: (no justification provided)
  • 21:50 mobrovac@tin: Finished deploy [changeprop/deploy@56f7511]: (no justification provided) (duration: 00m 34s)
  • 21:50 mobrovac@tin: Started deploy [changeprop/deploy@56f7511]: (no justification provided)
  • 21:42 urandom: T160570: Rolling Cassandra restart, restbase-dev
  • 21:35 ppchelko@tin: Finished deploy [changeprop/deploy@56f7511]: Revert previous deploy (duration: 01m 07s)
  • 21:34 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: Revert previous deploy
  • 21:31 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: dc1948f6bc7b1 Revert previous deploy
  • 21:29 ppchelko@tin: Finished deploy [changeprop/deploy@56f7511]: dc1948f6bc7b1 (duration: 00m 16s)
  • 21:29 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: dc1948f6bc7b1
  • 21:24 ppchelko@tin: Finished deploy [changeprop/deploy@56f7511]: Rate limiting code and config. T161710 (duration: 01m 46s)
  • 21:23 ppchelko@tin: Started deploy [changeprop/deploy@56f7511]: Rate limiting code and config. T161710
  • 20:23 RainbowSprinkles: gerrit2001: upgraded to 2.13.8+git1-wmf.5 / 2.13.8-1-g7c438d37a2
  • 20:12 mutante: imported gerrit_2.13.8+git1-wmf.5_amd64 on apt.wikimedia.org (T158946)
  • 19:26 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.4
  • 19:13 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: mw.org -> wmf.4
  • 19:05 demon@tin: Synchronized wmf-config/InitialiseSettings.php: New wordmark for mk/srwiki (duration: 00m 57s)
  • 19:03 demon@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-sr.svg: new wordmark (duration: 00m 46s)
  • 18:59 maxsem@tin: Synchronized php-1.30.0-wmf.4/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/357846/ (duration: 00m 49s)
  • 18:55 maxsem@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 18:49 urandom: Restarting Cassandra, restbase-dev1001-a to test alternative disk access mode
  • 18:42 mutante: built gerrit_2.13.8+git1-wmf.5 on copper (T158946)
  • 18:40 maxsem@tin: Synchronized php-1.30.0-wmf.4/extensions/LoginNotify/: https://gerrit.wikimedia.org/r/#/c/357743/ (duration: 00m 44s)
  • 18:36 maxsem@tin: Synchronized php-1.30.0-wmf.4/includes/EditPage.php: https://gerrit.wikimedia.org/r/#/c/357855/ (duration: 00m 45s)
  • 18:25 maxsem@tin: Synchronized multiversion/submodules.json: https://gerrit.wikimedia.org/r/#/c/352985/3 (duration: 00m 43s)
  • 18:17 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/356881/4 (duration: 00m 44s)
  • 18:09 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/354731/6 (duration: 00m 44s)
  • 17:55 arlolra: Updated Parsoid to 108eed81 (T136653, T167081)
  • 17:46 arlolra@tin: Finished deploy [parsoid/deploy@f82cb4f]: Updating Parsoid to 108eed81 (duration: 10m 12s)
  • 17:36 arlolra@tin: Started deploy [parsoid/deploy@f82cb4f]: Updating Parsoid to 108eed81
  • 16:44 nuria@tin: Finished deploy [analytics/refinery@2fbed63]: (no justification provided) (duration: 04m 08s)
  • 16:40 nuria@tin: Started deploy [analytics/refinery@2fbed63]: (no justification provided)
  • 16:33 godog: delete net.ifnames for ms-be2001 and ms-be2013 - T158429
  • 16:24 bblack: cp1074: varnish-backend-restart for mailbox lag
  • 15:22 moritzm: updating mw1262-mw1265 to HHVM 3.18.2+wmf5
  • 15:11 XioNoX: Upgrading rancid to 3 - T167288
  • 14:56 moritzm: updating mw1261 to HHVM 3.18.2+wmf5
  • 14:54 XioNoX: 2 blackhole IPs pushed to cr* routers
  • 14:02 aude@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Do not enable Wikibase data access yet on beta wiktionary (duration: 00m 43s)
  • 13:47 aude@tin: Synchronized php-1.30.0-wmf.4/extensions/RevisionSlider: Fix fatal error: T167359 (duration: 00m 44s)
  • 13:41 aude@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 13:33 aude@tin: Synchronized php-1.30.0-wmf.4/extensions/Wikidata: Fix warning in date formatting T167360 (duration: 02m 16s)
  • 13:31 XioNoX: blackhole v4 IPs removed from all cr* routers
  • 12:39 moritzm: updating mwdebug* to HHVM 3.18.2+wmf5
  • 12:17 moritzm: uploaded hhvm 3.18.2-dfsg-1+wmf5 to apt.wikimedia.org
  • 12:17 moritzm: updated hhvm 3.18.2-dfsg-1+wmf5 to apt.wikimedia.org
  • 11:41 marostegui: Drop table updates on s7 - T139342
  • 11:41 moritzm: powercycling mw1294, mgmt is unresponsive
  • 09:41 moritzm: updating mysql-connector-java on hadoop cluster
  • 09:05 elukey: upgrade zookeeper packages to 3.4.5+dfsg-2+deb8u2 on conf100[123], conf200[23] and druid100[123]
  • 08:58 godog: swift eqiad-prod eqiad-prod: decom ms-be1005/6/7 - T166489
  • 08:50 TabbyCat: Rename user "Mlpearc" to "FlightTime" on Central Auth is now finished (T166028)
  • 08:36 godog: temporarily stop ircecho on tegmen, puppet spam
  • 08:22 TabbyCat: Starting big global rename as requested in T166028
  • 07:00 marostegui: Drop table updates on s6 - T139342
  • 05:59 _joe_: uploading new service-checker version to reprepro, T167048
  • 05:54 marostegui: Deploy alter table s2 - db1074 - T166205
  • 05:53 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1074 - T166205 (duration: 00m 43s)
  • 05:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1076 - T166205 (duration: 00m 45s)
  • 02:56 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 8 02:56:27 UTC 2017 (duration 6m 26s)
  • 02:50 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 05m 07s)
  • 02:40 twentyafterfour: deploying hotfix for T166958
  • 02:34 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 08m 41s)
  • 01:45 mutante: manually running mediawiki maintenance job "echo_mail_batch" (on terbium as www-data, just like cron). did _NOT_ get denied by DB (T167373)
  • 01:37 maxsem@tin: Synchronized php-1.30.0-wmf.2/extensions/GeoData/includes/Searcher.php: Livehack to stop exceptions (duration: 00m 46s)
  • 00:54 mutante: cp4019 - powercycled (same as others) | lvs1007 - sits at installer - waiting for IP to be configured (T150256)
  • 00:47 mutante: cp1059 - same thing - powercycle after failed boot after reimaging script
  • 00:41 mutante: cp4011 - like cp4010 - powercycling (host down, console sat at initramfs). it hat the "did not detect disk by uid" issue but boots normal after powercycle
  • 00:34 mutante: cp4020 - powercycling (host down, console sat at initramfs)
  • 00:31 mutante: cp2012 - fixed salt key issue as for cp3005 (delete key, stop/start minion, accept new key)
  • 00:25 mutante: salt-master: deleted salt-key for cp3005, stopped started minion cp3005 - key got accepted again (was: Salt Master has rejected this minion's public key)

2017-06-07

  • 23:33 ppchelko@tin: Finished deploy [trending-edits/deploy@e0a8716]: Include reverts from bots to get rid of false positives (duration: 07m 00s)
  • 23:30 catrope@tin: Synchronized php-1.30.0-wmf.4/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.eventLogging/index.js: T167236 (duration: 00m 43s)
  • 23:28 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Relaunch related pages A/B test to 98% of users on enwiki (T167310) (duration: 00m 44s)
  • 23:26 ppchelko@tin: Started deploy [trending-edits/deploy@e0a8716]: Include reverts from bots to get rid of false positives
  • 22:24 bblack: reimaging ex-cache_maps hosts (fresh role::spare::system installs)
  • 22:18 bblack: puppet node clean+deactivate for cp3003
  • 22:15 bblack: lvs4002 - restarting pybal to remove old maps table entries
  • 22:14 bblack: lvs3002 - restarting pybal to remove old maps table entries
  • 22:13 bblack: lvs2002 - restarting pybal to remove old maps table entries
  • 22:13 bblack: lvs1002 - restarting pybal to remove old maps table entries
  • 22:12 bblack: lvs4004 - restarting pybal to remove old maps table entries
  • 22:11 bblack: lvs3004 - restarting pybal to remove old maps table entries
  • 22:09 bblack: lvs2005 - restarting pybal to remove old maps table entries
  • 22:07 bblack: lvs1005 - restarting pybal to remove old maps table entries
  • 21:32 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.2
  • 21:31 twentyafterfour: rolling back to wmf.2 due to error spike and popups no longer working refs T166829
  • 21:25 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.4
  • 21:23 twentyafterfour@tin: Synchronized php-1.30.0-wmf.4/: sync 3248a17 refs T167343 (duration: 07m 52s)
  • 20:26 twentyafterfour@tin: Synchronized php-1.30.0-wmf.4/extensions/MobileFrontend: Deploy 66ef9cb refs T167216 (duration: 00m 46s)
  • 20:04 twentyafterfour: Preparing to deploy the MediaWiki train for group1 wikis, 1.30.0-wmf.4 refs T166829
  • 18:22 thcipriani@tin: Synchronized wmf-config: SWAT: Enable archive indexing on delete for select wikis T162302 (duration: 00m 47s)
  • 18:14 thcipriani@tin: Synchronized portals: SWAT: Updating portals stats T128546 (duration: 00m 44s)
  • 18:13 thcipriani@tin: Synchronized portals/prod/wikipedia.org/assets: SWAT: Updating portals stats T128546 (duration: 00m 44s)
  • 17:14 elukey: restart nutcracker on thumbor1002 (too many connections approaching the 1024 ulimit)
  • 15:37 akosiaris: disable puppet on puppetmaster1001, depool rhodium for tests
  • 14:51 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2007.codfw.wmnet
  • 14:48 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1007.codfw.wmnet
  • 14:11 dcausse: eu swat done
  • 12:56 aude@tin: Synchronized php-1.30.0-wmf.4/extensions/Wikidata: Fix parser function registration T167238 (duration: 02m 20s)
  • 12:43 marostegui: Drop table updates on s2 - T139342
  • 12:40 aude@tin: Synchronized wmf-config/InitialiseSettings-labs.php: Enable Wikibase Client on beta wiktionary sites T158323 (duration: 00m 43s)
  • 12:40 elukey: upgrade zookeeper packages on conf2002 to 3.4.5+dfsg-2+deb8u2
  • 12:32 bblack: cp1072, cp1063 restarting varnish backend for mailbox lag
  • 12:26 aude@tin: Synchronized wmf-config/Wikibase.php: Site links for non-main namespace wiktionary pages T158323 (duration: 00m 43s)
  • 12:19 aude@tin: Synchronized wmf-config/Wikibase-labs.php: Site links for non-main namespace wiktionary pages (duration: 00m 44s)
  • 11:08 gehel: restarting cron on logstash cluster
  • 10:29 moritzm: installing tiff regression security update on trusty
  • 10:26 ema: upgrade lvs1*/lvs2* to jessie 8.8 point release T164703
  • 09:49 ema: upgrade lvs[3001-3004] to jessie 8.8 point release T164703
  • 09:28 gehel: upgrading kibana to v5.3.3 on logstash cluster - T167266
  • 09:15 ema: upgrade lvs4001-4004 to jessie 8.8 point release T164703
  • 08:58 marostegui: Deploy alter table on s2 - db1076 - T166205
  • 08:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1076 - T166205 (duration: 00m 43s)
  • 08:50 marostegui: Deploy alter table s4 - db1056 - T166206
  • 08:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056 - T166206 (duration: 00m 43s)
  • 08:02 marostegui: Run redact_sanitarium on db1095 for dewiki - T153743
  • 07:22 marostegui: Deploy alter table on db1047 enwiki.revision - T162807
  • 06:49 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 - T166206 (duration: 00m 44s)
  • 05:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1053, depool db1056 - T166206 (duration: 01m 03s)
  • 03:11 l10nupdate@tin: ResourceLoader cache refresh completed at Wed Jun 7 03:11:40 UTC 2017 (duration 6m 54s)
  • 03:04 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.4) (duration: 14m 29s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 57s)
  • 00:21 RainbowSprinkles: gerrit: rolled back to 2.13.4-13-gc0c5cc4742 from 2.13.8. T152640 rearing its ugly head again (login issues)

2017-06-06

  • 23:59 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/Flow/includes/Content/BoardContentHandler.php: SWAT: Revert "Throw when unserializing invalid Flow workflow metadata JSON" T166100 T156813 (duration: 00m 43s)
  • 23:58 thcipriani@tin: Synchronized php-1.30.0-wmf.4/extensions/Flow/includes/Content/BoardContentHandler.php: SWAT: Revert "Throw when unserializing invalid Flow workflow metadata JSON" T166100 T156813 (duration: 00m 45s)
  • 23:56 RainbowSprinkles: gerrit: back from reindexing
  • 23:55 RainbowSprinkles: gerrit: force stopping for a second to reindex accounts
  • 23:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable page previews on wikispecies T166894 (duration: 00m 44s)
  • 23:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update ContentNamespaces for Commons Wiki T167077 (duration: 00m 46s)
  • 21:57 RainbowSprinkles: gerrit: restarting last time, didn't work like I wanted
  • 21:53 RainbowSprinkles: gerrit: restarting to test a config tweak
  • 21:41 mutante: contint1001 - graceful'ed Apache to deploy gerrit:351391
  • 21:19 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: unbreak mw.org pref page
  • 20:21 RainbowSprinkles: gerrit: Down for just a moment, finally doing point release on cobalt
  • 19:57 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.4
  • 19:45 demon@tin: Finished scap: testwiki to wmf.4 + prepping l10n. again (x2) (duration: 20m 25s)
  • 19:36 mutante: cobalt - removed systemd unit file (that has issues with ulimit and isn't used yet) - ran "systemctl reset-failed" which cleared the "systemctl status" which made the Icinga check recover
  • 19:24 demon@tin: Started scap: testwiki to wmf.4 + prepping l10n. again (x2)
  • 19:23 demon@tin: scap failed: RuntimeError scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details) (duration: 13m 32s)
  • 19:23 demon@tin: scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)
  • 19:10 demon@tin: Started scap: testwiki to wmf.4 + prepping l10n. again
  • 19:08 demon@tin: Synchronized README: No-op, just forcing co-master sync (duration: 01m 27s)
  • 19:01 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: testwiki back to wmf.2
  • 18:55 maxsem@tin: Finished scap: LoginNotify to testwiki - rebuild messages (duration: 38m 19s)
  • 18:16 maxsem@tin: Started scap: LoginNotify to testwiki - rebuild messages
  • 18:15 maxsem@tin: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/357317/2 (duration: 00m 44s)
  • 18:10 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357317/2 (duration: 00m 44s)
  • 18:03 demon@tin: Finished scap: testwiki to wmf.3, prepping l10n cache (duration: 31m 58s)
  • 17:31 demon@tin: Started scap: testwiki to wmf.3, prepping l10n cache
  • 16:53 moritzm: installing wireshark security updates on trusty (jessie already fixed)
  • 16:41 bblack: rebooted lvs1007 (kernel update)
  • 16:35 bblack: rebooted lvs1007 (kernel update)
  • 15:21 otto@tin: Finished deploy [eventlogging/analytics@37233cd]: (no justification provided) (duration: 00m 04s)
  • 15:21 otto@tin: Started deploy [eventlogging/analytics@37233cd]: (no justification provided)
  • 14:58 moritzm: installing libsndfile security updates on trusty
  • 14:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1094 original weight (duration: 00m 40s)
  • 13:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1094 weight (duration: 00m 40s)
  • 13:39 elukey: shutdown analytics1033 and analytics1039 to replace their BBU - T166140
  • 13:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 with low weight (duration: 00m 40s)
  • 12:58 marostegui: Shutdown db1094 for maintenance - T166518
  • 12:58 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1094 for maintenance - T166518 (duration: 00m 39s)
  • 12:51 godog: upgrade scap to 3.5.8 - T127762
  • 12:41 mobrovac@tin: Finished deploy [changeprop/deploy@e92dd66]: Bump src to bc8abf3 (duration: 01m 45s)
  • 12:40 mobrovac@tin: Started deploy [changeprop/deploy@e92dd66]: Bump src to bc8abf3
  • 12:16 bblack: cp1049 - restaret varnish backend for mailbox lag
  • 12:08 gehel: kill stuck osm replication on maps1001
  • 11:28 akosiaris@tin: Finished deploy [servermon/servermon@4a2288f]: (no justification provided) (duration: 00m 04s)
  • 11:28 akosiaris@tin: Started deploy [servermon/servermon@4a2288f]: (no justification provided)
  • 11:17 moritzm: uploaded ferm 2.3.2+wmf1 to apt.wikimedia.org/stretch-wikimedia (T166653)
  • 11:02 ladsgroup@tin: Synchronized wmf-config/Wikibase-production.php: Enabling writing in full entity id in testwikidatawiki (T165197) (duration: 00m 39s)
  • 10:22 moritzm: installing NSS security updates
  • 09:43 moritzm: installing perl security updates
  • 09:41 akosiaris: stop jobchron/jobrunner processes across jobrunner and videoscalers in codfw
  • 09:35 akosiaris: restart jobchron service across videoscalers T129148
  • 09:33 akosiaris: restart jobchron service across jobrunners T129148
  • 09:32 akosiaris@tin: Finished deploy [jobrunner/jobrunner@161c84c]: (no justification provided) (duration: 01m 17s)
  • 09:31 akosiaris@tin: Started deploy [jobrunner/jobrunner@161c84c]: (no justification provided)
  • 09:29 akosiaris: running puppet on jobrunners T129148
  • 09:25 akosiaris: running puppet on videoscalers T129148
  • 09:25 akosiaris: moving around jobrunner/jobrunner was probably not required T129148
  • 09:19 akosiaris: running puppet again on tin, after moving /serv/deployment/jobrunner/jobrunner T129148
  • 09:12 akosiaris: running puppet on mw1161 T129148
  • 09:11 akosiaris: git pull and scap deploy --init for jobrunner T129148
  • 09:08 akosiaris: running puppet on tin T129148
  • 09:04 akosiaris: disable puppet on all jobrunners T129148
  • 09:04 akosiaris: disable puppet on all jobrunners
  • 08:54 dcausse: restarting elastic2014 to reclaim free space on deleted log file
  • 08:43 jynus: stopping db2035 and preparing for reimage
  • 08:39 gehel: raise log level to WARN for TransportShardBulkAction on elasticsearch cirrus - T167091
  • 07:53 gehel: starting upgrade to elasticsearch 5.3.2 on cirrus eqiad cluster - T163708
  • 06:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add comments about current status of db1089 - T166935 (duration: 00m 39s)
  • 05:56 marostegui: Deploy alter table s3 on db1075 (eqiad master) - T166278
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Tue Jun 6 02:27:37 UTC 2017 (duration 6m 3s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 32s)

2017-06-05

  • 23:33 thcipriani: running on terbium: mwscript extensions/ORES/maintenance/CheckModelVersions.php frwiki && mwscript extensions/ORES/maintenance/PopulateDatabase.php frwiki
  • 23:32 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ORES review tool in frwiki T165044 (duration: 00m 40s)
  • 23:23 thcipriani: frwiki create tables ores_model and ores_classification T165044
  • 22:03 bblack: cp1074 - varnish-backend-restart (mailbox lag)
  • 22:02 bblack: cp1099 - varnish-backend-restart (mailbox lag)
  • 21:34 bawolff: deployed patch for T165846
  • 21:01 reedy@tin: Synchronized wmf-config/CommonSettings.php: Run Pdf Processors in firejails T164145 T164000 (duration: 00m 40s)
  • 20:16 subbu: updated parsoid to 141fc07d (T166655)
  • 20:10 ssastry@tin: Finished deploy [parsoid/deploy@bb0613c]: Updating Parsoid to 141fc07d (duration: 07m 02s)
  • 20:03 ssastry@tin: Started deploy [parsoid/deploy@bb0613c]: Updating Parsoid to 141fc07d
  • 18:52 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357169/2 (duration: 00m 39s)
  • 18:43 MaxSem: ran mwscript maintenance/namespaceDupes.php --wiki=etwiki --fix
  • 18:41 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357025/2 (duration: 00m 39s)
  • 18:36 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/355594/2 (duration: 00m 39s)
  • 18:29 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357186/2 (duration: 00m 42s)
  • 18:25 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357026/2 (duration: 00m 38s)
  • 18:11 maxsem@tin: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/356437/4 (duration: 00m 40s)
  • 16:25 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1089 original weight - T166935 (duration: 00m 38s)
  • 16:19 jynus: stopping db2037 and preparing for reimage
  • 15:16 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight - T166935 (duration: 00m 39s)
  • 15:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1089 weight - T166935 (duration: 00m 38s)
  • 14:45 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight - T166935 (duration: 00m 39s)
  • 13:47 bblack: rebooting lvs1010 again
  • 13:27 zeljkof: eu swat finished
  • 13:16 zfilipin@tin: Synchronized wmf-config/throttle.php: SWAT: Lift IP throttle for Wikimedia Chile editathon (T166788) (duration: 00m 39s)
  • 13:02 bblack: rebooting lsv1010 (post-reinstall)
  • 12:54 marostegui: Stop MySQL db1047 - T166452
  • 09:06 marostegui: Stop replication on db1070 for maintenance - T153743
  • 08:10 godog: swift eqiad-prod decom ms-be1009 / 10 / 11 - T166489
  • 07:43 marostegui: Stop labsdb1011 to take a backup - T153743
  • 07:41 jynus: stopping db2038 mysql and preparing for reimage
  • 07:15 marostegui: Deploy alter table in s2 (codfw master) this will generate lag in codfw - T166205
  • 06:20 marostegui: Deploy alter table s4 - on labsdb1001 - T166206
  • 06:15 marostegui: Deploy alter table on s3 - db1069 - T166278
  • 06:13 marostegui: Deploy alter table on s4 - db1053 - T166206
  • 06:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1053 - T166206 (duration: 00m 39s)
  • 05:58 marostegui: Stop MySQL on db1095 to take a backup - T153743
  • 05:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Add coments to db1089's current status (duration: 00m 39s)
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Mon Jun 5 02:27:53 UTC 2017 (duration 6m 2s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 08m 14s)

2017-06-04

  • 10:31 ema: mw2256 down, console stuck on 'Starti'. power cycled.
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 09m 12s)

2017-06-03

  • 05:20 marostegui: Reboot db1089 - T166933
  • 05:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - it is broken (duration: 00m 41s)
  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Sat Jun 3 02:30:27 UTC 2017 (duration 6m 24s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 47s)
  • 00:04 mutante: wikitech-static-iad: mv /etc/acme/cert/wikitech-static-iad-signed.csr /etc/acme/cert/wikitech-static-iad.chained.crt ; wikitech-static-ord: copy wiki logo: /srv/mediawiki/images# wget https://wikitech-static-iad.wikimedia.org/w/images/labswiki.png

2017-06-02

  • 23:53 demon@tin: Synchronized wmf-config/throttle.php: pruning some old throttle exceptions (duration: 00m 40s)
  • 23:46 mutante: wikitech-static-iad: edited acme_tiny.py to adjust URL to agreement PDF, to fix ""Provided agreement URL [1] does not match current agreement URL[2]"
  • 23:45 mutante: wikitech-static-iad: create new cert for "iad" hostname, using acme-setup/acme-tiny: /usr/local/sbin# acme-setup -i "wikitech-static-iad" -s "wikitech-static-iad.wikimedia.org" ; python acme_tiny.py --account-key /etc/acme/acct/acct.key --csr /etc/acme/csr/wikitech-static-iad.pem --acme-dir /var/acme/challenge/ > /etc/acme/cert/wikitech-static-iad-signed.csr  ; had to hack acme_tiny.py
  • 23:22 mutante: wikitech-static-ord copied Lets-Encrypt intermediate certs from /usr/local/share/ca-certificates on old server
  • 23:19 mutante: wikitech-static (iad): adjust Apache config to use wikitech-static-iad
  • 23:18 mutante: wikitech-static-ord: installed package upgrades, installed vim, removing "ord" from Apache config after DNS change ..
  • 23:14 mutante: maintenance on status.wikimedia.org and wikitech-static.wikimedia.org
  • 20:08 ejegg: re-enabled AstroPay/dLocal payment methods
  • 19:36 ejegg: updated payments-wiki from 5edd788 to 7a50542
  • 19:23 ejegg: updated CiviCRM from 9c06bd2 to c70ae65
  • 18:29 mobrovac@tin: Finished deploy [restbase/deploy@4b14527]: (no justification provided) (duration: 00m 41s)
  • 18:29 mobrovac@tin: Started deploy [restbase/deploy@4b14527]: (no justification provided)
  • 18:28 mobrovac@tin: Started deploy [restbase/deploy@4b14527]: h
  • 17:01 bblack: starting wmf-auto-reimage on lvs1007-10
  • 16:16 RainbowSprinkles: gerrit2001: gerrit updated to 2.13.8+git1-wmf.4
  • 16:03 bblack: start wmf-auto-reimage of lvs1011, lvs1012
  • 15:01 jynus: restarting ircecho on tegment
  • 14:32 mobrovac@tin: Finished deploy [restbase/deploy@4b14527]: Add the extract_html property to the summary end point for T165017 (duration: 06m 43s)
  • 14:25 mobrovac@tin: Started deploy [restbase/deploy@4b14527]: Add the extract_html property to the summary end point for T165017
  • 13:28 gehel: restart elastic2003 to reload logging configuration
  • 12:11 hashar: restarting Jenkins to upgrade the logstash plugin
  • 09:49 jynus: stopping db2041 to prepare it for reimage
  • 09:18 marostegui: Deploy alter table s3 - db1015 - T166278
  • 09:12 marostegui: Deploy alter table s3 - labsdb1003 - T166278
  • 07:47 marostegui: Resume alter table on db1047 enwiki.revision - T166452
  • 07:45 moritzm: uploaded gerrit 2.13.8+git1-wmf4 to apt.wikimedia.org
  • 07:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1059 - T166206 (duration: 00m 39s)
  • 07:36 marostegui: Deploy alter table on s4 - labsdb1009 - T166206
  • 07:02 akosiaris: starting fleet wide PCC for gerrit change 356030. Should take a while to complete
  • 05:25 jynus@tin: Synchronized wmf-config/db-eqiad.php: Emergency pool of db1049 (duration: 00m 48s)
  • 04:42 elukey: removed some old scap revs for the Analytics refinery on stat1002 to free space (git fat jars replicating after each deployment, known issue)
  • 02:46 bd808: Loadavg on mw1198 very high (44+) and nginx/hhvm checks flapping

2017-06-01

  • 23:33 twentyafterfour: phabricator upgrade complete.
  • 23:29 twentyafterfour: Performing phabricator update, expect momentary downtime.
  • 23:25 twentyafterfour: Preparing phabricator update to tag release/2017-06-01/1 [ https://phabricator.wikimedia.org/project/view/2802/ ]
  • 23:20 ebernhardson@tin: Synchronized wmf-config/CirrusSearch-common.php: T163463: apply sister search restrictions requested by enwiki (duration: 00m 39s)
  • 23:18 ebernhardson@tin: Synchronized wmf-config/InitialiseSettings.php: T163463: apply sister search restrictions requested by enwiki (duration: 00m 40s)
  • 21:59 RainbowSprinkles: gerrit2001: Upgraded to 2.13.8, seems to be running fine this time.
  • 20:37 mobrovac@tin: Finished deploy [citoid/deploy@ba0db9c]: Update spec to minimise alert noise - T163986 (duration: 05m 20s)
  • 20:32 mobrovac@tin: Started deploy [citoid/deploy@ba0db9c]: Update spec to minimise alert noise - T163986
  • 20:23 bsitzmann@tin: Finished deploy [mobileapps/deploy@2a8e648]: Update mobileapps to c4dc72d (duration: 05m 18s)
  • 20:18 bsitzmann@tin: Started deploy [mobileapps/deploy@2a8e648]: Update mobileapps to c4dc72d
  • 19:30 mepps: updated SmashPig from 4f84d88 to d4458fa
  • 19:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.2
  • 19:23 gehel@tin: Finished deploy [wdqs/wdqs@3936e36]: (no justification provided) (duration: 01m 20s)
  • 19:22 gehel@tin: Started deploy [wdqs/wdqs@3936e36]: (no justification provided)
  • 19:08 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: Revert "Add RejectParserCacheValue handler for mw-parser-output invalidation" T166345 (duration: 00m 43s)
  • 18:21 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1002.eqiad.wmnet
  • 18:20 gehel: wdqs1002 back in LVS - thermal paste added - T166524
  • 17:42 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1002.eqiad.wmnet
  • 17:41 gehel: shutting down wdqs1002 for maintenance - T166524
  • 17:02 elukey: sto mysql, eventlogging_sync and shutdown db1047 (analytics-store) for maintenance - T159266
  • 16:22 jynus: retrying reimage of db2044
  • 15:03 elukey: restart kafka100[23] for jvm upgrades
  • 14:21 mforns@tin: Finished deploy [analytics/refinery@7540403]: (no justification provided) (duration: 02m 50s)
  • 14:18 mforns@tin: Started deploy [analytics/refinery@7540403]: (no justification provided)
  • 14:00 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2048 after maintenance (duration: 00m 44s)
  • 13:18 marostegui: Deploy alter table s3 revision on labsdb1001 - T166278
  • 13:15 marostegui: Deploy alter table s3 revision on labsdb1011 - T166278
  • 13:11 gilles: restored original configuration on mwdebug1001
  • 11:33 godog: test upgrade of swift 2.10 on ms-fe2005 - T162609
  • 10:24 gilles: Point nutcracker to localhost on mwdebug1001
  • 10:06 godog: run puppet to blacklist acpi_power_meter across the fleet and rmmod the module
  • 09:51 _joe_: refreshing facts on the puppet compiler
  • 08:15 godog: upgrade grafana to 4.3.2 on labmon1001 / krypton
  • 07:49 gilles: editing wikiversions.php manually on mwdebug1001 to point enwiki to wmf.2
  • 06:08 marostegui: Deploy alter table on s3, labsdb1010 - T166278
  • 06:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1035 - T166278 (duration: 00m 57s)
  • 06:04 marostegui: Deploy alter table on s3, db1044 - T166278
  • 06:02 marostegui: Deploy alter table on s3, dbstore1001 - T166278
  • 05:58 elukey: powercycle cp3032 - T166758
  • 05:43 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3032.esams.wmnet
  • 02:52 l10nupdate@tin: ResourceLoader cache refresh completed at Thu Jun 1 02:52:25 UTC 2017 (duration 6m 42s)
  • 02:45 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 02s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 29s)

2017-05-31

  • 23:59 dereckson@tin: Synchronized wmf-config/throttle.php: Add throttule rules for 2017-06-01 Fortaleza event (T166619) (duration: 00m 41s)
  • 23:03 ejegg: disabled d*local payment methods
  • 22:37 ejegg: updated payments-wiki from 4786e7c to 5edd788
  • 22:14 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikipedias back to 1.30.0-wmf.1
  • 21:41 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: touch InitialiseSettings.php (duration: 00m 39s)
  • 21:37 ejegg: reverted payments-wiki to 4786e7c
  • 21:32 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikipedias to 1.30.0-wmf.2
  • 21:29 AaronSchulz: Restored mwdebug1001 to wmf1 with normal nutcracker/memcached and puppet running
  • 21:23 ejegg: updated payments-wiki from 4786e7c to d467d3b
  • 20:17 RainbowSprinkles: gerrit: bringing offline for a few minutes for point release (2.13.4 -> 2.13.8, T158946)
  • 20:15 mobrovac@tin: Finished deploy [citoid/deploy@7d69554]: Relaxing date validation - T132308 (duration: 02m 32s)
  • 20:13 mobrovac@tin: Started deploy [citoid/deploy@7d69554]: Relaxing date validation - T132308
  • 19:31 demon@tin: Synchronized scap/plugins/clean.py: cleanup r us (duration: 00m 42s)
  • 19:13 gehel@tin: Finished deploy [wdqs/wdqs@af495a2]: (no justification provided) (duration: 01m 29s)
  • 19:11 gehel@tin: Started deploy [wdqs/wdqs@af495a2]: (no justification provided)
  • 17:30 godog: swift eqiad-prod decom ms-be100[128] - T166489
  • 16:53 ema: restart varnish-backend on cp1074
  • 16:53 ema: merge cache_maps into cache_upload: finished moving LVS IPs T164608
  • 16:33 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: Index article placeholders up to Q16956 on cywiki (T162244) (duration: 00m 42s)
  • 15:58 hoo: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds
  • 15:31 ema: merge cache_maps into cache_upload: move LVS IPs T164608
  • 14:34 XioNoX: init7 fixed the issue, ping works from the init7 interface, reenabling the BGP session - T166663
  • 14:02 moritzm: upgrading install2002 to reprepro 5.1.1
  • 13:26 hoo@tin: Synchronized wmf-config/Wikibase-production.php: WikibaseClient: Don't persist Statement usages (T151717) (duration: 00m 41s)
  • 13:21 ema: cache_eqiad: upgrade to jessie 8.8 point release T164703
  • 13:20 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: Log "api-readonly" errors (T164191, T123867) (duration: 00m 43s)
  • 13:15 ema: cache_codfw: upgrade to jessie 8.8 point release T164703
  • 13:10 ema: cache_esams: upgrade to jessie 8.8 point release T164703
  • 13:08 marostegui: Stop MySQL on db1048 and shutdown the host for maintenance - T160731
  • 13:08 moritzm: uploaded zookeeper 3.4.5+dfsg-2+deb8u2 to apt.wikimedia.org
  • 12:36 ema: cache_ulsfo: upgrade to jessie 8.8 point release T164703
  • 12:35 marostegui: Deploy alter table on s3 revision table - db1035 - https://phabricator.wikimedia.org/T166278
  • 12:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1077, depool db1035 - T166278 (duration: 00m 41s)
  • 12:22 ema: cp1008: upgrade to jessie 8.8 point release T164703
  • 12:11 XioNoX: Disable v6 BGP session with Init7 in knams because of routing loop on their network
  • 12:04 volans: merged stringify_facts=false for production hosts T166372
  • 10:59 jynus: preparing for backup and reimage to jessie of db2044
  • 10:35 moritzm: updated reprepro on install1002 to 5.1.1 from backports (for support of dbgsym and buildinfo files)
  • 10:29 godog: remove salt-minion salt-common from stretch-wikimedia - T166646
  • 09:30 marostegui: Deploy alter table on s3 revision table - db1078 - https://phabricator.wikimedia.org/T166278
  • 09:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1078, depool db1077 - T166278 (duration: 00m 42s)
  • 09:24 _joe_: etcd in eqiad in read-write mode
  • 09:22 _joe_: started etcd replica eqiad => codfw
  • 09:15 _joe_: etcd replica codfw => eqiad now stopped
  • 09:09 _joe_: etcd in read-only mode for switchover to eqiad
  • 08:27 godog: complete linux 4.9 upgrade on Debian ms-be2* machines
  • 08:24 moritzm: installing imagemagick security updates on trusty (jessie already fixed)
  • 07:47 elukey: restart kafka on kafka10[14,22,20] for jvm upgrades
  • 06:45 moritzm: installing sudo security updates
  • 06:45 marostegui: Deploy alter table s3 revision table - dbstore1002 - T166278
  • 06:31 marostegui: Deploy alter table on s4 - db1059 - T166206
  • 06:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1081, depool db1059 - T166206 (duration: 00m 41s)
  • 06:12 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1078 - T166278 (duration: 00m 43s)
  • 06:04 marostegui: Deploy alter table on s3 revision table - db1078 - T166278
  • 06:04 marostegui: Deploy alter table on s3 revision table - db1095 - T166278
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 45s)

2017-05-30

  • 23:15 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Page images can come outside the lead for all projects except Wikipedia (duration: 00m 41s)
  • 23:09 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Add Wikipedia wordmark in Serbian/Macedonian (duration: 00m 45s)
  • 23:08 demon@tin: Synchronized static/images/mobile/copyright/: Compressed + new images (duration: 00m 42s)
  • 22:43 Reedy: created securepoll_elections.el_owner on testwiki T166568
  • 22:20 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Make Flow default in all namespaces on cawikiquote (T165497) (duration: 00m 43s)
  • 22:20 mutante: Welcome new root shell user herron (T166587)
  • 22:10 RoanKattouw: Running populateContentModel.php on all talk namespaces for all tables on cawikiquote
  • 21:28 RoanKattouw: Running Flow/convertNamespaceFromWikitext.php on all discussion namespaces on cawikiquote (T165497)
  • 21:21 mobrovac@tin: Finished deploy [zotero/translators@f051fe7]: Translators update for T95128 and T166292 (duration: 00m 05s)
  • 21:21 mobrovac@tin: Started deploy [zotero/translators@f051fe7]: Translators update for T95128 and T166292
  • 20:36 AaronSchulz: Set all wikis to wmf.2 via wikiversions.php on mwdebug1001 only; manual nutcracker running a screen to use local memcached for debugging
  • 20:18 mutante: LDAP - added uid=herron to groups "ops" and "wmf" for ops onboarding of Keith (T166587)
  • 20:09 gilles: Restarting nutcracker on mwdebug1001
  • 20:06 gilles: Overwriting nutcracker.yml on mwdebug1001 to point memcache cluster only to memcached on localhost
  • 20:05 gilles: Manually installed memcached on mwdebug1001, running on default port 11211
  • 20:04 gilles: Disabled puppet on mwdebug1001
  • 18:37 urandom: T160570: Upgrading dev env to Cassandra 3.11 (snapshot)
  • 17:55 thcipriani: branching 1.30.0-wmf.3 T165957
  • 17:28 arlolra: Updated Parsoid to d07dfe1a (T161151, T136653)
  • 17:17 arlolra@tin: Finished deploy [parsoid/deploy@744f719]: Updating Parsoid to d07dfe1a (duration: 08m 41s)
  • 17:09 arlolra@tin: Started deploy [parsoid/deploy@744f719]: Updating Parsoid to d07dfe1a
  • 16:40 moritzm: installing shadow regression update
  • 15:33 marostegui: Deploy alter table on s3.revision on labsdb1009 - T166278
  • 15:14 moritzm: installing bash security updates on trusty (jessie already fixed)
  • 15:03 moritzm: installing mysql-connector-java security update on analytics1031
  • 14:53 _joe_: failing citoid over to codfw, T165105
  • 14:48 moritzm: updating mw2140-mw2147, mw2251-mw2253 to HHVM 3.18
  • 14:27 _joe_: restarting squid on aluminium.
  • 13:58 moritzm: updating mw2240-mw2242, mw2254-mw2260 to HHVM 3.18
  • 13:47 aude@tin: Synchronized wmf-config/InitialiseSettings.php: Set wgPageImagesAPIDefaultLicense for wikidata (duration: 00m 41s)
  • 13:44 elukey: restart kafka on kafka1013 for jvm upgrades
  • 13:35 aude@tin: Synchronized wmf-config/Wikibase-production.php: Enable Wikibase echo notifications on Wikipedia, except enwiki, dewiki, frwiki T142102 (duration: 00m 42s)
  • 13:21 elukey: restart kafka on kafka1001 for jvm upgrades
  • 13:14 ema: upgrade prometheus-node-exporter to 0.14.0~git20170523-0 on ubuntu systems
  • 12:43 elukey: restart kafka on kafka200[123] for jvm upgrades (main-codfw, eventbus)
  • 12:10 moritzm: installin jbig2dec security updates
  • 12:07 elukey: restart kafka on kafka1012 for jvm upgrades
  • 12:01 moritzm: installing jbig2dec security updates
  • 11:48 marostegui: Rename update table on enwiki on db1089 host - T139342
  • 11:31 moritzm: installing fop security updates
  • 11:14 godog: upgrade grafana to 4.3.1 on krypton
  • 10:44 gilles: run refreshFileHeaders for group 0 wikis on Terbium
  • 10:32 akosiaris: enable calico IPv6 BGP peering for cr1-eqiad
  • 10:18 jynus: stopping and backing up db2048 in preparation for reimage
  • 09:50 ema: upgrade prometheus-node-exporter to 0.14.0~git20170523-0 on debian systems
  • 09:43 jynus: restarting db2055 for mariadb and kernel upgrade
  • 08:23 elukey: restart jmxtrans on all the kafka brokers (analytics+main-codfw/eqiad) for jvm upgrades
  • 08:17 elukey: restart kafka on kafka1018 for jvm upgrades
  • 07:38 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1002.eqiad.wmnet
  • 07:38 gehel: wdqs1002 back in LVS - T166524
  • 07:09 marostegui: Deploy alter table on enwiki.revision on db1047 - T166452
  • 06:45 marostegui: Deploy alter table on s3 db1038 - T166278
  • 06:41 marostegui: Deploy alter table on s3 dbstore1002 - https://phabricator.wikimedia.org/T166278
  • 06:35 marostegui: Deploy alter table s4 - db1081 - https://phabricator.wikimedia.org/T166206
  • 06:35 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1084, depool db1081 - T166206 (duration: 00m 59s)
  • 06:23 marostegui: Deploy alter table on s3 dbstore2001 - T166278
  • 02:49 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 30 02:49:20 UTC 2017 (duration 6m 44s)
  • 02:42 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 54s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 22s)

2017-05-29

  • 20:04 mobrovac@tin: Started restart [zotero/translation-server@50f216a]: Memory at 50%
  • 19:56 gehel: removing wdqs1002 from LVS pending investigation of T166524
  • 19:55 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1002.eqiad.wmnet
  • 18:57 gehel: restarting wdqs-updater on wdqs1002
  • 17:40 volans: re-enabled puppet on tegmen and re-enabled raid_handler T163998
  • 17:29 volans: disabled puppet on tegmen and disabled raid_handler temporarily T163998
  • 15:02 gehel: restarting wdqs-updater on wdqs1002
  • 14:33 moritzm: rebooting multatuli for systemd modules-load.d debugging
  • 14:24 godog: upgrade prometheus-hhvm-exporter to 0.3-1 in codfw/eqiad with less verbose logging - T158286
  • 14:15 gehel: reset remote for elasticsearch/plugins deployment - T163708
  • 14:14 marostegui: Stop MySQL labsdb1009 to take a backup - T153743
  • 14:04 gehel: starting upgrade to elasticsearch 5.3.2 on cirrus codfw cluster - T163708
  • 14:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2036 - T166278 (duration: 00m 41s)
  • 14:01 marostegui: Deploy alter table s3 on codfw master db2018 - T166278
  • 13:42 moritzm: updating gdb on mw* servers
  • 13:10 marostegui: Stop replication on db1070 to flush tables for export - T153743
  • 13:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 - T153743 (duration: 00m 41s)
  • 13:02 akosiaris: enable puppet across eqiad/esams after puppetmaster upgrade.
  • 12:52 akosiaris: disable puppet across eqiad/esams for puppetmaster upgrade. This should avoid any irc spam about failed puppet agent runs
  • 12:52 akosiaris: enable puppet across codfw/ulsfo after puppetmaster upgrade
  • 12:41 akosiaris: disable puppet across codfw/ulsfo for puppetmaster upgrade. This should avoid any irc spam about failed puppet agent runs
  • 12:36 moritzm: installing imagemagick security updates on jessie
  • 12:31 akosiaris: update kubernetes policy-options on cr{1,2}-{eqiad,codfw}. T165732
  • 10:39 moritzm: installing fop security updates
  • 10:18 ema: upgrade nginx to 1.11.10-1+wmf1 on hassium and hassaleh
  • 09:53 moritzm: upgrade remaining mw* hosts already running HHVM 3.18 to 3.18.2+dfsg-1+wmf4
  • 09:22 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1045 (duration: 00m 41s)
  • 09:01 marostegui: Drop gather tables from: testwiki, test2wiki, enwikivoyage, hewiki, enwiki - T166097
  • 08:02 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Remove db1023 - T166486 (duration: 00m 41s)
  • 08:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Remove db1023 - T166486 (duration: 00m 42s)
  • 07:38 marostegui: Stop MySQL on db1095 to take a backup - this will make labsdb1009,10 and 11 break replication while it is down - T153743
  • 07:01 _joe_: reeanbling scap on mw2140, T166328
  • 06:45 _joe_: restarting changeprop on scb1002, using 15 gigs of RAM
  • 06:42 marostegui: Deploy alter table s3 - dbstore2002 - T166278
  • 06:41 marostegui: Deploy alter table s4 - dbstore1002 - T166206
  • 06:33 _joe_: trying to restart pdfrender on scb1002
  • 06:32 marostegui: Deploy alter table s3 - db2036 - T166278
  • 06:32 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2043, depool db2036 - T166278 (duration: 01m 44s)
  • 06:29 _joe_: powercycling mw1294
  • 06:11 marostegui: Deploy alter table on s4 db1084 - T166206
  • 06:10 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1091, depool db1084 - T166206 (duration: 02m 45s)
  • 06:01 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1091 - T166206 (duration: 03m 01s)
  • 05:54 marostegui: Restart MySQL on db1047 - T166452
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 20s)

2017-05-28

  • 13:19 jynus: restart db1069:3313 mysql instance, stuck on replication
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 46s)

2017-05-27

  • 02:51 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 27 02:51:13 UTC 2017 (duration 6m 49s)
  • 02:44 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 05s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 41s)

2017-05-26

  • 14:29 marostegui: Stop pt-table-checksum on s1 - T162807
  • 14:04 marostegui: Deploy alter table on s3 revision table db2043 - T166278
  • 14:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2050, depool db2043 - T166278 (duration: 00m 41s)
  • 13:57 _joe_: consuming the backlog of htmlCacheUpdate jobs for enwiktionary
  • 13:19 gehel: restart wdqs-updater on all wdqs nodes - T166378
  • 12:55 marostegui: Deploy alter table s4 on db1097 - T166206
  • 12:44 elukey: Restart Hadoop daemons on analytics100[12] (Hadoop master nodes) for jvm upgrades
  • 12:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T166206 (duration: 00m 41s)
  • 10:56 gehel: restart wdqs-updater on all wdqs nodes - T166378
  • 09:30 volans: slowly testing if puppet stringify_facts=false is a noop across the fleet T166372
  • 08:45 volans: killed daemonized puppet on tegmen, lvs1006 T166203
  • 06:38 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1097 - T166206 (duration: 00m 40s)
  • 06:10 marostegui: Deploy alter table on s3 - db2050 - T166278
  • 06:09 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2057, depool db2050 - T166278 (duration: 00m 56s)
  • 06:05 marostegui: Resume pt-table-checksum on s1 - T162807
  • 02:58 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 26 02:58:48 UTC 2017 (duration 6m 37s)
  • 02:52 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 59s)
  • 02:25 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 08m 31s)
  • 01:33 bsitzmann@tin: Finished deploy [mobileapps/deploy@a8d0c91]: Update mobileapps to db6493c (duration: 03m 45s)
  • 01:29 bsitzmann@tin: Started deploy [mobileapps/deploy@a8d0c91]: Update mobileapps to db6493c
  • 00:16 thcipriani@tin: Finished scap: SWAT: Fix version of DonationInterface deployed to donatewiki T166302 (duration: 19m 44s)

2017-05-25

  • 23:56 thcipriani@tin: Started scap: SWAT: Fix version of DonationInterface deployed to donatewiki T166302
  • 23:44 thcipriani@tin: Synchronized php-1.30.0-wmf.2/resources/src/jquery/jquery.makeCollapsible.js: SWAT: jquery.makeCollapsible: Restore considering empty <a> as part of toggle T166298 (duration: 00m 42s)
  • 23:20 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Revert "Add Code of Conduct footer links to wikitech and mw.o"" PART II (duration: 00m 41s)
  • 23:19 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Add Code of Conduct footer links to wikitech and mw.o"" PART I (duration: 00m 43s)
  • 22:58 thcipriani: mw1170 wikipedias back to 1.30.0-wmf.1
  • 22:26 thcipriani: mw1170 running wmf.2 for all wikis for troubleshooting T166345
  • 22:24 thcipriani: mw1161 wikipedias back to running running wmf.1
  • 22:20 thcipriani: mw1161 running wmf.2 for all wikis for troubleshooting T166345
  • 22:17 papaul: ores200[1-9] - signing puppet certs, salt-key, initial run
  • 21:43 papaul: OS install on ores200[1-9]
  • 21:31 arlolra: Updated Parsoid to 5b52d07b (T166068)
  • 21:25 arlolra@tin: Finished deploy [parsoid/deploy@4a2c3f4]: Updating Parsoid to 5b52d07b (duration: 07m 43s)
  • 21:18 arlolra@tin: Started deploy [parsoid/deploy@4a2c3f4]: Updating Parsoid to 5b52d07b
  • 20:30 urandom: T164865: RESTBase dev, disable revision range deletes
  • 20:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: wikipedias back to 1.30.0-wmf.1
  • 19:48 chasemp: restart redises on rdb2003
  • 19:44 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: revert SWAT: Add Code of Conduct footer links to wikitech and mw.o Part II (duration: 00m 38s)
  • 19:43 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: revert SWAT: Add Code of Conduct footer links to wikitech and mw.o Part I (duration: 00m 39s)
  • 19:23 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Add Code of Conduct footer links to wikitech and mw.o Part II (duration: 00m 39s)
  • 19:22 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Code of Conduct footer links to wikitech and mw.o Part I (duration: 00m 39s)
  • 19:09 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.2
  • 18:47 volans: completed upgrade of facter across the fleet T166203 (apart few hosts down)
  • 18:39 volans: forcing BBU learn on db1016
  • 18:34 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Remove special Math extension settings for hewiki Remove UseMathJax from CommonSettings.php T165475 (duration: 00m 43s)
  • 18:27 urandom: T164865: RESTBase dev, re-enable render range deletes
  • 18:12 thcipriani: mwscript namespaceDupes.php hewiki --fix
  • 18:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add namespace aliases for Hebrew Wikipedia T164858 (duration: 00m 47s)
  • 17:51 volans@sarin: conftool action : set/pooled=inactive; selector: name=mw2140.codfw.wmnet
  • 17:31 jynus@neodymium: conftool action : set/pooled=no; selector: name=mw2140.codfw.wmnet
  • 17:30 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2055 after maintenance, 2nd try (duration: 02m 42s)
  • 17:25 jynus@neodymium: conftool action : set/pooled=inactive; selector: name=mw2140.codfw.wmnet
  • 17:14 bsitzmann@tin: Finished deploy [mobileapps/deploy@614d752]: Update mobileapps to 946fe1f (duration: 04m 04s)
  • 17:12 jynus: powercycling mw2140
  • 17:10 bsitzmann@tin: Started deploy [mobileapps/deploy@614d752]: Update mobileapps to 946fe1f
  • 17:07 jynus@tin: Synchronized wmf-config/db-codfw.php: Repool db2055 after maintenance (duration: 02m 43s)
  • 16:27 urandom: T164865: RESTBase dev, re-enable revision range deletes
  • 15:43 godog: delete thumbnails with > 2000px for wikivoyage / wikiversity / wikisource / wikiquote - T162796
  • 15:28 jynus: restarting and upgrading db2055 for kernel downgrade
  • 14:40 bblack: restart cp1074 backend (mailbox)
  • 14:08 godog: shut ms-be1021 for BBU replacement - T163777
  • 13:39 jynus: restarting and upgrading db2055 for maintenance
  • 13:29 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2055 for maintenance (duration: 00m 41s)
  • 13:22 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1077 back to high load after maintenance (duration: 00m 41s)
  • 13:04 elukey: restart cassandra-a on aqs1004 to test https://gerrit.wikimedia.org/r/354107
  • 12:41 akosiaris: cordon kubernetes100{2,3,4} for testing calico-node on kubernetes1001
  • 10:01 elukey: restart HDFS datanode daemons on all the hadoop worker nodes for jvm upgrades
  • 09:39 elukey: reimage analytics1030 to Debian Jessie - T165529
  • 09:35 elukey: restart Yarn nodemanager daemons on all the hadoop worker nodes for jvm upgrades
  • 09:28 godog: ban commons object on request in ulsfo
  • 09:07 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1077 after maintenance with low weight (duration: 00m 41s)
  • 08:25 jynus: stopping and restarting db1077
  • 08:03 volans: resuming slow upgrade of facter across the fleet checking is a noop T166203
  • 07:58 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1077 for maintenance and upgrade (duration: 00m 41s)
  • 07:40 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2057 - T166278 (duration: 00m 41s)
  • 07:28 godog: roll-restart jessie ms-be2* for linux 4.9 update - T162029
  • 06:21 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T166206 (duration: 00m 55s)
  • 05:58 marostegui: Start pt-table-checksum on s1 - T162807
  • 02:51 l10nupdate@tin: ResourceLoader cache refresh completed at Thu May 25 02:51:53 UTC 2017 (duration 6m 38s)
  • 02:45 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 07m 18s)
  • 02:26 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 09m 46s)
  • 00:18 aaron@tin: Synchronized wmf-config/ProductionServices.php: Enable HTTPs for Swift usage (duration: 00m 41s)
  • 00:15 aaron@tin: Synchronized wmf-config/filebackend.php: Enable HTTPs for Swift usage (duration: 00m 41s)
  • 00:10 twentyafterfour: phabricator upgrade complete, service is online
  • 00:06 twentyafterfour: upgrading phabricator, expect momentary downtime

2017-05-24

  • 23:53 ejegg: updated payments-wiki from 5fa4a70 to 4786e7c
  • 23:49 XenoRyet: updated civicrm from 9b7a74c to 9c06bd2
  • 23:28 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Allow page images outside the lead on Wikivoyage wikis (T166251) (duration: 00m 41s)
  • 23:19 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable related pages for everyone (T155079) (duration: 00m 42s)
  • 23:18 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable print styles in Minerva (T163287) (duration: 00m 42s)
  • 23:10 catrope@tin: Synchronized multiversion/MWMultiVersion.php: Allow absolute script path for getMediaWikiCli() (duration: 00m 44s)
  • 22:33 krinkle@tin: Synchronized php-1.30.0-wmf.2/extensions/wikihiero: Fix styles queue warning - T92459 (duration: 00m 42s)
  • 22:02 mutante: terbium: dbtree: git stash and git pull origin to fix unclean repo state, deploy fix to syntax error
  • 21:53 urandom: T164865: Disabling range delete-based render culling, dev env
  • 21:34 Dereckson: Run fixProofreadIndexPagesContentModel.php new version (with Gerrit:355534 fix) to every wikisource
  • 21:10 Dereckson: Fixed wikisource Index: content model for ta.wikisource, en.wikisource and not wikisource databases (frrwiki + test2 + sourceswiki)
  • 21:10 demon@tin: Synchronized php-1.30.0-wmf.2/extensions/ProofreadPage/maintenance/fixProofreadIndexPagesContentModel.php: Now with proper batch support (duration: 00m 41s)
  • 20:38 demon@tin: Synchronized scap/plugins/clean.py: cleanups (duration: 00m 41s)
  • 20:29 Dereckson: Run fixProofreadIndexPagesContentModel on vec.wikisource (requested by Tpt), aborted after 50k (as that's greater than the expected number of rows)
  • 20:08 ejegg: reverted payments-wiki to 5fa4a70
  • 20:04 ejegg: updated payments-wiki from 5fa4a70 to 4786e7c
  • 19:25 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.2
  • 19:20 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 01s)
  • 19:20 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:14 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:14 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:12 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 01s)
  • 19:12 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:12 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:12 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:12 demon@tin: Synchronized wmf-config/: Dropping old ExtensionMessages (duration: 00m 42s)
  • 19:11 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:11 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:11 otto@tin: Finished deploy [eventlogging/analytics@c90a609]: (no justification provided) (duration: 00m 02s)
  • 19:11 otto@tin: Started deploy [eventlogging/analytics@c90a609]: (no justification provided)
  • 19:07 demon@tin: Synchronized wmf-config/: Dropping old contribution-tracking-setup.php -- finally (duration: 00m 42s)
  • 19:03 demon@tin: Synchronized wmf-config/CommonSettings.php: Dropping old ContribTracking config (duration: 00m 41s)
  • 19:02 demon@tin: Synchronized .gitignore: Completeness (duration: 00m 41s)
  • 19:00 thcipriani@tin: Finished scap: SWAT: Use file width/height instead of metadata for getContentHeaders Batch/pipeline backend operations in refreshFileHeaders T150741 (duration: 03m 12s)
  • 18:57 thcipriani@tin: Started scap: SWAT: Use file width/height instead of metadata for getContentHeaders Batch/pipeline backend operations in refreshFileHeaders T150741
  • 18:56 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/TimedMediaHandler/handlers: SWAT: Make getContentHeaders rely on fallback width/height T150741 (duration: 00m 41s)
  • 18:55 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/PagedTiffHandler/PagedTiffHandler_body.php: SWAT: Update getContentHeaders signature T150741 (duration: 00m 42s)
  • 18:54 thcipriani@tin: Synchronized php-1.30.0-wmf.2/extensions/PdfHandler/PdfHandler_body.php: SWAT: Update getContentHeaders signature T150741 (duration: 00m 40s)
  • 18:31 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: mobileFrontend: Move first paragraph before infobox T150325 (duration: 00m 41s)
  • 18:18 thcipriani: running mwscript namespaceDupes.php trwiki --fix
  • 18:17 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create a new namespace "Vikiproje" for trwiki T166102 (duration: 00m 41s)
  • 18:10 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgUploadNavigationUrl on srwiki T165901 (duration: 00m 42s)
  • 18:00 urandom: T164865: Upgrading Cassandra from 3.7.3-instaclustr to 3.10
  • 17:45 ottomata: rolling druid back to 0.9.0
  • 16:58 moritzm: installing ghostscript regression update on trusty (jessie security update was not affected)
  • 16:56 jynus: restarting and upgrading db2047
  • 16:54 volans: pause slowly upgrading facter across the fleet, resuming tomorrow T166203
  • 16:37 marostegui: Stop pt-table-checksum on s1 - T162807
  • 16:26 bblack: restarting varnish backend on cp1099 (mailbox lag)
  • 15:45 godog: test-upgrade grafana 4.3.1 on labmon1001
  • 15:35 krinkle@tin: Synchronized php-1.30.0-wmf.2/resources/Resources.php: Restore mediawiki.page.watch.ajax dependency - Iebfda85c7 (duration: 00m 42s)
  • 15:00 godog: deploy thumbor 0.1.39 for memcache-based throttling - T151065
  • 14:54 moritzm: uploaded gerrit 2.13.8+wmf2 to apt.wikimedia.org
  • 14:04 moritzm: installing jasper security updates on trusty (jessie already fixed)
  • 13:59 marostegui: Start running pt-table-checksum on s1 (will not run over night for now) - T162807
  • 13:59 paravoid: cr2-esams: enabling netflows experimentally
  • 13:54 elukey: upgrade Druid daemons on druid100[123] to 0.10 - T164008
  • 13:28 volans: slowly upgrading facter across the fleet checking is a noop T166203
  • 13:14 godog: upload prometheus-hhvm-exporter 0.3-1 to jessie-wikimedia - T158286
  • 12:20 moritzm: upgrade application servers using HHVM 3.18 to the latest 3.18.2+wmf4 build
  • 12:09 moritzm: updating puppet on puppetmaster2002
  • 12:08 godog: bounce pybal on lvs1003 - T134893
  • 11:52 XioNoX: pregressively adding "remove-private" to ix4/6 and transit4/6 bgp groups on cr2-esams T83037
  • 11:36 moritzm: uploaded puppet_3.8.5-2~bpo8+2 to apt.wikimedia.org
  • 10:50 akosiaris: repool esams T133387
  • 10:46 volans: stopped temporarily ircecho to avoid alert spam
  • 10:43 ema: upgrade prometheus-node-exporter on lvs hosts to 0.14.0~git20170523-0 T160156
  • 10:43 ema: upgrade prometheus-node-exporter on cache hosts to 0.14.0~git20170523-0 T160156
  • 10:05 volans: forcing puppet run on failed hosts only in esams T133387
  • 09:59 XioNoX: asw-esams back up (T133387)
  • 09:53 XioNoX: rebooting asw-esams for upgrade (T133387)
  • 09:49 ema: upgrade prometheus-node-exporter on cache hosts to 0.14.0~git20170523-0 T147569
  • 09:26 godog: upload prometheus-node-exporter 0.14.0~git20170523-0 to jessie-wikimedia - T160156
  • 09:15 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: cp3036.esams.wmnet
  • 09:10 akosiaris: drain esams for network tests for T133387
  • 08:52 marostegui: Deploy alter table on codfw master (db2019 and let it replicate) on s4 - T166206
  • 08:51 joal@tin: Finished deploy [analytics/refinery@9377d9c]: Deploying to fix yesterday's deploy bugs (duration: 02m 44s)
  • 08:49 akosiaris: depool cp3036 for T133387 testing
  • 08:49 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: cp3036.esams.wmnet
  • 08:48 joal@tin: Started deploy [analytics/refinery@9377d9c]: Deploying to fix yesterday's deploy bugs
  • 07:29 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1094 - T164530 (duration: 00m 41s)
  • 07:17 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1086, depool db1094 - T164530 (duration: 00m 41s)
  • 07:03 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1079, depool db1086 - T164530 (duration: 00m 42s)
  • 06:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1079 - T164530 (duration: 00m 54s)
  • 06:34 marostegui: Deploy alter table on s2.fawiki directly on codfw master (db2029) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 06:04 marostegui: Run pt-table-checksum on s7.frwiktionary - https://phabricator.wikimedia.org/T163190
  • 06:02 marostegui: Deploy alter table on s2 db1047 - https://phabricator.wikimedia.org/T162611
  • 03:04 l10nupdate@tin: ResourceLoader cache refresh completed at Wed May 24 03:04:03 UTC 2017 (duration 6m 45s)
  • 02:57 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.2) (duration: 13m 38s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 30s)

2017-05-23

  • 21:00 mepps: deployed payment wiki 0c06f8e
  • 20:24 bblack: enable BBR for all caches - T147569
  • 20:20 bblack: enable BBR for all caches @ codfw - T147569
  • 20:10 bblack: enable BBR for all caches @ ulsfo - T147569
  • 20:06 bblack: disabling puppet on all caches for BBR deploy control
  • 19:52 thcipriani@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.30.0-wmf.2
  • 19:34 thcipriani@tin: Finished scap: testwiki to php-1.30.0-wmf.2 and rebuild l10n cache (duration: 27m 52s)
  • 19:07 bblack: resetting cp1074 queues again: "fq flow_limit 200 buckets 10240"
  • 19:06 thcipriani@tin: Started scap: testwiki to php-1.30.0-wmf.2 and rebuild l10n cache
  • 18:43 bblack: resetting cp1074 queues again: "fq flow_limit 200 buckets 4096"
  • 17:40 bblack: fq on cp1074 reset to flow_limit 200 (resets counters)
  • 17:24 ladsgroup@tin: Finished deploy [ores/deploy@4874809]: Trying again with deploying ores (duration: 21m 30s)
  • 17:09 thcipriani: starting branch cut for 1.30.0-wmf.2 T163512
  • 17:03 ladsgroup@tin: Started deploy [ores/deploy@4874809]: Trying again with deploying ores
  • 16:50 volans: upgrading facter on mw[2250-2259] as a test batch
  • 16:49 bblack: BBR: enabling bbr on cp1074 - T147569
  • 16:43 bblack: BBR: enabling mq+fq on cp1074 - T147569
  • 16:26 bblack: puppet re-enables on caches
  • 16:24 demon@tin: Synchronized README: testing (duration: 00m 38s)
  • 16:17 bblack: disabled puppet on all cp* for RPS-related deployments (just in case!)
  • 16:16 bblack: disabled puppet on all lvs* for RPS-related deployments
  • 16:15 ema: cp1074: enable prometheus node_exporter qdisc collector T147569
  • 15:50 marostegui: Stop replication on dbstore1002 s7 thread for maintenance - T163190
  • 15:23 volans: re-enabled raid_handler and puppet on tegmen
  • 15:02 otto@tin: Finished deploy [eventlogging/analytics@UNKNOWN]: (no justification provided) (duration: 00m 02s)
  • 15:01 otto@tin: Started deploy [eventlogging/analytics@UNKNOWN]: (no justification provided)
  • 14:56 otto@tin: Finished deploy [eventlogging/analytics@25f8096]: (no justification provided) (duration: 00m 04s)
  • 14:56 otto@tin: Started deploy [eventlogging/analytics@25f8096]: (no justification provided)
  • 14:42 volans: temporarily disabled raid_handler and puppet on tegmen
  • 14:25 jynus: deploying new check_raid monitoring write policy for megacli T166108
  • 14:21 Dereckson: EU SWAT done
  • 14:21 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on dty.wikipedia (T166121) (duration: 00m 38s)
  • 14:09 XioNoX: re-enabling BGP session to Init7 - T165288
  • 14:03 moritzm: installing nutcracker update in codfw (T163795)
  • 13:37 marostegui: Run CleanDuplicateScores script to clean up possible duplicates on fawiki before starting to create the UNIQUE keys - https://phabricator.wikimedia.org/T164530
  • 13:23 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Add *.esa.int to CopyUploadsDomains (T164643) (duration: 00m 39s)
  • 12:47 elukey@tin: Finished deploy [analytics/refinery@679aeea]: Updated stat1002 with the last refinery deployment (duration: 00m 42s)
  • 12:46 elukey@tin: Started deploy [analytics/refinery@679aeea]: Updated stat1002 with the last refinery deployment
  • 12:46 elukey@tin: Finished deploy [analytics/refinery@679aeea]: (no justification provided) (duration: 00m 01s)
  • 12:45 elukey@tin: Started deploy [analytics/refinery@679aeea]: (no justification provided)
  • 12:39 joal@tin: Finished deploy [analytics/refinery@679aeea]: Weekly deploy (2 weeks late, big deploy)-2 (duration: 01m 35s)
  • 12:38 joal@tin: Started deploy [analytics/refinery@679aeea]: Weekly deploy (2 weeks late, big deploy)-2
  • 12:24 joal@tin: Finished deploy [analytics/refinery@679aeea]: Weekly deploy (with 2 weeks late, big deploy) (duration: 04m 24s)
  • 12:20 moritzm: upgrading mw1261-mw1265 to hhvm 3.18.2+dfsg-1+wmf4
  • 12:20 joal@tin: Started deploy [analytics/refinery@679aeea]: Weekly deploy (with 2 weeks late, big deploy)
  • 12:13 joal@tin: Finished deploy [analytics/refinery@222d0c0]: (no justification provided) (duration: 03m 56s)
  • 12:09 joal@tin: Started deploy [analytics/refinery@222d0c0]: (no justification provided)
  • 12:09 moritzm: uploaded hhvm 3.18.2+dfsg-1+wmf4 to apt.wikimedia.org (contains extended upstream fix for XML reader crash) (T162586)
  • 11:56 elukey: set vm.dirty_backround_bytes=25165824 on aqs1004 as part of testing for https://gerrit.wikimedia.org/r/#/c/354107 (Rollback: set vm.dirty_backround_ratio=10)
  • 11:51 _joe_: uploaded calico-cni 1.8.3-1~wmf1 to jessie-wikimedia
  • 11:51 _joe_: uploaded calicoctl 1.2.0-1~wmf1 to jessie-wikimedia
  • 11:44 _joe_: pushed calico/node:1.2.0 to the docker registry
  • 11:42 _joe_: pushed calico/kube-policy-controller:0.6.0 to the docker registry
  • 11:19 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 - T164530 (duration: 00m 38s)
  • 11:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1088, depool db1093 - T164530 (duration: 00m 38s)
  • 10:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1085, depool db1088 - T164530 (duration: 00m 38s)
  • 10:43 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1085 - T164530 (duration: 00m 38s)
  • 10:27 godog: upload kafkatee 0.1.5 to jessie-wikimedia, remove unused kafkatee 0.1.4 from trusty-wikimedia - T149451
  • 10:14 marostegui: Run pt-table-checksum on s7.frwiktionary - T165743
  • 09:56 moritzm: restarting cassandra on restbase1013, restbase1014, restbase1015, restbase1017 to pick up Java security updates
  • 09:49 godog: swift eqiad-prod: ms-be1028/ms-be1039 object weight 3500 - T160640
  • 09:46 addshore: addshore@terbium:~$ ~/mymwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php et+wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
  • 09:24 moritzm: restarting cassandra on restbase1007, restbase1009, restbase1012 to pick up Java security updates
  • 09:16 hashar: Restarting Jenkins on contint1001
  • 09:15 elukey: reverted manual hack on mw1161 with scap pull
  • 08:15 elukey: apply manually https://gerrit.wikimedia.org/r/#/c/351854/2/wmf-config/jobqueue.php (persistent connections between hhvm and redis) to mw1161 as production test
  • 08:13 marostegui: Force WB as a default policy on db1031 because of degraded BBU
  • 08:00 addshore: the last script I started is now stopped
  • 07:48 addshore: addshore@terbium:~$ ~/mymwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php et+wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
  • 07:25 moritzm: installing openjdk security updates on maps and wdqs clusters
  • 07:13 marostegui: Deploy schema change on ruwiki.ores_classification directly on codfw master (db2028) - T164530
  • 07:07 marostegui: Rename gather_list gather_list_flag gather_list_item on db1078 db1094 and db1089 - T166097
  • 06:29 marostegui: Deploy alter table on s7.frwiktionary db2040 and db1034 - T165743
  • 06:20 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1021 - T162611 (duration: 00m 38s)
  • 06:20 marostegui: Deploy alter table on s2 eqiad master db1054 - T162611
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 23 02:29:17 UTC 2017 (duration 6m 16s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 25s)

2017-05-22

  • 23:51 aaron@tin: Synchronized wmf-config: Move swift auth URL to ProductionServices (duration: 00m 52s)
  • 23:49 aaron@tin: Synchronized static/images/project-logos/hywiki-2x.png: Fix hy.wikipedia high resolution logos (duration: 00m 38s)
  • 23:48 aaron@tin: Synchronized static/images/project-logos/hywiki-1.5x.png: Fix hy.wikipedia high resolution logos (duration: 00m 38s)
  • 23:34 demon@tin: Synchronized wmf-config/ProductionServices.php: I4b19b4 (duration: 00m 38s)
  • 23:33 demon@tin: Synchronized wmf-config/filebackend.php: I4b19b4 (duration: 00m 38s)
  • 23:20 aaron@tin: Synchronized wmf-config/filebackend.php: Move swift auth URL to ProductionServices (duration: 00m 38s)
  • 23:19 aaron@tin: Synchronized wmf-config/ProductionServices.php: Move swift auth URL to ProductionServices (duration: 00m 38s)
  • 23:15 aaron@tin: Synchronized wmf-config/logging.php: Include DB shard in production SPI log entries (duration: 00m 38s)
  • 21:11 bblack: BBR: cp1065: reverted back to cubic+pfifo_fast - T147569
  • 21:10 bblack: BBR: cp1074: reverted back to cubic+pfifo_fast - T147569
  • 20:56 ladsgroup@tin: Finished deploy [ores/deploy@4874809]: Second deploy of ores for enabling frwiki damaging (duration: 05m 23s)
  • 20:50 ladsgroup@tin: Started deploy [ores/deploy@4874809]: Second deploy of ores for enabling frwiki damaging
  • 20:46 arlolra: Updated Parsoid to ebac1890 (T165139)
  • 20:43 ladsgroup@tin: Finished deploy [ores/deploy@263255a]: (no justification provided) (duration: 29m 07s)
  • 20:40 arlolra@tin: Finished deploy [parsoid/deploy@a9f2229]: Updating Parsoid to ebac1890 (duration: 07m 54s)
  • 20:32 arlolra@tin: Started deploy [parsoid/deploy@a9f2229]: Updating Parsoid to ebac1890
  • 20:14 ladsgroup@tin: Started deploy [ores/deploy@263255a]: (no justification provided)
  • 20:14 Amir1: starting deploy of ores:68cca85 to prod
  • 19:30 bblack: BBR: cp1074: switching congestion control to bbr manually - T147569
  • 19:29 bblack: BBR: cp1074: switching qdisc to mq+fq manually - T147569
  • 19:25 bblack: BBR: cp1065: switching congestion control to bbr manually - T147569
  • 19:16 bblack: BBR: cp1065: switching qdisc to mq+fq manually - T147569
  • 18:57 demon@tin: Synchronized README: forcing co-master sync (duration: 00m 42s)
  • 18:56 demon@tin: Pruned MediaWiki: 1.29.0-wmf.20 (duration: 01m 21s)
  • 18:22 ejegg: updated payments-wiki from 3b84521 to 5fa4a70
  • 18:18 bblack: rebooting acamar
  • 18:06 mepps: updated thank you send drush command
  • 18:01 mepps: updated civicrm 9b7a74c
  • 18:00 mepps: updated process control for new thank you send drush command 7c9572b
  • 17:49 ejegg: turned off paypal audit parser
  • 16:06 akosiaris: re-enable notifications in icinga
  • 15:27 _joe_: restarted puppetmasters in codfw
  • 13:23 dcausse@tin: Synchronized wmf-config/InitialiseSettings.php: Use wikitech db group instead of labswiki+ labtestwiki (duration: 00m 39s)
  • 13:22 akosiaris: silence icinga
  • 13:17 dcausse@tin: Synchronized wmf-config/CommonSettings.php: Enable TimedMediaHandler's new video player Beta Feature in Labs (duration: 00m 43s)
  • 13:02 _joe_: restarted etcdmirror on conf1002, consequence of https://gerrit.wikimedia.org/r/354095
  • 09:59 moritzm: repooled mw2221 (was down for hardware error)
  • 09:37 marostegui: Deploy alter table s7.frwiktionary on db1039 - https://phabricator.wikimedia.org/T165743
  • 09:15 marostegui: Drop table MediaWikiInstallPingback_15732959 from db1046, db1047 and dbstore1002 - T165836
  • 09:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1026, depool db1045 - T164530 (duration: 00m 39s)
  • 08:55 marostegui: Restart mysql on db1069 to apply new replication filters - T165977
  • 08:50 marostegui: Restart mysql on db1095 to apply new replication filters - T165977
  • 08:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1026 - T164530 (duration: 00m 38s)
  • 08:02 marostegui: Deploy alter table on s2 (revision table) db1021 - https://phabricator.wikimedia.org/T162611
  • 08:02 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1021 - T162611 (duration: 00m 38s)
  • 07:56 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2035 - T162611 (duration: 00m 38s)
  • 07:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db2035 - T162611 (duration: 00m 38s)
  • 07:54 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1036 - T162611 (duration: 00m 39s)
  • 07:22 moritzm: installing openjdk-7 security updates on jessie
  • 07:14 marostegui: Deploy alter table on s5 wikidatawiki.ores_classification directly on codfw master - T164530
  • 07:07 marostegui: Run CleanDuplicateScores script to clean up possible duplicates on wikidatawiki before starting to create the UNIQUE keys - T164530
  • 06:56 marostegui: Deploy alter table s7.frwiktionary on dbstore1001 - https://phabricator.wikimedia.org/T165743
  • 06:53 marostegui: Deploy alter table s7.frwiktionary on db2029 (codfw master) - https://phabricator.wikimedia.org/T165743
  • 06:47 marostegui: Deploy alter table on db2035 and db1036 for s2. bgwiktionary,eowiki, idwiki - T162611
  • 06:47 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2035 - T162611 (duration: 00m 38s)
  • 06:37 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1036 - T162611 (duration: 00m 39s)
  • 06:02 smalyshev@tin: Finished deploy [wdqs/wdqs@e4301da]: Redeploy GUI due to breakage in T165228 (duration: 01m 50s)
  • 06:00 smalyshev@tin: Started deploy [wdqs/wdqs@e4301da]: Redeploy GUI due to breakage in T165228
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Mon May 22 02:26:59 UTC 2017 (duration 6m 0s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 18s)

2017-05-21

  • 09:42 Reedy: force ran puppet on deployment-tin to pickup dbname in wmf-beta-update-database.py
  • 09:07 smalyshev@tin: Finished deploy [wdqs/wdqs@227ab25]: Redeploy GUI due to breakage in T165228 (duration: 00m 19s)
  • 09:06 smalyshev@tin: Started deploy [wdqs/wdqs@227ab25]: Redeploy GUI due to breakage in T165228
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Sun May 21 02:27:46 UTC 2017 (duration 6m 3s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 30s)

2017-05-20

  • 21:54 Dereckson: Run namespaceDupe on fr.wikisource and en.wikisource
  • 17:29 addshore: addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log T164407
  • 17:29 addshore: addshore@terbium:/srv/mediawiki/php-1.30.0-wmf.1$ mwscriptwikiset extensions/Cognate/maintenance/purgeDeletedCognatePages.php wiktionary.dblist --batch-size=1000 >> ~/purge.201705161230.log
  • 09:08 thcipriani: restarting jenkins on contint1001
  • 08:24 smalyshev@tin: Finished deploy [wdqs/wdqs@227ab25]: Whitelist update (duration: 02m 32s)
  • 08:22 smalyshev@tin: Started deploy [wdqs/wdqs@227ab25]: Whitelist update
  • 07:52 gehel: restart wdqs-updater on all wdqs clusters (stuck on too large update)
  • 02:29 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 20 02:29:14 UTC 2017 (duration 6m 13s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s)

2017-05-19

  • 16:44 reedy@tin: Synchronized wmf-config/throttle.php: Wikimedia Vienna Hackathon (duration: 00m 39s)
  • 15:40 mutante: planet10001 - manually deleting cron job for deleted sr.planet (should puppetize the "absence" too)
  • 13:58 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2047 - T165743 (duration: 00m 38s)
  • 13:47 marostegui: Deploy alter table s7.frwiktionary db1033 - T165743
  • 13:31 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2047 - T165743 (duration: 00m 39s)
  • 13:09 moritzm: downgraded mw1161 to HHVM 3.12 (crashes often compared to app servers, downgrade over the weekend)
  • 12:58 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2068 - T165743 (duration: 00m 39s)
  • 12:40 marostegui: Deploy alter table s7.frwiktionary on db2068 - T165743
  • 12:40 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2068 - T165743 (duration: 00m 40s)
  • 11:39 marostegui: Deploy alter table s2.revision table on labsdb1003 - T162611
  • 11:05 moritzm: uploaded nutcracker 0.4.1-1+wm3~jessie1 to apt.wikimedia.org (T163795)
  • 10:31 ebernhardson: restarting elsaticsearch on relforge1001 to pull in remote reindex
  • 10:19 moritzm: powercycling mw2221, stuck in reboot and serial console unresponsive
  • 10:08 _joe_: moved stale repos to /srv/deployment/STALE on tin, T129290
  • 10:07 moritzm: rebooting mw2220/mw2221 for update to Linux 4.9 / HHVM 3.18 / nutcracker tests
  • 09:15 reedy@tin: Synchronized dblists/: Update size dblists (duration: 00m 39s)
  • 09:01 reedy@tin: Synchronized php-1.30.0-wmf.1/extensions/WikimediaMaintenance/makeSizeDBLists.php: Catch a silly error (duration: 00m 39s)
  • 08:14 jynus@tin: Synchronized wmf-config/db-codfw.php: Depool db2048 for reimage (duration: 00m 39s)
  • 07:36 akosiaris: reboot kubernetes2001 for tests
  • 06:51 moritzm: installing openjdk-7/trusty regression update
  • 06:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1051 - T159753 T164530 (duration: 00m 38s)
  • 06:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1051 - T159753 T164530 (duration: 00m 39s)
  • 06:09 marostegui: Deploy alter table s2.revision table - db1018 - https://phabricator.wikimedia.org/T162611
  • 06:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1060 - T162611 (duration: 00m 40s)
  • 05:56 jynus: shutting down db2049 and preparing it for reimage
  • 02:28 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 19 02:28:08 UTC 2017 (duration 6m 0s)
  • 02:22 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 53s)

2017-05-18

  • 20:47 mutante: wasat - git pull - bring to latest, the last changed had never been deployed here like on terbium, but it's also not a backend for dbtree yet (T163141)
  • 20:44 mutante: terbium / dbtree - deploying gerrit:353388 (sudo -u mwdeploy git pull origin in /srv/dbtree) (T163143)
  • 20:03 urandom: T164865: restarting RESTBase-dev, range delete-based render retention
  • 19:52 urandom: T164865: restarting RESTBase-dev to apply range delete-based render retention
  • 19:06 urandom: T164865: configure RESTBase tables for size-tiered compaction (dev env only)
  • 18:37 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/SecurePoll/includes/pages/DumpPage.php: Revert "Dump should return decrypted votes" (T145695) (duration: 00m 48s)
  • 17:10 robh: mr1-ulsfo having oob connection re-routed at ulsfo, will flap a bit from 1700-1730 gmt
  • 17:09 moritzm: upgrading mw2130-mw2139 to Linux 4.9 and HHVM 3.18
  • 16:28 moritzm: restarting cassandra on restbase1010, restbase1011, restbase1016, restbase1018 to pick up OpenJDK security updates
  • 16:11 elukey: upgraded cassandra-tools-wmf on aqs hosts
  • 15:54 _joe_: uploaded package cni to jessie-wikimedia
  • 15:34 marostegui: Deploy alter table s2.revision table - db1060 - https://phabricator.wikimedia.org/T162611
  • 15:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1074, depool db1060 - T162611 (duration: 00m 39s)
  • 14:46 moritzm: rebooting restbase1008 for update to Linux 4.9 and to pick up OpenJDK security updates
  • 14:32 XioNoX: rebooting mr1-ulsfo for software upgrade - T164970
  • 14:12 akosiaris: perform a final reboot on kubernetes200X
  • 13:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 - T159753 T164530 (duration: 00m 39s)
  • 13:40 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 - T159753 T164530 (duration: 01m 03s)
  • 13:33 jynus: stopping mariadb and preparing for reimage at db2051
  • 13:14 elukey: AMEND prev: reloaded kafkatee on oxygen
  • 13:14 elukey: reloaded kafkatee to test T151748
  • 12:57 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1066 - T159753 T164530 (duration: 00m 38s)
  • 12:51 moritzm: upgrading mw1209-mw1219 to Linux 4.9 and HHVM 3.18
  • 12:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072, depool db1066 - T159753 T164530 (duration: 00m 38s)
  • 12:44 marostegui: Deploy alter table s2.revision table - db1074 - https://phabricator.wikimedia.org/T162611
  • 12:44 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1076, depool db1074 - T159753 T164530 (duration: 00m 39s)
  • 12:42 moritzm: upgrading mw1161 (job runner) to HHVM 3.18
  • 11:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1073, depool db1072 - T159753 T164530 (duration: 00m 39s)
  • 11:10 marostegui: Run pt-table-checksum on s7.metawiki - T163190
  • 09:49 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1080, depool db1073 - T159753 T164530 (duration: 00m 39s)
  • 09:47 moritzm: upgrading image scalers in codfw to Linux 4.9 and HHVM 3.18
  • 09:33 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1083, depool db1080 - T159753 T164530 (duration: 00m 38s)
  • 09:14 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1083 - T159753 T164530 (duration: 00m 39s)
  • 09:07 moritzm: upgrading image scalers mw1294/mw1295 to Linux 4.9 and HHVM 3.18
  • 09:06 marostegui: Deploy alter table s2.revision table - db1076 - https://phabricator.wikimedia.org/T162611
  • 09:06 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1090, depool db1076 - T162611 (duration: 00m 39s)
  • 08:52 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1089 - T159753 T164530 (duration: 00m 39s)
  • 08:46 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T159753 T164530 (duration: 00m 39s)
  • 08:32 moritzm: upgrading mw1180-mw1188, mw1200-mw1208 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 08:16 apergos: reboot dataset1001 for kernel update
  • 08:09 marostegui: Deploy alter table on s1.enwiki directly on codfw master (db2016) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 08:01 moritzm: reboot rhenium for update to Linux 4.9
  • 07:36 moritzm: installing freetype security updates on trusty (jessie already fixed)
  • 07:27 akosiaris: restart nagios-nrpe-server on dbstore2001
  • 07:01 marostegui: Deploy alter table on s2.plwiki directly on codfw master (db2017) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 06:43 moritzm: installing tiff security updates
  • 06:24 marostegui: Deploy alter table on s2.ptwiki directly on codfw master (db2017) after running the clean up duplicates script - https://phabricator.wikimedia.org/T164530
  • 06:21 marostegui: Deploy alter table s2.revision table - labsdb1001 - T162611
  • 06:10 marostegui: Deploy alter table s2.revision table - dbstore1001 - T162611
  • 06:10 marostegui: Deploy alter table s2.revision table - db1090 - T162611
  • 06:09 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1090 - T162611 (duration: 00m 38s)
  • 06:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2062 - T116557 (duration: 00m 39s)
  • 05:01 Jamesofur: insert decryption key for WMF Board Election
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Thu May 18 02:26:11 UTC 2017 (duration 5m 59s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 14s)

2017-05-17

  • 23:45 cwd: re-enabled p-c jobs
  • 23:06 cwd: disabled p-c jobs
  • 22:27 ejegg: updated SmashPig from 0145e2d to 4f84d88
  • 22:00 urandom: T164865: altering compaction strategy to sizetiered, local_group_wikipedia_T_parsoid_html.data (in RESTBase dev)
  • 21:50 ejegg: rolled back SmashPig to 0145e2d
  • 21:47 ejegg: updated SmashPig from 0145e2d to 1affad1
  • 20:36 ejegg: updated paypal EC fallback currency in payments-wiki config
  • 19:21 robh: mr1-ulsfo replacement underway
  • 18:54 urandom: T164865: restarting RESTBase in dev env to apply range-delete probability bug-fix
  • 18:30 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.init.js: Do not check for visual editor availability when loading source editor (Gerrit:354126) (duration: 00m 39s)
  • 18:24 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on ilo. and ms.wikipedia (T164230, T165247) (duration: 00m 39s)
  • 18:08 paravoid: reprepro include facter 2.4.6 to jessie-wikimedia/trusty-wikimedia
  • 16:52 bblack: restarting varnish backend on cp1099 (mailbox)
  • 16:42 moritzm: upgrading mw2120-mw2129 to Linux 4.9 and HHVM 3.18
  • 15:08 moritzm: upgrading mw1189-mw1199 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 14:50 marostegui: Deploy alter table on s2.revision table on db1069 - T162611
  • 14:26 demon@tin: Synchronized README: No-op, forcing co-master sync (duration: 00m 40s)
  • 14:20 demon@tin: Pruned MediaWiki: 1.29.0-wmf.21 [keeping static files] (duration: 00m 22s)
  • 14:20 moritzm: upgrading mw1170-mw1179 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 14:19 demon@tin: Pruned MediaWiki: 1.29.0-wmf.19 (duration: 01m 07s)
  • 14:17 demon@tin: Pruned MediaWiki: 1.29.0-wmf.19 [keeping static files] (duration: 00m 12s)
  • 13:47 cmjohnson1: replacing optics on cr1-3/1/2 and/or asw-c-eqiad:xe-8/0/38 T165008
  • 13:47 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/TwoColConflict/modules/: SWAT Fix issues with column alignment T165129 (duration: 00m 39s)
  • 13:44 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT Take RevisionSlider out of beta on all sites NOOP PT 2/2 (duration: 00m 39s)
  • 13:42 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT Take RevisionSlider out of beta on all sites T163685 PT 1/2 (duration: 00m 40s)
  • 13:42 elukey: shutdown analytics1030 for T165529
  • 13:41 moritzm: upgrading mw1261-mw1265 to new hhvm-luasandbox/hhvm-luasandbox-dbg packages
  • 13:23 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Harden zerowiki config (T162771) (duration: 00m 41s)
  • 12:42 marostegui: Deploy alter table on s2.trwiki directly on codfw master (db2017) after running the clean up duplicates script - T164530
  • 11:27 moritzm: uploaded php-luasandbox_2.0.12~jessie3 to apt.wikimedia.org (adds a separate debug package hhvm-luasandbox-dbg)
  • 11:17 moritzm: rebooting restbase2012 for update to Linux 4.9 and to pick up openjdk security updates
  • 10:58 moritzm: rebooting restbase2011 for update to Linux 4.9 and to pick up openjdk security updates
  • 10:47 jynus: stopping db2052 and preparing it for reimage
  • 10:26 moritzm: rebooting restbase2010 for update to Linux 4.9 and to pick up openjdk security updates
  • 09:58 moritzm: rebooting restbase2009 for update to Linux 4.9 and to pick up openjdk security updates
  • 09:31 moritzm: rebooting restbase2008 for update to Linux 4.9 and to pick up openjdk security updates
  • 08:50 marostegui: Deploy alter table on codfw master (db2016) and let ir replicate - T159753
  • 06:56 marostegui: Drop already renamed tables from labtestweb2001 (labtestwiki) - T164887
  • 06:54 marostegui: Drop already renamed tables from silver (labswiki) - T164887
  • 06:52 marostegui: Deploy alter table on s2 (revision table) dbstore1002 - T162611
  • 06:26 marostegui: Deploy alter table on s2 (revision table) db2017 (codfw master) - https://phabricator.wikimedia.org/T1626111
  • 06:22 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2041 and db2049 - T162611 (duration: 00m 39s)
  • 06:01 marostegui: Resume pt-table-checksum on s7.centralauth - https://phabricator.wikimedia.org/T163190
  • 02:25 l10nupdate@tin: ResourceLoader cache refresh completed at Wed May 17 02:25:59 UTC 2017 (duration 6m 1s)
  • 02:19 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 06m 58s)

2017-05-16

  • 23:25 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Update wmde-policy RSS feed on meta. (T165285) (duration: 00m 39s)
  • 22:42 Dereckson: Tin has now an up-to-date /srv/mediawiki-staging HEAD, with operations/mediawiki-config repo = prod = staging
  • 20:22 mobrovac@tin: Started restart [restbase/deploy@d98af6f] (dev-cluster): Apply the revision range deletion algorithm, take 2 - T164865
  • 20:10 mobrovac@tin: Started restart [restbase/deploy@d98af6f] (dev-cluster): Apply the revision range delition algorithm - T164865
  • 18:49 jynus: rolled back to HEAD~2 on tin to leave things the way I found them
  • 18:41 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1055 after reimage (duration: 00m 39s)
  • 18:26 bblack: cp1074: run-no-puppet varnish-backend-restart (has high mailbox lag, causing small 503 spikes)
  • 17:23 cmjohnson1: swapping optics asw-c-eqiad xe-8/0/38 T165008
  • 17:05 moritzm: upgrading mw2017/mw2099 to Linux 4.9 and HHVM 3.18
  • 16:40 moritzm: upgrading mw2190-mw2199 to Linux 4.9 and HHVM 3.18
  • 16:22 robh@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2098.codfw.wmnet
  • 15:48 jynus: restarting and upgrading db1095
  • 15:01 marostegui@tin: Synchronized wmf-config/db-codfw.php: Remove old comment (duration: 00m 39s)
  • 14:53 marostegui: Deploy alter table on s2 (revision table) db2041 - https://phabricator.wikimedia.org/T162611
  • 14:53 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2056, depool db2041 - T162611 (duration: 00m 41s)
  • 14:50 mobrovac@tin: Started restart [restbase/deploy@d98af6f]: Apply new puppet role/profile paradigm
  • 14:36 kartik@tin: Finished deploy [cxserver/deploy@6118dda]: Update cxserver to 740641f (duration: 02m 21s)
  • 14:34 kartik@tin: Started deploy [cxserver/deploy@6118dda]: Update cxserver to 740641f
  • 14:27 moritzm: upgrading mw2180-mw2189 to Linux 4.9 and HHVM 3.18
  • 14:08 jynus: rolling restart labsdb1009,10,11 for mariadb upgrade (and kernel upgrade)
  • 14:06 moritzm: rebooting restbase2007 for update to Linux 4.9 and to pick up openjdk security updates
  • 13:53 moritzm: upgrading mw2170-mw2179 to Linux 4.9 and HHVM 3.18
  • 13:48 addshore: SWAT done
  • 13:48 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/QuickSurveys/extension.json: SWAT: Explicitly add mediawiki.cookie dependency (duration: 00m 39s)
  • 13:40 moritzm: rebooting restbase2006 for update to Linux 4.9 and to pick up openjdk security updates
  • 13:39 addshore@tin: Synchronized wmf-config/throttle.php: SWAT: Raise the account creation limit for www.enwp.org/WP:Meetup/Eugene/WikiAPA T165421 (duration: 00m 39s)
  • 13:36 addshore@tin: Synchronized wmf-config/: SWAT: #1 T164502, #2, #3 (duration: 00m 41s)
  • 13:19 moritzm: upgrading mw2163-mw2169 to HHVM 3.18
  • 13:07 moritzm: upgrading mw2110-mw2117 to HHVM 3.18
  • 12:55 marostegui: Run pt-table-checksum on s7.centralauth - https://phabricator.wikimedia.org/T163190
  • 12:06 marostegui: Deploy alter table on s2 (revision table) db2049 - https://phabricator.wikimedia.org/T162611
  • 12:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2049 - T162611 (duration: 00m 39s)
  • 11:37 moritzm: upgrading mw1190-mw1208 to Linux 4.9 and HHVM 3.18
  • 11:28 jynus: stopping db1055 before reimage for backup
  • 11:27 Amir1: ladsgroup@terbium:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --wiki=enwiki
  • 11:25 Amir1: cleaning up is completely done current number of rows: 9,261,264 T159753
  • 11:24 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1055 for reimage (duration: 00m 39s)
  • 10:56 moritzm: upgrading codfw app servers already using HHVM 3.18 to 3.18.2+wmf3
  • 10:49 marostegui: Deploy schema change on testwikidatawiki.wb_terms on s3 codfw master - T165246
  • 10:36 jynus: upgrading and restarting db2062's mariadb service
  • 10:30 moritzm: installing openjdk-7 security updates on trusty hosts
  • 10:28 addshore: T164407 addshore@terbium mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000
  • 10:27 addshore: addshore@terbium mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000
  • 10:14 moritzm: upgrading mw1185-mw1189 to Linux 4.9 and HHVM 3.18
  • 09:26 moritzm: upgrading mw1189 / mw1293 from HHVM 3.18.2+wmf2 to 3.18.2+wmf3
  • 08:59 moritzm: upgrading mw1170-mw1184 from HHVM 3.18.2+wmf2 to 3.18.2+wmf3
  • 08:45 moritzm: upgrading git packages on tin/naos from local 2.11 backport to the version from jessie-backports
  • 08:22 moritzm: installing git security updates on trusty (jessie already fixed)
  • 07:39 godog: upload prometheus-mysqld-exporter 0.10.0 to jessie-wikimedia - T161296
  • 07:10 moritzm: upgrading mw1261-mw1265 to HHVM 3.18.2+wmf3
  • 07:06 Amir1_: start of cleaning up ores_classification table in enwiki last round (four hours) (T159753)
  • 06:58 moritzm: restarted hhvm on mw1165 (stuck in HPHP::Treadmill deadlock)
  • 06:37 marostegui: Stop replication at the same position on db1044 and db2018 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 06:32 marostegui: Disable replication codfw > eqiad on s3 https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 06:08 marostegui: Run pt-table-checksum on s7.viwiki - T163190
  • 06:02 marostegui: Deploy alter table on s2 (revision table) db2056 - T162611
  • 06:02 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2063, depool db2056 - T162611 (duration: 00m 40s)
  • 05:17 XioNoX: fyi, one of the links between codfw and eqiad is down for a scheduled Zayo maintenance. No outage, traffic routed around.
  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 16 02:26:19 UTC 2017 (duration 6m 3s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 12s)
  • 00:30 ejegg: updated payments-wiki from 57451de to 3b84521
  • 00:10 ejegg: updated CiviCRM from 061cd61 to 4ece34c

2017-05-15

  • 23:41 bd808@tin: Synchronized php-1.30.0-wmf.1/resources/src/mediawiki.rcfilters/mw.rcfilters.Controller.js: RCFilters: Actually read/write highlight parameter (T165107) (duration: 00m 40s)
  • 22:23 mobrovac@tin: Finished deploy [restbase/deploy@d98af6f]: Wt2lint bug fix - T163091 (duration: 06m 44s)
  • 22:16 mobrovac@tin: Started deploy [restbase/deploy@d98af6f]: Wt2lint bug fix - T163091
  • 21:19 mobrovac@tin: Finished deploy [restbase/deploy@c52add0]: Expose the new /transform/wikitext/to/lint end point to the public - T163091 (duration: 06m 32s)
  • 21:13 mobrovac@tin: Started deploy [restbase/deploy@c52add0]: Expose the new /transform/wikitext/to/lint end point to the public - T163091
  • 20:48 gilles: run refreshImageMetadata --force for group1 + group2 wikis except commons on terbium T150741
  • 20:20 subbu: Updated Parsoid to a182c227 (T141226, T164792, T37247, T153107, T163091, T164006, T161151, T162920, T163549)
  • 20:11 ssastry@tin: Finished deploy [parsoid/deploy@132d0e5]: Updating Parsoid to a182c227 (duration: 07m 21s)
  • 20:04 ssastry@tin: Started deploy [parsoid/deploy@132d0e5]: Updating Parsoid to a182c227
  • 19:42 catrope@tin: Synchronized php-1.30.0-wmf.1/includes/api/ApiQueryRevisions.php: T165100 (duration: 00m 40s)
  • 18:45 catrope@tin: Synchronized php-1.30.0-wmf.1/extensions/MobileFrontend/: Revert "Use csrf token for watching" (T165209) (duration: 00m 41s)
  • 18:45 RoanKattouw: Canary failing on mw1279 due to Wikimedia\Rdbms\Database::makeList: empty input for field rev_id from ApiQueryRevisions
  • 18:20 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Disable test reader QuickSurveys (T131949, T164769, T164894, T164960, T164943) (duration: 00m 40s)
  • 17:24 mobrovac@tin: Finished deploy [restbase/deploy@c70a1e1] (dev-cluster): Bring RESTBase up to date in the Dev Cluster (duration: 01m 51s)
  • 17:22 mobrovac@tin: Started deploy [restbase/deploy@c70a1e1] (dev-cluster): Bring RESTBase up to date in the Dev Cluster
  • 15:39 akosiaris: upgrade pybal to 1.13.6 across the LVS fleet
  • 15:10 mobrovac@tin: Finished deploy [citoid/deploy@3ed34ef]: Better publishing date extraction support - T132308 (duration: 02m 49s)
  • 15:07 mobrovac@tin: Started deploy [citoid/deploy@3ed34ef]: Better publishing date extraction support - T132308
  • 14:24 mobrovac@tin: Started restart [restbase/deploy@c70a1e1] (dev-cluster): Restart after applying https://gerrit.wikimedia.org/r/#/c/352851/
  • 13:50 moritzm: upgrading mwdebug servers to 3.18.2+wmf3
  • 13:48 addshore@tin: Synchronized php-1.30.0-wmf.1/includes/media/DjVu.php: SWAT: Add X-Content-Dimensions support to DjVu T150741 (duration: 00m 39s)
  • 13:47 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/TimedMediaHandler/handlers: SWAT: Fix X-Content-Dimensions support T150741 (duration: 00m 40s)
  • 13:37 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/VisualEditor: SWAT: #1 #2 T165238 T165238 VisualEditor (duration: 00m 41s)
  • 13:27 moritzm: uploaded HHVM 3.18.2+dfsg-1+wmf3 to apt.wikimedia.org (addresses segfault in XML reader (T162586, T165074)
  • 13:20 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/Cognate/maintenance/populateCognatePages.php: SWAT: Add a clear-first option to populatePages script T164407 PT 2/2 (duration: 00m 39s)
  • 13:19 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/Cognate/src/CognateStore.php: SWAT: Add a clear-first option to populatePages script T164407 PT 1/2 (duration: 00m 40s)
  • 13:10 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add QuickSurvey for reader segmentation research T131949 T164769 T164894 T164960 T164963 (duration: 00m 40s)
  • 12:54 akosiaris: upload pybal 1.13.6 to apt.wikimedia.org/jessie-wikimedia/main
  • 12:33 aude@tin: Synchronized wmf-config/Wikibase-production.php: Enable data type for tabular data (duration: 00m 41s)
  • 11:09 Amir1_: cleaning up ores_classification has finished 18M rows deleted, current number of rows 38,937,217 (T159753)
  • 10:36 moritzm: rebooting mw2224-mw2242 for update to Linux 4.9
  • 10:18 moritzm: installing batik security updates on trusty
  • 10:14 moritzm: installing fop security updates on trusty
  • 09:34 moritzm: installing bind security updates (we're using client-side libs/tools only)
  • 09:10 godog: swift codfw-prod: more ms-be2001/ms-be2012 decom - T162785
  • 08:29 godog: swift eqiad-prod: ms-be1028/ms-be1039 object weight 3000 - T160640
  • 08:26 moritzm: installing rtmpdump security updates on jessie
  • 08:17 Amir1_: start of cleaning up ores_classification table
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Mon May 15 02:27:02 UTC 2017 (duration 5m 59s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 42s)
  • 01:25 bblack: depooled cp1053 from all services (possible hardware issues)

2017-05-14

  • 02:26 l10nupdate@tin: ResourceLoader cache refresh completed at Sun May 14 02:26:33 UTC 2017 (duration 6m 2s)
  • 02:20 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s)

2017-05-13

  • 12:27 gehel: restarting wdqs updater on wdqs cluster
  • 02:33 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 13 02:33:34 UTC 2017 (duration 6m 9s)
  • 02:27 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 07s)
  • 01:18 mobrovac: zotero restart as memis above 50%
  • 00:54 urandom: T165139: Truncating RESTBase feed_aggregated tables (corruption)
  • 00:31 urandom: T165139: Truncating RESTBase summary tables (corruption)

2017-05-12

  • 20:55 demon@tin: Synchronized wmf-config/InitialiseSettings.php: Touch (duration: 00m 39s)
  • 20:49 demon@tin: Synchronized wmf-config/: Swapping DynamicSidebar to normal extension registration (duration: 00m 19s)
  • 19:20 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/TextExtracts/includes/ApiQueryExtracts.php: API: Change memcache key to clear cache T165161 (duration: 00m 39s)
  • 19:02 thcipriani@tin: Synchronized wmf-config/CommonSettings.php: Add RejectParserCacheValue handler for mw-parser-output T165161 (duration: 00m 40s)
  • 18:47 bblack: starting spaced-out ~4h run of "run-no-puppet varnish-frontend-restart" on cache_upload+cache_text to re-set transient storage levels (in screen on neodymium)
  • 18:10 thcipriani@tin: Finished scap: Revert "Wrap parser output in
    " 4/4 (duration: 19m 13s)
  • 17:51 thcipriani@tin: Started scap: Revert "Wrap parser output in
    " 4/4
  • 17:51 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/api/ApiParse.php: Revert "Wrap parser output in
    " 3/4 (duration: 00m 42s)
  • 17:50 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/cache/MessageCache.php: Revert "Wrap parser output in
    " 2/4 (duration: 00m 39s)
  • 17:49 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/parser/Parser.php: Revert "Wrap parser output in
    " 1/4 (duration: 00m 39s)
  • 17:38 ema: cp4010: upgrade varnish back to 4.1.6-1wm1, transient storage issues are unrelated
  • 17:33 krinkle@tin: Synchronized php-1.30.0-wmf.1/includes/resourceloader/ResourceLoaderClientHtml.php: (no justification provided) (duration: 00m 40s)
  • 16:53 moritzm: powercycling mw1294 (machine unacessible/locked up)
  • 16:23 moritzm: repooled mw1172 after scap pull (was down with hardware error)
  • 14:10 moritzm: rebooting mw2163-mw2179 for update to Linux 4.9
  • 13:47 moritzm: rebooting mw2110-mw2117 for update to Linux 4.9
  • 13:06 moritzm: repooled mw2098 (was down with hardware error)
  • 12:53 moritzm: downgrading mw1161 (job runner) to HHVM 3.12, some known instabilities and fix for one HHVM 3.18 will likely be available next week, so going the conversative way over the weekend
  • 11:35 gehel: cleaning old elasticsearch and logstash logs on logstash cluster
  • 10:38 _joe_: moved hpssacli.tar.gz to /root on puppetmaster1001
  • 09:59 hashar@tin: Synchronized php-1.30.0-wmf.1/extensions/MobileFrontend: Correctly handle the mw-parser-output wrapper - T164733 (duration: 00m 43s)
  • 09:02 akosiaris: move planet2001 to ganeti nodegroup row_A
  • 08:58 marostegui: Rename semantic tables before dropping them on wikitech hosts (silver and labtestweb2001) - T164887
  • 06:05 marostegui: Deploy alter table on s2 (revision table) db2063 - https://phabricator.wikimedia.org/T162611
  • 06:05 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2064, depool db2063 - T162611 (duration: 00m 39s)
  • 05:53 marostegui: Stop MySQL dbstore2001 for testing - T165033
  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 12 02:30:11 UTC 2017 (duration 6m 16s)
  • 02:23 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 06m 49s)

2017-05-11

  • 23:49 thcipriani@tin: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable saving RC Filters on Beta Cluster (beta-only-change) (duration: 00m 39s)
  • 23:39 thcipriani@tin: Synchronized php-1.30.0-wmf.1/resources/src/mediawiki.rcfilters: SWAT: Gate option to save RC filters to default false 3/3 (duration: 00m 39s)
  • 23:39 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/specials/SpecialRecentchanges.php: SWAT: Gate option to save RC filters to default false 2/3 (duration: 00m 39s)
  • 23:38 thcipriani@tin: Synchronized php-1.30.0-wmf.1/includes/DefaultSettings.php: SWAT: Gate option to save RC filters to default false 1/3 (duration: 00m 39s)
  • 23:30 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/TemplateData/extension.json: SWAT: Fix styles queue violation for "ext.templateData" T92459 (duration: 00m 39s)
  • 23:23 twentyafterfour: restart apache on iridium to apply hotfix for T163967
  • 23:21 thcipriani@tin: Synchronized php-1.30.0-wmf.1/resources/src/mediawiki/mediawiki.Upload.Dialog.js: SWAT: mw.Upload.Dialog: Define .static.name T164999 (duration: 00m 40s)
  • 23:12 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable OOUI for EditPage on MW.org (duration: 00m 40s)
  • 23:09 Amir1: clean up for ores_classification is finished for now, 9M rows cleaned, current number of row: 55,959,017 (T159753)
  • 21:19 twentyafterfour@tin: Synchronized php-1.30.0-wmf.1/includes/specials/SpecialSearch.php: hotfix T165091 (duration: 00m 39s)
  • 21:02 Amir1: start of cleaning up ores_classification in enwiki for two hours (T159753)
  • 20:57 hashar: CI Phpunit jobs were segfaulting due to an upgrade of HHVM to 3.18. Got rolled back to 3.12 - T165074
  • 20:06 demon@tin: Synchronized scap/plugins/prep.py: scap prep is fast now (duration: 00m 44s)
  • 19:41 demon@tin: Synchronized scap/plugins/clean.py: no-op, completeness (duration: 00m 42s)
  • 19:35 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.30.0-wmf.1
  • 18:53 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/Gadgets/includes/GadgetResourceLoaderModule.php: SWAT: Revert "Move gadget styles from main stylesheet request to site request" T165040 T165031 (duration: 00m 42s)
  • 18:47 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable OOUI in EditPage for fawiki T162849 (duration: 00m 42s)
  • 18:39 hoo: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds
  • 18:23 thcipriani@tin: Synchronized php-1.30.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: SWAT: RecentChangesClicks: Address minor performance concerns T158458 (duration: 00m 42s)
  • 15:35 ladsgroup@tin: Synchronized wmf-config: Set oresDamagingPref default to values that actually exist (T165011) (duration: 00m 44s)
  • 15:35 Amir1: starts of ladsgroup@tin:/srv/mediawiki-staging$ scap sync-dir wmf-config 'Set oresDamagingPref default to values that actually exist (T165011)'
  • 15:30 chasemp: rotate novaadmin in /labtest/ ldappasswd -H ldap://labtestservices2001.wikimedia.org -x -D "uid=novaadmin,ou=people,dc=wikimedia,dc=org" -W -A -S
  • 14:37 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sca2004.codfw.wmnet
  • 14:36 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=sca2004.codfw.wmnet
  • 14:10 gehel@tin: Finished deploy [wdqs/wdqs@bc30531]: (no justification provided) (duration: 01m 23s)
  • 14:08 gehel@tin: Started deploy [wdqs/wdqs@bc30531]: (no justification provided)
  • 14:07 gehel: deploying WDQS to fix T165029
  • 14:01 mobrovac@tin: Started restart [zotero/translation-server@50f216a]: Zotero unresponsive
  • 13:59 aude@tin: Synchronized php-1.30.0-wmf.1/extensions/Wikidata: Update quality constraints (duration: 02m 14s)
  • 13:56 mobrovac@tin: Started restart [zotero/translation-server@6a4a828]: (no justification provided)
  • 13:48 addshore@tin: Synchronized wmf-config/jobqueue-labs.php: SWAT: LABS ONLY Re-enable persistent connection to Redis for jobrunners in lab (duration: 00m 41s)
  • 13:33 addshore@tin: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: (notask) wgRevisionSliderAlternateSlider true everywhere PT 2/2 (duration: 00m 42s)
  • 13:33 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: (notask) wgRevisionSliderAlternateSlider true everywhere PT 1/2 (duration: 00m 43s)
  • 13:31 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: T162796 Stop prerendering thumbs at 2560/2880 pixels (duration: 00m 41s)
  • 13:23 moritzm: rebooting restbase2005 for update to Linux 4.9 / new openjdk
  • 13:21 addshore@tin: Synchronized php-1.30.0-wmf.1/extensions/Cognate/src/CognateStore.php: SWAT: T165005 Dont pass ConnectionRefs to ConnectionManager::releaseConnection (duration: 00m 42s)
  • 13:10 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: T164888 Correct alias(es) from es.wikisource to eo.wikisource (duration: 00m 42s)
  • 12:55 akosiaris: migrate sca2004 to ganeti nodegroup row_A
  • 12:33 marostegui: Run pt-table-checksum on s7.ukwiki - https://phabricator.wikimedia.org/T163190
  • 12:19 elukey: reboot kafka100[23] for kernel upgrades (kafka main-eqiad, eventbus eqiad)
  • 11:03 marostegui: Deploy alter table on s2 (revision table) db2064 - T162611
  • 11:03 marostegui@tin: Synchronized wmf-config/db-codfw.php: Depool db2064 - T162611 (duration: 00m 42s)
  • 10:15 akosiaris: reboot ganeti200{5,6,7,8} for network reconfiguration
  • 10:10 marostegui: Run pt-table-checksum on s7.rowiki - https://phabricator.wikimedia.org/T163190
  • 10:07 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore db1056 original load (duration: 00m 49s)
  • 09:46 ema: cp4010: downgrade varnish to 4.1.5-1wm4 and check frontend transient memory usage
  • 09:12 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=logstash1003.eqiad.wmnet
  • 09:12 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=logstash1002.eqiad.wmnet
  • 09:12 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=logstash1001.eqiad.wmnet
  • 09:10 moritzm: upgrading mw1170-mw1188 to HHVM 3.18 / Linux 4.9 (also pruning HHVM CLI bytecode since downtimed anyway)
  • 08:55 moritzm: migrating mw1161 (job runner) to HHVM 3.18 and Linux 4.9
  • 08:47 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1056 load (duration: 00m 43s)
  • 08:35 marostegui: Run pt-table-checksum on s7.kowiki - https://phabricator.wikimedia.org/T163190
  • 08:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Increase db1056 load (duration: 00m 42s)
  • 08:26 moritzm: migrating mw1189 (API server) to HHVM 3.18 and Linux 4.9
  • 07:53 godog: roll-restart ms-fe1* for linux 4.9 upgrade - T162029
  • 06:50 moritzm: migrating mw1293 (image scaler) to HHVM 3.18 and Linux 4.9
  • 06:30 marostegui: Drop mira user on wikitech database - T164968
  • 06:11 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1056 with less load (duration: 00m 43s)
  • 05:56 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1067 - T147166 T130067 (duration: 00m 57s)
  • 03:14 l10nupdate@tin: ResourceLoader cache refresh completed at Thu May 11 03:14:51 UTC 2017 (duration 6m 44s)
  • 03:08 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 13m 33s)
  • 02:46 Jamesofur: all election emails out
  • 02:41 Jamesofur: Sending English and all other language election emails via terbium
  • 02:35 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 13m 22s)
  • 02:21 Jamesofur: sending Chinese election emails via terbium
  • 02:18 Jamesofur: sending uk and vi election emails via terbium
  • 02:10 Jamesofur: sending pt,pt-br and ru election emails via terbium
  • 01:55 Jamesofur: sending polish and dutch election emails via terbium
  • 01:32 Jamesofur: sending Italian and Japanese election emails via terbium
  • 01:21 Jamesofur: sending he, hi and id election emails via terbium
  • 01:08 Jamesofur: sending French election emails via terbium
  • 01:05 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.1
  • 01:00 Jamesofur: sending farsi election emails via terbium
  • 00:50 Jamesofur: sending Spanish election emails via terbium
  • 00:36 Jamesofur: sending german election emails via terbium
  • 00:29 Jamesofur: sending bg and bn election emails via terbium
  • 00:11 Jamesofur: sending arabic election emails via terbium
  • 00:03 maxsem@tin: Finished deploy [kartotherian/deploy@9401f38]: Try https://gerrit.wikimedia.org/r/#/c/352886/ and https://gerrit.wikimedia.org/r/#/c/353184/ on test hosts (duration: 145m 42s)

2017-05-10

  • 23:50 twentyafterfour@tin: Finished scap: Sync fix for T164983 plus i18n files leftover from swat. refs T162954 (duration: 30m 37s)
  • 23:19 twentyafterfour@tin: Started scap: Sync fix for T164983 plus i18n files leftover from swat. refs T162954
  • 23:13 catrope@tin: Synchronized php-1.30.0-wmf.1/extensions/WikimediaEvents/: T164617 (duration: 00m 42s)
  • 23:08 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable archive search on select wikis (T162302) (duration: 00m 41s)
  • 21:38 twentyafterfour@tin: Synchronized php-1.30.0-wmf.1/extensions/ORES/includes/Hooks.php: sync fix for T164984 refs T162954 (duration: 00m 42s)
  • 21:38 maxsem@tin: Started deploy [kartotherian/deploy@9401f38]: Try https://gerrit.wikimedia.org/r/#/c/352886/ and https://gerrit.wikimedia.org/r/#/c/353184/ on test hosts
  • 20:55 elukey: restart hhvm on mw1268 (HHVM 3.12, HPHP::Treadmill::getAgeOldestRequest issue)
  • 20:37 demon@tin: Synchronized README: no-op, comaster sync (duration: 00m 42s)
  • 20:36 Dereckson: Run namespaceDupes.php on es.wikisource (T164195)
  • 20:35 bsitzmann@tin: Finished deploy [mobileapps/deploy@5d3b34a]: Update mobileapps to 75b135e (duration: 03m 55s)
  • 20:33 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Restore Autor: and Portal: namespaces on es.wikisource (T164195) (duration: 00m 42s)
  • 20:31 bsitzmann@tin: Started deploy [mobileapps/deploy@5d3b34a]: Update mobileapps to 75b135e
  • 19:51 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.21
  • 19:51 twentyafterfour: rolling group1 back to 1.29.0-wmf.21 due to T164984
  • 19:45 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.30.0-wmf.1
  • 19:33 twentyafterfour: deploying 1.30.0-wmf.1 to group1 wikis. refs T162954
  • 19:29 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/TimedMediaHandler/: Store original media dimensions as additional header (T150741) (duration: 00m 43s)
  • 19:28 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/PdfHandler/PdfHandler_body.php: Store original media dimensions as additional header (T150741) (duration: 00m 42s)
  • 19:27 dereckson@tin: Synchronized wmf-config/interwiki.php: Interwiki map update (disable __list sorting, T145337) (duration: 00m 41s)
  • 19:26 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/PagedTiffHandler/PagedTiffHandler_body.php: Store original media dimensions as additional header (T150741) (duration: 00m 42s)
  • 19:17 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/TwoColConflict/: Add "oojs-ui" dep to ext.TwoColConflict.filterOptionsJs (duration: 00m 42s)
  • 18:57 paravoid: mr1-ulsfo: request system snapshot media internal slice alternate; request system reboot
  • 18:53 dereckson@tin: Synchronized php-1.29.0-wmf.21/extensions/TwoColConflict/: Add "oojs-ui" dep to ext.TwoColConflict.filterOptionsJs (duration: 00m 42s)
  • 18:30 dereckson@tin: Synchronized php-1.30.0-wmf.1/extensions/CirrusSearch/maintenance/forceSearchIndex.php: Fix index usage on archive indexing (duration: 00m 42s)
  • 18:14 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Put Cognate in write mode for all wiktionaries (T164407) (duration: 00m 42s)
  • 17:46 jynus: setting db1056's cpu scaling_governor to performance, rather than powersave
  • 17:20 moritzm: installing groovy security updates
  • 17:03 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Revert "Create Autor and Portal namespaces on Spanish Wikisource" (PT164195) (duration: 00m 43s)
  • 16:30 godog: roll-restart swift object servers to apply https://gerrit.wikimedia.org/r/#/c/353078
  • 15:44 moritzm: instaling git security updates on jessie systems
  • 15:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1067 - T147166 T130067 (duration: 01m 43s)
  • 15:18 moritzm: uploaded HHVM 3.18.2 and HHVM extensions to apt.wikimedia.org/main (previously only in experimental)
  • 15:03 jynus: shutting down db1056 for pysical maintenance T164944
  • 14:57 elukey: reboot kafka1001 for kernel upgrades (kafka main-eqiad, eventbus eqiad)
  • 14:50 marostegui: Stop replication at the same position on db1067 and db2016 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 14:50 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1067 - T147166 T130067 (duration: 00m 43s)
  • 14:43 marostegui: Run pt-table-checksum on s7.huwiki - https://phabricator.wikimedia.org/T163190
  • 14:41 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1097 (duration: 00m 43s)
  • 14:39 jynus: disabling puppet to solve disk mount issues T164915
  • 14:36 godog: roll-restart swift-proxy to apply https://gerrit.wikimedia.org/r/#/c/353078/
  • 14:36 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1005.eqiad.wmnet
  • 14:34 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 (duration: 00m 43s)
  • 14:27 hashar: European SWAT completed
  • 14:21 moritzm: upgrading mw1263-mw1265 to latest HHVM package (including the redis QUIT patch)
  • 14:19 hashar@tin: Finished scap: Store original media dimensions as additional header - T150741 (duration: 03m 53s)
  • 14:15 hashar@tin: Started scap: Store original media dimensions as additional header - T150741
  • 14:15 hashar@tin: scap aborted: Store original media dimensions as additional header - T150741 (duration: 00m 00s)
  • 14:15 hashar@tin: Started scap: Store original media dimensions as additional header - T150741
  • 14:15 hashar@tin: scap aborted: (no justification provided) (duration: 00m 00s)
  • 14:15 hashar@tin: Started scap: (no justification provided)
  • 14:13 hashar: ValueError: /srv/mediawiki-staging/php-1.30.0-wmf.1/extensions/Collection/.eslintrc.json is an invalid JSON file
  • 13:53 elukey: reboot kafka200[23] for kernel upgrades (kafka main-codfw cluster, eventbus codfw)
  • 13:35 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Clean up inappropriate usages of wmg - T151891 (duration: 00m 42s)
  • 13:34 hashar@tin: Synchronized wmf-config/CommonSettings.php: Clean up inappropriate usages of wmg - T151891 (duration: 00m 42s)
  • 13:28 marostegui: Disable replication codfw > eqiad on s1 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 13:24 hashar@tin: Synchronized php-1.29.0-wmf.21/extensions/Popups: eventLogging: Discard events with duplicate tokens - T161769 T163198 (duration: 00m 43s)
  • 13:19 hashar@tin: Synchronized php-1.30.0-wmf.1/extensions/Popups: eventLogging: Discard events with duplicate tokens - T161769 T163198 (duration: 01m 08s)
  • 13:17 hashar@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ar.svg: (no justification provided) (duration: 00m 42s)
  • 13:13 hashar@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ar.svg: Add new Arabic Wikipedia logo - T164648 (duration: 00m 44s)
  • 13:12 akosiaris: restart pybal on lvs1006, lvs1009, lvs1012 to pick up the kubemaster LVS service
  • 13:09 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Add new Arabic Wikipedia logo - T164648 && Disable page previews beta features on various projects - T164740 (duration: 00m 42s)
  • 13:07 marostegui: Run pt-table-checksum on s7.hewiki - https://phabricator.wikimedia.org/T163190
  • 13:04 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Import sources on dty.wikipedia - T164573 (duration: 00m 43s)
  • 12:47 moritzm: installing irqbalance updates from jessie point update
  • 12:45 akosiaris: rebooting ganeti2007, ganeti2008 for networking config update
  • 12:34 moritzm: installing logback security updates
  • 11:27 jynus: stopping mariadb and preparing db1056 for reimage
  • 11:22 marostegui: Stop replication at the same position on db1049 and db2023
  • 11:14 marostegui: Stop replication at the same position on db1050 and db2028
  • 10:50 marostegui: Stop replication at the same position on db1033 and db2029 - T147166 T130067
  • 10:44 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1056 for reimage (duration: 00m 43s)
  • 10:43 marostegui: Disable replication codfw > eqiad on s7 - T147166 T130067
  • 09:36 godog: roll-restart ms-fe2* for linux 4.9 upgrade - T162029
  • 09:11 moritzm: installing vim security updates on jessie
  • 09:05 volans: updated CI puppet compiler facts from production
  • 08:59 moritzm: installing wget security updates on jessie
  • 08:35 moritzm: rebooting mx2001 for update to Linux 4.9
  • 08:35 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: wmgUseTwoColConflict true for all wikis (duration: 00m 54s)
  • 07:30 marostegui: Stop replication at the same position on db10418 and db2017 - T147166 https://phabricator.wikimedia.org/T130067
  • 07:16 marostegui: Disable replication codfw > eqiad on s2 -T147166 T130067
  • 07:13 Amir1: another round of cleaning up ores_classification is done, 12M rows deleted. Current number of rows: 64,902,521 (T159753)
  • 06:36 moritzm: installing rtmpdump security updates on trusty
  • 06:15 marostegui: Deploy alter table wikidatawiki.wb_terms on dbstore1001 - T162539 T163190
  • 06:08 marostegui: Run pt-table-checksum on s7.frwiktionary - T163190
  • 05:04 Amir1: start of cleaning up ores_classification rows for three hours
  • 04:49 kartik@tin: Finished deploy [cxserver/deploy@533b4f4]: Update cxserver to 534619c (duration: 02m 38s)
  • 04:46 kartik@tin: Started deploy [cxserver/deploy@533b4f4]: Update cxserver to 534619c
  • 03:02 l10nupdate@tin: ResourceLoader cache refresh completed at Wed May 10 03:02:23 UTC 2017 (duration 6m 37s)
  • 02:55 l10nupdate@tin: scap sync-l10n completed (1.30.0-wmf.1) (duration: 06m 50s)
  • 02:30 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 09s)
  • 00:29 maxsem@tin: Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/352980/3 (duration: 00m 42s)
  • 00:12 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on fiwiki (T163011) (duration: 00m 43s)
  • 00:10 RoanKattouw: Running extensions/ORES/maintenance/PopulateDatabase.php on fiwiki

2017-05-09

  • 23:54 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable RCFilters beta feature on all remaining wikis (T144458) (duration: 00m 44s)
  • 23:33 mutante: db1040 - remove from puppet, puppet node clean/deactivate, deleted salt-key, remove from icinga by running puppet on tegmen after that (T164057)
  • 23:23 demon@tin: Finished scap: rebuilding l10n for extension-list swap (duration: 34m 10s)
  • 23:13 mutante: analytics1027 - decom: revoke puppet cert, delete salt key, puppet node clean/deactivate, check icinga removal (T161597)
  • 22:49 demon@tin: Started scap: rebuilding l10n for extension-list swap
  • 22:46 reedy@tin: Synchronized wmf-config/extension-list-wikitech: Consistency (duration: 00m 42s)
  • 22:20 reedy@tin: Synchronized wmf-config/wikitech.php: Disable Semantic extensions (duration: 00m 42s)
  • 22:03 reedy@tin: scap aborted: (no justification provided) (duration: 00m 03s)
  • 22:03 reedy@tin: Started scap: (no justification provided)
  • 21:40 twentyafterfour: Mediawiki train group0 finished, will resume tomorrow with group 1 wikis. refs T162954
  • 21:32 twentyafterfour: group0 wikis to 1.30.0-wmf.1 refs T162954
  • 21:32 twentyafterfour@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.30.0-wmf.1
  • 20:52 twentyafterfour@tin: Finished scap: MediaWiki sync new branch wmf/1.30.0-wmf.1 + localization cache and deploy to testwikis refs T162954 (duration: 29m 41s)
  • 20:22 twentyafterfour@tin: Started scap: MediaWiki sync new branch wmf/1.30.0-wmf.1 + localization cache and deploy to testwikis refs T162954
  • 19:47 maxsem@tin: Finished deploy [kartotherian/deploy@740235c]: https://gerrit.wikimedia.org/r/#/c/352886/ (duration: 05m 35s)
  • 19:42 maxsem@tin: Started deploy [kartotherian/deploy@740235c]: https://gerrit.wikimedia.org/r/#/c/352886/
  • 18:39 bblack: varnish: manually etting runtime lru_interval / nuke_limit via varnishadm for all clusters' backends to match start-time change in https://gerrit.wikimedia.org/r/#/c/352827/
  • 18:26 subbu: updated Parsoid to 9d8badc8 (T151277)
  • 18:22 ssastry@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 07m 09s)
  • 18:16 mepps: updated SmashPig from 200f63e to 0145e2d
  • 18:15 ssastry@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 17:29 elukey: executing varnish-backend-restart on cp1072 as attempt to mitigate "FetchError Could not get storage" and "ExpKill LRU_Fail" - T145661
  • 17:25 elukey: executing varnish-backend-restart on cp1074 as attempt to mitigate "FetchError Could not get storage" and "ExpKill LRU_Fail" - T145661
  • 17:23 twentyafterfour: Preparing to branch 1.30.0-wmf.1 [ T162954 ]
  • 16:08 elukey: playing with mw2146 for T163674
  • 16:00 elukey: stopping Hadoop daemons and shutting down analytics[1032-1033,1040].eqiad.wmnet - T132256
  • 15:20 moritzm: installing rpcbind/libtirpc security updates on ms1001
  • 15:15 moritzm: uploaded kubernetes 1.5.5-1+wmf1 to stretch-wikimedia/experimental
  • 15:02 urandom: starting instances restbase2005
  • 14:55 moritzm: repooled mw1264 after hardware error has been fixed (and scap pull)
  • 14:45 hashar: European SWAT completed
  • 14:39 bblack: varnish: varnishadm runtime set default_ttl=86400 for text+upload fe+be layers via cumin, to match deployed start-time changes in https://gerrit.wikimedia.org/r/#/c/352826/
  • 14:22 hashar@tin: Finished scap: (no justification provided) (duration: 03m 10s)
  • 14:19 hashar@tin: Started scap: (no justification provided)
  • 14:16 elukey: correction: reboot kafka2001 for kernel upgrades (eventbus codfw)
  • 14:16 elukey: reboot kafka1001 for kernel upgrades (eventbus codfw)
  • 14:10 hashar@tin: Finished scap: TwoColConflict update (duration: 19m 30s)
  • 14:09 marostegui: Stop MySQL and shutdown db1048 (phabricator slave) to replace BBU - T160731
  • 14:06 marostegui: Run pt-table-checksum on s7.fawiki - T163190
  • 13:51 hashar@tin: Started scap: TwoColConflict update
  • 13:49 hashar@tin: Synchronized php-1.29.0-wmf.21/extensions/TwoColConflict: BACKPORTS from master - T162806 T163886 (duration: 00m 41s)
  • 13:47 hashar@tin: Synchronized wmf-config/Wikibase-production.php: Enable sending Wikidata notification on Wikivoyage - T142103 (duration: 00m 39s)
  • 13:46 gehel: upgrade deployment-prep cluster to elasticsearch 5.3.2 - T163707
  • 13:44 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Create Autor and Portal namespaces on Spanish Wikisource - T164195 (duration: 00m 39s)
  • 13:39 gehel: cancel upgrading elasticsearch on relforge (plugin under test is missing a release for 5.3.2) - T163703
  • 13:35 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Allow page move only autopatrolled at hiwiki - T164239 (duration: 00m 42s)
  • 13:33 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Allow new page patroll for autoconfirmed users on bnwiki - T164159 (duration: 00m 40s)
  • 13:26 ayounsi@tin: Finished deploy [librenms/librenms@b10cc7c]: (no justification provided) (duration: 00m 04s)
  • 13:26 ayounsi@tin: Started deploy [librenms/librenms@b10cc7c]: (no justification provided)
  • 13:25 hashar@tin: Synchronized php-1.29.0-wmf.21/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: Fix the container calculation for template editor - T163105 (duration: 00m 40s)
  • 13:23 gehel: upgrading elasticsearch on relforge - T163703
  • 13:11 reedy@tin: Synchronized wmf-config/extension-list: PageTriage to extension.json in extension-list (duration: 00m 39s)
  • 13:08 reedy@tin: Synchronized wmf-config/mobile.php: wfLoadExtension for ZeroBanner (duration: 00m 41s)
  • 13:02 moritzm: rebooting restbase2004 for update to Linux 4.9 and new OpenJDK
  • 12:34 gehel: upgrade ELK on deplyoment-logstash2
  • 12:19 moritzm: rebooting restbase2003 for update to Linux 4.9 and new OpenJDK
  • 11:47 marostegui: Stop replication at the same position on db1049 and db2023 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 11:45 marostegui: Disable replication codfw > eqiad on s5 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 11:35 moritzm: rebooting restbase2002 for update to Linux 4.9 and new OpenJDK
  • 11:27 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Restore original weight for db1097 - T147166 T130067 (duration: 00m 39s)
  • 11:03 elukey: forced net.netfilter.nf_conntrack_tcp_timeout_time_wait = 65 to all the kafka brokers
  • 10:39 ayounsi@tin: Finished deploy [librenms/librenms@259e998]: (no justification provided) (duration: 00m 09s)
  • 10:39 ayounsi@tin: Started deploy [librenms/librenms@259e998]: (no justification provided)
  • 10:35 akosiaris@tin: Finished deploy [librenms/librenms@259e998]: (no justification provided) (duration: 00m 02s)
  • 10:35 akosiaris@tin: Started deploy [librenms/librenms@259e998]: (no justification provided)
  • 10:34 elukey: reboot kafka1022 for kernel upgrades
  • 10:09 elukey: reboot kafka1020 for kernel upgrades
  • 09:57 moritzm: restarting hhvm on mw1190, deadlocked in HPHP::Treadmill::getAgeOldestRequest
  • 09:41 marostegui: Stop replication at the same position on db1097 and db2019 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 09:37 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1097 - T147166 T130067 (duration: 00m 41s)
  • 09:21 marostegui: Disable replication codfw > eqiad on s4 - https://phabricator.wikimedia.org/T147166 https://phabricator.wikimedia.org/T130067
  • 09:12 marostegui: Run pt-table-checksum on s7.eswiki - T163190
  • 09:07 hoo: Removed 2fa from global account Jcornelius (T164682)
  • 08:05 godog: roll-restart swift proxy for ratelimit middleware - T162793
  • 07:53 moritzm: uploaded kubernetes 1.4.2-6 for stretch-wikimedia to apt.wikimedia.org
  • 07:34 moritzm: removing unneeded rpcbind/nfs-common packages (T106477)
  • 07:31 marostegui: Stop replication at the same position on db1050 and db2028 - T147166 T130067
  • 07:27 marostegui: Disable replication codfw > eqiad on s6 - T147166 T130067
  • 07:12 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool es2019 - T149526 (duration: 00m 39s)
  • 07:11 elukey: reboot kafka1014 for kernel upgrades
  • 07:01 _joe_: installing the new version of python-service-checker across the fleet
  • 06:37 marostegui: Run pt-table-checksum on s7.cawiki - T163190
  • 06:01 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2038 - T162539 T163548 (duration: 00m 41s)
  • 05:54 marostegui: Deploy alter table on wikidatawiki.wb_terms on codfw master db2023 - https://phabricator.wikimedia.org/T162539 - https://phabricator.wikimedia.org/T163548
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Tue May 9 02:27:46 UTC 2017 (duration 5m 58s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 17s)

2017-05-08

  • 23:21 bd808@tin: Synchronized wmf-config/wikitech.php: Revert "Disable creation of new forms on wikitech" (T53642) (duration: 01m 10s)
  • 22:54 bd808@tin: Finished deploy [striker/deploy@00e8545]: openstack: Role modifications require global admin rights (T164787) (duration: 00m 27s)
  • 22:54 bd808@tin: Started deploy [striker/deploy@00e8545]: openstack: Role modifications require global admin rights (T164787)
  • 22:17 bd808: Deleted 2fa for user Mdann52 on wikitech after verifying account ownership via ssh file creation. T164804
  • 22:01 andrewbogott: rebooting labservices1002 to mess with the bios
  • 21:55 bblack: depooled cp3035 (memory issues - already schedule for FE restart to fix, which will repool when it's reached in the list...)
  • 21:25 mobrovac@tin: Finished deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046 (duration: 01m 39s)
  • 21:24 mobrovac@tin: Started deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046
  • 21:22 mobrovac@tin: Finished deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046 (duration: 03m 51s)
  • 21:18 mobrovac@tin: Started deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046
  • 21:17 mobrovac@tin: Finished deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046 (duration: 00m 38s)
  • 21:17 mobrovac@tin: Started deploy [graphoid/deploy@a288409]: Switched to npm-stored graph-shared, fix mapsnapshot - T164046
  • 21:12 mobrovac@tin: Finished deploy [restbase/deploy@c70a1e1]: Remove the mobile-text end point - T158128 (duration: 06m 23s)
  • 21:06 mobrovac@tin: Started deploy [restbase/deploy@c70a1e1]: Remove the mobile-text end point - T158128
  • 21:05 arlolra@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 02m 43s)
  • 21:02 arlolra@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 20:48 arlolra@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 01m 36s)
  • 20:47 arlolra@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 20:37 gehel: silencing elasticsearch shard incinga check, recovery after upgrade is going to take a long time - T161908
  • 20:34 arlolra@tin: Finished deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8 (duration: 04m 50s)
  • 20:30 arlolra@tin: Started deploy [parsoid/deploy@0459ae3]: Updating Parsoid to 9d8badc8
  • 20:27 gehel: restarted kibana on logstash cluster - T161908
  • 20:21 gehel: upgrading kibana on logstash cluster - T161908
  • 20:02 gehel: restarting elasticsearch on logstash cluster after upgrade - T161908
  • 19:47 gehel: logstash / elasticsearch downtime coming up - T161908
  • 19:34 bd808: Deployment of Striker for T162508 complete; will continue debug keystone issue that is preventing Tool Labs membership requests from being approved
  • 19:34 bblack: restarted varnishxcache service on cp3031, was malfunctioning and sending crazy stats to grafana...
  • 19:28 gehel: starting ELK (logstash) upgrade - T161908
  • 19:17 bd808@tin: Finished deploy [striker/deploy@3836477]: Implement Tool Labs membership application and processing (T162508) (duration: 00m 32s)
  • 19:17 bd808@tin: Started deploy [striker/deploy@3836477]: Implement Tool Labs membership application and processing (T162508)
  • 19:15 bd808: Forced puppet run on californium to provision new striker config settings
  • 19:07 bd808: Applied database migration for T162508 to striker database on m5-master
  • 18:58 MaxSem: Restarted tilerator and tileratorui across the cluster
  • 18:52 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T164621 (duration: 00m 39s)
  • 18:50 bblack: running varnish frontend restarts to fix memory sizing on 256G+ hosts over the next ~4.5 h (mostly text+upload hosts)
  • 18:49 bblack: cp4006 repooled (frontend restarted)
  • 18:45 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T164498 (duration: 00m 39s)
  • 18:44 bblack: running varnish frontend restarts to fix memory sizing on 96GB and 192GB hosts over the next ~45m (mostly maps+misc hosts)
  • 18:41 catrope@tin: Synchronized php-1.29.0-wmf.21/extensions/Popups/: T163198 (duration: 00m 39s)
  • 18:40 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp4006.ulsfo.wmnet
  • 18:40 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp4006.ulsfo.wmnet
  • 18:38 catrope@tin: Synchronized php-1.29.0-wmf.21/extensions/VisualEditor/extension.json: T164472 (duration: 00m 39s)
  • 18:36 catrope@tin: Synchronized php-1.29.0-wmf.21/extensions/VisualEditor/modules/ve-mw/dm/metaitems/ve.dm.MWFlaggedMetaItem.js: T164054 (duration: 00m 38s)
  • 18:33 catrope@tin: Synchronized php-1.29.0-wmf.21/includes: T100999 (duration: 01m 24s)
  • 18:33 maxsem@tin: Finished deploy [tilerator/deploy@001811e]: 001811e, was in testing for 3 weeks (duration: 00m 20s)
  • 18:32 maxsem@tin: Started deploy [tilerator/deploy@001811e]: 001811e, was in testing for 3 weeks
  • 18:30 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: T164614 (duration: 00m 40s)
  • 18:26 catrope@tin: Synchronized static/images/project-logos/: T163048 (duration: 00m 39s)
  • 18:15 gehel: restarting wdqs-updater
  • 17:08 gehel@tin: Finished deploy [wdqs/wdqs@e637cf0]: (no justification provided) (duration: 01m 36s)
  • 17:07 gehel@tin: Started deploy [wdqs/wdqs@e637cf0]: (no justification provided)
  • 16:27 _joe_: installing the new service-checker on restbase2001,scb2001
  • 16:01 papaul: ganeti200[7-8] - signing puppet certs, salt-key, initial run
  • 15:40 papaul: OS install on ganeti200[7-8]
  • 15:28 bblack: cp4016 repooled
  • 14:23 _joe_: uploading new version of service-checker to reprepro
  • 14:20 zeljkof: eu swat finished!
  • 14:19 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: $wmgRelatedArticlesShowInSidebar is now undefined (duration: 00m 39s)
  • 14:19 marostegui: Run pt-table-checksum on s7.arwiki - T163190
  • 14:15 chasemp: touch /forcefsck && /sbin/reboot labservices1002
  • 14:09 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Bengali logo to mobile site (T164652) (duration: 00m 39s)
  • 14:08 zfilipin@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-bn.svg: SWAT: Add Bengali logo to mobile site (T164652) (duration: 00m 39s)
  • 14:02 zeljkof: extending eu swat for a few minutes
  • 13:55 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: pagePreviews: Fix NavPopups gadget detection (T164044) (duration: 00m 39s)
  • 13:47 chasemp: labservices1002 'touch /forcefsck && sudo reboot'
  • 13:45 zfilipin@tin: Synchronized wmf-config/CommonSettings.php: SWAT: Wikivoyage should show related pages in footer of skin (T164391) (duration: 00m 39s)
  • 13:44 zfilipin@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Wikivoyage should show related pages in footer of skin (T164391) (duration: 00m 39s)
  • 13:42 moritzm: depooled mw1264 (set to inactive), since the host is down (T164725)
  • 13:07 moritzm: restarting cassandra on restbase2001 to pick up openjdk security updates
  • 11:15 ladsgroup@tin: Synchronized wmf-config/Wikibase-production.php: USe redis lockManager for change dispatching (T159826) (duration: 00m 56s)
  • 11:14 Amir1: start of ladsgroup@tin:/srv/mediawiki-staging$ scap sync-file wmf-config/Wikibase-production.php 'USe redis lockManager for change dispatching (T159826)'
  • 09:54 moritzm: upgrading mw1261-mw1264 to Linux 4.9
  • 09:30 godog: swift eqiad-prod: ms-be1028/ms-be1039 object weight 2000 - T160640
  • 09:25 elukey: rolling restart of cassandra on aqs* hosts to pick up new jvm upgrades
  • 09:17 godog: swift codfw-prod: more ms-be2001/ms-be2012 decom - T162785
  • 08:55 elukey: restart Kafka mirror maker on kafka101[24]
  • 08:47 elukey: reboot kafka1013 for kernel upgrades
  • 08:25 godog: swift eqiad-prod: ms-be1028/ms-be1039 container/account full weight - T160640
  • 08:06 Amir1: clean up party of ores_classification is done now (T159753) 10M rows deleted. Current number of rows: 76,586,043
  • 06:31 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Decommission db1024 - T162699 (duration: 00m 39s)
  • 06:30 marostegui@tin: Synchronized wmf-config/db-codfw.php: Decommission db1024 - T162699 (duration: 00m 39s)
  • 06:18 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2045, depool db2038 - T162539 T163548 (duration: 00m 40s)
  • 05:09 Amir1: start of cleaning up ores_classification rows for two hours (T159753)
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Mon May 8 02:27:37 UTC 2017 (duration 5m 58s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 25s)

2017-05-07

  • 21:09 elukey: depooled cp4016.ulsfo.wmnet (sudo -i depool from localhost) due to issues with vhtcpd (segfaults in dmesg).
  • 17:20 andrewbogott: clearing out broken instances in the nova fullstack queue and restarting the tests.
  • 17:12 andrewbogott: rebooting labservices1002 in hopes of getting its IO unstuck
  • 16:52 andrewbogott: switching primary designate server from labservices1002 to labservices1001
  • 16:07 andrewbogott: restarted designate-central on labservices1002 due to many log messages like 'Deadlock detected. Retrying...'
  • 16:05 andrewbogott: restarted pdns and pdns-recursor on labcontrol1002
  • 09:08 ema: cp4018: restart vhtcpd and varnish services; repool
  • 08:43 elukey: depooled cp4018.ulsfo.wmnet (sudo -i depool from localhost) due to issues with HTCP)
  • 02:27 l10nupdate@tin: ResourceLoader cache refresh completed at Sun May 7 02:27:14 UTC 2017 (duration 5m 59s)
  • 02:21 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 07m 49s)

2017-05-06

  • 02:30 l10nupdate@tin: ResourceLoader cache refresh completed at Sat May 6 02:30:10 UTC 2017 (duration 6m 2s)
  • 02:24 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 07m 38s)

2017-05-05

  • 20:21 demon@tin: Synchronized scap/scap.cfg: no-op (duration: 00m 39s)
  • 18:54 jynus@tin: Synchronized wmf-config/db-eqiad.php: Repool db1070 after maintenance (duration: 00m 40s)
  • 18:21 mutante: ocg1002 - apt-get clean'ed for disk space
  • 16:09 jynus: shutting down db1070 for hw maintenance T160392
  • 16:06 jynus@tin: Synchronized wmf-config/db-eqiad.php: Depool db1070 for hw maintenance (duration: 00m 39s)
  • 15:30 jynus: running schema change on puppet.fact_values (m1)
  • 15:28 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2045 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 15:28 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2052, depool db2045 - T162539 T163548 (duration: 00m 41s)
  • 15:18 elukey: increase nginx error log verbosity on mw2146 as test for T163674 (correct task)
  • 15:13 elukey: increase nginx error log verbosity on mw2146 as test for T164586
  • 15:04 bblack: nginx upgrading to 1.11.10-1+wmf1 on cache_upload
  • 14:59 bblack: nginx upgrading to 1.11.10-1+wmf1 on cache_text
  • 14:41 bblack: restarting all maps+misc varnish frontends for mem sizing update (spread over the next ~1.5h)
  • 14:30 bblack: restarting varnish frontend on cp4010 (text) for mem size update
  • 13:45 moritzm: installing remaining freetype security updates
  • 13:40 akosiaris@tin: Finished deploy [librenms/librenms@c0aa3ca]: Deploy WMF specific pages to librenms (duration: 00m 03s)
  • 13:39 akosiaris@tin: Started deploy [librenms/librenms@c0aa3ca]: Deploy WMF specific pages to librenms
  • 13:28 urandom: T163292: bootstrapping Cassandra on restbase1008-c
  • 13:25 chasemp: labstore1005/1004 'dpkg -i /home/jmm/*deb' for rpcbind fix (these are new security packages from mortizm)
  • 12:34 akosiaris@tin: Finished deploy [librenms/librenms@9fa1391]: (no justification provided) (duration: 00m 07s)
  • 12:34 akosiaris@tin: Started deploy [librenms/librenms@9fa1391]: (no justification provided)
  • 12:16 elukey: reboot kafka1018 for kernel upgrades
  • 11:30 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 03s)
  • 11:30 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:29 moritzm: installing openjdk-8 security updates/cassandra restarts on restbase staging clusters
  • 11:26 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 11:26 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:17 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 11:17 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:09 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 14s)
  • 11:08 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:02 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 23s)
  • 11:02 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:01 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 01s)
  • 11:01 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:01 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 11:00 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 11:00 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 03s)
  • 11:00 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 10:58 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s)
  • 10:58 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 10:57 akosiaris@tin: Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 13s)
  • 10:57 akosiaris@tin: Started deploy [librenms/librenms@b25a5e9]: (no justification provided)
  • 09:00 elukey: re-arm keyholder on mira (new scap key added for librenms)
  • 08:48 elukey: re-arming keyholder on naos
  • 08:46 godog: swift codfw-prod: ms-be2001 - ms-be2012 weight 700 - T162785
  • 07:49 marostegui: Deploy alter table on wikidatawiki.wb_terms - dbstore1002 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 07:11 marostegui: Deploy alter table on wikidatawiki.wb_terms - dbstore2001 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 06:45 ema: starting cache_upload upgrades to varnish 4.1.6-1wm1
  • 05:55 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2052 - T162539 T163548
  • 05:55 marostegui@tin: Synchronized wmf-config/db-codfw.php: Repool db2059, depool db2052 - T162539 T163548 (duration: 00m 40s)
  • 04:21 mutante: scheduled long downtime for mailman I/O stats on fermium - until we find better ways to deal with the normal spikes causing alerts
  • 02:38 l10nupdate@tin: ResourceLoader cache refresh completed at Fri May 5 02:38:35 UTC 2017 (duration 5m 14s)
  • 02:33 l10nupdate@tin: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 29s)
  • 01:26 urandom: T163292: starting bootstrap of restbase1018-b

2017-05-04

  • 20:06 maxsem@tin: Synchronized php-1.29.0-wmf.21/extensions/JsonConfig: https://gerrit.wikimedia.org/r/#/c/351749/ (duration: 00m 40s)
  • 19:01 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: T164407 wgCognateReadOnly false for medium wikis (duration: 00m 39s)
  • 18:18 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: T164407 wgCognateReadOnly false for small wikis (duration: 00m 40s)
  • 17:32 bblack: nginx upgrading to 1.11.10-1+wmf1 on cache_maps
  • 17:30 thcipriani@tin: Synchronized php-1.29.0-wmf.21/extensions/Cognate: Add stats tracking for CognateRepo method usage (duration: 00m 39s)
  • 17:01 thcipriani@tin: Synchronized wmf-config: Revert revert Enable Cognate for Wiktionary in Read Only mode T164407 (duration: 00m 40s)
  • 16:59 thcipriani@tin: Synchronized php-1.29.0-wmf.21/extensions/Cognate/src/CognateStore.php: Construct DBReadOnlyError with null db (duration: 00m 39s)
  • 16:55 urandom: T163292: Starting bootstrap of restbase1018-a
  • 16:49 thcipriani@tin: Synchronized wmf-config: Revert Enable Cognate for Wiktionary in Read Only mode T164407 (duration: 00m 40s)
  • 16:42 thcipriani@tin: Synchronized wmf-config: Enable Cognate for Wiktionary in Read Only mode T164407 (duration: 00m 40s)
  • 16:29 thcipriani@tin: Synchronized php-1.29.0-wmf.21/extensions/Cognate: SWAT: Add read only mode T164407 (duration: 00m 56s)
  • 16:18 bblack: nginx upgraded to 1.11.10-1+wmf1 on all cache_misc
  • 16:14 thcipriani@tin: Synchronized README: test tin is back (duration: 01m 06s)
  • 16:09 filippo@tin: scap aborted: README (duration: 00m 28s)
  • 16:09 filippo@tin: Started scap: README
  • 16:03 urandom: T160759: restoring default Cassandra tombstone_threshold in eqiad
  • 16:00 godog: switch deployment server back to tin.eqiad.wmnet
  • 15:57 jynus@naos: Synchronized wmf-config/db-eqiad.php: Remove all read traffic from x1, es2 & es3-master-eqiad (duration: 01m 08s)
  • 15:45 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-rw,name=codfw
  • 15:45 bblack: nginx upgraded to 1.11.10-1+wmf1 on cp1051 (cache_misc)
  • 15:42 bblack: nginx upgraded to 1.11.10-1+wmf1 on cp1045 (cache_misc)
  • 15:36 godog: run-puppet-agent on cache_upload in codfw/swift for swift a/p in codfw
  • 15:34 chasemp: add cwd to acl*procurement-review for phab S4
  • 15:32 godog: run-puppet-agent on cache_upload in codfw/swift for swift a/a
  • 15:31 oblivian:: Setting swift-rw in eqiad UP
  • 15:31 oblivian:: Setting switft-rw in codfw DOWN
  • 15:16 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2059- https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 15:15 chasemp: labsdb1003 maintain-views --databases ptwikimedia,pawikisourcewbwikimedia,dtywiki --replace-all --debug T164103
  • 15:14 marostegui@naos: Synchronized wmf-config/db-codfw.php: Repool db2066, depool db2059 - T162539 T163548 (duration: 01m 06s)
  • 15:03 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Restore db1070 original weight - T160392 (duration: 00m 57s)
  • 14:46 oblivian:: Setting wdqs in codfw UP
  • 14:44 oblivian:: Setting restbase-async in eqiad DOWN
  • 14:43 oblivian:: Setting restbase in codfw DOWN
  • 14:43 _joe_: forcing a puppet run on cache (text,maps, misc) in eqiad/codfw to complete the switchback
  • 14:40 oblivian:: Setting restbase in eqiad UP
  • 14:39 oblivian:: Setting restbase-async in codfw UP
  • 14:36 moritzm: installing mysql-connector-java security updates on hadoop cluster
  • 14:35 _joe_: running puppet on varnishes in eqiad (text,misc,maps) to pick up the a/a traffic to services
  • 14:29 jynus: dropping and recreating user for maintain-views on labsdb1001 T164103
  • 14:24 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Increase db1070 weight - T160392 (duration: 01m 10s)
  • 14:23 chasemp: maintain-meta_p --databases dtywiki,pawikisource,ptwikimedia,wbwikimedia --debug labsdb1003 for T164103
  • 14:16 chasemp: maintain-meta_p --all-databases --purge --debug labsdb1001 for T164103
  • 14:09 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1070 with less weight - T160392 (duration: 01m 16s)
  • 14:03 chasemp: maintain-meta_p --all-databases --purge --debug labsdb1009/1010/1011 for T164103
  • 13:31 gehel: restart services on maps eqiad
  • 13:21 dereckson@naos: Synchronized wmf-config/throttle.php: Lift Account registration limit for cywiki for an event / T164482 (duration: 01m 08s)
  • 13:18 gehel: restart services on maps codfw
  • 13:15 gehel: restart services on maps-test
  • 12:42 marostegui: Stop MySQL db1070 for maintenance - T160392
  • 12:40 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool db1070 for maintenance - T160392 (duration: 01m 35s)
  • 12:28 marostegui: Deploy alter table enwiki.revision on dbstore1001 - T132416
  • 11:56 moritzm: installing mysql-connector-java security updates
  • 11:45 ema: starting cache_text upgrades to varnish 4.1.6-1wm1
  • 11:38 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Remove db1022 from config files as it will be decommissioned - T163778 (duration: 01m 06s)
  • 11:36 marostegui@naos: Synchronized wmf-config/db-codfw.php: Remove db1022 from config files as it will be decommissioned - T163778 (duration: 01m 25s)
  • 10:48 moritzm: installing tomcat security updates
  • 10:22 elukey: executed DEL ocg_job_status on rdb1007:6379 (new ocg_job_status hash is stored on the ocg* hosts) - T159850
  • 10:11 moritzm: restarting hhvm on mediawiki canaries to pick up freetype security update
  • 10:05 ema: restart varnish-be on cp2024 without RT experiment
  • 09:40 elukey: stop kafka on kafka1012 and reboot the host for kernel upgrade
  • 09:16 joal@naos: Finished deploy [analytics/refinery@9d35029]: (no justification provided) (duration: 02m 58s)
  • 09:13 joal@naos: Started deploy [analytics/refinery@9d35029]: (no justification provided)
  • 08:50 marostegui@naos: Synchronized wmf-config/db-codfw.php: Remove db1040 from config files as it will be decommissioned - T164057 (duration: 00m 48s)
  • 08:49 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Remove db1040 from config files as it will be decommissioned - T164057 (duration: 00m 55s)
  • 08:23 gehel: restart elasticsearch on relforge for JDK update
  • 07:59 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Remove tempdb2001 from config files as it will be decommissioned - T161712 (duration: 01m 07s)
  • 07:58 marostegui@naos: Synchronized wmf-config/db-codfw.php: Remove tempdb2001 from config files as it will be decommissioned - T161712 (duration: 01m 25s)
  • 07:25 _joe_: restarted cp3043 backend varnish at 7:13 UTC while trying to debug issues
  • 06:58 moritzm: installing freetype security updates
  • 06:26 marostegui@naos: Synchronized wmf-config/db-codfw.php: Depool tempdb2001, no longer needed - T161712 (duration: 01m 08s)
  • 06:17 marostegui: Stop MySQL on tempdb2001 to take a backup and prepare to decomission - T161712
  • 06:10 marostegui: Deploy alter table on wikidatawiki.wb_terms - db2066 - T162539 T163548
  • 06:10 marostegui@naos: Synchronized wmf-config/db-codfw.php: Depool db2066 - T162539 T163548 (duration: 01m 25s)
  • 06:09 Dereckson: CentralAuth: Removed MediaWiki 2FA for Alexsh (T164265)
  • 06:03 marostegui: Deploy alter table on wikidatawiki.wb_terms - dbstore2002 - T162539 T163548
  • 02:31 l10nupdate@naos: ResourceLoader cache refresh completed at Thu May 4 02:31:22 UTC 2017 (duration 5m 21s)
  • 02:26 l10nupdate@naos: scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 02s)
  • 02:08 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3a (duration: 08m 37s)
  • 02:00 urandom: T160759: lowering tombstone threshold to 1000 on all eqiad nodes
  • 01:59 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3a
  • 01:58 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3 (duration: 03m 29s)
  • 01:54 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: blacklist dewiki page, take 3
  • 01:51 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki (duration: 03m 28s)
  • 01:47 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki
  • 01:47 mobrovac@naos: Finished deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki (duration: 04m 12s)
  • 01:42 mobrovac@naos: Started deploy [restbase/deploy@4d04dfd]: Blacklist a page on dewiki
  • 01:22 urandom_: T160759: lowering tombstone_threshold on restbase1013 & restbase1014
  • 01:09 urandom_: T160759: starting restbase1012-a

2017-05-03

  • 22:59 RainbowSprinkles: gerrit: Quick restart to pick up logging config change
  • 22:47 ejegg: updated fundraising tools from 20afe9d to f2522cd
  • 22:23 ejegg: updated fundraising tools from a1e9342 to 20afe9d
  • 21:06 demon@naos: Synchronized README: No-op, forcing co-master sync (duration: 02m 28s)
  • 20:35 mutante: mw1167 - same as mw1166 (jobrunners) - there was a hhvm[12547]: Fatal error: unknown exception followed by mysql slow query, SELECT MASTER_TID_WAIT... | systemctl restart hhvm recovers it
  • 20:30 mutante: mw1166 - restart hhvm service (Fatal error: request has exceeded memory limit)
  • 20:13 urandom: T160759: restoring default tombstone thresholds, restbase10{3,4,6}
  • 19:57 mutante: mw1287 - also restarting hhvm (with systemctl restart)
  • 19:56 mutante: mw1287 - restarted crashed apache (proxy_fcgi:error)
  • 19:48 demon@naos: Finished scap: Cleaning up some unused branches, no-op (duration: 15m 13s)
  • 19:33 demon@naos: Started scap: Cleaning up some unused branches, no-op
  • 19:32 demon@naos: Pruned MediaWiki: 1.29.0-wmf.18 (duration: 00m 19s)
  • 19:30 demon@naos: Pruned MediaWiki: 1.29.0-wmf.20 [keeping static files] (duration: 00m 44s)
  • 19:27 ppchelko@naos: Finished deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 attempt #2 - checks timeout (duration: 01m 39s)
  • 19:26 ppchelko@naos: Started deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 attempt #2 - checks timeout
  • 19:25 ppchelko@naos: Finished deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 (duration: 07m 39s)
  • 19:18 ppchelko@naos: Started deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759
  • 18:48 papaul: db2084 - signing puppet certs, salt-key, initial run
  • 18:48 urandom: T160759: reducing tombstone threshold to 1000, restbase1014
  • 18:46 urandom: T160759: reducing tombstone threshold to 1000, restbase1016
  • 18:39 urandom: T160759: reducing tombstone threshold to 1000, restbase1013
  • 18:35 urandom: restarting restbase1016-c
  • 18:34 urandom: restarting restbase1013-b
  • 18:00 bblack: restart cp2005 backend (lag)
  • 17:34 moritzm: uploaded openjdk-8 u131 to apt.wikimedia.org
  • 17:14 jynus@naos: Synchronized wmf-config/InitialiseSettings.php: Disable cognate- it is causing an outage on x1 (duration: 01m 06s)
  • 16:30 jynus@naos: Synchronized wmf-config/db-eqiad.php: Fine-tune per-server load to reduce db connection errors (duration: 01m 27s)
  • 16:17 mutante: install2002 / db2084 - reverting live hack, re-enabling puppet. db2084 doesnt even talk to DHCP, all other new db servers are fine, just this one out of 22 is not. seems to be actually broken NIC, cable was switched, switch config was checked too
  • 16:08 mutante: install2002 - temp stop puppet to debug dhcp issue of db2084
  • 15:13 catrope@naos: Synchronized php-1.29.0-wmf.21/includes/logging/LogPager.php: Replace FORCE INDEX(ls_field_val) with IGNORE INDEX(ls_log_id) (https://gerrit.wikimedia.org/r/#/c/351653/ for T17441) (duration: 01m 14s)
  • 15:09 RoanKattouw: Live-hacked (cherry-picked) https://gerrit.wikimedia.org/r/#/c/351653/ onto naos and synced to mwdebug1002 for testing
  • 14:54 gehel: restart of elasticsearch on relforge
  • 14:43 END: (PASS) - Rolling restart of parsoid in codfw and eqiad - t09_restart_parsoid (switchdc/oblivian@neodymium)
  • 14:27 START: - Rolling restart of parsoid in codfw and eqiad - t09_restart_parsoid (switchdc/oblivian@neodymium)
  • 14:26 END: (PASS) - Update Tendril tree to start from the core DB masters in eqiad - t09_tendril (switchdc/oblivian@neodymium)
  • 14:25 START: - Update Tendril tree to start from the core DB masters in eqiad - t09_tendril (switchdc/oblivian@neodymium)
  • 14:25 godog: start swiftrepl on ms-fe1005
  • 14:24 END: (PASS) - Start MediaWiki jobrunners, videoscalers and maintenance in eqiad - t09_start_maintenance (switchdc/oblivian@neodymium)
  • 14:22 START: - Start MediaWiki jobrunners, videoscalers and maintenance in eqiad - t09_start_maintenance (switchdc/oblivian@neodymium)
  • 14:21 END: (PASS) - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/oblivian@neodymium)
  • 14:21 START: - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/oblivian@neodymium)
  • 14:20 END: (PASS) - Set MediaWiki in read-write mode in eqiad (db-eqiad config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:20 MediaWiki: read-only period ends at: 2017-05-03 14:20:28.286697 (switchdc/oblivian@neodymium)
  • 14:20 root@naos: Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-write mode in datacenter eqiad (duration: 00m 32s)
  • 14:19 START: - Set MediaWiki in read-write mode in eqiad (db-eqiad config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:19 END: (PASS) - Set core DB masters in read-write mode in eqiad, ensure masters in codfw are read-only - t07_coredb_masters_readwrite (switchdc/oblivian@neodymium)
  • 14:19 START: - Set core DB masters in read-write mode in eqiad, ensure masters in codfw are read-only - t07_coredb_masters_readwrite (switchdc/oblivian@neodymium)
  • 14:19 END: (PASS) - Switch the Redis masters from codfw to eqiad and invert the replication - t06_redis (switchdc/oblivian@neodymium)
  • 14:19 START: - Switch the Redis masters from codfw to eqiad and invert the replication - t06_redis (switchdc/oblivian@neodymium)
  • 14:18 END: (PASS) - Switch traffic flow to the appservers from codfw to eqiad - t05_switch_traffic (switchdc/oblivian@neodymium)
  • 14:17 START: - Switch traffic flow to the appservers from codfw to eqiad - t05_switch_traffic (switchdc/oblivian@neodymium)
  • 14:16 END: (FAIL) - Switch MediaWiki master datacenter and read-write discovery records from codfw to eqiad - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 14:16 root@naos: Synchronized wmf-config/CommonSettings.php: Switch MediaWiki active datacenter to eqiad (duration: 00m 31s)
  • 14:15 START: - Switch MediaWiki master datacenter and read-write discovery records from codfw to eqiad - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 14:15 END: (PASS) - Wipe and warmup caches in eqiad - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 14:12 elukey: restart kafka-mirror-main-eqiad_to_analytics.service on kafka1012
  • 14:12 END: (PASS) - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 14:09 START: - Wipe and warmup caches in eqiad - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 14:08 START: - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 14:08 END: (PASS) - Set core DB masters in read-only mode in codfw, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/oblivian@neodymium)
  • 14:08 START: - Set core DB masters in read-only mode in codfw, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/oblivian@neodymium)
  • 14:08 END: (PASS) - Set MediaWiki in read-only mode in codfw (db-codfw config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:07 root@naos: Synchronized wmf-config/db-codfw.php: Set MediaWiki in read-only mode in datacenter codfw (duration: 00m 45s)
  • 14:07 MediaWiki: read-only period starts at: 2017-05-03 14:07:08.261300 (switchdc/oblivian@neodymium)
  • 14:07 START: - Set MediaWiki in read-only mode in codfw (db-codfw config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 14:06 END: (PASS) - Stop MediaWiki jobrunners, videoscalers and cronjobs in codfw - t01_stop_maintenance (switchdc/oblivian@neodymium)
  • 14:01 START: - Stop MediaWiki jobrunners, videoscalers and cronjobs in codfw - t01_stop_maintenance (switchdc/oblivian@neodymium)
  • 14:00 godog: stop swiftrepl on ms-fe1005
  • 13:59 END: (PASS) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/oblivian@neodymium)
  • 13:59 START: - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/oblivian@neodymium)
  • 13:59 END: (PASS) - Disabling puppet on selected hosts in codfw and eqiad - t00_disable_puppet (switchdc/oblivian@neodymium)
  • 13:58 START: - Disabling puppet on selected hosts in codfw and eqiad - t00_disable_puppet (switchdc/oblivian@neodymium)
  • 13:16 hashar: Restarting Jenkins
  • 13:06 marostegui: db1028: Increased /srv/ by 20G to clear the warning
  • 11:59 moritzm: rebooted kubernetes1002, not 1003
  • 11:59 moritzm: rebooting kubernetes1003 for update to Linux 4.9
  • 11:39 moritzm: rebooting kubernetes1001 for update to Linux 4.9
  • 11:37 oblivian@naos: Synchronized wmf-config: Changing the read-only reason for the DC switchover (T164177) (duration: 01m 20s)
  • 11:25 moritzm: uploaded nodepool 0.1.1+wmf7 to apt.wikimedia.org
  • 11:23 hashar: Upgrading Jenkins 2.46.1 -> 2.46.2 - T144106
  • 11:16 jynus: restarting replication on s*, and x1 eqiad -> codfw
  • 11:02 hashar: Restarting Nodepool
  • 10:58 moritzm: upgrading nodepool on labnodepool1001 to a package including https://gerrit.wikimedia.org/r/351608
  • 10:18 END: (PASS) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 10:17 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/oblivian@neodymium)
  • 10:14 END: (PASS) - Set MediaWiki in read-write mode in codfw (db-codfw config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:14 START: - Set MediaWiki in read-write mode in codfw (db-codfw config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:14 END: (PASS) - Set MediaWiki in read-only mode in eqiad (db-eqiad config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:13 START: - Set MediaWiki in read-only mode in eqiad (db-eqiad config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium)
  • 10:13 _joe_: testing reverted steps of switchdc, non-dry-run --dc-from eqiad --dc-to codfw (should be noop)
  • 10:05 moritzm: installing icu security updates on trusty (jessie already fixed)
  • 09:50 marostegui: Restart db1097 to change its binlog to STATEMENT - T155099
  • 09:19 elukey: reboot mc[1019-1036].eqiad.wmnet for kernel upgrades
  • 09:18 moritzm: rebooting restbase1018 for update to Linux 4.9
  • 09:05 godog: rebuild mismounted FSes on ms-be1035 - T163673
  • 08:53 _joe_: rebooting restbase1018 T163280
  • 08:24 _joe_: deactivating restbase1018-vg for RAID failover and rebuild T163280
  • 08:01 hashar: Rolling back Jenkins 2.46.2 -> 2.46.1 - T144106
  • 07:53 hashar: Upgrading Jenkins 2.46.1 -> 2.46.2 - T144106
  • 07:42 _joe_: rebuilding RAIDs on restbase1018 T163280
  • 07:35 hashar: Restarting Nodepool to catch up with python-jenkins 0.4.14
  • 07:35 moritzm: updated python-jenkins on labnodepool1001 to 0.4.14 (needed by latest Jenkins LTS)
  • 02:48 l10nupdate@naos: ResourceLoader cache refresh completed at Wed May 3 02:48:33 UTC 2017 (duration 5m 21s)
  • 02:43 l10nupdate@naos: scap sync-l10n completed (1.29.0-wmf.21) (duration: 14m 02s)
  • 01:41 mutante: kubernetes - puppet fails because "E: Unable to locate package cni

2017-05-02

  • 23:42 TimStarling: EtcdConfig changes all reverted
  • 23:17 tstarling@puppetmaster1001: conftool action : set/@read-only.yaml; selector: name=ReadOnly,scope=eqiad
  • 23:07 TimStarling: scap pull on mw2017 and mwdebug1001 for etcd testing
  • 23:00 TimStarling: locking scap on naos for deployment of EtcdConfig https://gerrit.wikimedia.org/r/#/c/351132/
  • 22:57 _joe_: upgrading python-conftool across the fleet
  • 22:38 mutante: gerrit (cobalt/gerrit2001) - deployed firewall change to allow ssh between gerrit servers for clustering, new iptables rules exist now (T152525)
  • 21:52 jynus: running previously failed alter tables on s3-eqiad T163912
  • 21:33 jynus: creating missing math table on bdwikimedia (s3)
  • 20:04 hashar: Restarting Jenkins for plugin rollback
  • 17:51 bblack: codfw->eqiad switchback: end-user edge traffic back to normal @ eqiad ( https://gerrit.wikimedia.org/r/#/c/351330/ ) - 10 minute TTL for bulk traffic pattern shift starts now.
  • 17:50 mobrovac@naos: Finished deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079 (duration: 06m 04s)
  • 17:48 papaul: new db servers signing puppet certs,salt-key, initial run
  • 17:44 mobrovac@naos: Started deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079
  • 17:40 END: (PASS) - Start MediaWiki jobrunners, videoscalers and maintenance in codfw - t09_start_maintenance (switchdc/volans@neodymium)
  • 17:39 mobrovac@naos: Finished deploy [restbase/deploy@6adb0f2]: (no justification provided) (duration: 01m 34s)
  • 17:38 START: - Start MediaWiki jobrunners, videoscalers and maintenance in codfw - t09_start_maintenance (switchdc/volans@neodymium)
  • 17:37 mobrovac@naos: Started deploy [restbase/deploy@6adb0f2]: (no justification provided)
  • 17:37 END: (PASS) - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/volans@neodymium)
  • 17:37 START: - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/volans@neodymium)
  • 17:35 END: (PASS) - Set MediaWiki in read-write mode in codfw - t08_stop_mediawiki_readonly (switchdc/volans@neodymium)
  • 17:35 MediaWiki: read-only period ends at: 2017-05-02 17:35:48.111079 (switchdc/volans@neodymium)
  • 17:35 START: - Set MediaWiki in read-write mode in codfw - t08_stop_mediawiki_readonly (switchdc/volans@neodymium)
  • 17:35 oblivian@puppetmaster1001: conftool action : set/val=test; selector: name=ReadOnly,scope=codfw
  • 17:33 END: (PASS) - Set core DB masters in read-write mode in codfw, ensure masters in eqiad are read-only - t07_coredb_masters_readwrite (switchdc/volans@neodymium)
  • 17:33 START: - Set core DB masters in read-write mode in codfw, ensure masters in eqiad are read-only - t07_coredb_masters_readwrite (switchdc/volans@neodymium)
  • 17:32 END: (PASS) - Switch the Redis masters from eqiad to codfw and invert the replication - t06_redis (switchdc/volans@neodymium)
  • 17:32 START: - Switch the Redis masters from eqiad to codfw and invert the replication - t06_redis (switchdc/volans@neodymium)
  • 17:31 END: (PASS) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:31 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:23 END: (FAIL) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:23 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:20 END: (PASS) - Switch traffic flow to the appservers from eqiad to codfw - t05_switch_traffic (switchdc/volans@neodymium)
  • 17:17 START: - Switch traffic flow to the appservers from eqiad to codfw - t05_switch_traffic (switchdc/volans@neodymium)
  • 17:08 END: (FAIL) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:08 START: - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/volans@neodymium)
  • 17:05 catrope@naos: Synchronized php-1.29.0-wmf.21/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: T164157 (duration: 01m 00s)
  • 17:03 END: (FAIL) - Set core DB masters in read-only mode in eqiad, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/volans@neodymium)
  • 17:03 START: - Set core DB masters in read-only mode in eqiad, ensure all masters are read-only - t03_coredb_masters_readonly (switchdc/volans@neodymium)
  • 16:58 END: (FAIL) - Set MediaWiki in read-only mode in eqiad - t02_start_mediawiki_readonly (switchdc/volans@neodymium)
  • 16:57 MediaWiki: read-only period starts at: 2017-05-02 16:57:37.952132 (switchdc/volans@neodymium)
  • 16:57 START: - Set MediaWiki in read-only mode in eqiad - t02_start_mediawiki_readonly (switchdc/volans@neodymium)
  • 16:56 ppchelko@naos: Finished deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements. Restart after a check timeout (duration: 07m 56s)
  • 16:53 END: (FAIL) - Stop MediaWiki jobrunners, videoscalers and cronjobs in eqiad - t01_stop_maintenance (switchdc/volans@neodymium)
  • 16:53 START: - Stop MediaWiki jobrunners, videoscalers and cronjobs in eqiad - t01_stop_maintenance (switchdc/volans@neodymium)
  • 16:52 END: (PASS) - Disabling puppet on selected hosts in eqiad and codfw - t00_disable_puppet (switchdc/volans@neodymium)
  • 16:51 START: - Disabling puppet on selected hosts in eqiad and codfw - t00_disable_puppet (switchdc/volans@neodymium)
  • 16:51 END: (PASS) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:50 START: - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:50 END: (FAIL) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:50 START: - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/volans@neodymium)
  • 16:48 ppchelko@naos: Started deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements. Restart after a check timeout
  • 16:47 volans: testing (not dry-run) tasks for tomorrow's switchover in reverse mode eqiad->codfw
  • 16:43 ppchelko@naos: Started deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements. Restart after a check fail
  • 16:42 ppchelko@naos: Finished deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements (duration: 05m 47s)
  • 16:37 ppchelko@naos: Started deploy [restbase/deploy@6adb0f2]: Summary endpoint enhancements
  • 16:36 END: (PASS) - Wipe and warmup caches in codfw - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 16:32 END: (PASS) - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 16:32 _joe_: message about cache warmup is wrong, it is being executed in eqiad
  • 16:29 START: - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 16:29 START: - Wipe and warmup caches in codfw - t04_cache_wipe (switchdc/oblivian@neodymium)
  • 16:29 _joe_: testing (not dry-run) cache wipe/warmup and redis resync for the switchover codfw->eqiad
  • 16:25 papaul: OS install on new db servers
  • 16:16 elukey@naos: Synchronized wmf-config/ProductionServices.php: Replace Redis lock IPs after hw refresh (duration: 01m 16s)
  • 15:53 oblivian@puppetmaster1001: conftool action : set/@read-only.yaml; selector: name=ReadOnly,scope=eqiad
  • 15:36 ema: cache_misc: upgrade varnish to 4.1.6-1wm1
  • 15:24 _joe_: restarting confd in eqiad/esams to pick up the server change
  • 15:20 godog: add 100G to graphite1003 and graphite2002
  • 15:01 elukey: stop and masked memcached on mc10[01-18].eqiad.wmnet
  • 14:35 moritzm: rebooting rdb1007 for update to latest 4.4 kernel
  • 14:22 moritzm: rebooting rdb1005 for update to latest 4.4 kernel
  • 13:52 moritzm: rebooting rdb1003 for update to latest 4.4 kernel
  • 13:39 moritzm: rebooting rdb1001 for update to latest 4.4 kernel
  • 13:26 gehel: stopping load on elastic2020 - T149006
  • 13:15 ema: cache_maps: upgrade varnish to 4.1.6-1wm1
  • 13:13 gehel: load testing elastic2020 before putting it back in the cluster - T149006
  • 13:03 godog: rebuild mismounted FSes on ms-be1036 - T163673
  • 12:22 moritzm: rebooting rdb1008 for kernel update to Linux 4.9
  • 12:19 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=pdf,name=ocg1001.eqiad.wmnet
  • 12:15 _joe_: manually set ocg1001,3 to be redis slaves of ocg1002
  • 11:47 moritzm: rebooting rdb1006 for kernel update to Linux 4.9
  • 11:37 gehel: restart of relforge cluster to activate hebrew plugin
  • 11:30 moritzm: rebooting rdb1004 for kernel update to Linux 4.9
  • 11:23 hashar: Restarting Nodepool
  • 11:23 moritzm: downgraded python-jenkins on labnodepool1001 to 0.2.1 (0.4.11 is still broken with the new Jenkins LTS)
  • 11:06 moritzm: rebooting rdb1002 for kernel update to Linux 4.9
  • 10:51 hashar: Restarting Nodepool with python-jenkins 0.4.11
  • 10:50 moritzm: upgrading python-jenkins on labnodepool1001 to 0.4.11
  • 10:44 akosiaris: create new ganeti nodegroup called row_A holding ganeti2005, ganeti2006. Renamed the default nodegroup to row_B. T164011
  • 10:20 elukey: restart ocg on ocg1002 (localhost:8000 - frontend - not reachable)
  • 10:12 hashar: Upgrading Jenkins to 2.46.1 - T144106
  • 10:11 jynus: stopping replication on db1015
  • 09:58 END: (PASS) - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 09:56 START: - Resync the redis for jobqueues in eqiad with the masters in codfw - t04_resync_redis (switchdc/oblivian@neodymium)
  • 09:55 _joe_: testing pre-switchover the step to restart & resync redises in dc_to (eqiad)
  • 09:48 jynus@naos: Synchronized wmf-config/db-codfw.php: Add db1097 (duration: 01m 00s)
  • 09:47 jynus@naos: Synchronized wmf-config/db-eqiad.php: Depool db1015 & add db1097 (duration: 01m 17s)
  • 09:36 hashar: Jenkins/CI is back up!
  • 09:34 hashar: Nodepool can not add instances to Jenkins any more. Roll backing Jenkins to 2.32.3
  • 09:29 akosiaris: Set description for ganeti2005, ganeti2006 on asw-a-codfw. T164011
  • 09:27 akosiaris: create interface range ganeti on asw-a-codfw. T164011
  • 09:24 akosiaris: remove configuration from ge-8/0/0, ge-8/0/3 from asw-b-codfw for ganeti2005, ganeti2006 move to row A. T164011
  • 09:21 hashar: Starting Nodepool
  • 09:16 hashar: Stopping Nodepool
  • 09:14 hashar: OpenStack / wmflabs fails to create new instances
  • 08:40 hashar: Upgrading Jenkins to 2.46.2 - T144106
  • 08:40 elukey: run puppet and restart nutcracker on eqiad hosts with profile::mediawiki::nutcracker
  • 08:33 hashar: Upgrading Jenkins to 2.32.3 - T144106
  • 08:32 elukey: stop and mask redis on mc1001-mc1018 - T137345
  • 08:26 hashar: Upgrading Jenkins to 2.19.4 - T144106
  • 08:14 hashar: Installing Jenkins Pipeline plugin
  • 08:04 hashar: Installing Jenkins plugin Pipeline: Stage View https://plugins.jenkins.io/pipeline-stage-view
  • 08:04 hashar: Upgrading Jenkins to 2.7.4 - T144106
  • 07:59 elukey: Swap mc1001->mc1012 with mc1019->mc2030 - T137345 (more informative :)
  • 07:58 elukey: wap mc1001->mc1012 with mc1019->mc2030
  • 07:36 _joe_: starting etcd replication codfw => eqiad
  • 06:46 _joe_: disabling etcd auth on conf1*, converting to use nginx for TLS/auth T159687
  • 03:10 mattflaschen@naos: Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs/: Urgent deploy: Fix FlaggedRevs fatal, and also a filter issue: T164096 and T164049 (duration: 00m 56s)
  • 02:45 tstarling@naos: Synchronized php-1.29.0-wmf.21/includes/config/EtcdConfig.php: EtcdConfig backported bug fixes (duration: 01m 02s)
  • 02:34 tstarling@naos: Synchronized wmf-config/CommonSettings.php: siteinfo hook (duration: 02m 39s)
  • 00:33 tstarling@puppetmaster1001: conftool action : set/@read-write.yaml; selector: name=ReadOnly
  • 00:33 tstarling@puppetmaster1001: conftool action : set/@dc-codfw.yaml; selector: name=WMFMasterDatacenter
  • 00:25 TimStarling: populating production etcd with initial mediawiki config keys

2017-05-01

  • 23:41 mutante: netmon1002 - signed puppet cert, initial puppet run, accept salt-key,.. (T159756)
  • 23:15 mutante: netmon1002 - boot into PXE, initial OS install (T159756)
  • 23:06 bd808: Ran puppet cert clean striker-deploy03.striker.eqiad.wmflabs on labcontrol1001
  • 19:43 ejegg: updated payments-wiki from 4c56302 to 57451de
  • 19:10 mobrovac@naos: Finished deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version (duration: 02m 08s)
  • 19:08 mobrovac@naos: Started deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version
  • 18:46 mutante: temp. re-enabling puppet on restbase1018 and running it once to fix icinga config syntax error. then disabling it again. restbase service stopped before and after. this box has a broken disk.
  • 18:35 mutante: brought mc1018 back up, ran puppet on it and then on Icinga. parent was adjusted from asw-d-eqiad to asw2-2-eqiad. reduced icinga config errors by 50% :p (1 of 2 left, restbase1018)
  • 18:28 mutante: powercycling mc1018
  • 18:19 mutante: manually removed asw-d-eqiad remnants from /etc/icinga/puppet_hosts.cfg to fix icinga config after gerrit:351167 / T148506. fixes Icinga config error. then puppet adds it back
  • 18:03 andrewbogott: restarting nova-fullstack tests but saving instance 2d60e8c5-fb2a-4681-ac0a-ae2162bb13fb for future research
  • 17:03 mutante: phab2001 - start/stop phd service - that fixed "systemd state" icinga check, even though phd does not run just like before
  • 16:53 bblack: reverting inter-caching routing from codfw-switchover period: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchback
  • 16:52 bblack@neodymium: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet
  • 16:19 mobrovac@naos: Finished deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 (duration: 02m 19s)
  • 16:17 mobrovac@naos: Started deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514
  • 15:46 jynus: shutting down db1063 for maintenance T164107
  • 15:13 bblack: restarting varnish backend on cp2002 (mailbox issues)
  • 12:58 Amir1: cleaning ores_classification rows half an hour or so (T159753)
  • 11:31 jynus: running alter table on categorylinks on db1054, 68, 62 T164185
  • 11:25 jynus: running alter table on enwiki.categorylinks on db1052 T164185
  • 03:46 tstarling@naos: Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 01m 01s)
  • 03:44 tstarling@naos: Synchronized wmf-config/etcd.php: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 02m 39s)

2017-04-30

  • 16:35 urandom: T160759: Restoring default tombstone_threshold on restbase1009
  • 16:29 ppchelko@naos: Finished deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues (duration: 07m 27s)
  • 16:21 ppchelko@naos: Started deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues
  • 15:31 elukey: set tombstone_failure_threshold=1000 to restbase1009-a with P5165 on restbase1009-a - T160759
  • 15:24 elukey: set tombstone_failure_threshold=10000 to restbase1009-a with P5165 on restbase1009-a - T160759
  • 07:45 elukey: deleted /srv/cassandra-a/commitlog/CommitLog-5-1490738321543.log from restbase1009-a (empty commit log file created before OOM - backup in /home/elukey)

2017-04-29

  • 10:50 elukey: set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to kafka[1018,1020,1022].eqiad.wmnet (was 120 - maybe related to T136094 ?)
  • 10:39 elukey: start ferm on kafka1020/18 (nodes were previously down for maintenance, not sure why ferm wasn't started)
  • 09:59 reedy@naos: Synchronized wmf-config/CommonSettings.php: Revert pdf processor firejails T164045 (duration: 02m 41s)

2017-04-28

  • 21:24 Dereckson: End of live debug on mwdebug1001, restored previous state with a local scap pull
  • 21:00 ejegg: updated payments-wiki from 1620b82 to 4c56302
  • 20:23 Dereckson: Live debug on mwdebug1001 for T164059
  • 19:30 jynus: shutting down db1063 - I see high temperatures reported, and going up T164107
  • 19:09 urandom: T163936: reenabling puppet on restbase-dev1001
  • 18:14 urandom: T163936: disabling puppet on restbase-dev1001 (t-shooting c-m-c)
  • 17:09 jynus: restarting replication on all nodes on s7-eqiad T164092
  • 16:38 jynus: stopping replication on all nodes on s7-eqiad in case db1062 boots up in a corrupted state
  • 16:36 jynus: restarting db1062 once more T164092
  • 15:56 godog: poweroff prometheus1004 for ram upgrade - T163385
  • 15:40 jynus: deploying new events_coredb_slave.sql on codfw T160984
  • 15:21 godog: poweroff prometheus1003 for ram upgrade - T163385
  • 14:55 gehel: shutting down elastic2020 for mainboard replacement - T149006
  • 14:32 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Change db1063 IP and rack - T163895 (duration: 00m 48s)
  • 14:31 marostegui@naos: Synchronized wmf-config/db-codfw.php: Change db1063 IP and rack - T163895 (duration: 00m 50s)
  • 14:04 marostegui: Stop and shutdown db1063 - T163895
  • 14:04 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Change db1062 rack location - T163895 (duration: 00m 52s)
  • 13:59 moritzm: installing ghostscript security updates
  • 13:56 urandom: T163936: restarting cassandra-metrics-collector, restbase production
  • 13:55 urandom: $ readlink /usr/local/lib/cassandra-metrics-collector/cassandra-metrics-collector.jar
  • 13:50 ema: varnish 4.1.6-1wm1 uploaded to apt.w.o
  • 13:46 urandom: T163936: restarting cassandra-metrics-collector on restbase1007
  • 13:46 marostegui@naos: Synchronized wmf-config/db-codfw.php: Change db1061 IP - T163895 (duration: 01m 00s)
  • 13:44 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Change db1061 IP - T163895 (duration: 01m 19s)
  • 13:44 urandom: T163936: forcing puppet run on restbase1007
  • 13:30 marostegui: Stop MySQL and shutdown db1061 - T163895
  • 13:26 marostegui: Stop MySQL and shutdown db1062 - T163895
  • 10:47 akosiaris: migrate/evacuate ganeti2005, ganeti2006 for T164011
  • 10:42 akosiaris: reboot oresrdb1002 for kernel upgrade
  • 09:56 moritzm: installing libxslt security updates on trusty
  • 09:29 marostegui: upgrade mariadb db1059,db1056 from 10.0.22 to 10.0.28
  • 09:17 marostegui: upgrade mariadb db1071 from 10.0.23 to 10.0.28
  • 09:15 akosiaris: reboot oresrdb1001 for kernel upgrade
  • 09:02 marostegui: Upgrade mariadb on db1081 and db1084 from 10.0.23 to 10.0.28
  • 08:03 Amir1: cleanup done, 4M rows deleted (T159753)
  • 07:58 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1045 - T162539 T163548 (duration: 02m 38s)
  • 06:48 Amir1: cleaning around 5-10M rows in ores_classification in enwiki (half-an-hour script, T159753)
  • 01:18 ejegg: rolled payments-wiki back to 1620b82
  • 01:15 ejegg: udated payments-wiki from 1620b82 to 4c56302

2017-04-27

  • 23:36 catrope@naos: Synchronized php-1.29.0-wmf.21/extensions/SecurePoll/includes/pages/CreatePage.php: Stop gap for fix global election creation (T164043) (duration: 00m 43s)
  • 23:34 catrope@naos: Synchronized wmf-config/InitialiseSettings.php: Enable WikidataPageBanner on viwikivoyage (T163662) (duration: 00m 46s)
  • 23:29 ejegg: rolled back payments-wiki to 1620b82
  • 23:29 catrope@naos: Synchronized wmf-config/InitialiseSettings.php: Enable responsive references on elwiki (T163074) (duration: 00m 49s)
  • 23:27 ejegg: udated payments-wiki from 1620b82 to 4c56302
  • 23:22 catrope@naos: Synchronized wmf-config/InitialiseSettings.php: Set ORES thresholds in new format for all enabled wikis (T162760) (duration: 00m 53s)
  • 23:16 catrope@naos: Synchronized php-1.29.0-wmf.21/includes/deferred/LinksUpdate.php: Release prior row locks beforehand in LinksUpdate::updateCategoryCounts (T163801) (duration: 01m 01s)
  • 23:13 catrope@naos: Synchronized wmf-config/CirrusSearch-common.php: Enable sistersearch title profile for wikivoyage (duration: 01m 19s)
  • 21:57 cwd: updated process-control to 1.0.6
  • 21:56 volans: shutting down gadolinium, it came up 1h25m ago and stole the public IP from meitnerium
  • 21:08 ppchelko@naos: Finished deploy [restbase/deploy@61c1ceb]: Automatically rerender parsoid, only store summaries if they are changed, don't rerender data-parsoid (duration: 07m 16s)
  • 21:01 ppchelko@naos: Started deploy [restbase/deploy@61c1ceb]: Automatically rerender parsoid, only store summaries if they are changed, don't rerender data-parsoid
  • 20:53 ppchelko@naos: Finished deploy [restbase/deploy@fcfc537]: Automatically rerender parsoid, only store summaries if they are changed (duration: 11m 33s)
  • 20:53 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.21
  • 20:47 twentyafterfour@naos: Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs: deploy fix for T163994 (duration: 01m 17s)
  • 20:42 ppchelko@naos: Started deploy [restbase/deploy@fcfc537]: Automatically rerender parsoid, only store summaries if they are changed
  • 20:37 mutante: ocg1001 - has been reinstalled but ocg package deployment fails currently "has the minion key been accepted", should not be repooled just yet
  • 20:32 mutante: ores/cache::misc: switch ores back to codfw-only - everything is like it was before the failed deploy yesterday again
  • 20:21 andrewbogott: stripping a bunch of unneeded extensions from wikitech-static
  • 20:20 mutante: ocg1001 - re-added to puppet, initial run, reinstall ongoing (T161158)
  • 20:18 mutante: ores is active/active now, for a short time
  • 20:16 mutante: ocg1001 - revoke old puppet cert, salt key
  • 20:15 mutante: run puppet on cache::misc to push ores change - cumin -b 5 -s 10 'R:class = role::cache::misc' 'run-puppet-agent -q'
  • 20:03 twentyafterfour: 1.29.0-wmf.21 is blocked by T163994
  • 20:01 mutante: ocg1001 - reboot into PXE, re-install
  • 19:59 twentyafterfour@naos: Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs/frontend/FlaggedRevsUI.hooks.php: deploy fix for T163994 (duration: 01m 04s)
  • 19:33 twentyafterfour: start mediawiki deployment train group 2 - all wikis to 1.29.0-wmf.21
  • 19:24 reedy@naos: Synchronized wmf-config/CommonSettings.php: Run pdf processors in firejails T164000 (duration: 01m 20s)
  • 19:20 XenoRyet: Updated paymentswiki from ee7d402 to 1620b82
  • 18:47 addshore: Morning SWAT Done!
  • 18:46 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT WMDE Spring campaign - Remove logging (no longer needed) (duration: 00m 47s)
  • 18:44 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT wmgUseGettingStarted true for dewiki (duration: 00m 48s)
  • 18:41 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable Cognate Logging (duration: 00m 48s)
  • 18:40 XenoRyet: Roll back paymentswiki from 030b2f9 to ee7d402
  • 18:34 addshore@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch: SWAT #1 #2 (duration: 00m 59s)
  • 18:31 addshore@naos: Synchronized wmf-config/CirrusSearch-common.php: SWAT update name of sistersearch profile for wikivoyage (duration: 00m 49s)
  • 18:24 addshore@naos: Synchronized php-1.29.0-wmf.21/extensions/WikimediaEvents/WikimediaEventsHooks.php: SWAT WMDE Spring campaign - Remove hook PT2/2 (duration: 00m 52s)
  • 18:23 urandom: T163936: restarting cassandra-metrics-collector, restbase production
  • 18:22 addshore@naos: Synchronized php-1.29.0-wmf.21/extensions/WikimediaEvents/extension.json: SWAT WMDE Spring campaign - Remove hook PT1/2 (duration: 00m 57s)
  • 18:21 urandom: T163936: restarting cassandra-metrics-collector, restbase staging
  • 18:20 addshore@naos: Synchronized php-1.29.0-wmf.21/includes/api/ApiQueryPagePropNames.php: SWAT Do not add limit to ApiQueryPagePropNames when database type is mysql (duration: 01m 04s)
  • 18:17 twentyafterfour: restarting apache on iridium to hotfix T164005
  • 18:07 addshore@naos: Synchronized wmf-config/Wikibase-production.php: SWAT Fix echoIcon for wikibase in testwikis (duration: 01m 27s)
  • 17:44 XenoRyet: Updated paymentswiki from ee7d402 to 030b2f9
  • 17:36 ladsgroup@naos: Finished deploy [ores/deploy@68cca85]: (no justification provided) (duration: 21m 50s)
  • 17:30 _joe_: started pybal on lvs1006 after network was fixed
  • 17:25 XenoRyet: reverted paymentswiki from 030b2f9 to ee7d402
  • 17:20 XenoRyet: Updated paymentswiki from ee7d402 to 030b2f9
  • 17:15 ladsgroup@naos: Started deploy [ores/deploy@68cca85]: (no justification provided)
  • 17:15 Amir1: ladsgroup@naos:/srv/deployment/ores/deploy$ scap deploy (T163950)
  • 17:12 demon@naos: Pruned MediaWiki: 1.29.0-wmf.18 [keeping static files] (duration: 00m 20s)
  • 17:08 _joe_: stop pybal on lvs1006 to stop announcing via BGP
  • 17:08 demon@naos: Pruned MediaWiki: 1.29.0-wmf.16 (duration: 00m 13s)
  • 17:04 demon@naos: Synchronized scap/plugins/clean.py: One last fix (duration: 01m 04s)
  • 16:53 gehel: unbanning all elasticsearch servers in eqiad row D - T148506
  • 16:48 demon@naos: Synchronized scap/plugins/clean.py: --keep-static is nice now. Also need a co-master sync (duration: 01m 28s)
  • 16:45 andrewbogott: re-enabling labs instance creation/deletion
  • 16:42 demon@naos: Pruned MediaWiki: 1.29.0-wmf.19 [keeping static files] (duration: 00m 15s)
  • 16:32 gehel: unbanning elasticsearch servers in eqiad row D - elastic10(17|18|19|20) - T148506
  • 15:56 elukey: restart of jmxtrans on all the hadoop worker nodes
  • 15:51 andrewbogott: disabling labs instance create/delete to avoid hilarity during network maintenance
  • 15:50 elukey: forced 'service ferm start' on the failed analytics hosts
  • 15:46 marostegui: Upgrade db1091 mariadb from 10.0.23 to 10.0.28
  • 15:39 marostegui: Upgrade db1089 mariadb from 10.0.23 to 10.0.28
  • 15:34 marostegui: Upgrade db1090 mariadb from 10.0.23 to 10.0.28
  • 15:22 jynus: stopping all replication channels on dbstore1001 for topology changes
  • 14:34 ema: upgrade upload-codfw to varnish 4.1.5-1wm4 T145661
  • 14:29 marostegui: Stop MySQL and shutdown es2019 for HW replacement - T149526
  • 14:26 ema: varnish 4.1.5-1wm4 uploaded to apt.w.o T145661
  • 14:08 marostegui: Deploy alter table labswiki.revision on labtestweb2001 - T132416
  • 14:04 marostegui: Deploy alter table labswiki.revision on silver - T132416
  • 13:57 _joe_: restarting HHVM on mw2213, stuck in HPHP::Treadmill::getAgeOldestRequest
  • 13:52 ladsgroup@naos: Synchronized wmf-config/Wikibase-production.php: SWAT: Set echoIcon for notification of wikibase in test wikis (T142102) (duration: 00m 57s)
  • 13:52 Amir1: start of scap sync-file wmf-config/Wikibase-production.php 'SWAT: Set echoIcon for notification of wikibase in test wikis (T142102)'
  • 13:45 ladsgroup@naos: Synchronized portals: (no justification provided) (duration: 01m 05s)
  • 13:44 ladsgroup@naos: Synchronized portals/prod/wikipedia.org/assets: (no justification provided) (duration: 01m 21s)
  • 13:43 Amir1: ladsgroup@naos:/srv/mediawiki-staging$ portals/sync-portals (T128546)
  • 12:53 volans: disabled puppet on rdb*
  • 12:06 marostegui: Upgrade es1011 and es1014 from mariadb 10.0.22 to mariadb 10.0.28
  • 11:50 marostegui: Upgrade mariadb from 10.0.22 to 10.0.28 on es1015
  • 09:46 moritzm: upgrading mysql on bohrium/piwik
  • 09:25 _joe_: restarting all redis instances for jobqueues on eqiad to force a full resync with masters in codfw T163337
  • 08:55 jynus: deploying alter table to all wikis on s6 T163979
  • 08:54 _joe_: restarting redis rdb1001:6380 after cleaning up the current AOF files for investigation of T163337
  • 08:50 moritzm: installing django security updates
  • 08:29 godog: ms-be1039 issue "controller slot=3 pd 1I:1:5 modify disablepd" to force failed sdc - T163690
  • 08:25 ema: restart varnish-be on cp2024 with expiry thread RT experiment enabled
  • 08:19 ema: upgrade varnish to 4.1.5-1wm3 on cp2024
  • 07:56 elukey: aqs100[69] back serving AQS traffic
  • 07:55 ema: varnish 4.1.5-1wm3 uploaded to apt.w.o T145661
  • 07:16 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool hosts that needed to be moved for the network maintenance - T162681 (duration: 02m 32s)
  • 06:53 marostegui: Reboot es1014 for kernel upgrade - T162029
  • 06:50 elukey: executed kafka preferred-replica-election to rebalance topic leaders in the analytics cluster after maintenance
  • 06:45 marostegui: Reboot es1011 for kernel upgrade - T162029
  • 06:39 marostegui: Logging for the record: drop table hashs from s2, s3 and s7 (only places where it existed) - T54927
  • 06:23 _joe_: moving orphaned objects in ms-be1039's root partition in sdc1/stale_root to save space
  • 06:17 marostegui: Deploy schema change on s7 metawiki.pagelinks to remove partitioning on db1041 - T153300
  • 06:14 marostegui: Deploy alter table on s5 (wikidatawiki) on db1049 - T163548
  • 06:14 marostegui: Deploy alter table on s5 (wikidatawiki) on db1070 (running locally instead of neodymium as this host will be affected by the network maintenance) - T163548
  • 06:11 marostegui: Deploy alter table on s5 (wikidatawiki) on db1070 (running locally instead of neodymium as this host will be affected by the network maintenance) - T130067 T162539
  • 06:09 marostegui: Deploy alter table on s5 (wikidatawiki) on db1049 - T130067 T162539
  • 05:59 marostegui: Deploy alter table labsdb1003 (wikidatawiki) https://phabricator.wikimedia.org/T162539%C2%A0https://phabricator.wikimedia.org/T163548
  • 05:24 Amir1: cleaning some rows in ores_classification in enwiki (T159753)
  • 03:44 ottomata: starting kafka broker on kafka1020
  • 03:40 ottomata: running kafka replica election to bring kafka1018 back as preferred leader
  • 02:21 Jamesofur: running populateEditCount.php in screen on wast for T163854, counting edits for board vote eligibility
  • 02:16 RoanKattouw: Reset 2FA for T163931 on labswiki
  • 00:14 twentyafterfour: starting phabricator update
  • 00:05 ebernhardson@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch/includes/Searcher.php: cirrus: align sister search boost template config variable with documentation (duration: 00m 50s)

2017-04-26

  • 23:51 niharika29@naos: Synchronized php-1.29.0-wmf.21/includes/interwiki/ClassicInterwikiLookup.php: Interwiki: Dont override interwiki map order (T145337) (duration: 01m 00s)
  • 23:38 niharika29@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch/: Align other index template boosting config names (duration: 00m 57s)
  • 23:34 niharika29@naos: Synchronized wmf-config/InitialiseSettings.php: Increase max field count for wikidata; Enable Flow beta feature on arwiki (T155720) (duration: 00m 58s)
  • 23:31 niharika29@naos: Synchronized wmf-config/InitialiseSettings.php: Increase max field count for wikidata; Enable Flow beta feature on arwiki (T155720) (duration: 01m 04s)
  • 23:29 niharika29@naos: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] Increase max field count for wikidata (duration: 01m 23s)
  • 21:42 mutante: running puppet on all cache::misc nodes via cumin to switch ORES to eqiad
  • 21:30 mutante: restarting uwsgi-ores service on all scb2* with systemctl restart
  • 21:15 twentyafterfour: finished with mediawiki deployment train for group1. Everything appears stable, no increase in logspam.
  • 21:12 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.21
  • 21:09 halfak@naos: Started restart [ores/deploy@cc12103]: (no justification provided)
  • 21:08 twentyafterfour@naos: Synchronized php-1.29.0-wmf.21/extensions/Flow/Hooks.php: sync https://gerrit.wikimedia.org/r/#/c/350481/ refs T163896 T161733 (duration: 01m 20s)
  • 21:05 arlolra: Updated Parsoid to 4949857a (T116508, T64270, T133673)
  • 20:55 arlolra@naos: Finished deploy [parsoid/deploy@8d109eb]: Updating Parsoid to 4949857a (duration: 06m 52s)
  • 20:48 arlolra@naos: Started deploy [parsoid/deploy@8d109eb]: Updating Parsoid to 4949857a
  • 20:48 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/350481/1 to get the train back on track refs T161733
  • 20:35 bsitzmann@naos: Finished deploy [mobileapps/deploy@b5afcb8]: Update mobileapps to 14bd4a5 (duration: 15m 17s)
  • 20:34 halfak@naos: Finished deploy [ores/deploy@cc12103]: T162892 (duration: 21m 28s)
  • 20:31 elukey: restart zookeeper on conf1003 after network maintenance
  • 20:20 bsitzmann@naos: Started deploy [mobileapps/deploy@b5afcb8]: Update mobileapps to 14bd4a5
  • 20:12 halfak@naos: Started deploy [ores/deploy@cc12103]: T162892
  • 19:50 elukey: restart kafka nodes (kafka1018 and kafka1020) after network maintenance
  • 19:45 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.20
  • 19:42 twentyafterfour: rolling back group1 to wmf.20 due to T163896 refs T161733
  • 19:31 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.29.0-wmf.21
  • 19:24 twentyafterfour: begin deployment train: group1 wikis to 1.29.0-wmf.21 refs T161733
  • 19:22 bblack: initiating cumin-based restart of all varnish backends for cache_upload in codfw to downgrade from experimental package. 30 minute spacing, 10 hosts, ~5h to completion...
  • 19:17 thcipriani@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable collectionsaveascommunitypage right on es.wikipedia T163767 (duration: 00m 49s)
  • 19:05 bblack: restarting varnish frontend and backend on cp3033 to downgrade
  • 19:03 bblack: restaring varnish-frontend on cp2014 to downgrade
  • 18:58 thcipriani@naos: Synchronized wmf-config/CommonSettings.php: SWAT: Workaround issue of overriding whitelist config variable T163114 (duration: 00m 53s)
  • 18:56 bblack: downgrading varnish back to 4.1.5-wm1 on all -wm2 hosts
  • 18:50 thcipriani@naos: Synchronized php-1.29.0-wmf.21/extensions/CirrusSearch: SWAT: Provide a way to blacklist a set of wikis for crosswiki search T163546 (duration: 01m 02s)
  • 18:44 thcipriani@naos: Synchronized wmf-config/CirrusSearch-common.php: SWAT: Adjust sistersearch against wikivoyage to require title matching T163547 (duration: 01m 11s)
  • 18:38 thcipriani@naos: Synchronized wmf-config/CirrusSearch-common.php: SWAT: Configure multimedia search template boosting T163223 (duration: 00m 53s)
  • 18:30 thcipriani@naos: Synchronized php-1.29.0-wmf.20/extensions/SecurePoll: SWAT: Add voter scripts for board/fdc election 2017 T163854 (duration: 00m 57s)
  • 18:26 thcipriani@naos: Synchronized php-1.29.0-wmf.21/extensions/SecurePoll: SWAT: Add voter scripts for board/fdc election 2017 T163854 (duration: 01m 00s)
  • 18:23 thcipriani@naos: Synchronized dblists/commonsuploads.dblist: SWAT: Enable local uploads on knwiki T133137 (duration: 01m 06s)
  • 18:16 ema: start varnish-frontend on cp2014
  • 18:14 jynus: running alter table on all wikis of s3 T163912
  • 17:49 jynus: rebooting es1019 for upgrading and to fix race condition on services
  • 17:46 elukey: restart nutcracker on the eqiad mw hosts to pick up the new shard config (spamming elasticsearch memcached and triggering alarms)
  • 17:44 elukey: unmasking and starting daemons on restbase-dev1003
  • 17:41 reedy@naos: Synchronized wmf-config/InitialiseSettings.php: touch (duration: 01m 23s)
  • 17:02 mobrovac@naos: Started restart [trending-edits/deploy@7112062]: Restart for ICU lib update
  • 17:01 mobrovac@naos: Started restart [mobileapps/deploy@5c2b9a9]: Restart for ICU lib update
  • 17:00 mobrovac@naos: Started restart [mathoid/deploy@7eb4092]: Restart for ICU lib update
  • 16:43 mobrovac@naos: Started restart [electron-render/deploy@9156760]: Restart for ICU lib update
  • 16:39 mobrovac@naos: Started restart [graphoid/deploy@128206b]: Restart for ICU lib update
  • 16:37 mobrovac@naos: Started restart [eventstreams/deploy@05bcc8f]: Restart for ICU lib update
  • 16:37 mobrovac@naos: Started restart [electron-render/deploy@9156760]: Restart for ICU lib update
  • 16:36 mobrovac@naos: Started restart [cxserver/deploy@6899032]: Restart for ICU lib update
  • 16:34 mobrovac@naos: Started restart [citoid/deploy@b8c4cb2]: Restart for ICU lib update
  • 16:14 elukey: stop and mask cassandra and restbase on restbase-dev1003 for row-d maintenance
  • 16:07 _joe_: disabled and masked strongswan, memcached, redis on mc1013-17 for decommissioning
  • 15:43 XioNoX: VRRP priority removed, interfaces cr2/asw2 renamed - T148506
  • 15:40 _joe_: shutting down conf1003 T148506
  • 15:33 XioNoX: "cr2-eqiad# delete interfaces ae4 disable" done, confirmed links and LACP are up - T148506
  • 15:33 XioNoX: "cr2-eqiad# delete interfaces ae4 disable" done, confirmed links and LACP are up
  • 15:24 marostegui: Shutdown es2019 for maintenance with papaul and Dell - T149526
  • 15:12 XioNoX: switch ports for rack D7 and D8 configured - T148506
  • 14:47 marostegui: Stop MySQL db1070 (just in case) to test drac cold restart
  • 14:47 bblack@neodymium: conftool action : set/pooled=no; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet
  • 14:26 elukey: depooling aqs100[69] from AQS for network maintenance
  • 14:20 elukey: stop zookeeper on conf1003 for row-d maintenance (Hadoop, Kafka related)
  • 14:04 XioNoX: "cr2-eqiad# set interfaces ae4 disable" done, (1 ping loss) - T148506
  • 14:00 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1026, depool db1045 - T162539 T163548 (duration: 00m 53s)
  • 13:59 XioNoX: lowered VRRP priority for T148506
  • 13:58 andrewbogott: put labservices1001 into downtime to minimize (but probably not totally eliminate) alert spam
  • 13:56 andrewbogott: disabled instance creation on Horizon via https://gerrit.wikimedia.org/r/#/c/350414/ and on wikitech via a strategic edit in extensions/OpenStackManager/special/SpecialNovaInstance.php
  • 13:56 godog: downtime and poweroff ms-be 21 26 27 37 38 39 before switch relocation - T148506
  • 13:54 gehel: downtime "ElasticSearch health check for shards" checks for logstash and elasticsearch eqiad - T148506
  • 13:53 elukey: stop kafka on kafka1020 and kafka1018 for row-d extended maintenance (D2)
  • 13:44 _joe_: shutting down mc1013-18 for row D maintenance
  • 13:40 aude@naos: Synchronized wmf-config/CommonSettings-labs.php: (no justification provided) (duration: 00m 57s)
  • 13:32 aude@naos: Synchronized wmf-config/Wikibase-production.php: disable tabular-data for now on wikidata and enable echo notification on test wikis (duration: 01m 06s)
  • 13:29 marostegui: Deploy alter table on db1069 (wikidatawiki) https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 13:27 marostegui: Deploy alter table labsdb1001 https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 13:23 marostegui: Deploy alter table db1045 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 13:22 elukey: restart HDFS on analytics100[12] (Hadoop master nodes) to pick up recent topology changes for the cluster
  • 13:10 aude@naos: Synchronized wmf-config/throttle.php: (no justification provided) (duration: 01m 23s)
  • 13:02 ema@neodymium: conftool action : set/pooled=yes; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 13:00 ema: cp2017: restart varnish-be
  • 12:56 marostegui: Shutdown db1092 for maintenance - https://phabricator.wikimedia.org/T162681
  • 12:55 gehel: restart elasticsearch on relforge1001 to validate new config - T161830
  • 12:46 moritzm: installing mysql security updates (5.5 as packaged in Debian jessie)
  • 12:43 ema@neodymium: conftool action : set/pooled=no; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 11:32 jynus: applying new events_coredb_slave.sql on db2055 T160984
  • 11:31 moritzm: rebooting mwlog2001 for update to Linux 4.9
  • 10:47 ladsgroup@naos: Synchronized wmf-config/Wikibase-labs.php: T142104, part II (duration: 00m 56s)
  • 10:45 ladsgroup@naos: Synchronized static/images/wikibase/echoIcon.svg: T142104, part I (duration: 01m 04s)
  • 10:44 marostegui: Deploy alter table on s5, on db1063 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 10:39 jynus@naos: Synchronized wmf-config/db-eqiad.php: switch s5 eqiad master from db1049 to db1063 (duration: 01m 24s)
  • 09:48 jynus: migrating s5 eqiad replicas under db1063
  • 09:42 jynus: restarting mariadb at db1063
  • 09:24 marostegui: Shutdown db1094, db1093, db1091 for maintenance - T162681
  • 09:16 marostegui: Shutdown es1019 for maintenance - T162681
  • 08:32 elukey: Gracefully stopping hadoop daemons on Hadoop nodes affected by Row-D maintenance
  • 08:30 marostegui: Deploy alter table on change_tag and tag_summary on silver and labtestweb2001 - T147166
  • 08:27 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool hosts that need to be moved for the network maintenance - T162681 (duration: 02m 25s)
  • 08:22 moritzm: reimaging terbium to jessie
  • 07:59 jynus: shutting down mariadb on db1040 as a backup before decommissioning
  • 07:48 marostegui: Deploy alter table on s1, on db1052 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 07:30 marostegui: Deploy alter table on s7, on db1062 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 07:24 marostegui: Deploy alter table on s4, on db1068 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 07:09 marostegui: Deploy alter table on s6, on db1061 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 06:56 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1071 - T162539 T163548 (duration: 02m 24s)
  • 06:45 marostegui: Deploy alter table on s2, on db1054 (eqiad master) for tables: change_tag and tag_summary - https://phabricator.wikimedia.org/T147166
  • 06:10 marostegui: Deploy alter table on s3, on db1075 (eqiad master) for tables: change_tag and tag_summary - T147166
  • 05:57 marostegui: Deploy alter table enwiki.revision on labsdb1011 - T132416
  • 00:20 catrope@naos: Synchronized php-1.29.0-wmf.21/extensions/Flow/modules/flow/ui/widgets/mw.flow.ui.ReplyWidget.js: T163749 (duration: 01m 24s)

2017-04-25

  • 22:24 mutante: mediawiki maintenance servers: last log entry was _before_ merging https://gerrit.wikimedia.org/r/#/c/342777/ and making a change
  • 22:23 andrewbogott: re-enabling dns on labservices1001
  • 22:22 mutante: mediawiki maintenance servers: making wasat identical to terbium. wasat is currently the active server running crons. no change there at all. on terbium where crons are inactive, some log files were removed
  • 22:13 twentyafterfour@naos: rebuilt wikiversions.php and synchronized wikiversions files: group0 wikis to 1.29.0-wmf.21
  • 22:08 madhuvishy: Reenabled labs instance creation and deletion on horizon
  • 22:05 twentyafterfour@naos: Finished scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #5) (duration: 21m 52s)
  • 22:02 andrewbogott: causing an intentional outage of labs-ns0 and labs-recursor0 to make sure we're properly girded for tomorrow's switch replacement.
  • 21:43 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #5)
  • 21:41 twentyafterfour@naos: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_66989801"/* "/srv/mediawiki-staging/php-1.29.0-wmf.21/cache/l10n"' returned non-zero exit status 1 (duration: 03m 38s)
  • 21:38 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #4)
  • 21:33 twentyafterfour@naos: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_930292683"/* "/srv/mediawiki-staging/php-1.29.0-wmf.21/cache/l10n"' returned non-zero exit status 1 (duration: 03m 46s)
  • 21:30 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #3)
  • 21:23 twentyafterfour@naos: scap failed: CalledProcessError Command 'cp -r "/tmp/scap_l10n_2414756836"/* "/srv/mediawiki-staging/php-1.29.0-wmf.21/cache/l10n"' returned non-zero exit status 1 (duration: 00m 54s)
  • 21:23 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733 (attempt #2)
  • 21:09 twentyafterfour@naos: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_3498979833" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 01m 56s)
  • 21:07 twentyafterfour@naos: Started scap: sync 1.29.0-wmf.21 to testwikis (pre-group0) refs T161733
  • 20:00 madhuvishy: Labs instance creation and deletion on horizon temporarily disabled via https://gerrit.wikimedia.org/r/350266
  • 19:50 demon@naos: Synchronized wmf-config/CommonSettings-labs.php: no-op, beta change (duration: 01m 58s)
  • 18:55 chasemp: restart nova-fullstack on labnet1001
  • 18:50 chasemp: downtime labservices1001 as we fail away from it and puppet staleness on labservices1002
  • 18:38 andrewbogott: disabling nova-api for another try at labservices failover
  • 18:33 twentyafterfour: Deployment Train: Branching mediawiki wmf/1.29.0-wmf.21 from master refs T161733
  • 17:36 jynus: running test schema change on etwiki on eqiad (depooled) T17441
  • 17:35 RainbowSprinkles: gerrit: Quick reboot to pick up new bouncycastle library
  • 17:25 arlolra: Updated Parsoid to 55b90511 (T153885, T163330, T89262, T154709, T162919, T161306)
  • 17:20 moritzm: rebooting ruthenium for update to Linux 4.9
  • 17:19 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 07s)
  • 17:19 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:18 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 05s)
  • 17:18 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:18 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 08s)
  • 17:18 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:18 arlolra@naos: Finished deploy [parsoid/deploy@719d7bd]: Updating Parsoid to 55b90511 (duration: 08m 02s)
  • 17:17 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 07s)
  • 17:17 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 17:11 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 02m 18s)
  • 17:09 arlolra@naos: Started deploy [parsoid/deploy@719d7bd]: Updating Parsoid to 55b90511
  • 17:08 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:54 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 25s)
  • 16:53 godog: flush wikiwix cache from planet2001 and rebuild files
  • 16:53 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:53 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 07s)
  • 16:53 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:50 andrewbogott: labservices failover aborted due to cryptic routing/firewall issue
  • 16:45 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet,service=apache2
  • 16:44 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config (duration: 00m 20s)
  • 16:44 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config
  • 16:42 godog: flush wikiwix cache from planet1001 and rebuild files
  • 16:41 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet,service=apache2
  • 16:41 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet,service=apache2
  • 16:40 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet,service=apache2
  • 16:38 andrewbogott: stopping nova-api for labservices switchover
  • 16:36 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config (duration: 00m 53s)
  • 16:35 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config
  • 16:29 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config (duration: 00m 04s)
  • 16:29 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: enable wildcard topic config
  • 16:18 otto@naos: Finished deploy [eventlogging/eventbus@e7da0cc]: (no justification provided) (duration: 00m 06s)
  • 16:17 otto@naos: Started deploy [eventlogging/eventbus@e7da0cc]: (no justification provided)
  • 16:09 thcipriani@naos: Synchronized README: test new scap version (duration: 01m 03s)
  • 15:59 akosiaris: restart pybal on lvs[2001-2002].codfw.wmnet,lvs[3001-3002].esams.wmnet,lvs[4001-4002].ulsfo.wmnet,lvs[1001-1002].wikimedia.org T159687
  • 15:50 moritzm: installing libav security updates
  • 15:48 bawolff@naos: Synchronized wmf-config/CommonSettings-labs.php: Test account creation limits on labs (duration: 01m 14s)
  • 15:47 akosiaris: restart pybal on lvs2003.codfw.wmnet,lvs3003.esams.wmnet,lvs4003.ulsfo.wmnet,lvs1003.wikimedia.org T159687
  • 15:46 marostegui: Stop replication on db1086 and db1094 in sync - https://phabricator.wikimedia.org/T130067
  • 15:36 mobrovac@naos: Finished deploy [changeprop/deploy@7521b2f]: Bring back the concurrency level - T163292 (duration: 01m 13s)
  • 15:35 mobrovac@naos: Started deploy [changeprop/deploy@7521b2f]: Bring back the concurrency level - T163292
  • 15:33 jynus: stopping replication on dbstore1001 to change its replication topology
  • 15:33 akosiaris: restart pybal on lvs[2004-2006].codfw.wmnet,lvs3004.esams.wmnet,lvs4004.ulsfo.wmnet,lvs[1004-1006].wikimedia.org T159687
  • 15:28 filippo@neodymium: conftool action : set/pooled=yes; selector: name=mw2017.codfw.wmnet
  • 15:27 mobrovac@naos: Finished deploy [changeprop/deploy@e0e3684]: Bring back the concurrency level - T163292 (duration: 00m 10s)
  • 15:26 mobrovac@naos: Started deploy [changeprop/deploy@e0e3684]: Bring back the concurrency level - T163292
  • 15:18 ema: start cache_text upgrade to linux 4.9 T162029
  • 15:14 marostegui: Deploy alter table s7 on watchlist table directly on the master (db1062) - https://phabricator.wikimedia.org/T130067
  • 15:14 filippo@neodymium: conftool action : set/pooled=no; selector: name=mw2017.codfw.wmnet
  • 14:59 jynus@naos: Synchronized wmf-config/db-eqiad.php: switch s7 eqiad master from db1041 to db1062 (duration: 00m 54s)
  • 14:54 bblack: upgrading nginx on cp1008
  • 14:30 bawolff@naos: Synchronized private/PrivateSettings.php: rv change to T163477 to see if it fixes logging (duration: 01m 14s)
  • 14:27 bawolff: Logging has seemed to stop after last deploy to private settings :(
  • 14:20 bblack: uploaded WMF nginx-1.11.10-1+wmf1 packages to jessie-wikimedia repo
  • 14:17 marostegui: Stop replication in sync on db1089 and db1083 for maintenance - https://phabricator.wikimedia.org/T130067
  • 14:08 jynus: restarting mariadb on db1062
  • 14:07 jynus: moving s7 eqiad replicas under db1062
  • 14:02 godog: poweroff ms-be1016 for controller swap - T150206
  • 14:02 bawolff@naos: Synchronized wmf-config/PrivateSettings.php: Hopefully cause previous changes to be picked up try2 (duration: 00m 44s)
  • 13:58 bawolff@naos: Synchronized wmf-config/PrivateSettings.php: Hopefully cause previous changes to be picked up (duration: 00m 44s)
  • 13:51 hashar: European SWAT complete
  • 13:49 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Re-enable ContentTranslation - T163344 (duration: 00m 44s)
  • 13:37 hashar@naos: Synchronized php-1.29.0-wmf.20/includes/media/TransformationalImageHandler.php: media: Capture stderr when running convert --version - T158649 (duration: 00m 47s)
  • 13:35 moritzm: rebooting einsteinium for update to Linux 4.9
  • 13:31 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Fix namespace Wikipedia_talk for zh_classicalwiki - T162547 (duration: 00m 48s)
  • 13:24 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Two namespace aliases for zh_classicalwiki - T162547 (duration: 00m 49s)
  • 13:22 marostegui: Deploy alter table on s3 (only etwiki) for tag_summary and change_tag tables - T147166
  • 13:20 hashar@naos: Synchronized php-1.29.0-wmf.20/includes: Fix bogus field reference in Category::getCountMessage() callback - T162941 (duration: 01m 14s)
  • 13:16 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Add NS aliases for zh_classicalwiki - T162547 (duration: 01m 00s)
  • 13:15 marostegui: Deploy alter table on silver.watchlist and labtestweb2001.labtestwiki for the watchlist table - T130067
  • 13:12 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Add Draft namespace to zh_classicalwiki - T163655 (duration: 01m 19s)
  • 13:10 hashar: zh_classicalwiki : renamed broken page via namespaceDupes.php : id=73504 ns=0 dbk=模板:Protected_logo -> 模板:Protected_logobroken
  • 12:35 marostegui: Stop replication in sync on db1092 and db1087 for maintenance - https://phabricator.wikimedia.org/T130067
  • 11:57 gehel: banning elasticsearch row D node in preparation for maintenance
  • 11:46 marostegui: Deploy alter table s5 on watchlist table directly on the master (db1049) - https://phabricator.wikimedia.org/T130067
  • 11:28 jynus@naos: Synchronized wmf-config/db-eqiad.php: Depool db1022, promote db1061 as the s6 eqiad master (duration: 01m 17s)
  • 11:27 marostegui: Deploy alter table s1 on watchlist table directly on the master (db1052) - https://phabricator.wikimedia.org/T130067
  • 11:01 jynus: switching eqiad s6 master to db1061
  • 10:45 jynus: stopping replication on db1050
  • 10:39 marostegui: Stop replication in sync on db1090 and db1076 for maintenance - https://phabricator.wikimedia.org/T130067
  • 10:15 jynus: restarting db1061's mysql process
  • 10:12 jynus: moving all slaves of s6 eqiad under db1061
  • 09:49 marostegui: Stop replication in sync on db1091 and db1084 for maintenance - T130067
  • 09:46 marostegui: Deploy alter table s2 on watchlist table directly on the master (db1054) - T130067
  • 09:10 jynus@naos: Synchronized wmf-config/db-eqiad.php: Promote db1054 as the new s2 master on eqiad (duration: 01m 19s)
  • 08:56 marostegui: Stop replication on db1088 and db1093 in sync - T130067
  • 08:53 jynus: restarting stopping replication on s2-eqiad and restarting db1054
  • 08:52 marostegui: Deploy alter table s4 commonswiki.watchlist directly on db1068 (eqiad master) - T130067
  • 08:24 marostegui: Stop MySQL db1041 (eqiad master) to reclone db1062 from it - T163665
  • 08:03 jynus: moving all slaves of s2 eqiad under db1054
  • 07:14 ema: upgrade cp3033 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661
  • 06:34 marostegui: Deploy alter table on s3, all the wikis to the watchlist table on db1075, eqiad master - T130067
  • 06:10 marostegui@naos: Synchronized wmf-config/db-codfw.php: Restore db2061 original weight (duration: 00m 57s)
  • 06:06 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1071, depool db1026 - T162539 T163548 (duration: 01m 17s)
  • 05:41 marostegui: Deploy alter table enwiki.revision on labsdb1009 and labsdb1010 - T132416
  • 02:22 bawolff: deployed patch for T163477
  • 01:42 MaxSem: Deployed security patches for T163166
  • 00:53 bawolff: unconfirming emails associated with T163477
  • 00:38 mutante: ocg1001 - powercycle into installer, was sitting at partman step with "failure to read from sda"...
  • 00:25 twentyafterfour: restarted apache2 on iridium to tune rate limiting value
  • 00:16 twentyafterfour@naos: Synchronized wmf-config/CommonSettings.php: fix "Notice: Undefined variable: wmgRelatedArticlesFooterWhitelistedSkins" (duration: 01m 11s)

2017-04-24

  • 23:41 twentyafterfour@naos: Synchronized wmf-config/: deploy https://gerrit.wikimedia.org/r/#/c/348472/ refs T163114 (duration: 01m 05s)
  • 23:22 ejegg: updated civicrm from 40d88c0 to 061cd61
  • 23:08 ejegg: updated civicrm from a11c108 to 40d88c0
  • 22:46 bawolff: deploy patch for T155277
  • 21:53 hoo: Updated the sites and site_identifiers tables on all Wikidata clients for dtywiki T161529.
  • 21:41 ejegg: updated civicrm from 51dbbad to a11c108
  • 19:52 mattflaschen@naos: Finished scap: Full scap (due to ORES i18n change earlier), plus additional $wgHiddenPrefs change (duration: 17m 06s)
  • 19:35 mattflaschen@naos: Started scap: Full scap (due to ORES i18n change earlier), plus additional $wgHiddenPrefs change
  • 19:10 bblack: cp2026: restart to wm2 varnish package
  • 18:42 thcipriani@naos: Synchronized wmf-config/throttle.php: SWAT: New throttle rule T163726 (duration: 01m 03s)
  • 18:19 thcipriani@naos: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove defunct $wgForeignUploadTestEnabled for cross-wiki upload A/B test (duration: 00m 53s)
  • 18:18 jynus: disabling mysql replication eqiad -> codfw on s[1-7] and x1 shards T155099
  • 18:10 thcipriani@naos: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Full path to xvfb-run (beta only change) (duration: 01m 07s)
  • 17:53 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase db2061 weight (duration: 00m 47s)
  • 17:46 marostegui: Alter table labtestwiki.user_groups on labtestweb2001 - T155605
  • 17:43 bblack: installing varnish 4.1.5-1wm2 on all cache_upload hosts @ codfw (no restarts)
  • 17:41 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase db2043 and db2061 weight (duration: 00m 49s)
  • 17:36 demon@naos: Synchronized dblists/group0.dblist: moving labstestwiki to group0 (duration: 00m 54s)
  • 17:35 bblack: upgrade cp2024 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661
  • 17:28 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase db2043 and db2061 weight - T163339 (duration: 00m 58s)
  • 17:19 gehel: restarting wdqs-updater for new configuration
  • 17:10 gehel@naos: Finished deploy [wdqs/wdqs@481346a]: (no justification provided) (duration: 01m 47s)
  • 17:08 gehel@naos: Started deploy [wdqs/wdqs@481346a]: (no justification provided)
  • 16:58 marostegui@naos: Synchronized wmf-config/db-codfw.php: Repool db2043 and db2061 with less weight - T163339 (duration: 01m 16s)
  • 16:56 godog: poweroff prometheus2004 for memory upgrade - T163386
  • 16:11 ema: upgrade cp2017 varnish-be to varnish 4.1.5-1wm2, expiry thread lock/priority workaround T145661
  • 15:44 jynus: stopping all slaves on dbstore1001 for maintenance
  • 15:44 godog: poweroff prometheus2003 for memory upgrade - T163386
  • 15:28 mattflaschen@naos: Synchronized wmf-config/CommonSettings.php: T163696: Only copy filter thresholds if they are set (duration: 01m 10s)
  • 15:10 matt_flaschen: GuidedTour/RCFilters/ORES deployment complete and tested
  • 15:09 XioNoX: disabling the bgp session between pfw-codfw and cr2 for T163447
  • 15:07 ema: varnish 4.1.5-1wm2 uploaded to apt.w.o T145661
  • 15:06 matt_flaschen: Preference updates (for ORES on enwiki) done, using naos instead of terbium
  • 14:54 mattflaschen@naos: Synchronized php-1.29.0-wmf.20/extensions/ORES: Make the preference for the "r" flag on the RC page also control highlighting (duration: 00m 48s)
  • 14:50 mattflaschen@naos: Synchronized wmf-config/: Release RC Filters on more wikis and prep changes for that (duration: 00m 53s)
  • 14:39 matt_flaschen: Deployment of T152827 ("Enable GuidedTour on all wikis") complete and tested
  • 14:38 Dereckson: Created linter table on ptwikimedia and dtywiki
  • 14:34 mattflaschen@naos: Synchronized wmf-config/InitialiseSettings.php: Enable GuidedTour on all wikis (duration: 00m 59s)
  • 14:27 marostegui: Deploy alter table on s3 etwiki on watchlist table directly on the master (db1075) - T130067
  • 14:17 marostegui: Stop MySQL db2043 and db2061 for maintenance - https://phabricator.wikimedia.org/T163339
  • 14:14 marostegui@naos: Synchronized wmf-config/db-codfw.php: Depool db2043 and db2061 - T163339 (duration: 01m 08s)
  • 14:14 moritzm: rebooting ms1001 for kernel update to Linux 4.9
  • 14:10 hashar@naos: Finished scap: Full scap for namespaces related changes (T161529 and https://gerrit.wikimedia.org/r/#/c/349864/1) (duration: 16m 06s)
  • 14:09 ema@neodymium: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 14:08 ema: re-pooling cp2002's varnish-be with increased priority for expiry thread T145661
  • 13:57 ema@neodymium: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 13:54 hashar@naos: Started scap: Full scap for namespaces related changes (T161529 and https://gerrit.wikimedia.org/r/#/c/349864/1)
  • 13:50 addshore: Initial run of populateCognatePages.php complete. 27,595,121 rows in cognate_pages & 17,263,411 in cognate_titles
  • 13:49 godog: swift eqiad-prod: more weight on ms-be1028 -> ms-be1039 - T160640
  • 13:47 elukey: reimage analytics1003 to Jessie (Oozie/Hive/Camus not available during this timeframe in the Analytics Hadoop cluster)
  • 13:47 marostegui: Deploy unscheduled alter table on silver (labswiki.user_groups) - T159416
  • 13:26 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Enable user group expiry in production - T159416 (duration: 00m 49s)
  • 13:16 marostegui: Remove replication codfw - eqiad on s3 (db2018 codfw master will not be a slave of eqiad master) - https://phabricator.wikimedia.org/T130067 https://phabricator.wikimedia.org/T147166 T162133
  • 13:14 hashar@naos: Synchronized php-1.29.0-wmf.20/extensions/ProofreadPage/ProofreadPage.namespaces.php: Fix language code for Norwegian (duration: 00m 54s)
  • 13:12 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1082 - T162539 - T163548
  • 13:11 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1063 - T162539 https://phabricator.wikimedia.org/T163548
  • 13:10 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Make sysops able to grant/remove confirmed user group at cswiki - T163206 (duration: 00m 55s)
  • 13:09 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Raise autoconfirmed status requirements to 4 days, 10 edits at cswiki - T163207 (duration: 01m 09s)
  • 13:06 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Set timezone to Asia/Kolkata on wb.wikimedia - T163322 (duration: 00m 44s)
  • 13:05 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Remove all feeds added in T127176 from RSS whitelist for mw.org - T163217 (duration: 00m 45s)
  • 13:03 hashar@naos: Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on zh_classicalwiki - T163043 (duration: 00m 46s)
  • 12:52 aude@naos: Synchronized wmf-config/Wikibase-production.php: Disable use of new column in wb_terms table for now (duration: 00m 48s)
  • 12:46 aude@naos: Synchronized wmf-config/Wikibase-production.php: (no justification provided) (duration: 00m 47s)
  • 12:41 Dereckson: pt.wikimedia.org and dty.wikipedia.org wikis creation done
  • 12:38 dereckson@naos: Synchronized wmf-config/interwiki.php: +dty +wmpt and other fixes (duration: 00m 48s)
  • 12:28 Dereckson: mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php dtywiki --backend=local-multiwrite (T162874)
  • 12:14 dereckson@naos: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for dty.wikipedia (T161529) (duration: 00m 49s)
  • 12:13 dereckson@naos: Synchronized langlist: +dty (T161529) (duration: 00m 50s)
  • 12:09 dereckson@naos: rebuilt wikiversions.php and synchronized wikiversions files: +dtywiki
  • 12:08 Dereckson: Creata dtywiki database (T161529)
  • 12:08 dereckson@naos: Synchronized dblists: +dtywiki (duration: 00m 56s)
  • 12:07 dereckson@naos: Synchronized static/images/project-logos/: Logo for dty.wikipedia (T161529) (duration: 01m 13s)
  • 11:59 Dereckson: Purged https://pt.wikimedia.org/ URL (T126832)
  • 11:55 dereckson@naos: Synchronized multiversion/MWMultiVersion.php: Entry point for pt.wikimedia.org (T126832) (duration: 00m 44s)
  • 11:50 Dereckson: mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php ptwikimedia --backend=local-multiwrite (T126832)
  • 11:48 dereckson@naos: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for pt.wikimedia (T126832)
  • 11:42 dereckson@naos: rebuilt wikiversions.php and synchronized wikiversions files: +pt.wikimedia (T126832)
  • 11:42 dereckson@naos: Synchronized dblists/: Respawn pt.wikimedia configuration (duration: 00m 44s)
  • 11:41 Dereckson: Recreate database for ptwikimedia (T126832)
  • 11:28 dereckson@naos: Synchronized php-1.29.0-wmf.20/languages/messages/MessagesDty.php: Localize namespaces in Doteli (T162872) (duration: 00m 50s)
  • 11:27 dereckson@naos: Synchronized php-1.29.0-wmf.20/extensions/Gadgets/Gadgets.namespaces.php: Localize namespaces in Doteli (T162873) (duration: 00m 44s)
  • 11:26 dereckson@naos: Synchronized php-1.29.0-wmf.20/extensions/Scribunto/Scribunto.namespaces.php: Localize namespaces in Doteli (T162874) (duration: 00m 46s)
  • 11:16 addshore: addshore@wasat:~$ mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist --batch-size=1000
  • 11:14 addshore@naos: Synchronized wmf-config/InitialiseSettings-labs.php: Deploy Cognate to production wiktionaries T150182 PT 4/4 (duration: 00m 47s)
  • 11:12 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Deploy Cognate to production wiktionaries T150182 PT 3/4 (touched) (duration: 00m 52s)
  • 11:02 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Deploy Cognate to production wiktionaries T150182 PT 3/4 (duration: 00m 57s)
  • 11:01 addshore@naos: Synchronized wmf-config/CommonSettings-labs.php: Deploy Cognate to production wiktionaries T150182 PT 2/4 (duration: 01m 01s)
  • 10:57 addshore@naos: Synchronized wmf-config/CommonSettings.php: Deploy Cognate to production wiktionaries T150182 PT 1/4 (duration: 01m 18s)
  • 10:28 addshore: addshore@wasat:~$ mwscriptwikiset extensions/Cognate/maintenance/populateCognatePages.php wiktionary.dblist
  • 10:27 addshore: 180 rows added to cognate_titles & cognate_pages
  • 10:25 addshore: addshore@wasat:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php zawiktionary
  • 10:25 addshore: 172 sites added to cognate_sites
  • 10:24 addshore: addshore@wasat:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php enwiktionary --site-group=wiktionary
  • 10:16 addshore@naos: Finished scap: Add Cognate to extension-list T150182 (duration: 15m 26s)
  • 10:01 addshore@naos: Started scap: Add Cognate to extension-list T150182
  • 10:00 jynus: disabling puppet on app servers for apache config deploy T126832
  • 09:56 addshore@naos: Synchronized wmf-config/InitialiseSettings-labs.php: wmgUseInterwikiSorting true for wiktionaries PT 2/2 (duration: 00m 46s)
  • 09:54 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: wmgUseInterwikiSorting true for wiktionaries PT 1/2 (duration: 00m 47s)
  • 09:51 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Configure InterwikiSorting orders for Wiktionaries PT 2/2 (duration: 00m 48s)
  • 09:50 addshore@naos: Synchronized wmf-config/InterwikiSortOrders.php: Configure InterwikiSorting orders for Wiktionaries PT 1/2 (duration: 00m 53s)
  • 09:49 jynus: testing mediawiki changes on mwdebug1001
  • 09:44 addshore@naos: Synchronized docroot/noc/conf/InterwikiSortOrders.php.txt: NOOP Add InterwikiSortOrders to noc docroot (docs only) (duration: 01m 00s)
  • 09:42 addshore@naos: Synchronized wmf-config/InitialiseSettings.php: Use group0 to reduce lines for WMDE related config settings (duration: 01m 18s)
  • 09:15 marostegui: Stop MYSQL on db1062 to backup its mysql - T163665
  • 09:14 jynus: dropping ptwikimedia from es1012,es1016,es1018,es2011,es2012,es2013, T126832
  • 09:11 jynus: dropping ptwikimedia from es3 T126832
  • 09:08 jynus: dropping ptwikimedia from es2 T126832
  • 09:04 jynus: dropping ptwikimedia from x1 T126832
  • 08:55 jynus: dropping ptwikimedia from s3 T126832
  • 08:03 marostegui: Deploy alter table enwiki.revision on db1095 (sanitarium2) - T132416
  • 07:34 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1080 and db1067 (duration: 01m 18s)
  • 06:23 marostegui: Deploy alter table enwiki.revision db1052 (eqiad master) - T132416
  • 06:12 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1087 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548
  • 06:12 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1092, depoll db1087 - T162539 T163548 (duration: 02m 19s)

2017-04-23

  • 19:13 ema: cp2020: restart varnish-be
  • 17:49 jynus: disabling puppet on db2062 and upgrading MariaDB package to 10.1 T116557
  • 03:12 andrewbogott: removing files in /srv/deployment/ocg/postmortem on ocg1003, another case of T162780

2017-04-22

  • 13:41 ema@neodymium: conftool action : set/pooled=no; selector: name=cp2024.codfw.wmnet,service=varnish-be
  • 07:53 jynus: restarting es2019.codfw.wmnet after upgrade
  • 07:43 jynus: powercycling es2019.codfw.wmnet, unresponsive
  • 07:21 jynus@naos: Synchronized wmf-config/db-codfw.php: Depool es2019 (duration: 02m 16s)
  • 03:21 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2024.codfw.wmnet,service=varnish-be
  • 02:56 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2024.codfw.wmnet,service=varnish-be
  • 02:18 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 00:34 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be

2017-04-21

  • 23:52 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2026.codfw.wmnet,service=varnish-be
  • 22:49 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2026.codfw.wmnet,service=varnish-be
  • 15:06 marostegui@naos: Synchronized wmf-config/db-codfw.php: Increase weight db2071 (duration: 01m 17s)
  • 14:32 marostegui: Analyze revision, logging and page table on s1 db1067 - https://phabricator.wikimedia.org/T116557
  • 14:26 ema: ban objects with CT < 1024 on codfw cache_upload T145661
  • 14:00 moritzm: installing postgresql bugfix update from jessie point release on labsdb1004
  • 13:35 marostegui: Deploy alter table on wikidatawiki.wb_terms on db1092 - T162539 T163548
  • 13:20 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool db1092 - T162539 T163548 (duration: 01m 18s)
  • 12:51 akosiaris: reboot puppetmaster1002 for kernel upgrade
  • 12:07 marostegui: Analyze revision, logging and page table on s1 db1080 - T116557
  • 12:07 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Update db1080 depool reason (duration: 01m 18s)
  • 10:35 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Repool db1071 - T163109 (duration: 01m 20s)
  • 09:20 moritzm: rebooting etherpad1001 (running etherpad.wikimedia.org) for update to Linux 4.9
  • 09:10 jynus: stopping and upgrading/reconfiguring db2062 (depooled) T116557
  • 08:49 jynus@naos: Synchronized wmf-config/db-codfw.php: Depool db2062 (duration: 01m 20s)
  • 08:32 akosiaris: looking at tcpircbot (logmsgbot) problems at tegmen
  • 08:20 elukey: rolling restart of aqs (nodejs) on aqs* to pick up upgrades
  • 08:01 moritzm: rolling restart of hhvm on application servers in eqiad to pick up ICU security update
  • 07:47 marostegui: Stop MySQL on db1071 and db1063 to reclone db1063 - T163109
  • 07:43 moritzm: installing further icu security updates
  • 06:21 marostegui: Restart MySQL on db1065 for maintenance - T163351
  • 06:09 marostegui: Deploy alter table enwiki.revision db1067 - T132416

2017-04-20

  • 22:28 twentyafterfour: enable rate limiting in phabricator
  • 22:17 paravoid: setting tw_reuse to 1 on dbproxy1003
  • 21:47 twentyafterfour: started phd on iridium
  • 21:31 twentyafterfour: stopped phd on iridium to reduce load on the database
  • 19:26 Amir1: deploy finished
  • 19:24 Amir1: start of ladsgroup@naos:/srv/mediawiki-staging/php-1.29.0-wmf.20$ scap sync-file php-1.29.0-wmf.20/extensions/ORES/includes/Hooks.php 'Disable ORES in Recentchangeslinked (T163063)'
  • 19:15 mutante: test logging in fundraising channel
  • 19:06 mutante: fixing duplicate ircecho situation - since today it should run from tegmen, the active icinga server
  • 17:51 mutante: restarted icinga-wm (ircecho) to pick up config change
  • 17:13 jynus: stopping replication on db1040
  • 17:09 andrewbogott: disabling puppet on serpens, seaborgium, pollux, dubnium, labservices1001, labservices1002 for tentative rollout of https://gerrit.wikimedia.org/r/#/c/348920/
  • 16:58 jynus: moving GTID s4 eqiad replicas under db1068
  • 16:46 ema: repool varnish-be on cp2017
  • 16:18 ema: depool varnish-be on cp2017
  • 16:08 elukey: uploaded piwik 2.17.1-1 to jessie-wikimedia main
  • 15:17 Amir1: deleting duplicate rows in ores_classification dated after revision 775502802 (dated April 15th) (T163337)
  • 15:16 XioNoX: disabling pybal on lvs2002 for T163323
  • 14:32 moritzm: upgrading tor on radium to 0.2.9.10
  • 14:23 moritzm: rebooting radium (tor relay) for kernel update to Linux 4.9
  • 14:09 moritzm: rebooting osmium for kernel update to Linux 4.9
  • 14:06 gehel: rolling restart of kartotherian / tilerator on maps codfw cluster
  • 13:58 gehel: rolling restart of kartotherian / tilerator on maps eqiad cluster
  • 13:58 marostegui: Stop MySQL on db1068 and db1081 for maintenance - T163110
  • 13:57 jynus: running reset slave all on db2019
  • 13:53 gehel: rolling restart of kartotherian / tilerator on maps-test cluster
  • 13:18 moritzm: restarting hhvm on mw2097/2098 to pick up icu security update
  • 13:11 elukey: upgrading Piwik to 2.17.1 (brief downtime during the maintenance announced)
  • 12:12 elukey: restart Yarn Resource manager on analytics1001 (hadoop master) to pick up new JVM settings
  • 12:11 moritzm: installing icu security updates
  • 11:32 _joe_: removing hack for jobqueue's refreshlinks T163418 from the jobrunners
  • 11:23 jynus: changing db2071 to replicate from db2016
  • 10:32 moritzm: installing remaining dbus updates from jessie point update
  • 10:07 elukey: restart Yarn Resource manager on analytics1002 (hadoop master standby) to pick up new JVM settings
  • 09:47 Amir1: running the cleanup script for ores_classification in enwiki
  • 09:38 _joe_: live-hack redeployed, running scap pull on codfw jobrunners T163418
  • 09:38 _joe_: live-hack redeployed, running scap pull on codfw jobrunners
  • 09:34 hashar@naos: Synchronized rpc/RunJobs.php: Revert "rpc: raise exception instead of die" - causes monitoring spam (duration: 01m 20s)
  • 09:17 _joe_: removed the live hack, running scap pull again on mw2154
  • 09:14 _joe_: scap pull of live hack for T163418 on mw2154
  • 08:47 _joe_: live-patching ./includes/jobqueue/jobs/RefreshLinksJob.php to drop all recursive jobs, T163418
  • 07:59 jynus: shutting down db1080 for cloning and upgrade T163413
  • 07:54 jynus@naos: Synchronized wmf-config/db-codfw.php: Add db2071, depooled (duration: 00m 53s)
  • 07:53 jynus@naos: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 01m 02s)
  • 07:53 marostegui: Deploy alter table enwiki.revision db1065 - https://phabricator.wikimedia.org/T132416
  • 07:31 marostegui@naos: Synchronized wmf-config/db-eqiad.php: Depool db1065 - T132416 (duration: 02m 18s)
  • 07:12 marostegui: Deploy alter table on s4.image on eqiad master db1040 (this will create lag on eqiad - all hosts have been silenced) - https://phabricator.wikimedia.org/T73563
  • 06:39 marostegui: Deploy alter table on s4.oldimage on eqiad master db1040 (this will create lag on eqiad - all hosts have been silenced) - T73563
  • 01:37 mutante: mw2150 - restarted hhvm (had 'thread leakage' alert)
  • 01:28 mutante: ran puppet on all (16) Dell R320 via cumin to add CPU frequency check
  • 00:37 ejegg: updated CiviCRM from 90d679b to 51dbbad

2017-04-19

  • 23:58 ejegg: updated payments-wiki from ccfbf98 to ee7d402
  • 22:37 papaul: OS installation on db2071
  • 21:44 ejegg: updated SmashPig from 17c56b0 to 200f63e
  • 21:37 krinkle@naos: Synchronized php-1.29.0-wmf.20/resources/src/startup.js: I34bbe8edf - Fix js fatal (duration: 01m 20s)
  • 20:08 ejegg: updated payments-wiki from 5398b23 to ccfbf98
  • 19:22 krinkle@naos: Synchronized php-1.29.0-wmf.20/resources/src/mediawiki/mediawiki.js: Ie50bdd (duration: 00m 58s)
  • 19:20 krinkle@naos: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/extension.json: T162604 (duration: 01m 20s)
  • 19:17 XenoRyet: Updated SmashPig from 3db064d to 17c56b0
  • 18:58 ejegg: rolled back payments-wiki to 5398b23
  • 18:56 ejegg: updated payments-wiki from 5398b23 to 68e3ac6
  • 18:27 ariel@naos: Finished deploy [dumps/dumps@ad621e6]: doc fixes thanks to awight (duration: 00m 04s)
  • 18:27 ariel@naos: Started deploy [dumps/dumps@ad621e6]: doc fixes thanks to awight
  • 18:25 ejegg: updated payments-wiki from 36f38f6 to 5398b23
  • 18:19 mobrovac: restbase stopping RB and disabling puppet on restbase1018 due to T163292
  • 18:18 ariel@naos: Finished deploy [dumps/dumps@101f8a4]: page range fixes and standalone scripts (duration: 00m 18s)
  • 18:18 ariel@naos: Started deploy [dumps/dumps@101f8a4]: page range fixes and standalone scripts
  • 17:27 Amir1: mwscript extensions/ORES/maintenance/CleanDuplicateScores.php on all wikis with ORES review tool enabled (T163337)
  • 17:26 thcipriani@naos: Synchronized docroot/noc/index.html: test scap on naos.codfw.wmnetdocroot/noc/index.html: trailing whitespace (duration: 02m 02s)
  • 17:25 mobrovac@naos: Started restart [restbase/deploy@1bfada4]: Restart to stop trying to connect to dead restbase1018 Cassandra instances - T163292
  • 17:08 thcipriani@naos.codfw.wmnet: test
  • 17:03 filippo@naos: Finished deploy [prometheus/jmx_exporter@7327459]: test deploy from naos (duration: 00m 03s)
  • 17:03 filippo@naos: Started deploy [prometheus/jmx_exporter@7327459]: test deploy from naos
  • 17:02 godog: bounce tcpircbot on einsteinium to pick up changes
  • 17:02 _joe_: running manally enwiki refreshLinks jobs to catch up a bit
  • 16:59 papaul: power balancing on mw2215
  • 16:58 Amir1: ladsgroup@naos:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --wiki=enwiki froze
  • 16:49 Amir1: ladsgroup@naos:~$ mwscript extensions/ORES/maintenance/CleanDuplicateScores.php --wiki=enwiki (T163337)
  • 16:33 godog: deploy.fixurl on G@deployment_target:* after deployment server switchover
  • 16:20 gehel: disabling deprecation warning logs on elasticsearch eqiad - T163345
  • 16:19 jynus: setting db2033 as read write
  • 16:13 godog: run puppet on naos.codfw.wmnet - new deployment server
  • 16:03 gehel: disabling deprecation warning logs on elasticsearch codfw - T163345
  • 15:51 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=elasticsearch,name=elastic2020.*
  • 15:49 jynus: shutting down db2033 (x1-master)
  • 15:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw2256.*
  • 15:48 jynus@tin: Synchronized wmf-config/db-codfw.php: Failing over x1-master (duration: 00m 41s)
  • 15:46 gehel@puppetmaster1001: conftool action : set/pooled=no; selector: name=elastic2020.codfw.wmnet
  • 15:42 jynus@tin: Synchronized wmf-config/InitialiseSettings.php: Disable cx_translation- it is causing an outage on x1 (duration: 02m 44s)
  • 15:40 dzahn@puppetmaster2001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet
  • 15:32 mutante: mw2256 went down and showed " PANIC: double fault, error_code: 0x0"
  • 15:16 jynus@tin: Synchronized wmf-config/db-codfw.php: Pool db2055 as an additional API server (duration: 01m 02s)
  • 15:11 _joe_: ran cumin 'R:class = role::mediawiki::jobrunner and *.eqiad.wmnet' 'systemctl reset-failed' manually
  • 15:07 godog: start swiftrepl on ms-fe1005 for codfw switchover
  • 15:04 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_restart_parsoid(eqiad, codfw) Successfully completed
  • 14:53 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet,service=apache2
  • 14:53 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2256.codfw.wmnet,service=nginx
  • 14:48 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet,service=nginx
  • 14:48 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2256.codfw.wmnet,service=apache2
  • 14:46 gehel: banning elastic2020 from codfw cluster - T149006
  • 14:46 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_restart_parsoid(eqiad, codfw) Rolling restart parsoid in eqiad and codfw
  • 14:44 oblivian@tin: Synchronized wmf-config/ProductionServices.php: Fix redis locks (duration: 02m 24s)
  • 14:41 akosiaris: powercycle mw2256
  • 14:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(eqiad, codfw) Successfully completed
  • 14:33 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(eqiad, codfw) Update Tendril configuration for the new masters
  • 14:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_start_maintenance(eqiad, codfw) Successfully completed
  • 14:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_start_maintenance(eqiad, codfw) Start MediaWiki maintenance in the new master DC
  • 14:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_restore_ttl(eqiad, codfw) Successfully completed
  • 14:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_restore_ttl(eqiad, codfw) Restore the TTL of all the MediaWiki discovery records
  • 14:30 switchdc: (volans@sarin) END TASK - switchdc.stages.t08_stop_mediawiki_readonly(eqiad, codfw) Successfully completed
  • 14:30 switchdc: (volans@sarin) MediaWiki read-only period ends at: 2017-04-19 14:30:05.678665
  • 14:30 root@tin: Synchronized wmf-config/db-codfw.php: Set MediaWiki in read-write mode in datacenter codfw (duration: 00m 18s)
  • 14:29 switchdc: (volans@sarin) START TASK - switchdc.stages.t08_stop_mediawiki_readonly(eqiad, codfw) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  • 14:28 switchdc: (volans@sarin) END TASK - switchdc.stages.t07_coredb_masters_readwrite(eqiad, codfw) Successfully completed
  • 14:28 switchdc: (volans@sarin) START TASK - switchdc.stages.t07_coredb_masters_readwrite(eqiad, codfw) set core DB masters in read-write mode
  • 14:25 switchdc: (volans@sarin) END TASK - switchdc.stages.t06_redis(eqiad, codfw) Successfully completed
  • 14:25 switchdc: (volans@sarin) START TASK - switchdc.stages.t06_redis(eqiad, codfw) Switch the Redis replication
  • 14:25 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Successfully completed
  • 14:22 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Switch traffic flow to the appservers in the new datacenter
  • 14:22 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_datacenter(eqiad, codfw) Successfully completed
  • 14:22 root@tin: Synchronized wmf-config/CommonSettings.php: Switch MediaWiki active datacenter to codfw (duration: 00m 19s)
  • 14:21 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_datacenter(eqiad, codfw) Switch MediaWiki configuration to the new datacenter
  • 14:21 switchdc: (volans@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 14:15 switchdc: (volans@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 14:15 switchdc: (volans@sarin) END TASK - switchdc.stages.t03_coredb_masters_readonly(eqiad, codfw) Successfully completed
  • 14:15 switchdc: (volans@sarin) START TASK - switchdc.stages.t03_coredb_masters_readonly(eqiad, codfw) set core DB masters in read-only mode
  • 14:14 switchdc: (volans@sarin) END TASK - switchdc.stages.t02_start_mediawiki_readonly(eqiad, codfw) Successfully completed
  • 14:14 root@tin: Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-only mode in datacenter eqiad (duration: 01m 29s)
  • 14:13 switchdc: (volans@sarin) MediaWiki read-only period starts at: 2017-04-19 14:12:54.007017
  • 14:12 switchdc: (volans@sarin) START TASK - switchdc.stages.t02_start_mediawiki_readonly(eqiad, codfw) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  • 14:09 switchdc: (volans@sarin) END TASK - switchdc.stages.t01_stop_maintenance(eqiad, codfw) Successfully completed
  • 14:07 switchdc: (volans@sarin) START TASK - switchdc.stages.t01_stop_maintenance(eqiad, codfw) Stop MediaWiki maintenance in the old master DC
  • 14:06 godog: stop swiftrepl on ms-fe1005 for codfw switchover
  • 14:06 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_reduce_ttl(eqiad, codfw) Successfully completed
  • 14:06 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_reduce_ttl(eqiad, codfw) Reduce the TTL of all the MediaWiki discovery records
  • 14:06 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Successfully completed
  • 14:05 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Disabling puppet on selected hosts
  • 14:00 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 13:42 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2014.codfw.wmnet,service=varnish-be
  • 13:28 urandom: cqlsh -f /etc/cassandra/adduser.cql, recreating user/perms (as-needed)
  • 12:38 urandom: T163292: Starting removal of Cassandra instance restbase1018-c.eqiad.wmnet
  • 11:36 oblivian:: Setting swift-rw in eqiad DOWN
  • 11:36 oblivian:: Setting swift-rw in codfw UP
  • 11:36 ema: repool varnish-be on cp3044
  • 11:23 godog: add naos to git-deploy term on common-infrastructure4 - T162900
  • 11:03 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 10:57 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 10:56 _joe_: running the warmup stage in codfw for final testing
  • 10:41 ema: depool varnish-be on cp3044 because of mailbox lag issues
  • 09:34 moritzm: installing dbus security updates
  • 09:11 elukey: cleaning up ocg1003's /srv/deployment/ocg/postmortem dir (root partition filled up)
  • 07:26 hoo: Updated the sites and site_identifiers tables on all Wikidata clients for T149522.
  • 06:57 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t06_redis(codfw, eqiad) Successfully completed
  • 06:56 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t06_redis(codfw, eqiad) Switch the Redis replication
  • 06:52 _joe_: artificially stopping slave replication on rdb2001 for a final test of the switchover redis stage
  • 03:53 urandom: T163292: Starting removal of Cassandra instance restbase1018-b.eqiad.wmnet
  • 03:49 mobrovac@tin: Started restart [restbase/deploy@1bfada4]: (no justification provided)
  • 03:40 mobrovac@tin: Started restart [restbase/deploy@1bfada4]: Kick RB to pick up restbase1018 instances are gone
  • 03:32 mobrovac@tin: Finished deploy [changeprop/deploy@a19ebf8]: Temp: Decrease the transclusion update from 400 to 200 for T163292 (duration: 00m 53s)
  • 03:31 mobrovac@tin: Started deploy [changeprop/deploy@a19ebf8]: Temp: Decrease the transclusion update from 400 to 200 for T163292
  • 01:58 mutante: naos: rsyncd is of course legitimately running on a deployment server sepearate from this (unlike in other cases where we used it for syncing during migration), so this was just the one config fragment for /home and not removing the service or anything
  • 01:56 mutante: naos: manually deleting rsyncd config remnants (puppet wouldn't know to clean up after itself)
  • 01:47 mutante: rsyncing /home from mira to naos (T162900)
  • 01:21 urandom: T163292: Starting removal of Cassandra instance restbase1018-a.eqiad.wmnet

2017-04-18

  • 23:04 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1018.eqiad.wmnet
  • 23:02 mutante: ms1001 - deleting old GlobalCert SSL cert for dumps.wm that was about to expire and is replaced by Letsencrypt,
  • 22:30 mutante: ocg1003 gzipping ocg.log for disk space
  • 21:12 bblack@neodymium: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 20:36 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be
  • 17:26 mobrovac@tin: Finished deploy [restbase/deploy@1bfada4]: Blacklist all user pages on commons (duration: 07m 12s)
  • 17:26 ssastry@tin: Finished deploy [parsoid/deploy@b067328]: Deploying Parsoid to bump heap limits to 900m (from 600m) (duration: 06m 25s)
  • 17:19 ssastry@tin: Started deploy [parsoid/deploy@b067328]: Deploying Parsoid to bump heap limits to 900m (from 600m)
  • 17:19 mobrovac@tin: Started deploy [restbase/deploy@1bfada4]: Blacklist all user pages on commons
  • 17:12 XenoRyet: updated tools from a8b8d72 to a1e9342
  • 17:09 elukey: restart nutcracker in codfw (profile::mediawiki::nutcracker) to make sure that all the daemons are running with the latest config
  • 16:26 bblack: completed Traffic-layer portions of codfw switchover ( https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchover_2 )
  • 16:21 bblack: starting Traffic-layer portions of codfw switchover ( https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchover_2 )
  • 16:15 jynus: reimporting some rows to dbstore1002 on jawiki and ruwiki T160509
  • 16:12 godog: reboot tin to fix cpu mhz issue and check bios settings - T163158
  • 16:09 mobrovac@tin: Finished deploy [restbase/deploy@960b468]: Blacklist an enwiki and a commons page (duration: 08m 16s)
  • 16:01 mobrovac@tin: Started deploy [restbase/deploy@960b468]: Blacklist an enwiki and a commons page
  • 16:00 mobrovac@tin: Finished deploy [restbase/deploy@960b468]: Dev Cluster: Blacklist an enwiki and a commons page (duration: 01m 42s)
  • 15:58 mobrovac@tin: Started deploy [restbase/deploy@960b468]: Dev Cluster: Blacklist an enwiki and a commons page
  • 15:20 elukey: restored default output-buffer config for rdb2005:6479
  • 15:08 godog: puppet-run on cache_upload in codfw/eqiad to pick up swift a/p changes
  • 15:02 godog: puppet-run on cache_upload in codfw/eqiad to pick up switch a/a changes
  • 15:02 gehel: upgrading elastic2020 to elasticsearch 5.1.2
  • 14:55 _joe_: switchover of services, misc things done
  • 14:54 oblivian:: Setting restbase-async in codfw DOWN
  • 14:54 oblivian:: Setting restbase-async in eqiad UP
  • 14:43 _joe_: switching traffic for all a/a services plus maps and restbase to codfw-only
  • 14:38 _joe_: forcing puppet run on caches for catching up with the a/a setting of maps and restbase
  • 14:33 oblivian:: Setting restbase in eqiad DOWN
  • 14:33 _joe_: starting switchover of services eqiad => codfw; external traffic will be switched over, as well as internal traffic to restbase
  • 14:25 gehel: un-ban elastic2020 to get ready for real-life test during switchover - T149006
  • 14:22 elukey: executed config set client-output-buffer-limit "normal 0 0 0 slave 2147483648 2147483648 300 pubsub 33554432 8388608 60" on rdb2005:6749 as attempt to solve slave lagging - T159850
  • 14:21 oblivian:: Setting mobileapps in eqiad UP
  • 14:14 oblivian:: Setting mobileapps in eqiad DOWN
  • 14:11 elukey: executed CONFIG SET appendfsync everysec (default) to restore defaults on rdb2005:6479- T159850
  • 14:08 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Successfully completed
  • 14:04 elukey: executed CONFIG SET appendfsync no on rdb2005:6479 to test if fsync stalls affect replication - T159850
  • 13:50 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Rolling restart parsoid in eqiad and codfw
  • 13:35 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute
  • 13:35 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC
  • 12:32 moritzm: upgrading labnodepool1001 to Linux 4.9
  • 12:13 moritzm: upgraded mw1261 to HHVM 3.18.2+wmf2
  • 11:39 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Successfully completed
  • 11:38 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Start MediaWiki maintenance in the new master DC
  • 11:37 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(codfw, eqiad) Successfully completed
  • 11:37 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(codfw, eqiad) Update Tendril configuration for the new masters
  • 11:35 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(eqiad, codfw) Successfully completed
  • 11:35 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(eqiad, codfw) Update Tendril configuration for the new masters
  • 11:34 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_tendril(codfw, eqiad) Successfully completed
  • 11:34 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_tendril(codfw, eqiad) Update Tendril configuration for the new masters
  • 11:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Successfully completed
  • 11:33 switchdc: (volans@sarin) START TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Restore the TTL of all the MediaWiki discovery records
  • 11:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Successfully completed
  • 11:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  • 11:30 switchdc: (volans@sarin) END TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) Successfully completed
  • 11:30 switchdc: (volans@sarin) START TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) set core DB masters in read-write mode
  • 11:18 switchdc: (volans@sarin) END TASK - switchdc.stages.t06_redis(codfw, eqiad) Successfully completed
  • 11:18 switchdc: (volans@sarin) START TASK - switchdc.stages.t06_redis(codfw, eqiad) Switch the Redis replication
  • 11:14 moritzm: upgrading logstash* to Linux 4.9
  • 10:58 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Successfully completed
  • 10:56 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Switch traffic flow to the appservers in the new datacenter
  • 10:56 switchdc: (volans@sarin) END TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Successfully completed
  • 10:55 switchdc: (volans@sarin) START TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Switch MediaWiki configuration to the new datacenter
  • 10:48 switchdc: (volans@sarin) END TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) Failed to execute
  • 10:48 switchdc: (volans@sarin) START TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) set core DB masters in read-only mode
  • 10:43 switchdc: (volans@sarin) END TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Successfully completed
  • 10:43 switchdc: (volans@sarin) START TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  • 10:33 switchdc: (volans@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute
  • 10:33 switchdc: (volans@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC
  • 10:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Successfully completed
  • 10:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Reduce the TTL of all the MediaWiki discovery records
  • 10:31 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Successfully completed
  • 10:31 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Disabling puppet on selected hosts
  • 10:28 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Failed to execute
  • 10:28 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_reduce_ttl(codfw, eqiad) Reduce the TTL of all the MediaWiki discovery records
  • 10:26 switchdc: (volans@sarin) END TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Successfully completed
  • 10:26 switchdc: (volans@sarin) START TASK - switchdc.stages.t00_disable_puppet(codfw, eqiad) Disabling puppet on selected hosts
  • 10:25 volans: Final test of switchdc steps in the codfw->eqiad configuration, only idempotent changes, T160178
  • 10:25 moritzm: installing wireshark security updates
  • 10:20 moritzm: uploaded HHVM 3.18.2+wmf2 for jessie-wikimedia/experimental (includes fix for T162354)
  • 09:52 oblivian:: Setting zotero in codfw UP
  • 09:50 _joe_: testing switchover script for services, will act on zotero in codfw
  • 09:45 _joe_: adding 60G to the ocg output partition on ocg1003
  • 09:17 oblivian@neodymium: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
  • 09:03 volans: upgrading conftool to v0.4.1 on neodymium/sarin
  • 07:48 _joe_: uploaded python-conftool 0.4.1 to jessie-wikimedia
  • 07:42 _joe_: cleaning up orphaned COW images in /var/cache/pbuilder/build/ on copper
  • 06:16 marostegui: For the record: restarted s7 instance on db1069 - T163183
  • 00:36 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend/resources/mobile.mainMenu/mainmenu.less: T163059 (duration: 03m 07s)

2017-04-17

  • 23:37 mutante: runnin rmmod acpi_pad on the 16 R320 via cumin, since blacklisting in puppet does not actively remove, confirmed unloaded. (16/16) success ratio (>= 100.0% threshold) for command: 'lsmod|grep -c acpi_pad ||:' (T162850)
  • 23:33 mutante: running puppet via cumin on all 16 Dell PowerEdge R320, adding blacklist file for acpi_pad kernel module. 15/16 success, all but tin (T162850)
  • 22:46 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 03m 01s)
  • 22:42 mutante: tin - load average going down, acpi_pad processes gone, cpu usage low again (T163158)
  • 22:40 mutante: tin - rmmod acpi_pad (T163158)
  • 22:08 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/modules/ext.wikimediaEvents.recentChangesClicks.js: T158458 T163152 (duration: 16m 23s)
  • 19:16 mutante: tegmen test ircecho stop/start service to confirm it's fine on jessie/prod icinga role (that's the passive server)
  • 19:02 demon@tin: Synchronized wmf-config/: Pruning some old extension message files, co-master sync (duration: 01m 52s)
  • 18:58 demon@tin: Pruned MediaWiki: 1.29.0-wmf.15 (duration: 00m 14s)
  • 18:46 maxsem@tin: Finished deploy [tilerator/deploy@001811e]: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only (duration: 00m 19s)
  • 18:46 maxsem@tin: Started deploy [tilerator/deploy@001811e]: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only
  • 18:45 maxsem@tin: scap aborted: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only (duration: 00m 19s)
  • 18:45 maxsem@tin: Started scap: https://gerrit.wikimedia.org/r/#/c/348224/ to test hosts only
  • 15:48 mobrovac@tin: Finished deploy [restbase/deploy@6595298]: Update client caching headers for T161284 (duration: 08m 15s)
  • 15:40 mobrovac@tin: Started deploy [restbase/deploy@6595298]: Update client caching headers for T161284
  • 15:34 mobrovac@tin: Finished deploy [restbase/deploy@6595298]: (no justification provided) (duration: 01m 29s)
  • 15:33 mobrovac@tin: Started deploy [restbase/deploy@6595298]: (no justification provided)
  • 15:32 mobrovac@tin: Finished deploy [restbase/deploy@6595298]: (no justification provided) (duration: 01m 42s)
  • 15:31 mobrovac@tin: Started deploy [restbase/deploy@6595298]: (no justification provided)
  • 09:33 marostegui: Silence alerts for restbase2004 and restbase2009 T160759

2017-04-16

  • 15:44 elukey: restart ocg on ocg1003 to clean up deleted files in lsof
  • 15:35 elukey: executing sudo find -name *.pdf -mtime +3 -exec rm {} \; on ocg1003's /srv/deployment/ocg/output to clean up some disk space - T162780

2017-04-14

  • 23:14 jynus: skipping CREATE DATABASE wbwikimedia on dbstore2001- duplicate declaration due to multi-source
  • 22:58 jynus: skipping CREATE DATABASE pawikisource on dbstore2001- duplicate declaration due to multi-source
  • 22:49 volans: restarting parsoid to get the disable linter change T148609
  • 22:17 Reedy: created linter tables on wbwikimedia T148609
  • 22:16 Reedy: created linter tables on pawikisource T148609
  • 20:53 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Disable Linter on larger wikis T148609 (duration: 00m 41s)
  • 20:26 reedy@tin: Synchronized wmf-config/abusefilter.php: abusefilter-modify-restricted for trwiki T161960 (duration: 01m 38s)
  • 17:48 mutante: mw1297 - restarted hhvm and apache
  • 17:07 twentyafterfour: deployed phabricator hotfix for T162943
  • 10:29 elukey: rollback systctl settings on mw1306 after experiment (stop jobchron/runner, stop hhvm, restore systctl settings, restart hhvm and job* daemons)
  • 09:50 elukey: temporarily set sysctl -w net.netfilter.nf_conntrack_max=524288 on mw1306 (jobrunner) as test - (rollback: sysctl -w net.netfilter.nf_conntrack_max=262144")
  • 09:43 elukey: temporarily set sysctl -w net.ipv4.ip_local_port_range="15000 64000" on mw1306 (jobrunner) as test - (rollback: sysctl -w net.ipv4.ip_local_port_range="32768 60999") - T157968
  • 08:32 elukey: restored appendfsync to 'everysec' on Redis rdb2005:6380 (end of performance experiment)
  • 07:23 elukey: executed CONFIG SET appendfsync no on redis2005:6780 as performance test
  • 00:39 niharika29@tin: Synchronized wmf-config/abusefilter.php: Fix Abuse Filter configuration for tr.wikipedia (T161960) (duration: 00m 42s)
  • 00:30 niharika29@tin: Finished scap: Reword ORES preferences (T162831), Put ORES r behind a preference (T162831), Deploy Special:Autoblocklist (T146414) (duration: 24m 44s)
  • 00:05 niharika29@tin: Started scap: Reword ORES preferences (T162831), Put ORES r behind a preference (T162831), Deploy Special:Autoblocklist (T146414)
  • 00:03 mutante: mw1297 - restart hhvm/apache
  • 00:03 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 41s)
  • 00:02 niharika29@tin: Synchronized wmf-config/CommonSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 41s)
  • 00:00 mutante: mw1293 - restart hhvm

2017-04-13

  • 23:56 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Retry sync Revert Remove use of blacklist for related pages feature (T162201) (duration: 00m 40s)
  • 23:51 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Revert Remove use of blacklist for related pages feature (T162201) (duration: 00m 41s)
  • 23:43 niharika29@tin: Synchronized wmf-config/CommonSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 40s)
  • 23:41 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Remove use of blacklist for related pages feature (T162201) (duration: 00m 40s)
  • 23:39 niharika29@tin: Synchronized wmf-config/InitialiseSettings.php: Enable related pages on Vector for htwiki (T126826) (duration: 00m 41s)
  • 23:26 niharika29@tin: Synchronized php-1.29.0-wmf.20/extensions/CirrusSearch/: Revert Workaround OOM issue on ngrams field (duration: 00m 54s)
  • 23:19 Dereckson: Create account for Jayantanth on wb.wikimedia (bureaucrat)
  • 23:09 dereckson@tin: Synchronized wmf-config/interwiki.php: DMOZ, pa.wikisource and wb.wikimedia interwiki map update (duration: 00m 41s)
  • 23:01 Dereckson: Create local-multiwrite stores for wb.wikimedia (T162510)
  • 23:01 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configurationfor wb.wikimedia.org (T162510) (duration: 00m 40s)
  • 23:00 Dereckson: Create Translate extension tables for wb.wikimedia (T162510)
  • 22:59 dereckson@tin: Synchronized multiversion/MWMultiVersion.php: Add wb.wikimedia.org to wikimedia.org domains to serve as wikis (T162510) (duration: 00m 40s)
  • 22:59 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: Create wb.wikimedia.org (T162510)
  • 22:58 dereckson@tin: Synchronized dblists: Create wb.wikimedia.org (T162510) (duration: 00m 41s)
  • 22:47 dereckson@tin: Synchronized static/images/project-logos/: Logos for wb.wikimedia (T162510) (duration: 00m 41s)
  • 22:32 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: pa.wikisource creation (take two) (duration: 00m 41s)
  • 22:31 dereckson@tin: Synchronized w/static/images/project-logos/: pa.wikisource creation (take two) (duration: 00m 40s)
  • 22:30 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: pa.wikisource creation (take two)
  • 22:30 dereckson@tin: Synchronized dblists: pa.wikisource creation (take two) (duration: 00m 41s)
  • 22:15 dereckson@tin: Synchronized wmf-config/InitialiseSettings.php: Initial configuration for pa.wikisource (T149522) (duration: 00m 41s)
  • 22:14 dereckson@tin: Synchronized static/images/project-logos/: Logos for pa.wikisource (T149522) (duration: 00m 41s)
  • 22:12 dereckson@tin: rebuilt wikiversions.php and synchronized wikiversions files: (no justification provided)
  • 22:12 dereckson@tin: Synchronized dblists: pa.wikisource creation (T149522) (duration: 00m 41s)
  • 21:56 demon@tin: Finished scap: pruned cdb files from wmf.18 (duration: 07m 55s)
  • 21:48 demon@tin: Started scap: pruned cdb files from wmf.18
  • 20:07 urandom: T161243: Clearing all snapshots
  • 19:45 ejegg: updated civicrm from 908b9c1 to 90d679b
  • 19:43 ejegg: updated SmashPig from ab52dbe to 3db064d
  • 19:16 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.20
  • 18:57 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Clean Wikisource namespaces T46320 (duration: 00m 43s)
  • 18:42 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Enable Education Program on it.wikiversity T162692 (duration: 00m 43s)
  • 18:38 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/LiquidThreads: Remove extra parameter from hook (duration: 00m 45s)
  • 18:35 reedy@tin: Synchronized wmf-config/abusefilter.php: Enable AbuseFilter blocks on tr.wikipedia T161960 (duration: 00m 43s)
  • 18:30 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Enable NewUserMessage on tr.wikiquote T161962 (duration: 00m 43s)
  • 18:30 urandom: T161243: Truncating parsoid tables (wikimedia storage group)
  • 18:29 mutante: restarting jenkins service to apply logging change gerrit:347877. it was already tested on jenkinstest.integration.eqiad.wmflabs
  • 18:25 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/Wikidata: Stop some logspam for deprecated hooks (duration: 02m 06s)
  • 18:23 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents: Stop some logspam for deprecated hooks (duration: 00m 43s)
  • 18:21 reedy@tin: Synchronized php-1.29.0-wmf.20/extensions/LiquidThreads: Stop some logspam for deprecated hooks (duration: 00m 45s)
  • 18:19 reedy@tin: Synchronized php-1.29.0-wmf.19/extensions/Wikidata: Stop some logspam for deprecated hook usage (duration: 02m 14s)
  • 18:16 urandom: T161243: Truncating parsoid tables (default storage group)
  • 18:16 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Document EducationProgram config (duration: 00m 43s)
  • 18:12 reedy@tin: Synchronized wmf-config/InitialiseSettings.php: Set wgUsejQueryThree to false everywhere ahead of further testing (duration: 00m 43s)
  • 18:09 reedy@tin: Synchronized wmf-config/CommonSettings-labs.php: Run 3d2png with xfvb-run on beta (duration: 00m 43s)
  • 16:55 elukey: restored default value of client-output-buffer-limit on rdb1007:6379 - T159850
  • 16:23 mobrovac@tin: Finished deploy [citoid/deploy@b8c4cb2]: Test deploy for T162814 (duration: 02m 24s)
  • 16:21 mobrovac@tin: Started deploy [citoid/deploy@b8c4cb2]: Test deploy for T162814
  • 16:15 thcipriani@tin: Synchronized README: scap.cfg change test (duration: 00m 44s)
  • 15:49 mobrovac@tin: Finished deploy [citoid/deploy@212800d]: Enable multiple results for T115248 and remove b/c for T114515 (duration: 03m 10s)
  • 15:46 mobrovac@tin: Started deploy [citoid/deploy@212800d]: Enable multiple results for T115248 and remove b/c for T114515
  • 15:02 andrewbogott: disabling puppet on dubnium and pollux for a cautious merge of https://gerrit.wikimedia.org/r/#/c/348071
  • 15:01 andrewbogott: disabling puppet on seaborgium and serpens for a cautious merge of https://gerrit.wikimedia.org/r/#/c/348071
  • 14:56 ppchelko@tin: Finished deploy [changeprop/deploy@e47afea]: Provide separate rules for ORES precaching in both DCs (duration: 00m 58s)
  • 14:55 ppchelko@tin: Started deploy [changeprop/deploy@e47afea]: Provide separate rules for ORES precaching in both DCs
  • 14:50 moritzm: installing bouncycastle security updates
  • 14:27 bblack: disabling puppet on recnds/ntp boxes to control patch rollout
  • 13:28 moritzm: powercycling thumbor1001, stuck in reboot
  • 13:18 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 43s)
  • 13:16 hashar@tin: Synchronized dblists/closed.dblist: Close wikimania2016 - T161183 (duration: 00m 43s)
  • 13:14 hashar@tin: Synchronized static/images/project-logos: (no justification provided) (duration: 00m 46s)
  • 13:00 moritzm: Upgrading thumbor* to Linux 4.9
  • 12:52 elukey: temporary set config set client-output-buffer-limit "slave 5368709120 5368709120 180" on rdb1007:6379
  • 12:34 volans@tin: Synchronized wmf-config/db-eqiad.php: Use a generic retry for the read only message T160178 (duration: 00m 44s)
  • 12:34 elukey: temporary set config set client-output-buffer-limit "slave 3221225472 3221225472 180" on rdb1007:6379
  • 12:22 volans@tin: Synchronized wmf-config/db-codfw.php: Use a generic retry for the read only message T160178 (duration: 01m 54s)
  • 12:16 moritzm: restarting ntp on achernar
  • 11:59 elukey: temporary set config set client-output-buffer-limit "slave 2536870912 2536870912 60" on rdb1007:6379
  • 11:37 elukey: temporary set config set client-output-buffer-limit "slave 2147483648 2147483648 60" on rdb1007:6379 to give time to rdb2005's replication to catch up - T159850
  • 10:58 moritzm: rebooting alsafi to Linux 4.9
  • 10:58 moritzm: rebooting alfafi to Linux 4.9
  • 10:47 elukey: reverted previous config for Redis rdb2005
  • 10:47 XioNoX: Confirmed we can still reach cr2-knams:lo0 via v6 (from esams), disabling IPv4 transit sessions for T162601
  • 10:42 XioNoX: disable V6 transit BGP session on cr2-knams for T162601
  • 10:22 elukey: executed CONFIG SET appendfsync no (prev value: "everysec") to Redis instance 6380 on rdb2005 - T125735
  • 10:13 godog: upgrade thumbor to 0.1.38
  • 10:08 moritzm: rebooting restbase1016 to Linux 4.9
  • 09:39 moritzm: rebooting restbase1011 to Linux 4.9
  • 09:12 moritzm: rebooting restbase1010 to Linux 4.9
  • 06:29 elukey: re-arm keyholder on mira after reboot
  • 06:14 elukey: powercycle mira - eth0 errors in the dmesg, CPU system utilization skyrocketed
  • 04:14 mutante: ms-be2023 is rebooting
  • 04:12 mutante: ms-be2023 icinga alerts, no more swift processes. cant ssh to it. attempt to power cycle. mgmt console enourmous spam of "rejecting I/O to offline device"
  • 01:58 bblack@neodymium: conftool action : set/pooled=yes; selector: name=achernar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 00:36 catrope@tin: Finished scap: Split RCFilters GuidedTour messages for ORES vs non-ORES (T162693) (duration: 53m 47s)

2017-04-12

  • 23:42 catrope@tin: Started scap: Split RCFilters GuidedTour messages for ORES vs non-ORES (T162693)
  • 23:37 catrope@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend/: Log only infoboxes which are not a direct children of lead section (T149884) (duration: 01m 05s)
  • 23:35 catrope@tin: Synchronized php-1.29.0-wmf.20/resources/src/mediawiki.widgets: Fix setDisabled in mw.widgets.Complex* (T162667) (duration: 00m 42s)
  • 23:32 catrope@tin: Synchronized php-1.29.0-wmf.19/resources/src/mediawiki.widgets: Fix setDisabled in mw.widgets.Complex* (T162667) (duration: 00m 44s)
  • 23:25 awight: rebuilt and reenabled process-control jobs
  • 23:20 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Disable cross-wiki uploads to Commons (T162374) (duration: 00m 43s)
  • 23:19 cwd: removed p-c crontab to stop all jobs
  • 23:15 bblack@neodymium: conftool action : set/pooled=no; selector: name=achernar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 23:13 catrope@tin: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on cawiki (T161307) and bgwiki (T162145) (duration: 00m 44s)
  • 23:02 bblack@neodymium: conftool action : set/pooled=yes; selector: name=acamar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 22:50 bblack: acamar fixed up BIOS: HT disabled and power mgmt was set to PPW (DAPC) instead of PPW (OS)
  • 22:45 bblack: downtiming acamar again to fixup bios stuff (HT at least)
  • 21:31 Dereckson: Create Education Program tables on it.wikiversity (T162692)
  • 20:44 legoktm@tin: Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to all wikis - T148609 (duration: 00m 44s)
  • 20:42 bblack@neodymium: conftool action : set/pooled=no; selector: name=acamar.wikimedia.org,dc=codfw,cluster=dns,service=pdns_recursor
  • 20:25 mutante: planet2001 - manually updating all feeds to make it active (or would have to wait for crons)
  • 20:12 ssastry@tin: Finished deploy [parsoid/deploy@323cebb]: Updating Parsoid to 75debae3 (duration: 09m 16s)
  • 20:07 mutante: planet2001 - activating all the crons, making planet active/active eqiad/codfw
  • 20:03 ssastry@tin: Started deploy [parsoid/deploy@323cebb]: Updating Parsoid to 75debae3
  • 19:42 bd808@tin: Synchronized wmf-config/mc.php: Revert "wikitech: Enable binary memcached protocol" (duration: 00m 43s)
  • 19:05 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.20
  • 19:05 XenoRyet: reverted SmashPig from aede277 to ab52dbe
  • 19:05 demon@tin: Synchronized php: symlink bump (duration: 00m 42s)
  • 19:04 ejegg: updated payments-wiki from 0b396a3 to 36f38f6
  • 18:52 XenoRyet: updated SmashPig from ab52dbe to aede277
  • 18:45 thcipriani@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend: SWAT: formatter: Increase log level of infobox message T149884 (duration: 00m 46s)
  • 18:44 ppchelko@tin: Finished deploy [changeprop/deploy@e403f56]: Config: Send ORES precache requests to both DCs. Attempt #2. T159615 (duration: 01m 15s)
  • 18:43 ppchelko@tin: Started deploy [changeprop/deploy@e403f56]: Config: Send ORES precache requests to both DCs. Attempt #2. T159615
  • 18:38 thcipriani@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend: SWAT: formatter: Change log channel of infobox message T149884 (duration: 00m 46s)
  • 18:37 ppchelko@tin: Finished deploy [changeprop/deploy@0a9a008]: Config: Send ORES precache requests to both DCs. T159615 (duration: 06m 53s)
  • 18:30 ppchelko@tin: Started deploy [changeprop/deploy@0a9a008]: Config: Send ORES precache requests to both DCs. T159615
  • 18:26 thcipriani@tin: Synchronized php-1.29.0-wmf.20/extensions/MobileFrontend: SWAT: setMobileOptions at time of skin creation T125588 (duration: 00m 46s)
  • 18:18 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Tweak Russian logo wordmark T162036 PART II (duration: 00m 43s)
  • 18:16 thcipriani@tin: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ru.svg: SWAT: Tweak Russian logo wordmark T162036 PART I (duration: 00m 43s)
  • 16:46 awight@tin: rebuilt wikiversions.php and synchronized wikiversions files: (no justification provided)
  • 16:37 awight@tin: Synchronized php-1.29.0-wmf.20/extensions/FundraiserLandingPage: Fix for donatewiki T162716 (duration: 00m 45s)
  • 16:35 awight@tin: Synchronized php-1.29.0-wmf.19/extensions/FundraiserLandingPage: Fix for donatewiki T162716 (duration: 00m 48s)
  • 15:53 chasemp: remove 2fa for Freddy2001 on wikitech per T162772
  • 14:31 andrewbogott: running maintain-meta_p on labsdb1001/1003/1009/1010/1011
  • 14:23 hashar: Restarting Jenkins for git/scm plugins updates
  • 14:06 hashar: European SWAT complete
  • 13:51 switchdc: (volans@neodymium) END TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Successfully completed
  • 13:48 switchdc: (volans@neodymium) START TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Switch traffic flow to the appservers in the new datacenter
  • 13:42 volans: testing t05_switch_traffic of the switchdc
  • 13:41 elukey: apply SLOWLOG RESET and CONFIG SET slowlog-max-len 100000 (prev value 10000, 10ms) to rdb1005:6380 to track down slow reqs - T125735
  • 13:37 hoo@tin: Synchronized php-1.29.0-wmf.20/extensions/Wikidata: Update Wikibase/ ArticlePlaceholder (duration: 02m 19s)
  • 13:33 hoo@tin: Synchronized php-1.29.0-wmf.19/extensions/Wikidata: Update Wikibase/ ArticlePlaceholder (duration: 02m 16s)
  • 13:33 elukey: restored slowlog-log-slower-than 10000 on rdb2005
  • 13:25 elukey: applied CONFIG SET slowlog-log-slower-than 300000 to Redis 6379 on rdb2005 and reset slowlog history to play with the stats
  • 13:10 addshore@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/extension.json: WMDE Spring campaign PT2/2 (duration: 00m 45s)
  • 13:09 addshore@tin: Synchronized php-1.29.0-wmf.20/extensions/WikimediaEvents/WikimediaEventsHooks.php: WMDE Spring campaign PT1/2 (duration: 00m 45s)
  • 13:08 hashar@tin: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporarily enable change dispatch logging on testwikidata" - T159828 (duration: 00m 47s)
  • 12:23 elukey: restart HDFS datanode daemons on all the Hadoop worker node to pick up the new JVM settings
  • 12:18 kartik@tin: Finished deploy [cxserver/deploy@2842efa]: Update cxserver to 56a012d (duration: 03m 58s)
  • 12:14 kartik@tin: Started deploy [cxserver/deploy@2842efa]: Update cxserver to 56a012d
  • 11:57 elukey: restart Yarn nodemanager daemons on all the Hadoop worker node to pick up the new JVM settings
  • 11:05 _joe_: downgrading python-urllib3 on puppetmaster1001
  • 11:02 akosiaris: upgrade puppet across the trusty fleet to 3.8. T162462
  • 10:34 hashar: Upgrading Jenkins "Email Extension" plugin 2.57.1..2.57.2 and restarting Jenkins
  • 10:07 hashar: Upgrading Jenkins "Git client" plugin 2.3.0..2.4.1 and restarting Jenkins
  • 09:58 switchdc: (volans@neodymium) END TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) Successfully completed
  • 09:58 switchdc: (volans@neodymium) START TASK - switchdc.stages.t07_coredb_masters_readwrite(codfw, eqiad) set core DB masters in read-write mode
  • 09:56 switchdc: (volans@neodymium) END TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) Failed to execute
  • 09:56 switchdc: (volans@neodymium) START TASK - switchdc.stages.t03_coredb_masters_readonly(codfw, eqiad) set core DB masters in read-only mode
  • 09:53 _joe_: removing the old directory of data from ocg1003
  • 09:52 volans: testing t03 and t07 DB-RO/RW stages of switchdc (codfw->eqiad), we are already in that situation, t03 will fail the verfication, is expected
  • 09:52 godog: swift codfw-prod: ms-be2001 - ms-be2012 initial decom - T162785
  • 09:47 _joe_: remounting the new partition under /srv/deployment/ocg/output, cleaning out the old dir. Will cause a service interruption for requests to ocg1003 for a few minutes. T162780
  • 09:42 gehel: starting load on elastic2020 - T149006
  • 09:41 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: wmgUseGettingStarted false for dewiki (duration: 00m 45s)
  • 09:26 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: WMDE Spring campaign - Add logging from WikimediaEvent (duration: 00m 46s)
  • 09:22 hashar: Restarting Jenkins for Matrix related plugins updates (3)
  • 09:12 _joe_: copying data from / to the neww partition on ocg1003 T162462
  • 09:10 hashar: Restarting Jenkins for plugins update (2)
  • 09:06 _joe_: creating a LVM volume on ocg1003
  • 09:05 hashar: Restarting Jenkins for plugins update
  • 08:59 addshore@tin: Synchronized php-1.29.0-wmf.19/extensions/WikimediaEvents/extension.json: patch1 & patch2 WMDE Spring campaign PT2/2 (duration: 00m 45s)
  • 08:58 addshore@tin: Synchronized php-1.29.0-wmf.19/extensions/WikimediaEvents/WikimediaEventsHooks.php: patch1 & patch2 WMDE Spring campaign PT1/2 (duration: 00m 47s)
  • 08:52 ema: upgrade cache_upload to linux 4.9 T162029
  • 08:44 gehel: reimaging elastic2020 for testing - T149006
  • 08:24 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Successfully completed
  • 08:22 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_start_maintenance(codfw, eqiad) Start MediaWiki maintenance in the new master DC
  • 08:14 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Failed to execute
  • 08:14 root@tin: Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-write mode in datacenter eqiad (duration: 00m 35s)
  • 08:13 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t08_stop_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  • 08:09 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t06_redis(codfw, eqiad) Successfully completed
  • 08:09 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t06_redis(codfw, eqiad) Switch the Redis replication
  • 08:02 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Successfully completed
  • 08:02 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t05_switch_datacenter(codfw, eqiad) Switch MediaWiki configuration to the new datacenter
  • 08:00 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Successfully completed
  • 07:59 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t09_restore_ttl(codfw, eqiad) Restore the TTL of all the MediaWiki discovery records
  • 07:58 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Successfully completed
  • 07:55 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t05_switch_traffic(codfw, eqiad) Switch traffic flow to the appservers in the new datacenter
  • 07:55 _joe_: resuming non-dry run tests of switchdc, all logs from switchdc by me are just tests
  • 06:57 _joe_: the last messages are just a test and nothing was really done, as codfw is already in read-only mode right now
  • 06:57 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Failed to execute
  • 06:57 root@tin: Synchronized wmf-config/db-codfw.php: Set MediaWiki in read-only mode in datacenter codfw (duration: 00m 23s)
  • 06:57 switchdc: (oblivian@sarin) MediaWiki read-only period starts at: 2017-04-12 06:56:53.822926
  • 06:56 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t02_start_mediawiki_readonly(codfw, eqiad) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  • 06:53 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute
  • 06:53 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC
  • 06:50 _joe_: testing switchover codfw => eqiad, no destructive actions will be taken
  • 06:42 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 - T17441 (duration: 00m 46s)
  • 06:37 elukey: reimage mw2246.codfw.wmnet mw2152.codfw.wmnet to remove the /tmp partition (codfw videoscalers, switchover prep)
  • 06:32 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1072 - T132416 (duration: 00m 46s)
  • 06:28 _joe_: killing long-running puppet-agent on db2058 too
  • 06:20 _joe_: killing badly-started puppet agents on mc1010, tempdb2001,db1090, db2058, hydrogen, possibly others later
  • 06:13 marostegui: Deploy alter table on db1075 eqiad master (s3, image table) - T160415
  • 06:04 marostegui: Deploy schema change on s6 - db1093 - T17441
  • 06:04 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 02m 00s)
  • 05:56 marostegui: Deploy alter table on db2108 codfw master (s3, image table) - T160415
  • 04:53 legoktm: started `mwscriptwikiset refreshLinks.php small.dblist` on terbium

2017-04-11

  • 23:58 thcipriani@tin: Synchronized wmf-config/CirrusSearch-production.php: SWAT: Enable deleted archive indexing & searching T109561 PART II (duration: 00m 45s)
  • 23:56 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable deleted archive indexing & searching T109561 PART I (duration: 00m 45s)
  • 23:29 ejegg: updated fundraising-tools from 0a42db3 to a8b8d72
  • 23:27 thcipriani@tin: Synchronized portals: SWAT: Bumping portals to master T128546 (duration: 00m 46s)
  • 23:26 thcipriani@tin: Synchronized portals/prod/wikipedia.org/assets: SWAT: Bumping portals to master T128546 (duration: 00m 46s)
  • 23:23 mutante: ocg: clearing host cache for ocg1001 which is shutdown for hardware repair. (on ocg1003: sudo -u ocg -g ocg nodejs-ocg /srv/deployment/ocg/ocg/mw-ocg-service/scripts/clear-host-cache.js -c /etc/ocg/mw-ocg-service.js ocg1001) T161158
  • 23:15 thcipriani@tin: Synchronized docroot/noc/conf/pageassessments.dblist: SWAT: Adding pageassessments.dblist for maintanence script T159438 PART II (duration: 00m 45s)
  • 23:14 thcipriani@tin: Synchronized dblists/pageassessments.dblist: SWAT: Adding pageassessments.dblist for maintanence script T159438 PART I (duration: 00m 45s)
  • 23:11 mutante: ocg1001 - scheduled downtime in icinga for host and all services, confirmed it's not actively doign things anymore, shutting down for hardware replacement (T161158)
  • 23:10 thcipriani@tin: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Flow beta feature on frwikiversity T162022 (duration: 00m 46s)
  • 23:04 mutante: ocg1001 - apt-get clean for disk space
  • 22:36 mutante: ocg1003 started picking up jobs (mw-ocg-latexer) after it was enabled with gerrit:347781, ocg1001 was disabled in the same change. Also ganglia graphs confirm it. T84723 T161158
  • 22:22 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: Enable alternate RevSlider slider on group0 T160410 (duration: 00m 45s)
  • 22:19 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=ocg1001.eqiad.wmnet
  • 22:17 addshore@tin: Synchronized wmf-config/InitialiseSettings.php: Enable TwoColConflict BetaFeature on fiwiki (duration: 00m 46s)
  • 21:23 mobrovac@tin: Finished deploy [restbase/deploy@a4042a6]: Update the legal text in the API docs (duration: 06m 49s)
  • 21:17 mobrovac@tin: Started deploy [restbase/deploy@a4042a6]: Update the legal text in the API docs
  • 21:16 mobrovac@tin: Finished deploy [restbase/deploy@a4042a6]: Staging: Update the legal text in the API docs (duration: 03m 55s)
  • 21:12 mobrovac@tin: Started deploy [restbase/deploy@a4042a6]: Staging: Update the legal text in the API docs
  • 21:12 mobrovac@tin: Finished deploy [restbase/deploy@a4042a6]: Dev cluster: Update the legal text in the API docs (duration: 01m 37s)
  • 21:11 mobrovac@tin: Started deploy [restbase/deploy@a4042a6]: Dev cluster: Update the legal text in the API docs
  • 20:51 _joe_: killed running 'puppet agent t-v' on ruthenium
  • 19:20 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, attempt#2 T160764 (duration: 01m 25s)
  • 19:18 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, attempt#2 T160764
  • 19:11 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, T160764 (duration: 03m 38s)
  • 19:08 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, full deploy, T160764
  • 19:08 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.20
  • 19:01 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#3 T160764 (duration: 00m 52s)
  • 19:00 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#3 T160764
  • 18:34 elukey: restart hhvm on mw1165 (debug in /tmp/hhvm.5384.bt.)
  • 18:25 demon@tin: Finished scap: testwiki to wmf.20 to bootstrap (duration: 35m 27s)
  • 17:49 demon@tin: Started scap: testwiki to wmf.20 to bootstrap
  • 17:49 demon@tin: Pruned MediaWiki: 1.29.0-wmf.17 [keeping static files] (duration: 00m 16s)
  • 17:41 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Initial Scap3 config deploy - T116335 (duration: 10m 39s)
  • 17:30 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Initial Scap3 config deploy - T116335
  • 17:23 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Repool db1093 (duration: 00m 57s)
  • 17:14 switchdc: (oblivian@sarin) END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
  • 17:08 mobrovac: restbase enabling back puppet for T116335
  • 17:07 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Staging: Initial Scap3 config deploy, take 2 - T116335 (duration: 02m 12s)
  • 17:06 switchdc: (oblivian@sarin) START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
  • 17:06 marostegui: Deploy unscheduled alter table on db1044 (s3, image table) - T160415
  • 17:05 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Staging: Initial Scap3 config deploy, take 2 - T116335
  • 17:05 ppchelko@tin: Finished deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#2 T160764 (duration: 03m 22s)
  • 17:04 marostegui: Deploy unscheduled alter table on db1015 (s3, image table) - T160415
  • 17:02 mobrovac@tin: Finished deploy [restbase/deploy@e470b9f]: Dev Cluster: Initial Scap3 config deploy, take 2 - T116335 (duration: 00m 58s)
  • 17:02 marostegui: Deploy unscheduled alter table on db1038 (s3, image table) - T160415
  • 17:02 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: nope, no wmf.19 for donatewiki. life is hard
  • 17:02 mobrovac@tin: Started deploy [restbase/deploy@e470b9f]: Dev Cluster: Initial Scap3 config deploy, take 2 - T116335
  • 17:01 ppchelko@tin: Started deploy [electron-render/deploy@5492cdb]: Update to latest upstream, canary on scb2001, attempt#2 T160764
  • 17:00 marostegui: Deploy unscheduled alter table on db1035 (s3, image table) - T160415
  • 16:58 marostegui: Deploy unscheduled alter table on db1077 (s3, image table) - T160415
  • 16:56 marostegui: Deploy unscheduled alter table on db1078 (s3, image table) - T160415
  • 16:54 demon@tin: rebuilt wikiversions.php and synchronized wikiversions files: donatewiki back to wmf.19. you put your left foot in