You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Release Engineering/SAL: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (hashar: Updating Quibble Jenkins jobs to 0.0.26) |
imported>Stashbot (thcipriani: bring integration-slave-docker-1037 back online after rm -rf /srv/jenkins-workspace/workspace/*) |
||
Line 1: | Line 1: | ||
== 2018-10-02 == | |||
* 16:45 thcipriani: bring integration-slave-docker-1037 back online after rm -rf /srv/jenkins-workspace/workspace/* | |||
* 16:20 thcipriani: investigating integration-slave-docker-1037 | |||
* 08:56 godog: bounce logstash | |||
* 07:59 legoktm: deployed https://gerrit.wikimedia.org/r/463905 | |||
* 07:38 legoktm: building quibble-stretch-php71 docker image | |||
* 05:32 legoktm: rebuilding quibble-stretch images https://gerrit.wikimedia.org/r/463883 | |||
* 03:26 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/463886 | |||
== 2018-10-01 == | |||
* 20:09 thcipriani: deployment-deploy01:sudo rm -rf /tmp/scap_l10n_* to remove stale l10n json and free up space | |||
* 17:13 marxarelli: bringing integration-slave-docker-1041 back online following source directory clean up ([[phab:T205902|T205902]]) | |||
* 16:52 marxarelli: removing old workspace src directories left by non-quibble docker jobs on integration-slave-docker-1041 | |||
* 07:10 mdholloway: deployment-maps04 updated kartotherian and tilerator to latest | |||
* 05:24 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@07cbfb4]: Update mobileapps to {{Gerrit|a1fa41b}} | |||
== 2018-09-29 == | |||
* 14:48 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/463615 | |||
* 01:22 Krinkle: Marking integration-slave-docker-1041 as offline (again). Why/How did it come back? | |||
* 00:24 Krinkle: Marking integration-slave-docker-1041 as offline. Various odd build failures, including https://integration.wikimedia.org/ci/job/mediawiki-quibble-composer-mysql-php70-docker/6913/console | |||
== 2018-09-28 == | |||
* 15:31 Amir1: ladsgroup@deployment-deploy01:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --prefix ([[phab:T201009|T201009]]) | |||
* 14:50 thcipriani: investigating integration-slave-docker-1041 | |||
* 07:35 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@bf09080]: Update mobileapps to {{Gerrit|7878ffc}} | |||
== 2018-09-27 == | |||
* 17:53 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@a0054ba]: Update mobileapps to {{Gerrit|0d6c2b7}} | |||
* 07:38 mdholloway: deployment-maps04 updated tilerator and kartotherian node modules ([[phab:T195513|T195513]], [[phab:T200594|T200594]]) | |||
== 2018-09-26 == | |||
* 15:13 thcipriani: integration-slave-docker-1034:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online -- https://phabricator.wikimedia.org/P7592 | |||
* 15:05 thcipriani: integration-slave-docker-1033:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online | |||
* 14:47 thcipriani: investigating integration-slave-docker-103{3,4} | |||
* 11:57 Amir1: [[gerrit:462927]] (ores) is going to beta | |||
* 08:24 hashar: Restarting CI Jenkins on contint1001 [#2] | |||
* 08:14 hashar: Restarting CI Jenkins on contint1001 | |||
== 2018-09-25 == | |||
* 23:01 marxarelli: configured new jenkins node integration-slave-docker-1043 with 6 executors | |||
* 23:01 marxarelli: replaced integration-slave-docker-1042 with new integration-slave-docker-1043 instance | |||
* 22:39 marxarelli: launching new integration-slave-docker-1042 bigram instance | |||
* 22:33 marxarelli: deleting remaining m1.medium instances used as m4executors ([[phab:T205362|T205362]]) | |||
* 22:15 marxarelli: taking remaining m1.medium m4executor jenkins nodes offline ([[phab:T205362|T205362]]) | |||
* 18:16 marxarelli: reconfiguring bigram jenkins nodes to use 6 executors. 7 were configured by mistake ([[phab:T205362|T205362]]) | |||
* 18:00 marxarelli: configuring new integration-slave-docker-1041 jenkins node with 7 executors ([[phab:T205362|T205362]]) | |||
* 17:42 marxarelli: configuring new jenkins node integration-slave-docker-1040 with 7 executors ([[phab:T205362|T205362]]) | |||
* 17:38 marxarelli: launching integration-slave-docker-1041 bigram instance ([[phab:T205362|T205362]]) | |||
* 17:30 marxarelli: the puppet parameter for docker_lvm_volume specified in horizon was not applied correctly on the first puppet run for some reason. tearing down integration-slave-docker-1039... | |||
* 17:25 marxarelli: launching integration-slave-docker-1040 bigram instance ([[phab:T205362|T205362]]) | |||
* 17:24 marxarelli: deleting instances integration-slave-docker-1007/1008 ([[phab:T205362|T205362]]) | |||
* 17:13 marxarelli: launching new integration-slave-docker-1039 bigram instance | |||
* 17:12 marxarelli: taking integration-slave-docker-1007/1008 offline for replacement ([[phab:T205362|T205362]]) | |||
* 17:09 marxarelli: deleting integration-slave-docker-1030/1031 instances ([[phab:T205362|T205362]]) | |||
* 17:05 marxarelli: taking integration-slave-docker-1030/1031 offline for replacement | |||
* 16:47 marxarelli: increasing executors to 7 for jenkins nodes integration-slave-docker-1033/1034 | |||
* 16:46 marxarelli: new instance creation delayed due to quota | |||
* 16:45 marxarelli: launching new integration-slave-docker-1039/1040 bigram instances | |||
* 01:21 legoktm: deployed https://gerrit.wikimedia.org/r/450508 | |||
* 00:36 legoktm: deploying https://gerrit.wikimedia.org/r/462609 | |||
* 00:22 legoktm: deploying https://gerrit.wikimedia.org/r/453447 | |||
== 2018-09-24 == | |||
* 20:21 bearND: (beta): Update mobileapps to {{Gerrit|badb463}} | |||
* 10:55 hashar: gerrit: granting labs/tools/* project owners the ability to submit changes {{!}} https://gerrit.wikimedia.org/r/#/c/labs/tools/+/462420/ | |||
* 09:51 hashar: deployment-deploy01 : backed up /srv/mediawiki-staging/php-master/cache/gitinfo and created a new. Its size of 69632 bytes might cause slow writes?? {{!}} [[phab:T204762|T204762]] | |||
* 09:24 hashar: Live hacked scap code on deployment-deploy01 for [[phab:T204762|T204762]] and reverted hack changes | |||
* 08:32 hashar: deployment-deploy01 rm -fR /tmp/scap_l10n_* | |||
* 06:41 legoktm: deploying https://gerrit.wikimedia.org/r/462341 | |||
* 03:45 kart_: Update cxserver to {{Gerrit|d913793}} | |||
== 2018-09-23 == | |||
* 14:03 Krenair: rm stuff in deployment-deploy01:/tmp to try to clear space and stop shinken whining | |||
* 01:05 andrewbogott: rebooted deployment-maps03; OOM and also [[phab:T205195|T205195]] | |||
== 2018-09-22 == | |||
* 20:51 Hauskatze: github: deleting several wikimedia/mediawiki-extensions-Collection-.* mirror repos for [[phab:T183891|T183891]] | |||
* 20:05 Hauskatze: github: deleted mirror wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-zim_renderer {{!}} [[phab:T183891|T183891]]; moving to the next one | |||
* 18:21 Krenair: went to do the same with deployment-maps03 and accidentally broke SSH access to the server | |||
* 18:21 Krenair: removed ferm package from deployment-snapshot01 as it appeared unmanaged by puppet and was causing problems with SSH access from the current deployment hosts (previous logs referenced [[phab:T153468|T153468]], this just explains why puppet hadn't purged stuff) | |||
* 18:01 Krenair: rm deployment-maps03:/etc/ferm/conf.d/10_redis_exporter_6379 as it was breaking ferm from starting ([[phab:T153468|T153468]]), puppet has not re-created it so I assume it was historical (shouldn't puppet be purging such files?) | |||
* 18:00 Krenair: rm deployment-snapshot01:/etc/ferm/conf.d/10_prometheus-nutcracker-exporter as it was breaking ferm from starting ([[phab:T153468|T153468]]), puppet has not re-created it so I assume it was historical (shouldn't puppet be purging such files?) | |||
== 2018-09-21 == | |||
* 17:26 marxarelli: adding jenkins node integration-slave-docker-1038 with 7 executors | |||
* 16:47 marxarelli: added new jenkins node integration-slave-docker-1037 with 7 executors | |||
* 15:49 marxarelli: replacing integration-slave-docker-1036 with new bigram instance | |||
* 15:48 marxarelli: taking node integration-slave-docker-1035 offline due to unusually high steal cpu time and long build durations | |||
* 15:17 marxarelli: integration-slave-docker-1035/1036 showing unusually high cpu steal and unusually long mean build durations | |||
* 15:15 marxarelli: taking integration-slave-docker-1036 offline due to unusually high cpu steal % trend | |||
* 15:13 marxarelli: launching integration-slave-docker-1037 bigram instance | |||
* 13:03 Amir1: ores:7b987a7 is going beta | |||
* 05:32 legoktm: deployed https://gerrit.wikimedia.org/r/461510 | |||
== 2018-09-20 == | |||
* 23:48 marxarelli: adding new integration-slave-docker-1035/1036 jenkins nodes, each with 7 executors | |||
* 23:23 marxarelli: launching integration-slave-docker-1035/1036 bigram instances | |||
* 23:20 marxarelli: taking integration-slave-docker-1004/1005 offline for replacement ([[phab:T202160|T202160]]) | |||
* 16:52 Amir1: deploy ores:ee2d28b | |||
* 11:21 hashar: Refreshing jenkins jobs to get rid of docker run option "--tmp /tmpfs" . It is mounted with 'noexec' which causes various jobs to fail. {{!}} [[phab:T203181|T203181]] and [[phab:T204919|T204919]] | |||
* 11:17 hashar: deployment-deploy01: removed /srv/deployment/analytics/refinery-cache (8GBytes) | |||
* 11:07 hashar: deployment-deploy01 is out of disk space (again) | |||
== 2018-09-19 == | |||
* 21:18 Hauskatze: github: deleted https://github.com/wikimedia/mediawiki-services-ocg-collection {{!}} [[phab:T183891|T183891]] | |||
* 20:07 bearND: (beta): Update mobileapps to {{Gerrit|a224e99}} | |||
* 18:56 Amir1: ores:76fe25a goes to beta ([[phab:T204862|T204862]]) | |||
== 2018-09-18 == | == 2018-09-18 == | ||
* 10:04 hashar: Updating Quibble Jenkins jobs to 0.0.26 | * 10:04 hashar: Updating Quibble Jenkins jobs to 0.0.26 | ||
Line 1,473: | Line 1,581: | ||
* 19:56 Amir1: restarting ores services in deployment-sca03 ([[phab:T183862|T183862]]) | * 19:56 Amir1: restarting ores services in deployment-sca03 ([[phab:T183862|T183862]]) | ||
{{SAL-archives/Release Engineering}} | |||
<noinclude>[[Category:SAL]]</noinclude> | <noinclude>[[Category:SAL]]</noinclude> |
Revision as of 16:45, 2 October 2018
2018-10-02
- 16:45 thcipriani: bring integration-slave-docker-1037 back online after rm -rf /srv/jenkins-workspace/workspace/*
- 16:20 thcipriani: investigating integration-slave-docker-1037
- 08:56 godog: bounce logstash
- 07:59 legoktm: deployed https://gerrit.wikimedia.org/r/463905
- 07:38 legoktm: building quibble-stretch-php71 docker image
- 05:32 legoktm: rebuilding quibble-stretch images https://gerrit.wikimedia.org/r/463883
- 03:26 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/463886
2018-10-01
- 20:09 thcipriani: deployment-deploy01:sudo rm -rf /tmp/scap_l10n_* to remove stale l10n json and free up space
- 17:13 marxarelli: bringing integration-slave-docker-1041 back online following source directory clean up (T205902)
- 16:52 marxarelli: removing old workspace src directories left by non-quibble docker jobs on integration-slave-docker-1041
- 07:10 mdholloway: deployment-maps04 updated kartotherian and tilerator to latest
- 05:24 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b
2018-09-29
- 14:48 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/463615
- 01:22 Krinkle: Marking integration-slave-docker-1041 as offline (again). Why/How did it come back?
- 00:24 Krinkle: Marking integration-slave-docker-1041 as offline. Various odd build failures, including https://integration.wikimedia.org/ci/job/mediawiki-quibble-composer-mysql-php70-docker/6913/console
2018-09-28
- 15:31 Amir1: ladsgroup@deployment-deploy01:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --prefix (T201009)
- 14:50 thcipriani: investigating integration-slave-docker-1041
- 07:35 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@bf09080]: Update mobileapps to 7878ffc
2018-09-27
- 17:53 mdholloway: deployment-mcs01 deployed [mobileapps/deploy@a0054ba]: Update mobileapps to 0d6c2b7
- 07:38 mdholloway: deployment-maps04 updated tilerator and kartotherian node modules (T195513, T200594)
2018-09-26
- 15:13 thcipriani: integration-slave-docker-1034:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online -- https://phabricator.wikimedia.org/P7592
- 15:05 thcipriani: integration-slave-docker-1033:sudo rm -rf /srv/jenkins-workspace/workspace/* and bring back online
- 14:47 thcipriani: investigating integration-slave-docker-103{3,4}
- 11:57 Amir1: gerrit:462927 (ores) is going to beta
- 08:24 hashar: Restarting CI Jenkins on contint1001 [#2]
- 08:14 hashar: Restarting CI Jenkins on contint1001
2018-09-25
- 23:01 marxarelli: configured new jenkins node integration-slave-docker-1043 with 6 executors
- 23:01 marxarelli: replaced integration-slave-docker-1042 with new integration-slave-docker-1043 instance
- 22:39 marxarelli: launching new integration-slave-docker-1042 bigram instance
- 22:33 marxarelli: deleting remaining m1.medium instances used as m4executors (T205362)
- 22:15 marxarelli: taking remaining m1.medium m4executor jenkins nodes offline (T205362)
- 18:16 marxarelli: reconfiguring bigram jenkins nodes to use 6 executors. 7 were configured by mistake (T205362)
- 18:00 marxarelli: configuring new integration-slave-docker-1041 jenkins node with 7 executors (T205362)
- 17:42 marxarelli: configuring new jenkins node integration-slave-docker-1040 with 7 executors (T205362)
- 17:38 marxarelli: launching integration-slave-docker-1041 bigram instance (T205362)
- 17:30 marxarelli: the puppet parameter for docker_lvm_volume specified in horizon was not applied correctly on the first puppet run for some reason. tearing down integration-slave-docker-1039...
- 17:25 marxarelli: launching integration-slave-docker-1040 bigram instance (T205362)
- 17:24 marxarelli: deleting instances integration-slave-docker-1007/1008 (T205362)
- 17:13 marxarelli: launching new integration-slave-docker-1039 bigram instance
- 17:12 marxarelli: taking integration-slave-docker-1007/1008 offline for replacement (T205362)
- 17:09 marxarelli: deleting integration-slave-docker-1030/1031 instances (T205362)
- 17:05 marxarelli: taking integration-slave-docker-1030/1031 offline for replacement
- 16:47 marxarelli: increasing executors to 7 for jenkins nodes integration-slave-docker-1033/1034
- 16:46 marxarelli: new instance creation delayed due to quota
- 16:45 marxarelli: launching new integration-slave-docker-1039/1040 bigram instances
- 01:21 legoktm: deployed https://gerrit.wikimedia.org/r/450508
- 00:36 legoktm: deploying https://gerrit.wikimedia.org/r/462609
- 00:22 legoktm: deploying https://gerrit.wikimedia.org/r/453447
2018-09-24
- 20:21 bearND: (beta): Update mobileapps to badb463
- 10:55 hashar: gerrit: granting labs/tools/* project owners the ability to submit changes | https://gerrit.wikimedia.org/r/#/c/labs/tools/+/462420/
- 09:51 hashar: deployment-deploy01 : backed up /srv/mediawiki-staging/php-master/cache/gitinfo and created a new. Its size of 69632 bytes might cause slow writes?? | T204762
- 09:24 hashar: Live hacked scap code on deployment-deploy01 for T204762 and reverted hack changes
- 08:32 hashar: deployment-deploy01 rm -fR /tmp/scap_l10n_*
- 06:41 legoktm: deploying https://gerrit.wikimedia.org/r/462341
- 03:45 kart_: Update cxserver to d913793
2018-09-23
- 14:03 Krenair: rm stuff in deployment-deploy01:/tmp to try to clear space and stop shinken whining
- 01:05 andrewbogott: rebooted deployment-maps03; OOM and also T205195
2018-09-22
- 20:51 Hauskatze: github: deleting several wikimedia/mediawiki-extensions-Collection-.* mirror repos for T183891
- 20:05 Hauskatze: github: deleted mirror wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-zim_renderer | T183891; moving to the next one
- 18:21 Krenair: went to do the same with deployment-maps03 and accidentally broke SSH access to the server
- 18:21 Krenair: removed ferm package from deployment-snapshot01 as it appeared unmanaged by puppet and was causing problems with SSH access from the current deployment hosts (previous logs referenced T153468, this just explains why puppet hadn't purged stuff)
- 18:01 Krenair: rm deployment-maps03:/etc/ferm/conf.d/10_redis_exporter_6379 as it was breaking ferm from starting (T153468), puppet has not re-created it so I assume it was historical (shouldn't puppet be purging such files?)
- 18:00 Krenair: rm deployment-snapshot01:/etc/ferm/conf.d/10_prometheus-nutcracker-exporter as it was breaking ferm from starting (T153468), puppet has not re-created it so I assume it was historical (shouldn't puppet be purging such files?)
2018-09-21
- 17:26 marxarelli: adding jenkins node integration-slave-docker-1038 with 7 executors
- 16:47 marxarelli: added new jenkin