You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
User:Razzi: Difference between revisions
imported>Razzi (→How to apply hadoop config changes?: new section) |
imported>Razzi |
||
Line 113: | Line 113: | ||
For example https://gerrit.wikimedia.org/r/c/operations/puppet/+/698194/1/hieradata/common.yaml | For example https://gerrit.wikimedia.org/r/c/operations/puppet/+/698194/1/hieradata/common.yaml | ||
== linux-host-entries.ttyS0-115200 versus linux-host-entries.ttyS1-115200 == | |||
a mystery | |||
== sudo gnt-instance console an-airflow1002.eqiad.wmnet is stuck, is this normal? == | |||
? |
Revision as of 21:13, 14 June 2021
Learning the Wikimedia stack!
<InputBox>
type=create
placeholder=Article name
prefix=User:Razzi/
buttonlabel=Create user article
</InputBox>
<inputbox>
type=create
prefix=User:Razzi/
default=2022-08-16
buttonlabel=Create article for day
</inputbox>
<inputbox>
type=commenttitle
page=User:Razzi
buttonlabel=New section on this page
</inputbox>
Documentation
No changes were found matching these criteria.
- 2021-04-20
- 2021-05-5
- 2021-06-09
- 2021-06-1
- 2021-06-10
- 2021-06-14
- 2021-06-30
- 2021-07-01
- 2021-07-30
- 2021-08-02
- 2021-09-09
- 2021-09-24
- 2022-02-07
- 2022-03-28
- A week with the search team
- Analytics notes
- Debugging eventlogging to druid network flows internal hourly.service
- Developing cookbook locally
- Experiment: use puppet notice to show variable
- First logical volume resizing
- First pass at understanding T300164 varnishkafka alerts
- Ganeti error: Connection to console of instance datahubsearch1002.eqiad.wmnet failed
- How to depool / pool a host from etcd
- How to run systemd unit of another user
- How to show mysql host from sql query
- How to view pooled services for lvs
- Installing puppet on mac
- Learning about partitions for flerovium/furud
- Looking into The following units failed: wmf auto restart prometheus-mysqld-exporter@matomo.service
- NameNode vs DataNode
- Notes on clouddb views
- Plan to drain hadoop cluster
- Presto query logging: https://phabricator.wikimedia.org/T269832
- Puppetboard
- Set up haproxy on mediawiki-vagrant
- Setting up kerberos locally
- Spicerack python api repl
- Superset 1.3.1 upgrade recap
- T279304
- T280132 disk swap
- Triage Superset Dashboard Timeouts - T294768
- What is conftool
- alertname: Icinga/Check correctness of the icinga configuration
- an-master reimaging
- code search
- common.js
- deployment train 5-18
- firewall audit
- fm/CFSSL
- fm/SCSI
- grand SRE IC plan
- https://phabricator.wikimedia.org/T298505
- learning storage on vagrant
- logs
- new plan for reimaging an-masters
- rebase off of origin in one command
- reimage of db1125
- scratch
- snippets
- ssh config
- ssh single letter domain shortcut
- superset 1.3.1 errors
- vector.css
- working with apache atlas in docker
Lists
Questions
How does refine use salts? https://gerrit.wikimedia.org/r/c/operations/puppet/+/679939
Is /system a default directory for hadoop, or can we remove it?
Is there a place that lists the vlans?
How to check vlan for a host?
Q: Is it expected that when reimaging a host, we see the old name when running homer?
[edit interfaces interface-range disabled] - member ge-1/0/13; [edit interfaces interface-range vlan-analytics1-d-eqiad] + member ge-1/0/13; member ge-1/0/43 { ... } [edit interfaces] + ge-1/0/13 { + description "db1125 {#2221}"; + }
^ this is while decommissioning db1125
A: No, I skipped some netbox steps; when I fixed them this didn't show up
Q: How to submit a test job to the yarn queue to test if it is accepting jobs?
Q: What to do about this warning on analytics1068?
May 06 21:03:35 analytics1068 systemd[1]: /run/systemd/generator.late/hadoop-yarn-nodemanager.service:18: PIDFile= late/hadoop-yarn-nodemanager.service:18: PIDFile= references path below legacy directory /var/run/, updating /var/run/hadoop-yarn/yarn-yarn-nodemanager.pid → /run/hadoop-yarn/yarn-yarn-nodemanager.pid → /run/hadoop-yarn/yarn-yarn-nodemanager.pid; please update the unit file accordingly.
Q: Server Lifecycle#Rename while reimaging when to merge homer patch?
A: homer patch is for firewall, not having to do with the reimaging process. Merge after reimage complete
Q: What is the order for creating puppet patches when it comes to server lifecycle? Some things that might need to be avoided: having site.pp for node that is being decommissioned, having site.pp for node that doesn't exist yet
Ideas
Script to show what tickets are currently in progress
Add homer-public to codesearch
Remove legacy analytics-hadoop from grafana
Random notes
sudo lsof -Xd DEL
- lists the files that have been deleted but are still held open by a running process
Puppet
Why does sshing into mgmt not accept the password?
Because you forgot the `root@` part!
Instead of ssh dbstore1007.mgmt.e
do `ssh root@dbstore1007.mgmt.e`
Or make ssh use the root user in your ~/.ssh/config: https://stackoverflow.com/questions/10197559/ssh-configuration-override-the-default-username
refactor this to run automatically
Why no homer diff?
TBD
how to check what vlan a host belongs to?
???
Proposal: stop using conda for infrastructure
Why not use standard pip?
How to apply hadoop config changes?
For example https://gerrit.wikimedia.org/r/c/operations/puppet/+/698194/1/hieradata/common.yaml
linux-host-entries.ttyS0-115200 versus linux-host-entries.ttyS1-115200
a mystery
sudo gnt-instance console an-airflow1002.eqiad.wmnet is stuck, is this normal?
?