You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Performance/Graphite/Synthetic Instance: Difference between revisions
imported>Phedenskog (Updated proxy for the security group) |
imported>Phedenskog (Updated storage schemas to new setup) |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{TOC}} | |||
== Graphite for synthetic testing == | == Graphite for synthetic testing == | ||
We have our own instance of Graphite running outside of our environment to make it easy to add as many metrics as needed. You can see that metrics/data in our Grafana instance. | We have our own instance of Graphite running outside of our environment to make it easy to add as many metrics as needed. You can see that metrics/data in our Grafana instance under the namespace '''sitespeed_io.''' | ||
<br /> | |||
The instance is setup for keeping metrics for 60 days. That means we have a two months window to act on regressions and also means have room for adding a lot of more tests/metrics if we want.<br /> | |||
== Access == | |||
You need to have the pem file to be able to access the server: | |||
<code>ssh -i "graphite.pem" ubuntu@wpt-graphite.wmftest.org</code> | |||
== Start/stop == | |||
You use the docker compose file to start/stop Graphite. The compose location is <code>/home/ubuntu/graphite/docker-compose.yml</code> | |||
Start the instance: | |||
<code>docker-compose up</code> | |||
Stop the instance: | |||
<code>docker-compose down</code> | |||
== Setup == | == Setup == | ||
Line 8: | Line 27: | ||
The the size of the instance was chosen because a big company that also runs Graphite use the same setup for their synthetic testing. We can change that in the future. | The the size of the instance was chosen because a big company that also runs Graphite use the same setup for their synthetic testing. We can change that in the future. | ||
When you setup a new instance, you need to make sure it stores the data on a disk that don't belong to that instance. We have 200 gb extra running on an instance, setup using the [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html official AWS documentation]. Mount it and make sure it is automatically mounted after a reboot. The | When you setup a new instance, you need to make sure it stores the data on a disk that don't belong to that instance. We have 200 gb extra running on an instance, setup using the [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html official AWS documentation]. Mount it and make sure it is automatically mounted after a reboot. The extra disk lives under '''/data/'''. | ||
We run the official dockerized version of Graphite using a docker-compose file. To setup Graphite the way we want it, we need to setup five volumes/mappings. | We run the official [https://hub.docker.com/r/graphiteapp/graphite-statsd dockerized version of Graphite] using a docker-compose file. To setup Graphite the way we want it, we need to setup five volumes/mappings. | ||
* '''whisper''' is where we store all the metrics | * '''whisper''' is where we store all the metrics | ||
Line 18: | Line 37: | ||
* '''carbon.conf''' is carbon/whisper setup, we have our own version because the default one has a very moderate number of new metrics created per minute. | * '''carbon.conf''' is carbon/whisper setup, we have our own version because the default one has a very moderate number of new metrics created per minute. | ||
''docker-compose.yml''<syntaxhighlight lang="docker"> | === Configurations === | ||
All configuration files lives in the server in <code>/home/ubuntu/graphite/.</code><br /> | |||
==== Docker compose ==== | |||
Our docker compose file (''docker-compose.yml)'' is simple. We point out which Graphite version, which ports to use, auto restart if something fails and map all the volumes we need.<syntaxhighlight lang="docker"> | |||
version: "3" | version: "3" | ||
services: | services: | ||
Line 33: | Line 56: | ||
- /home/ubuntu/graphite/storage-aggregation.conf:/opt/graphite/conf/storage-aggregation.conf | - /home/ubuntu/graphite/storage-aggregation.conf:/opt/graphite/conf/storage-aggregation.conf | ||
- /home/ubuntu/graphite/carbon.conf:/opt/graphite/conf/carbon.conf | - /home/ubuntu/graphite/carbon.conf:/opt/graphite/conf/carbon.conf | ||
memcached: | |||
image: memcached:1.5.16 | |||
ports: | |||
- "11211:11211" | |||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Storage aggregation ==== | |||
''storage-aggregation.conf''<syntaxhighlight lang="text"> | ''storage-aggregation.conf''<syntaxhighlight lang="text"> | ||
# Aggregation methods for whisper files. Entries are scanned in order, | # Aggregation methods for whisper files. Entries are scanned in order, | ||
Line 73: | Line 99: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Storage schemas ==== | |||
''storage-schemas.conf''<syntaxhighlight lang="text"> | ''storage-schemas.conf''<syntaxhighlight lang="text"> | ||
# Schema definitions for Whisper files. Entries are scanned in order, | # Schema definitions for Whisper files. Entries are scanned in order, | ||
Line 87: | Line 113: | ||
pattern = ^carbon\. | pattern = ^carbon\. | ||
retentions = 60:1d | retentions = 60:1d | ||
[crux] | |||
pattern = ^sitespeed_io\.crux\. | |||
retentions = 1d:2y | |||
[alexa] | |||
pattern = ^sitespeed_io\.desktop\.firstViewAlexa\. | |||
retentions = 1h:30d | |||
[sitespeed-firstview-desktop] | |||
pattern = ^sitespeed_io\.desktop\.firstView\. | |||
retentions = 1h:400d | |||
[sitespeed-desktop-user-journey-login] | |||
pattern = ^sitespeed_io\.desktop\.userJourneyLogin\. | |||
retentions = 1h:400d | |||
[webpagereplay-desktop] | |||
pattern = ^sitespeed_io\.desktop\.webpagereplay\. | |||
retentions = 1h:90d | |||
[alexa-emulated-mobile] | |||
pattern = ^sitespeed_io\.emulatedMobile\.firstViewAlexa\. | |||
retentions = 1h:30d | |||
[webpagereplay-emulated-mobile] | |||
pattern = ^sitespeed_io\.emulatedMobile\.webpagereplay\. | |||
retentions = 1h:90d | |||
[sitespeed-firstview-emulated-mobile] | |||
pattern = ^sitespeed_io\.emulatedMobile\.firstView\. | |||
retentions = 1h:400d | |||
[sitespeed-emulated-mobile-user-journey] | |||
pattern = ^sitespeed_io\.emulatedMobile\.userJourneyLogin\. | |||
retentions = 1h:400d | |||
[sitespeed-wpt-desktop] | |||
pattern = ^sitespeed_io\.webpagetest\.firstView\.pageSummary\.en_wikipedia_org\. | |||
retentions = 1h:400d | |||
[sitespeed-wpt-emulated-mobile] | |||
pattern = ^sitespeed_io\.webpagetestEmulatedMobile\.firstView\.pageSummary\.en_m_wikipedia_org\. | |||
retentions = 1h:40d | |||
[sitespeed] | [sitespeed] | ||
pattern = ^sitespeed_io\. | pattern = ^sitespeed_io\. | ||
retentions = | retentions = 1h:33d | ||
[cath_them_all] | [cath_them_all] | ||
pattern = .* | pattern = .* | ||
retentions = | retentions = 1h:60d | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Security | ==== Security groups ==== | ||
The instance has it own security group that make sure we only get data from our blessed instances. The Inbound group looks like this: | The instance has it own security group that make sure we only get data from our blessed instances. The Inbound group looks like this: | ||
Revision as of 12:09, 25 October 2021
Graphite for synthetic testing
We have our own instance of Graphite running outside of our environment to make it easy to add as many metrics as needed. You can see that metrics/data in our Grafana instance under the namespace sitespeed_io.
The instance is setup for keeping metrics for 60 days. That means we have a two months window to act on regressions and also means have room for adding a lot of more tests/metrics if we want.
Access
You need to have the pem file to be able to access the server:
ssh -i "graphite.pem" ubuntu@wpt-graphite.wmftest.org
Start/stop
You use the docker compose file to start/stop Graphite. The compose location is /home/ubuntu/graphite/docker-compose.yml
Start the instance:
docker-compose up
Stop the instance:
docker-compose down
Setup
The instance run on AWS on an m4.xlarge instance with an extra volume. We use AWS since our agents that collects the data uses AWS and then we can use security groups to make sure only our instances can post data to the instance.
The the size of the instance was chosen because a big company that also runs Graphite use the same setup for their synthetic testing. We can change that in the future.
When you setup a new instance, you need to make sure it stores the data on a disk that don't belong to that instance. We have 200 gb extra running on an instance, setup using the official AWS documentation. Mount it and make sure it is automatically mounted after a reboot. The extra disk lives under /data/.
We run the official dockerized version of Graphite using a docker-compose file. To setup Graphite the way we want it, we need to setup five volumes/mappings.
- whisper is where we store all the metrics
- graphite.db is the database where Graphites annotations is stored
- storage-schemas.conf configures how long time we want to store the metrics
- storage-aggregation.conf configures how we want to aggregate metrics
- carbon.conf is carbon/whisper setup, we have our own version because the default one has a very moderate number of new metrics created per minute.
Configurations
All configuration files lives in the server in /home/ubuntu/graphite/.
Docker compose
Our docker compose file (docker-compose.yml) is simple. We point out which Graphite version, which ports to use, auto restart if something fails and map all the volumes we need.
version: "3"
services:
graphite:
image: graphiteapp/graphite-statsd:1.1.5-12
ports:
- "2003:2003"
- "8080:80"
restart: always
volumes:
- /data/whisper:/opt/graphite/storage/whisper
- /data/graphite.db:/opt/graphite/storage/graphite.db
- /home/ubuntu/graphite/storage-schemas.conf:/opt/graphite/conf/storage-schemas.conf
- /home/ubuntu/graphite/storage-aggregation.conf:/opt/graphite/conf/storage-aggregation.conf
- /home/ubuntu/graphite/carbon.conf:/opt/graphite/conf/carbon.conf
memcached:
image: memcached:1.5.16
ports:
- "11211:11211"
Storage aggregation
storage-aggregation.conf
# Aggregation methods for whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds
#
# [name]
# pattern = <regex>
# xFilesFactor = <float between 0 and 1>
# aggregationMethod = <average|sum|last|max|min>
#
# name: Arbitrary unique name for the rule
# pattern: Regex pattern to match against the metric name
# xFilesFactor: Ratio of valid data points required for aggregation to the next retention to occur
# aggregationMethod: function to apply to data points for aggregation
#
[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.0
aggregationMethod = average
Storage schemas
storage-schemas.conf
# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#
# [name]
# pattern = regex
# retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...
# Carbon's internal metrics. This entry should match what is specified in
# CARBON_METRIC_PREFIX and CARBON_METRIC_INTERVAL settings
[carbon]
pattern = ^carbon\.
retentions = 60:1d
[crux]
pattern = ^sitespeed_io\.crux\.
retentions = 1d:2y
[alexa]
pattern = ^sitespeed_io\.desktop\.firstViewAlexa\.
retentions = 1h:30d
[sitespeed-firstview-desktop]
pattern = ^sitespeed_io\.desktop\.firstView\.
retentions = 1h:400d
[sitespeed-desktop-user-journey-login]
pattern = ^sitespeed_io\.desktop\.userJourneyLogin\.
retentions = 1h:400d
[webpagereplay-desktop]
pattern = ^sitespeed_io\.desktop\.webpagereplay\.
retentions = 1h:90d
[alexa-emulated-mobile]
pattern = ^sitespeed_io\.emulatedMobile\.firstViewAlexa\.
retentions = 1h:30d
[webpagereplay-emulated-mobile]
pattern = ^sitespeed_io\.emulatedMobile\.webpagereplay\.
retentions = 1h:90d
[sitespeed-firstview-emulated-mobile]
pattern = ^sitespeed_io\.emulatedMobile\.firstView\.
retentions = 1h:400d
[sitespeed-emulated-mobile-user-journey]
pattern = ^sitespeed_io\.emulatedMobile\.userJourneyLogin\.
retentions = 1h:400d
[sitespeed-wpt-desktop]
pattern = ^sitespeed_io\.webpagetest\.firstView\.pageSummary\.en_wikipedia_org\.
retentions = 1h:400d
[sitespeed-wpt-emulated-mobile]
pattern = ^sitespeed_io\.webpagetestEmulatedMobile\.firstView\.pageSummary\.en_m_wikipedia_org\.
retentions = 1h:40d
[sitespeed]
pattern = ^sitespeed_io\.
retentions = 1h:33d
[cath_them_all]
pattern = .*
retentions = 1h:60d
Security groups
The instance has it own security group that make sure we only get data from our blessed instances. The Inbound group looks like this:
Custom TCP 8080 - Access from the proxy (install1002) that is used by grafana.wikimedia.org
Custom TCP 8080 - Access from the agents security group that runs sitespeed.io/browsertime/webpagetest (for annotations)
Custom TCP 2003 - Access from the agents security group that runs sitespeed.io/browsertime/webpagetest (for metrics)
SSH TCP 22 - Access only from the .pem file.