Shellbox
Shellbox is a library for remote command execution, and a server for secure command execution. It was primarily implemented to sandbox lilypond (used by the Score extension ) and provide a way for MediaWiki to utilize external binaries without needing them to be in the same container. Shellbox relies on Kubernetes (and Linux containers/namespaces) to provide isolation and resource limits for external commands.
Documentation for integration in MediaWiki is available at mw:Shellbox , operational aspects are here on Wikitech.
Production Shellbox generally runs the same PHP version as production MediaWiki most of the time; during the migration to a newer PHP version, Shellbox services are often among the first to move to the new PHP version (see e.g. T377038 for PHP 8.1 and T403284 for PHP 8.3), somewhat ahead of MediaWiki.
Architecture
Requests come into an Apache httpd container, which contains the Shellbox secret key as a configmap. The request is passed onto a php-fpm container, which contains the Shellbox code and necessary binaries. Once the request is authenticated, Shellbox executes the command as the www-data user. The response is then sent back. Yeah.
MediaWiki talks to Shellbox over a local envoyproxy.
Shellboxes
We currently have six Shellbox deployments in active use:
- shellbox
- Used by: Score
- URLs: https://shellbox.discovery.wmnet:4008 and http://localhost:6024 on appservers
- Extra packages: lilypond, imagemagick, ghostscript, fluidsynth, lame, noto fonts
- T281423: New Service Request Shellbox
- shellbox-constraints
- Used by: WikibaseQualityConstraints regex checking
- URLs: https://shellbox-constraints.discovery.wmnet:4010 and http://localhost:6025 on appservers
- Extra packages: none
- T285104: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes
- shellbox-media
- Used by: DjVU (in core), PdfHandler , and PagedTiffHandler
- URLs: https://shellbox-media.discovery.wmnet:4015 and http://localhost:6026 on appservers
- Extra packages: djvulibre-bin, libtiff-tools, poppler-utils
- T289228: Convert media handling code (PdfHandler, PagedTiffHandler) to use Shellbox
- shellbox-syntaxhighlight
- Used by: SyntaxHighlight
- URLs: https://shellbox-syntaxhighlight.discovery.wmnet:4014 and http://localhost:6027 on appservers
- Extra packages: python3, pygments
- T289227: Convert SyntaxHighlight to use Shellbox
- shellbox-timeline
- Used by: EasyTimeline
- URLs: https://shellbox-timeline.discovery.wmnet:4012 and http://localhost:6028 on appservers
- Extra packages: librsvg, perl, ploticus, various fonts
- T289226: Convert EasyTimeline extension to use Shellbox
- shellbox-video
- Used by: TimedMediaHandler for transcoding video
- URLs: https://shellbox-video.discovery.wmnet:4080 and http://localhost:6036 on appservers
- Extra packages: ffmpeg, fluidsynth, wmf-certificates
- T356241: Move video transcoding to use Shellbox
Monitoring
- Primary Shellbox dashboard , supports all deployed Shellboxes.
Shellbox provides a
/healthz
endpoint that can be used to quickly check if the service is up, e.g.:
user@host$ curl https://shellbox.discovery.wmnet:4008/healthz
{
"__": "Shellbox running",
"pid": 10782
}
All other requests are harder to externally construct since they need to be signed with the Shellbox secret key.
Bugs should be reported/tracked in #Shellbox on Phabricator .
Logs
All logs from httpd and php-fpm should end up in logstash. You can filter for a specific Shellbox deployment with
kubernetes.namespace_name:"shellbox-constraints"
. The actual log text is under the field
log
(not
message
like MediaWiki). The httpd access logs that are HTTP 200 are dropped because of the volume and minimal likelihood they'll be useful.
All Shellbox invocations should still be logged under MediaWiki's
exec
log channel too.
Deploying a new version
After a patch to Shellbox is merged, new image versions are automatically built via the postmerge jobs ( trigger- ) shellbox-pipeline-publish ; the new image versions should show up in the Wikimedia Docker registry within ca. 30 minutes.
If you'd like to build a new set of images, but do not have a patch to merge (e.g., if you simply wish to pick up newer Debian packages), then you have a couple of options. The simplest approach to consider is updating 'rebuildCounter' in the Blubber file to trigger a new builds. This is appropriate when you are comfortable using images that reflect the latest state of the default branch (master).
However, if there are unreleased code changes that have been merged and will require additional care to roll out safely, you might consider a "replay" rebuild approach, in order to decouple those changes from your needs. In more detail:
-
You can trigger a replay of the
shellbox-pipeline-publish
postmerge job associated with any previously merged commit. This will require finding the build ID of the original job:
- If you know the Gerrit change that triggered the job, then you can find the build ID in the comments added by PipelineBot.
-
If you wish to rebuild the current image referenced by
shellbox.version(see below), you can find the build ID via thejenkins.buildlabel on one of the associated container image variants (e.g., viadocker image inspect).
- Once you've identified the build ID, navigate to it in the Jenkins UI and click the (replay) Rebuild link on the left hand side. Fresh images will be built at the same commit as the original job, and published with a new tag timestamp (see Console Output on your new shellbox-pipeline-publish job).
To deploy the new image version to production:
-
Change the value of
shellbox.versioninhelmfile.d/services/shellbox/global.yamlin the deployment-charts repository -
Deploy each version of shellbox following the
general deployment guidelines
. If you want a quick way to cycle through them, once you've tested how they perform on staging:
cd /srv/deployment-charts/helmfile.d/services # Change the DC according to your needs DC=codfw for deployment in shellbox*; do echo "#### Doing $deployment" sleep 5 pushd $deployment helmfile -e $DC -i apply --context 5 popd done
Note
: While the production migration to PHP 8.1 is in progress, you may see
two
helmfile releases per Shellbox deployment:
main
and
migration
. The latter is temporary and will be turned down when the migration completes. More details can be found in
task T377038
.
Smoke test
Quick verification that the containers are at least running:
cd /srv/deployment-charts/helmfile.d/services
# Change the DC according to your needs
DC=staging
for deployment in shellbox*; do
echo "#### Checking $deployment in $DC"
kube_env $deployment $DC
curl https://staging.svc.eqiad.wmnet:$(kubectl get service shellbox-main-tls-service -o jsonpath='{.spec.ports[0].nodePort}')
done
This
curl
call is expected to return a JSON payload describing an error processing the request because no action was encoded in the URL.
{
"__": "Shellbox server error",
"class": "Shellbox\\ShellboxError",
"message": "No action was specified",
"log": [
{
"level": 400,
"message": "Exception of class Shellbox\\ShellboxError: No action was specified",
"context": {
"trace": "#0 /srv/app/src/Server.php(72): Shellbox\\Server->guardedExecute('/srv/app/config...')\n#1 /srv/app/src/Server.php(61): Shellbox\\Server->execute('/srv/app/config...')\n#2 /srv/app/index.php(3): Shellbox\\Server::main('/srv/app/config...')\n#3 {main}"
}
}
]
}