You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

AQS 2.0

From Wikitech-static
Jump to navigation Jump to search

Analytics Query Service (AQS) is the software behind the /metrics family of endpoints in RESTBase. It is a read-only HTTP proxy to results served from Cassandra and Druid. It is currently based on a very outdated fork of RESTBase, and has received little updates over the years. As a part of the goal to sunset RESTBase, AQS 2.0 is a project to migrate the /metrics endpoints to a set of services exposed via the API Gateway.

AQS 2.0 services

Project overview

Epic task in Phabricator | API Platform Team workboard

  1. Underway In progress Implement the new, stand-alone AQS service(s)
  2. Deploy to k8s
  3. Expose the /metrics hierarchy from the new service(s) using the API Gateway
  4. Switch RESTBase to proxying requests from the old AQS service, to the new k8s-based one
  5. Deprecate the http://{project}/api/rest_v1/metrics resources
  6. Eventually phase out the RESTBase /metrics hierarchy

Proposal

From phab:T263489

We propose to break down the rewrite along dataset boundaries — similar to the module structure in RESTBase — with a separate project used to implement each.

  • pageviews
  • unique devices
  • wikistats 2
  • mediarequests
  • geoeditors

The resulting service (or services) will be proxied by RESTBase and/or the API Gateway (the former to eventually be deprecated in favor of the latter) in order to maintain complete compatibility with the existing API.

The target language for these implementations is Go. While a complete comparison of Javascript/NodeJS and Go is out of scope for this issue, the (simplified) rationale is:

  • Strong, static typing; Statically typed languages eliminate entire classes of bugs common to dynamic languages, improve security, and making code easier to reason about
  • Ease of use; Go is more obvious, more explicit, and easier to understand. Complicated concepts like concurrency are easier to get right
  • Performance; Service latency can be expected to be both lower, but more importantly, more predictable with Go

Developer guide

Getting started

AQS 2.0 consists of several repositories. Some correspond to individual services that expose APIs. Others correspond to cross-service common functionality or test environments. These repositories are mostly stored in WMF's GitLab, but speculative/formative repositories may be stored elsewhere for now.

You will need:

Go (aka "golang") is an opinionated language in various ways. Among these is that you're probably much better off keeping your Go code under your "GOPATH" rather than wherever you may be used to keeping code. (There are, of course, always ways for savvy developers to cheat the system. If you choose to do that, any consequences are on you.) On my Mac, I cloned all the AQS 2.0 repositories under ~/go/src/.

The current list of repositories is:

Cassandra-backed services:

Common functionality:

Test environments:

There will be at least one additional service repository created before production release, to cover the Druid-backed wikistats2 endpoints. It is possible we may choose to break those endpoints into multiple services. It is also possible that one or more additional services may be required for new production endpoints that are being discussed, but which we do not yet have details (or data to serve).

Running a service

The various service README files contain details about running that particular service. But the summary is that you'll need to open several command line (aka "terminal") windows/tabs and run commands in each. The following describes how to execute the "pageviews" services. Other services operate similarly.

  • In one terminal, navigate to <GOPATH>/aqs-docker-test-env
  • Run "make startup", wait for it to say "Startup complete", then leave it running
  • In another terminal, also in <GOPATH>/aqs-docker-test-env, run "make bootstrap" and wait for it to complete
  • Navigate (either in that terminal or a different one) to <GOPATH>/pageviews
  • Run "make"
  • Run "./pageviews" (and leave it running)
  • In another terminal, navigate to <GOPATH>/pageviews and run "make test"
  • In your browser, visit http://localhost:8080/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Banana/daily/20190101/20190102

We haven't started the Druid-based endpoint(s) yet, but the process will likely be similar, with perhaps some differences in how to launch the test environment.

Tips and troubleshooting

Because Go is an opinionated language, it may refuse to run over seemingly small things, such as whitespace. If you see something like this:

   goimports: format errors detected

You can execute this to see what Go is unhappy about:

   goimports -d *.go

And this to automatically fix it:

   goimports -w *.go

Our services depend on several packages, including our own “aqsassist”, which is in active development. This means you may sometimes need to update dependencies for your local service to run. You can update all dependencies via:

   go get .

or update specific dependencies via something like:

   go get gitlab.wikimedia.org/frankie/aqsassist

API documentation

The AQS 2.0 project is currently evaluating an OpenAPI-based toolset to create API docs:

  • Swag generates an OpenAPI specification based on a mix of code annotations and the code itself.
  • RapiDoc converts the specification into HTML.

Our goal is to create API docs that are reliable and easy to update by maintaining docs as close as possible to the code. Add your feedback about these tools to the evaluation section.

Adding swag docs to an API

Anywhere in main.go, add annotations to document general information about the API.

Example

main.go
// API documentation
// @title                  Wikimedia Pageviews API
// @version                1.0
// @description.markdown   api.md
// @contact.name           API Platform Team
// @contact.url            https://www.mediawiki.org/wiki/Talk:...
// @contact.email          example-email@host.org
// @license.name           Apache 2.0
// @license.url            http://www.apache.org/licenses/LICENSE-2.0.html
// @termsOfService         https://wikimediafoundation.org/wiki/Terms_of_Use
// @host                   api.wikimedia.org
// @basePath               /metrics/pageviews/
// @schemes                https

Using a markdown file for the description

In this example, the API description is stored in an api.md file in the root directory of the repository. This avoids having long descriptive text inside the code file.

Resources

Documenting an endpoint

Add annotations to any code file to document an endpoint. Endpoint annotations should be stored as close as possible to the code they describe. The block of endpoint annotations must end on a line immediately preceding a function.

Example

per_article.go
// API documentation
// @summary      Get pageviews for a page.
// @router       /per-article/{project}/{access} [get]
// @description  Given a wiki page and a date range, returns a time series of pageview counts.
// @param        project      path  string  true  "Domain of a Wikimedia project"            example(en.wikipedia.org)
// @param        access       path  string  true  "Method of access" Enums(desktop, mobile)  example(all-access)
// @produce      json
// @success      200 {object} PerArticle
func ...

Annotating the response format

Swag automatically gets information about the response format from the struct. To complete the schema in the docs, add these elements to the struct definition:

  • an example value within the JSON encoding definition using the syntax example:"example value"
  • a description of the attribute as an inline comment

For example:

handler.go
type PerArticle struct {
	Project     string `json:"project" example:"en.wikipedia.org"` // Wikimedia project domain
	Article     string `json:"article" example:"Jupiter"`          // Name of the article
}

Resources

Generating the API specification

Install swag:

go install github.com/swaggo/swag/cmd/swag@latest

Generate the spec:

swag init --markdownFiles .

Swag outputs the spec in YAML and JSON formats to a /docs directory.

Viewing the API specification

To view the spec as a webpage with an interactive sandbox, visit the RapiDoc demo. Select the Local JSON File button in the header, and select the docs/swagger.json generated by swag. RapidDoc supports JSON only.

Alternative: Using RapiDoc locally

Copy the following HTML document, and open it in a browser. Select Local JSON File, and select the docs/swagger.json generated by swag.

This is a minimum viable workflow intended for evaluation and testing.
rapidoc.html
<!doctype html> 
<html>
  <head>
    <meta charset="utf-8"> 
    <script type="module" src="https://unpkg.com/rapidoc/dist/rapidoc-min.js"></script>
  </head>
  <body>
    <rapi-doc
      spec-url = "/"
      theme = "dark"
      schema-style="table"
      fill-request-fields-with-example="false"
      primary-color="#049DFF"
      font-size="large"
    >
    </rapi-doc>
  </body>
</html>

Alternative: Using Swagger UI

You can also view the spec file in GitLab to see a preview of the spec rendered with Swagger UI.

Evaluation

Evaluation of swag

Benefits:

  • Swag gets information about the response object from the struct without needing additional annotations. This helps limit the number of annotations needed per endpoint and reduces duplication of information.
  • Robust feature set
  • Outputs both JSON and YAML

Issues:

  • Each parameter must be defined on one line. Depending on the length of the description, this can result in very long annotation lines, which can be difficult to read. Since most of the parameters are shared between services, maybe we could find a way to reduce duplication of these docs.
  • Swag only supports OpenAPI 2.0, not the latest version: 3.0.
  • Swag fails if you use tabs in annotations in main.go but oddly not in other files.

Evaluation of RapiDoc

Benefits:

  • API sandbox
  • Sidebar navigation is easier to use than the expandable sections used by Swagger UI
  • Helpful feature that fills in parameters with example values, making it easier to try out the sandbox, not offered by Swagger UI
  • Dark mode, not offered by Swagger UI
  • Support for color and logo customization
  • Used by Toolhub

Issues:

  • Poor color contrast on small text in dark mode
  • No option to toggle between light mode and dark mode
  • The sandbox automatically encodes path parameters, which is confusing for parameters like page titles that need to be encoded. If you put an encoded page title into the sandbox, it will encode it again, resulting in an invalid page title. This is an issue with both RapiDoc and Swagger UI and seems to be the expected behavior of both tools. As a workaround, we can call out this difference in the docs, but it may still cause confusion.

See also