You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
On consistency, I had put "...storage must provide read-your-write consistency...", which was then changed to "...storage must generally provide read-your-write consistency..." (emphasis mine). Why? Generally would seem to soften this requirement. Is this not a hard requirement? Eevans (talk) 21:13, 10 August 2016 (UTC)
- That was apparently edited by Gabriel. There seems to be discrepancies about the needs? Could you clarify that? -- Jcrespo 14:00, 16 August 2016 (UTC)
HAProxy back-end monitoring
Under the MySQL description it says that HAProxy "...monitors replication lag...", and "...disables servers that are lagging too severely...". Is there a reference to how this is done? What does disabled mean here, just that client connections will not be forwarded to the lagging machine? Eevans (talk) 21:39, 10 August 2016 (UTC)
- Something that I highly disapprove of is the concept of "failover" moving a service from one place to another. This, for a series of reasons, is a horrible pattern. Proxying and redirecting traffic is a way better design pattern; we use it successfully with our LVS-based architecture, and with haproxy. Proxies are not trouble-free, but in most cases those issues can be workarounded. So yes, proxies can be programmed with any, no matter how difficult check, and act accordingly dynamicaly changing the target of queries. This works specially well for simple systems (we cannot to that transparently on the main production servers because large transactions happen there (up to 24-hour queries), and plus strict serialization of writes is required/assumed. This is an example of how you can make an http proxy lag-aware (the method there is simplistic, we have better tools now, but essentialy the same idea): https://www.percona.com/blog/2014/12/18/making-haproxy-1-5-replication-lag-aware-in-mysql/ -- Jcrespo 13:16, 16 August 2016 (UTC)
Tuesday's (2016-08-16) meeting
I don't know about anyone else, but I didn't find today's meeting terribly useful. I'm not even entirely certain I can say what went wrong; We seemed to go back-and-forth on one or two issues (that may or may not be important), without arriving at consensus on anything. I would submit though, that the process outlined is still sane; In my opinion we should:
- Clearly and unambiguously define the requirements (with justification)
- Objectively weigh the proposals against the requirements
Speaking strictly for myself, all things being equal, I would prefer to go with MySQL (yes, really). I genuinely think that it should be our default in such matters, and the justification for using anything else should be so obvious that the sort of contention we're experiencing here be moot. That this isn't cut-and-dry then would seem to indicate some imprecision in defining the problem(s) we are trying to solve.
I propose we double-down on the requirements, make them as clear, concise, and detailed as possible (with input from from Performance and Security), before continuing.
Replaced "Other" section with "Operations"
Jcrespo wrote: What about "team expertise"? Reliability? Current storage used? TCO? Availablity (other than nodes down)? Performance in terms of latency and not only thoughput?. To this end I replaced the Other section with one entitled Operations, and attempted to enumerate what seemed like the distinct points into sub-sections. It's just a stub though, they will need to be fleshed out.
The section on consistency
When I added the per-operation sub-sections (login, read, delete, etc), it was my intention to remove the opening text entirely. In other words, I was hoping to make the requirements more specific by calling out what they were for each operation (along with any reasoning or justification). If possible, please try to incorporate any changes to that section into the per-op subsections so that we can remove the opening material.
POST vs GET
Sessions are usually not created on POST. Login requires a session when the login form is displayed (for CSRF and AuthManager). Immutable sessions (where the association between the user and the authentication information in the request cannot be changed, e.g. OAuth) need to be created on an arbitrary request. CentralAuth has various session-creation methods (invisible pixel, redirect chain) which are also GET-based. The "remember me" case is already mentioned. The only case of POST session creation that I can think of is anonymous edits.
Also, while the final POST of login sequence does always create a new session (due to session id reset), there is no guarantee that there is a final POST. When using an external authentication service which works with browser redirects (e.g. Google login), everything is a GET since browsers do not reliably support POST redirects. Wikimedia servers currently do not support any external login provider, but it does not seem far-fetched that it could happen in the future.
Logouts are normally GETs (except API-based ones). Session deletion is needed for id resets as well (e.g. after authentication data change); that's normally a POST. IMO there is not much point in exposing session deletion failure to the user; we already delete cookies and on SUL wikis change the user token, and there is not much end users can do about it anyway. What's important is that a session write should never overwrite a session delete even if it happens later. --tgr (talk) 07:50, 13 September 2016 (UTC)
Also note that GET-then-POST cross-DC consistency will be a common requirement for action API clients which need a GET to obtain a CSRF token (which might create the session, or update it with the new CSRF token) and then will immediately POST to perform an action with the same token. The session creation/update should either be replicated from the secondary DC to the primary one in less time than a HTTP roundtrip (which might come from Tool Labs or some other colocated source) or the first GET should be delayed while the replication is happening. E.g. Cassandra with LOCAL_QUORUM writes does not guarantee this. --tgr (talk) 07:50, 13 September 2016 (UTC)
Cookie handling is not always a requirement for having sessions
T140813 mentions that it should not possible even with a remote code executability on a MediaWiki box to list sessions (ie. access data of a session without knowing its ID). The goals here do not mention that. Is that still a requirement? It seems like some sort of custom service would be the only way to do that. --tgr (talk) 07:55, 13 September 2016 (UTC)