You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Talk:Shared Data Platform: Difference between revisions
(Created page with "First")
No edit summary
|(2 intermediate revisions by one other user not shown)|
|Line 1:||Line 1:|
Latest revision as of 20:58, 18 May 2022
I think it would be worth discussing RESTBase at greater length, since RESTBase + ChangeProp was originally proposed as exactly the sort of event-driven data store which is being proposed here.
The other big question I'd like to have answered is: how do we migrate everything to a new platform? Ie, in the end we don't want "one more storage system", but instead (at least) "one less storage system" since (hopefully) one or more existing systems can be replaced by the new thing. But how do we get there from here? Are there any unique capabilities of existing systems which can't be efficiently replicated by a MVP for the new Shared Data Platform? In my experience, these unique features are what tend to prevent us from ever fully replacing an existing system with a new system, and leads to the present proliferation of storage systems accurately described by this document.
One somewhat basic question: what is the proposed data lifetime for this platform? Most of our existing storage systems (including RESTBase) started out with grand dreams of maintaining historical as well as current data, but then got descoped when the store requirements of this plan became obvious. Is this a datastore for 'the current state of the wiki', including events when that current state changes, or is it a datastore for 'the historical state of the wiki', including archival analytics data and derived content from article revisions no longer current -- or something else? C. Scott Ananian (talk) 01:33, 1 April 2022 (UTC)
Internal vs. External vs. Privacy
I read this document as primarily addressing internal users (e.g. analytics, MW hosting) but there are occasional nods to data syndication. Do you imagine this platform would be used for both internal and external consumers? If so, I suspect we need to design privacy layers into the system from the ground up; it /might/ work to just bolt a redaction layer on at the last minute, in the style of the wiki replicas used by WMCS but no one seems to much like the way we're handling that now.