You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Talk:Shared Data Platform: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>C. Scott Ananian
(→‎RESTBase?: new section)
imported>Andrew Bogott
No edit summary
 
(One intermediate revision by one other user not shown)
Line 8: Line 8:


One somewhat basic question: what is the proposed data lifetime for this platform?  Most of our existing storage systems (including RESTBase) started out with grand dreams of maintaining historical as well as current data, but then got descoped when the store requirements of this plan became obvious.  Is this a datastore for 'the current state of the wiki', including events when that current state changes, or is it a datastore for 'the historical state of the wiki', including archival analytics data and derived content from article revisions no longer current -- or something else?
One somewhat basic question: what is the proposed data lifetime for this platform?  Most of our existing storage systems (including RESTBase) started out with grand dreams of maintaining historical as well as current data, but then got descoped when the store requirements of this plan became obvious.  Is this a datastore for 'the current state of the wiki', including events when that current state changes, or is it a datastore for 'the historical state of the wiki', including archival analytics data and derived content from article revisions no longer current -- or something else?
[[User:C. Scott Ananian|C. Scott Ananian]] ([[User talk:C. Scott Ananian|talk]]) 01:33, 1 April 2022 (UTC)
== Internal vs. External vs. Privacy ==
I read this document as primarily addressing internal users (e.g. analytics, MW hosting) but there are occasional nods to data syndication. Do you imagine this platform would be used for both internal and external consumers? If so, I suspect we need to design privacy layers into the system from the ground up; it /might/ work to just bolt a redaction layer on at the last minute, in the style of the wiki replicas used by WMCS but no one seems to much like the way we're handling that now.

Latest revision as of 20:58, 18 May 2022

First

RESTBase?

I think it would be worth discussing RESTBase at greater length, since RESTBase + ChangeProp was originally proposed as exactly the sort of event-driven data store which is being proposed here.

The other big question I'd like to have answered is: how do we migrate everything to a new platform? Ie, in the end we don't want "one more storage system", but instead (at least) "one less storage system" since (hopefully) one or more existing systems can be replaced by the new thing. But how do we get there from here? Are there any unique capabilities of existing systems which can't be efficiently replicated by a MVP for the new Shared Data Platform? In my experience, these unique features are what tend to prevent us from ever fully replacing an existing system with a new system, and leads to the present proliferation of storage systems accurately described by this document.

One somewhat basic question: what is the proposed data lifetime for this platform? Most of our existing storage systems (including RESTBase) started out with grand dreams of maintaining historical as well as current data, but then got descoped when the store requirements of this plan became obvious. Is this a datastore for 'the current state of the wiki', including events when that current state changes, or is it a datastore for 'the historical state of the wiki', including archival analytics data and derived content from article revisions no longer current -- or something else? C. Scott Ananian (talk) 01:33, 1 April 2022 (UTC)

Internal vs. External vs. Privacy

I read this document as primarily addressing internal users (e.g. analytics, MW hosting) but there are occasional nods to data syndication. Do you imagine this platform would be used for both internal and external consumers? If so, I suspect we need to design privacy layers into the system from the ground up; it /might/ work to just bolt a redaction layer on at the last minute, in the style of the wiki replicas used by WMCS but no one seems to much like the way we're handling that now.