imported>Avicennasis |
imported>Aklapper |
(42 intermediate revisions by 26 users not shown) |
Line 1: |
Line 1: |
| '''MySQL queries''' allow you to pull data from the Tool Labs replicated databases. | | '''MySQL queries''' against the [[Wiki Replicas]] databases allow you to search and collect metadata about pages and actions on the Wikimedia movement's public wikis. Also consider the [[m:Research:Quarry|Quarry]] web interface as an alternative for one-off queries. |
| (The original page has been imported from [//wiki.toolserver.org/view/MySQL_queries Toolserver MySQL queries])
| |
|
| |
|
| == Database layout == | | == Database layout == |
| The database layout is available at the MediaWiki wiki: [[mw:Manual:Database layout]]. | | The database layout is available at [[mw:Manual:Database layout]]. |
|
| |
|
| A dump of the currently running database layouts can be found [//tools.wmflabs.org/tools-info/schemas.php?schema=enwiki here]. | | A dump of the currently running database layouts can be found [//tools.wmflabs.org/tools-info/schemas.php?schema=enwiki here]. |
Line 10: |
Line 9: |
|
| |
|
| === Slices === | | === Slices === |
| The various databases are stored in slices. The slices are named with a preceding 's' and a digit, e.g., s1, s2, etc. followed by the internal domain name 'labsdb' (e.g. s1.labsdb). A table of this information is available here: [//tools.wmflabs.org/tools-info/?listmetap List meta_p.wiki]. | | The various databases are stored in 'slices'. The slices are named with a leading 's' and a digit—for example s1, s2, etc.—followed by the internal domain name '{analytics,web}.db.svc.wikimedia.cloud' (e.g. s1.analytics.db.svc.wikimedia.cloud and s1.web.db.svc.wikimedia.cloud). A table of this information is available here: [//tools.wmflabs.org/tools-info/?listmetap List meta_p.wiki]. |
|
| |
|
| === Data storage === | | === Data storage === |
Line 16: |
Line 15: |
|
| |
|
| # Page titles use underscores and never include the namespace prefix. (eg: page_title='The_Lord_of_the_Rings') | | # Page titles use underscores and never include the namespace prefix. (eg: page_title='The_Lord_of_the_Rings') |
| # User names use spaces, not underscores. (eg: rev_user_text='Jimbo Wales' | | # User names use spaces, not underscores. (eg: actor_name='Jimbo Wales') |
| # Namespaces are integers. A key to the integers is available [[mw:Manual:Namespace|here]]. <!-- The localized names for the namespaces can be obtained from the [[Toolserver database]]. This was on Toolserver --> | | # Namespaces are integers. A key to the integers is available [[mw:Manual:Namespace|here]]. <!-- The localized names for the namespaces can be obtained from the [[Toolserver database]]. This was on Toolserver --> |
|
| |
|
| === Views === | | === Views === |
| Tool Labs has exact replicas of Wikimedia's databases, however, certain information is restricted to Tool Labs users using MySQL views.
| | Toolforge has exact replicas of Wikimedia's databases, however, certain information is restricted using MySQL views. |
|
| |
|
| For example, the user table view does not show things like user_password or user_email to Tool Labs users. You can only access data from the public (redacted) databases marked as _p (eg: enwiki_p). | | For example, the user table view does not show things like user_password or user_email to Toolforge users. You can only access data from the public (redacted) databases marked as _p (eg: enwiki_p). |
|
| |
|
| === Alternative Views === | | === Alternative views === |
| There are some alternative views in which the data from the underlying tables are redacted in a different way, so that the corresponding indices can be used. If you utilize the listed columns in your queries (especially in WHERE clause or ORDER BY statement) it is recommended to use these alternative views. This will speed up things quite a lot. | | There are some alternative views in which the data from the underlying tables are redacted in a different way, so that the corresponding indices can be used. If you utilize the listed columns in your queries (especially in WHERE clause or ORDER BY statement) it is recommended to use these alternative views. This will speed up things quite a lot. |
|
| | |
| {| class=wikitable | | {| class=wikitable |
| ! columns | | ! columns |
Line 32: |
Line 31: |
| ! alternative view (recommended) | | ! alternative view (recommended) |
| |- | | |- |
| | ar_user || rowspan=2 | archive || rowspan=2 | archive_userindex | | | ar_actor || archive || archive_userindex |
| |-
| |
| | ar_user_text
| |
| |-
| |
| | rev_user || rowspan=2 | revision || rowspan=2 | revision_userindex
| |
| |- | | |- |
| | rev_user_text | | | fa_actor || filearchive || filearchive_userindex |
| |- | | |- |
| | log_namespace || rowspan=3 | logging || rowspan=3 | logging_logindex | | | log_namespace || rowspan=3 | logging || rowspan=3 | logging_logindex |
Line 46: |
Line 41: |
| | log_page | | | log_page |
| |- | | |- |
| | log_user || rowspan=2 | logging || rowspan=2 | logging_userindex | | | log_actor || logging || logging_userindex |
| |-
| |
| | log_user_text
| |
| |-
| |
| | oi_user || rowspan=2 | oldimage || rowspan=2 | oldimage_userindex
| |
| |-
| |
| | oi_user_text
| |
| |-
| |
| | rc_user || rowspan=2 | recentchanges || rowspan=2 | recentchanges_userindex
| |
| |- | | |- |
| | rc_user_text | | | oi_actor || oldimage || oldimage_userindex |
| |- | | |- |
| | rev_user || rowspan=2 | revision || rowspan=2 | revision_userindex | | | rc_actor || recentchanges || recentchanges_userindex |
| |- | | |- |
| | rev_user_text | | | rev_actor || revision || revision_userindex |
| |} | | |} |
|
| |
|
| === Federated Tables ===
| | {{note|There is also an ipblocks_ipindex view, but it’s unclear which columns it’s optimized for.}} |
| To enable access data across different slices (eg: enwiki and commonswiki) from within MySQL, there are so called federated tables which are stored in schemas marked as '_f_p'. An example is shown in [[#File_description_pages_without_an_associated_file]]
| | {{note|Also see [[News/Actor storage changes on the Wiki Replicas#The actor table seems really slow--so does comment|News/Actor storage changes on the Wiki Replicas]] for additional information about slow performance using the actor and/or comment tables}} |
|
| |
|
| Right now there exist federated tables for the following wikis:
| | {{remark| |
| * commonswiki_f_p (on every slice)
| | ; Technical bit |
| * wikidatawiki_f_p (on every slice)
| |
|
| |
|
| Federated tables are set up by MySQL admin's. If you need an additional one, you may request it in bugzilla. Further information can be found at [//dev.mysql.com/doc/refman/5.5/en/federated-storage-engine.html The Federated Storage Engine]
| | The script that creates and maintains the indexes used by the alternative views is called [[gerrit:plugins/gitiles/operations/puppet/+/production/modules/profile/files/labs/db/views/maintain_replica_indexes.py | maintain_replica_indexes.py]]. This script uses the definitions found in [[gerrit:plugins/gitiles/operations/puppet/+/production/modules/profile/templates/wmcs/db/wikireplicas/index-conf.yaml | index-conf.yaml]] to generate those indexes. That last file is where you can learn, for instance, that an additional index called <code>log_actor_deleted</code> is being created on the <code>logging</code> table using <code>log_actor, log_deleted</code> columns. |
| | |
| | The alternative tables boost the queries not just using indexes; they also do so by subsetting the original table. For instance, <code>actor_revision</code> table only includes those actors that match a row in the <code>revision</code> table's <code>rev_actor</code> column. The script that populates the alternative views can be found in [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/files/wmcs/db/wikireplicas/views/maintain-views.py maintain-views.py]; this script uses [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml maintain-views.yaml] to populate those views. The last file is where you can find the definition of <code>actor_revision</code> table, for instance. |
| | |reminder}} |
|
| |
|
| === Wiki text === | | === Wiki text === |
| Unfortunately there is no way to access the wikitext directly via MySQL replica databases. You have to use the the [[mw:API]] instead. | | Unfortunately there is no way to access the wikitext directly via MySQL replica databases. You have to use the [[mw:API]] instead. |
|
| |
|
| == Accessing the databases == | | == Accessing the databases == |
Line 82: |
Line 71: |
|
| |
|
| $ sql enwiki_p | | $ sql enwiki_p |
| This command will connect to the appropriate cluster (in this case, s1.labsdb) and give you a MySQL prompt where you can run queries. | | This command will connect to the appropriate cluster (in this case, s1.analytics.db.svc.wikimedia.cloud) and give you a MySQL prompt where you can run queries. |
| $ sql enwiki_p < test-query.sql > test-query.txt | | $ sql enwiki_p < test-query.sql > test-query.txt |
| This command takes a .sql file that contains your query, selects the appropriate slice, runs the query, and outputs to a text file. The advantage here is that it doesn't add lines around the data (making a table), it instead outputs the data in a tab-delimited format. | | This command takes a .sql file that contains your query, selects the appropriate slice, runs the query, and outputs to a text file. The advantage here is that it doesn't add lines around the data (making a table), it instead outputs the data in a tab-delimited format. |
Line 90: |
Line 79: |
| === mysql command === | | === mysql command === |
| If you wish to use the mysql command directly, a sample query would look like this: | | If you wish to use the mysql command directly, a sample query would look like this: |
| $ mysql --defaults-file=~/replica.my.cnf -h s1.labsdb enwiki_p -e "SHOW databases;" | | $ mysql --defaults-file=~/replica.my.cnf -h s1.analytics.db.svc.wikimedia.cloud enwiki_p -e "SHOW databases;" |
| The --defaults-file option points to the file replica.my.cnf where your MySQL credentials are stored (username, password). The -h option tells MySQL which host to access (in this case s1.labsdb) and enwiki_p is the database you want to access. The -e option tells MySQL to execute the following command enclosed in quotes. You can also have MySQL output the results to a file. | | The --defaults-file option points to the file replica.my.cnf where your MySQL credentials are stored (username, password). The -h option tells MySQL which host to access (in this case s1.analytics.db.svc.wikimedia.cloud) and enwiki_p is the database you want to access. The -e option tells MySQL to execute the following command enclosed in quotes. You can also have MySQL output the results to a file. |
| $ mysql --defaults-file=~/replica.my.cnf -h s1.labsdb enwiki_p < test-query.sql > test-query.txt | | $ mysql --defaults-file=~/replica.my.cnf -h s1.analytics.db.svc.wikimedia.cloud enwiki_p < test-query.sql > test-query.txt |
|
| |
|
| === mysql command with default === | | === mysql command with default === |
| Another way to use the mysql command directly but without the need to specify the ''--defaults-file'' option each time, is to copy your replica.my.cnf to .my.cnf ''once''. | | Another way to use the mysql command directly but without the need to specify the ''--defaults-file'' option each time, is to copy your replica.my.cnf to .my.cnf ''once''. |
| $ cp replica.my.cnf .my.cnf | | $ ln -s replica.my.cnf .my.cnf |
| From that point your login data is automatically invoked with mysql command. All subsequent commands then look similar as shown above. | | From that point your login data is automatically loaded by the <code>mysql</code> command. All subsequent commands then look similar as shown above. |
| $ mysql -h s1.labsdb -e "show databases;" | | $ mysql -h s1.analytics.db.svc.wikimedia.cloud -e "show databases;" |
|
| |
|
| == Writing queries == | | == Writing queries == |
| Because the replica databases are read-only, nearly all of the queries you will want to run will be SELECT queries. | | Because the replica databases are read-only, nearly all of the queries you will want to run will be SELECT queries. |
|
| |
|
| <source lang="sql"> | | <syntaxhighlight lang="sql"> |
| SELECT * FROM `user`; | | SELECT * FROM `user`; |
| </source> | | </syntaxhighlight> |
|
| |
|
| This query selects all columns from the user table. More information about MySQL queries are available below (in the example queries) and in the [//dev.mysql.com/doc/refman/5.5/en/select.html MySQL SELECT]. | | This query selects all columns from the user table. More information about MySQL queries are available below (in the example queries) and in the [//dev.mysql.com/doc/refman/5.5/en/select.html MySQL SELECT]. |
Line 111: |
Line 100: |
| Queries end in a semi-colon (;). If you want to cancel the query, end it in \c. If you want to output in a non-table format, use \G. | | Queries end in a semi-colon (;). If you want to cancel the query, end it in \c. If you want to output in a non-table format, use \G. |
|
| |
|
| == meta_p database == | | == Optimizing queries == |
| The "meta_p" database contains metadata about the various wikis and databases. It consists of one table: wiki and it is available on every slice (s1–s6.labsdb).
| | {{Tracked|T50875|Resolved}} |
| | To optimize queries, you can ask the MySQL server how the query is executed (a so-called EXPLAIN query). Because of the permissions restrictions on the Wiki Replica servers, a simple [https://mariadb.com/kb/en/library/explain/ EXPLAIN] query will not work. There is however a workaround using MariaDB's [https://mariadb.com/kb/en/library/show-explain/ SHOW EXPLAIN FOR]: |
|
| |
|
| [//tools.wmflabs.org/tools-info/?listmetap List meta_p.wiki]
| | # Open 2 SQL sessions connected to the same backend database server |
| | # In session 1: |
| | #* <code>SELECT CONNECTION_ID() AS conid;</code> |
| | #* Note the number returned. |
| | #* Run the query to be explained. |
| | # In session 2: |
| | #* Use the number you found above for <conid>. |
| | #* <code>SHOW EXPLAIN FOR <conid>;</code> |
|
| |
|
| == Example queries ==
| | If the rightmost column contains 'using filesort', your query is likely to take a long time to execute. Review whether the correct index is used, and consider switching to one of the alternative views. |
| :[[:en:Wikipedia:Database reports]] (and its equivalents in other languages) contain many useful examples.
| |
| :[//tools.wmflabs.org/tsreports/?wiki=http%3A%2F%2Fen.wikipedia.org tsreports] is another source of valuable suggestions
| |
| :{{done}} (Tool Labs) = These scripts have been tested (and if necessary modified) for Tool Labs
| |
| === Atypical log entries ===
| |
| <source lang="sql">
| |
| SELECT
| |
| user_name,
| |
| log_namespace,
| |
| log_timestamp,
| |
| log_action,
| |
| log_title,
| |
| log_comment
| |
| FROM logging
| |
| JOIN `user`
| |
| ON log_user = user_id
| |
| WHERE log_type='delete'
| |
| AND log_action != 'restore'
| |
| AND log_action != 'delete';
| |
| </source>
| |
| '''Explanation''': This pulls log entries from the deletion log that aren't restore or delete actions. {{done}} (Tool Labs)
| |
|
| |
|
| === Broken redirects ===
| | A helpful tool for this is the online [[toolforge:sql-optimizer|SQL Optimizer]], which will display the execution plan for a given query. [[:quarry:|Quarry]] can also show the plan for a currently executing query. |
| : See also source code of [http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/specials/SpecialBrokenRedirects.php?revision=71083&view=markup#l44 Special:BrokenRedirects]
| |
| <source lang="sql">
| |
| SELECT
| |
| p1.page_namespace,
| |
| p1.page_title
| |
| FROM redirect AS rd
| |
| JOIN page p1
| |
| ON rd.rd_from = p1.page_id
| |
| LEFT JOIN page AS p2
| |
| ON rd_namespace = p2.page_namespace
| |
| AND rd_title = p2.page_title
| |
| WHERE rd_namespace >= 0
| |
| AND p2.page_namespace IS NULL
| |
| ORDER BY p1.page_namespace ASC;
| |
| </source>
| |
| '''Explanation''': This pulls all broken redirects. {{done}}/modified (Tool Labs)
| |
|
| |
|
| === Cross-namespace redirects ===
| | {{Note|The Wiki Replica databases are split out between sets of separate mysql instances. To make sure you are on the same server instance, you should connect to the same database URL with both connections when trying <code>SHOW EXPLAIN FOR</code>. Successfully using <code>SHOW EXPLAIN FOR</code> requires both sessions to be connected to the same instance.}} |
| <source lang="sql"> | |
| SELECT
| |
| pt.page_namespace,
| |
| pf.page_title,
| |
| rd_title
| |
| FROM redirect, page AS pf, page AS pt
| |
| WHERE pf.page_namespace = 0
| |
| AND rd_title = pt.page_title
| |
| AND rd_namespace = pt.page_namespace
| |
| AND pt.page_namespace != 0
| |
| AND rd_from = pf.page_id
| |
| AND pf.page_namespace = 0;
| |
| </source> | |
| '''Explanation''': This pulls redirects from (Main) to any other namespace. {{done}} /modified (Tool Labs)
| |
|
| |
|
| === Deleted red-linked categories === | | == meta_p database == |
| <source lang="sql">
| | The "meta_p" database contains metadata about the various wikis and databases. It consists of one table: wiki and it is available on every slice (s1–s8.analytics.db.svc.wikimedia.cloud). |
| SELECT
| |
| cattmp.cl_to,
| |
| cattmp.cl_count,
| |
| user_name,
| |
| log_timestamp,
| |
| log_comment
| |
| FROM logging
| |
| JOIN `user` ON log_user = user_id
| |
| JOIN
| |
| (SELECT
| |
| cl_to,
| |
| COUNT(cl_to) AS cl_count
| |
| FROM categorylinks
| |
| LEFT JOIN page ON cl_to = page_title
| |
| AND page_namespace = 14
| |
| WHERE page_title IS NULL
| |
| GROUP BY cl_to) AS cattmp
| |
| ON log_title = cattmp.cl_to
| |
| WHERE log_namespace = 14
| |
| AND log_type = "delete"
| |
| AND log_timestamp = (SELECT
| |
| MAX(log_timestamp)
| |
| FROM logging AS last
| |
| WHERE log_namespace = 14
| |
| AND cattmp.cl_to = last.log_title);
| |
| </source>
| |
| '''Explanation''': This pulls non-existent used categories that have previously been deleted. {{done}} (Tool Labs)
| |
|
| |
|
| === Empty categories ===
| | [//tools.wmflabs.org/tools-info/?listmetap List meta_p.wiki] |
| <source lang="sql">
| |
| SELECT
| |
| page_title,
| |
| page_len
| |
| FROM categorylinks
| |
| RIGHT JOIN page ON cl_to = page_title
| |
| WHERE page_namespace = 14
| |
| AND page_is_redirect = 0
| |
| AND cl_to IS NULL
| |
| AND NOT EXISTS (SELECT
| |
| 1
| |
| FROM categorylinks
| |
| WHERE cl_from = page_id
| |
| AND cl_to = 'Wikipedia_category_redirects')
| |
| AND NOT EXISTS (SELECT
| |
| 1
| |
| FROM categorylinks
| |
| WHERE cl_from = page_id
| |
| AND cl_to = 'Disambiguation_categories')
| |
| AND NOT EXISTS (SELECT
| |
| 1
| |
| FROM templatelinks
| |
| WHERE tl_from = page_id
| |
| AND tl_namespace = 10
| |
| AND tl_title = 'Empty_category');
| |
| </source>
| |
| '''Explanation''': This pulls empty categories that aren't in specific categories and don't transclude a specific template. {{done}} Tools Labs
| |
|
| |
|
| === Fully-protected articles with excessively long expiries === | | == Example queries == |
| <source lang="sql">
| | * [[Help:MySQL queries/Example queries]] contains a large collection of examples which were once in this page directly. |
| SELECT
| | * [[:en:Wikipedia:Database reports]] (and its equivalents in other languages) contain many useful examples. |
| page_is_redirect,
| | * [https://quarry.wmflabs.org/query/runs/all?published=true Published queries on Quarry] are another good source of examples (try using [https://www.google.com/search?q=site%3Aquarry.wmflabs.org Google site search] for specific topics) |
| page_title,
| |
| user_name,
| |
| logs.log_timestamp,
| |
| pr_expiry,
| |
| logs.log_comment
| |
| FROM page
| |
| JOIN page_restrictions ON page_id = pr_page
| |
| AND page_namespace = 0
| |
| AND pr_type = 'edit'
| |
| AND pr_level = 'sysop'
| |
| AND pr_expiry > DATE_FORMAT(DATE_ADD(NOW(),INTERVAL 1 MONTH),'%Y%m%d%H%i%s')
| |
| AND pr_expiry != 'infinity'
| |
| LEFT JOIN logging AS logs ON logs.log_title = page_title
| |
| AND logs.log_namespace = 0
| |
| AND logs.log_type = 'protect'
| |
| LEFT JOIN `user` ON logs.log_user = user_id
| |
| WHERE CASE WHEN (NOT ISNULL(log_timestamp))
| |
| THEN log_timestamp = (SELECT MAX(last.log_timestamp)
| |
| FROM logging AS last
| |
| WHERE log_title = page_title
| |
| AND log_namespace = 0
| |
| AND log_type = 'protect')
| |
| ELSE 1 END;
| |
| </source>
| |
| '''Explanation''': Articles that are fully-protected from editing for more than one month. {{done}} /modified (Tool Labs)
| |
| | |
| === Excessively long IP blocks ===
| |
| <source lang="sql">
| |
| SELECT
| |
| ipb_address,
| |
| ipb_by_text,
| |
| ipb_timestamp,
| |
| ipb_expiry,
| |
| ipb_reason
| |
| FROM ipblocks
| |
| WHERE ipb_expiry > DATE_FORMAT(DATE_ADD(NOW(),INTERVAL 2 YEAR),'%Y%m%d%H%i%s')
| |
| AND ipb_expiry != "infinity"
| |
| AND ipb_user = 0;
| |
| </source>
| |
| '''Explanation''': Blocks of anonymous users that are longer than two years (but not indefinite). {{done}} (Tool Labs)
| |
| | |
| === Indefinitely fully-protected articles ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_is_redirect,
| |
| page_title,
| |
| user_name,
| |
| logs.log_timestamp,
| |
| logs.log_comment
| |
| FROM page
| |
| JOIN page_restrictions ON page_id = pr_page
| |
| AND page_namespace = 0
| |
| AND pr_type = 'edit'
| |
| AND pr_level = 'sysop'
| |
| AND pr_expiry = 'infinity'
| |
| LEFT JOIN logging AS logs ON logs.log_title = page_title
| |
| AND logs.log_namespace = 0
| |
| AND logs.log_type = 'protect'
| |
| LEFT JOIN `user` ON logs.log_user = user_id
| |
| WHERE CASE WHEN (NOT ISNULL(log_timestamp))
| |
| THEN log_timestamp = (SELECT MAX(last.log_timestamp)
| |
| FROM logging AS last
| |
| WHERE log_title = page_title
| |
| AND log_namespace = 0
| |
| AND log_type = 'protect')
| |
| ELSE 1 END;
| |
| </source>
| |
| '''Explanation''': Articles indefinitely fully-protected from editing. {{done}} (Tool Labs)
| |
| | |
| === Long pages ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| page_title,
| |
| page_len
| |
| FROM page
| |
| WHERE page_len > 175000
| |
| AND page_title NOT LIKE "%/%"
| |
| ORDER BY page_namespace ASC;
| |
| </source>
| |
| '''Explanation''': Pages over 175,000 bytes in length; excludes titles with "/" in them (to avoid archives, etc.). {{done}} /modified (Tool Labs)
| |
| | |
| === Redirects obscuring page content ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| page_title,
| |
| page_len
| |
| FROM page
| |
| WHERE page_is_redirect = 1
| |
| HAVING page_len > 449
| |
| ORDER BY page_namespace ASC;
| |
| </source>
| |
| '''Explanation''': Redirects with large page lengths. This usually indicates there is text below the redirect that should not be there. {{done}} /modified (Tool Labs)
| |
| | |
| === Mistagged non-free content ===
| |
| <source lang="sql">
| |
| SELECT
| |
| DISTINCT(enwiki_p.page.page_title),
| |
| commonswiki_p.image.img_name
| |
| FROM enwiki_p.image, commonswiki_p.image, enwiki_p.categorylinks, enwiki_p.page
| |
| WHERE enwiki_p.image.img_sha1 = commonswiki_p.image.img_sha1
| |
| AND enwiki_p.page.page_title = enwiki_p.image.img_name
| |
| AND enwiki_p.categorylinks.cl_from = enwiki_p.page.page_id
| |
| AND enwiki_p.categorylinks.cl_to = 'All_non-free_media'
| |
| AND enwiki_p.image.img_sha1 != 'phoiac9h4m842xq45sp7s6u21eteeq1';
| |
| </source>
| |
| '''Explanation''': This pulls files on a local wiki that are in non-free category, but also exist at Commons. This indicates that either the local image or the Commons image should be deleted. It excludes the SHA1 empty string due to bad database rows.
| |
| | |
| === Pages with the most revisions ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| ns_name,
| |
| page_title,
| |
| COUNT(*)
| |
| FROM revision
| |
| JOIN page ON page_id = rev_page
| |
| JOIN toolserver.namespace ON page_namespace = ns_id
| |
| AND dbname = 'enwiki_p'
| |
| GROUP BY page_namespace, page_title
| |
| ORDER BY COUNT(*) DESC
| |
| LIMIT 1000;
| |
| </source>
| |
| '''Explanation''': This pulls the pages with the most revisions. On large wikis, it can take several hours (or days) to run.
| |
| | |
| === Page counts by namespace ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| MAX(notredir),
| |
| MAX(redir)
| |
| FROM (
| |
| SELECT page.page_namespace,
| |
| IF( page_is_redirect, COUNT(page.page_namespace), 0 ) AS redir,
| |
| IF( page_is_redirect, 0, COUNT(page.page_namespace)) AS notredir
| |
| FROM page
| |
| GROUP BY page_is_redirect, page_namespace
| |
| ORDER BY page_namespace, page_is_redirect
| |
| ) AS pagetmp
| |
| GROUP BY page_namespace;
| |
| </source>
| |
| '''Explanation''': This pulls the number of redirects and non-redirects in each namespace. {{done}} /modified (Tool Labs)
| |
| | |
| === Orphaned talk pages ===
| |
| <source lang="sql">
| |
| SELECT
| |
| p1.page_namespace,
| |
| p1.page_title
| |
| FROM page AS p1
| |
| WHERE p1.page_title NOT LIKE "%/%"
| |
| AND p1.page_namespace NOT IN (0,2,3,4,6,8,9,10,12,14,16,18,100,102,104)
| |
| AND CASE WHEN p1.page_namespace = 1
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 0
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 5
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 4
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 7
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 6
| |
| AND p1.page_title = p2.page_title)
| |
| AND NOT EXISTS (SELECT
| |
| 1
| |
| FROM commonswiki_f_p.page AS p2
| |
| WHERE p2.page_namespace = 6
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 11
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 10
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 13
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 12
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 15
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 14
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 17
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 16
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND CASE WHEN p1.page_namespace = 101
| |
| THEN NOT EXISTS (SELECT
| |
| 1
| |
| FROM page AS p2
| |
| WHERE p2.page_namespace = 100
| |
| AND p1.page_title = p2.page_title)
| |
| ELSE 1 END
| |
| AND p1.page_id NOT IN (SELECT
| |
| page_id
| |
| FROM page
| |
| JOIN templatelinks
| |
| ON page_id = tl_from
| |
| WHERE tl_title="G8-exempt"
| |
| AND tl_namespace = 10)
| |
| AND p1.page_id NOT IN (SELECT
| |
| page_id
| |
| FROM page
| |
| JOIN templatelinks
| |
| ON page_id = tl_from
| |
| WHERE tl_title="Go_away"
| |
| AND tl_namespace = 10)
| |
| AND p1.page_id NOT IN (SELECT
| |
| page_id
| |
| FROM page
| |
| JOIN templatelinks
| |
| ON page_id = tl_from
| |
| WHERE tl_title="Rtd"
| |
| AND tl_namespace = 10);
| |
| </source>
| |
| '''Explanation''': This (very, very hackishly) pulls all pages in the talk namespaces that don't have a corresponding subject-space page. It JOINs against Commons to ensure that File_talk: pages are truly orphaned. It also has some en.wiki-specific template checks in it. {{done}} /modified (Tool Labs)
| |
| | |
| === Ownerless pages in the user space ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| page_title,
| |
| page_len
| |
| FROM page
| |
| LEFT JOIN `user`
| |
| ON user_name = REPLACE(page_title, '_', ' ')
| |
| WHERE page_namespace IN (2,3)
| |
| AND page_is_redirect = 0
| |
| AND page_title NOT LIKE "%/%"
| |
| AND page_title NOT RLIKE "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
| |
| AND user_name IS NULL;
| |
| </source>
| |
| '''Explanation''': This pulls all User: and User_talk: pages not belonging to a registered user. Pages belonging to an anonymous user are excluded. {{done}} /modified (Tool Labs)
| |
| :This doesn't exclude IPv6 addresses.
| |
| | |
| === Polluted categories ===
| |
| <source lang="sql">
| |
| SELECT DISTINCT
| |
| cl_to
| |
| FROM categorylinks AS cat
| |
| JOIN page AS pg1
| |
| ON cat.cl_from = pg1.page_id
| |
| WHERE page_namespace = 2
| |
| AND EXISTS (SELECT
| |
| 1
| |
| FROM page AS pg2
| |
| JOIN categorylinks AS cl
| |
| ON pg2.page_id = cl.cl_from
| |
| WHERE pg2.page_namespace = 0
| |
| AND cat.cl_to = cl.cl_to)
| |
| AND cl_to NOT IN (SELECT
| |
| page_title
| |
| FROM page
| |
| JOIN templatelinks
| |
| ON tl_from = page_id
| |
| WHERE page_namespace = 14
| |
| AND tl_namespace = 10
| |
| AND tl_title = 'Pollutedcat')
| |
| LIMIT 250;
| |
| </source>
| |
| '''Explanation''': This pulls categories that contain pages in the (Main) namespace and the User: namespace. Generally categories hold one or the other. {{done}} (Tool Labs)
| |
| | |
| === Categories categorized in red-linked categories ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title,
| |
| cl_to
| |
| FROM page
| |
| JOIN
| |
| (SELECT
| |
| cl_to,
| |
| cl_from
| |
| FROM categorylinks
| |
| LEFT JOIN page ON cl_to = page_title
| |
| AND page_namespace = 14
| |
| WHERE page_title IS NULL) AS cattmp
| |
| ON cattmp.cl_from = page_id
| |
| WHERE page_namespace = 14;
| |
| </source>
| |
| '''Explanation''': This pulls categories categorized in red-linked categories. {{done}} (Tool Labs)
| |
| | |
| === Articles containing red-linked files ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title,
| |
| il_to
| |
| FROM page
| |
| JOIN imagelinks
| |
| ON page_id = il_from
| |
| WHERE (NOT EXISTS(
| |
| SELECT
| |
| 1
| |
| FROM image
| |
| WHERE img_name = il_to))
| |
| AND (NOT EXISTS(
| |
| SELECT
| |
| 1
| |
| FROM commonswiki_f_p.page
| |
| WHERE page_title = il_to
| |
| AND page_namespace = 6))
| |
| AND page_namespace = 0;
| |
| </source>
| |
| '''Explanation''': This pulls articles that contain red-linked images. It checks both Commons and the local wiki. {{done}} /modified (Tool Labs)
| |
| | |
| === Self-categorized categories ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title,
| |
| cat_pages,
| |
| cat_subcats
| |
| FROM page
| |
| JOIN categorylinks ON cl_to = page_title
| |
| RIGHT JOIN category
| |
| ON cat_title = page_title
| |
| WHERE page_id = cl_from
| |
| AND page_namespace = 14;
| |
| </source>
| |
| '''Explanation''': This pulls self-categorized categories. {{done}} (Tool Labs)
| |
| | |
| === Uncategorized categories ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title,
| |
| page_len,
| |
| cat_pages,
| |
| rev_timestamp,
| |
| rev_user_text
| |
| FROM revision
| |
| JOIN
| |
| (SELECT
| |
| page_id,
| |
| page_title,
| |
| page_len,
| |
| cat_pages
| |
| FROM category
| |
| RIGHT JOIN page ON cat_title = page_title
| |
| LEFT JOIN categorylinks ON page_id = cl_from
| |
| WHERE cl_from IS NULL
| |
| AND page_namespace = 14
| |
| AND page_is_redirect = 0) AS pagetmp
| |
| ON rev_page = pagetmp.page_id
| |
| AND rev_timestamp = (SELECT MAX(rev_timestamp)
| |
| FROM revision AS last
| |
| WHERE last.rev_page = pagetmp.page_id);
| |
| </source>
| |
| '''Explanation''': This pulls uncategorized categories.
| |
| | |
| === Pages in a specific category using a specific template ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title
| |
| FROM page
| |
| JOIN templatelinks
| |
| ON tl_from = page_id
| |
| JOIN categorylinks
| |
| ON cl_from = page_id
| |
| WHERE cl_to = "Living_people"
| |
| AND tl_namespace = 10
| |
| AND tl_title = "Fact"
| |
| AND page_namespace = 0
| |
| LIMIT 500;
| |
| </source>
| |
| '''Explanation''': This pulls pages that are in a specific category and are using a specific template.
| |
| | |
| === Top edit timestamp for a category of users ===
| |
| <source lang="sql">
| |
| SELECT
| |
| rev_user_text,
| |
| rev_timestamp
| |
| FROM revision
| |
| JOIN (SELECT
| |
| page_title
| |
| FROM page
| |
| JOIN categorylinks
| |
| ON cl_from = page_id
| |
| WHERE cl_to = 'Category_name_goes_here'
| |
| AND page_namespace = 2
| |
| AND page_title NOT LIKE '%/%') AS cltmp
| |
| ON REPLACE(cltmp.page_title, '_', ' ') = rev_user_text
| |
| WHERE rev_timestamp = (SELECT
| |
| MAX(rev_timestamp)
| |
| FROM revision
| |
| WHERE rev_user_text = REPLACE(cltmp.page_title, '_', ' '));
| |
| </source>
| |
| '''Explanation''': This will take the user pages that do not contain a forward slash ("/") in "Category_name_goes_here" and get the top edit timestamp for each user.
| |
| | |
| === Top pages by in a specific namespace for a specific user ===
| |
| <source lang="sql">
| |
| SELECT
| |
| `page_title`,
| |
| COUNT(*)
| |
| FROM `revision`
| |
| JOIN `page`
| |
| ON `page_id` = `rev_page`
| |
| JOIN `user`
| |
| ON `user_id` = `rev_user`
| |
| WHERE `user_name` = 'User name'
| |
| AND `page_namespace` = 4
| |
| GROUP BY `page_title`
| |
| ORDER BY COUNT(*) DESC
| |
| LIMIT 25;
| |
| </source>
| |
| '''Explanation''': This will pull the page titles of a the most-edited pages by a specific user in a specific namespace.
| |
| | |
| === Number of deletions per day for a specific user ===
| |
| <source lang="sql">
| |
| SELECT
| |
| DATE( CONCAT( YEAR( log_timestamp), "-", MONTH( log_timestamp ), "-", DAY( log_timestamp ) ) ) AS day,
| |
| COUNT(log_timestamp) AS deletions
| |
| FROM logging
| |
| JOIN `user`
| |
| ON log_user = user_id
| |
| WHERE user_name = 'User name'
| |
| AND log_type = 'delete'
| |
| AND log_action = 'delete'
| |
| GROUP BY day;
| |
| </source>
| |
| '''Explanation''': This will group the number of deletions per day by a specific user.
| |
| | |
| === Number of deletions per day ===
| |
| <source lang="sql">
| |
| SELECT
| |
| DATE(CONCAT(YEAR(log_timestamp),"-",MONTH(log_timestamp),"-",DAY(log_timestamp))) AS day,
| |
| COUNT(log_id) AS deletions
| |
| FROM logging
| |
| WHERE log_type = 'delete'
| |
| AND log_action = 'delete'
| |
| GROUP BY day;
| |
| </source>
| |
| '''Explanation''': This will pull the number of deletions per day on a given wiki. {{done}} (Tool Labs)
| |
| | |
| === Most common deletion summaries ===
| |
| <source lang="sql">
| |
| SELECT
| |
| log_comment,
| |
| COUNT(log_id)
| |
| FROM logging
| |
| WHERE log_type = 'delete'
| |
| AND log_action = 'delete'
| |
| GROUP BY log_comment
| |
| ORDER BY COUNT(log_id) DESC;
| |
| </source>
| |
| '''Explanation''': This will pull the most common deletion summaries on a given wiki.
| |
| | |
| === Most common deletion summaries for a specific user ===
| |
| <source lang="sql">
| |
| SELECT
| |
| log_comment AS reason,
| |
| COUNT(*) AS uses
| |
| FROM logging_userindex
| |
| JOIN `user`
| |
| ON log_user = user_id
| |
| WHERE user_name = 'User name'
| |
| AND log_type = 'delete'
| |
| AND log_action = 'delete'
| |
| GROUP BY reason
| |
| ORDER BY uses DESC
| |
| LIMIT 25;
| |
| </source>
| |
| '''Explanation''': This will pull the most commonly used deletion summaries for a specific user. {{done}} /modified (Tool Labs)
| |
| | |
| === Most common edit summaries for a specific user ===
| |
| <source lang="sql">
| |
| SELECT
| |
| rev_comment,
| |
| COUNT(*)
| |
| FROM revision_userindex
| |
| WHERE rev_user_text = 'User name'
| |
| GROUP BY rev_comment
| |
| ORDER BY COUNT(*) DESC
| |
| LIMIT 25;
| |
| </source>
| |
| '''Explanation''': This will pull the most commonly used edit summaries for a specific user. {{done}} /modified (Tool Labs)
| |
| | |
| === Number of revisions per day ===
| |
| <source lang="sql">
| |
| SELECT
| |
| DATE(CONCAT(YEAR(rev_timestamp),"-",MONTH(rev_timestamp),"-",DAY(rev_timestamp))) AS day,
| |
| COUNT(rev_timestamp) AS revisions
| |
| FROM revision
| |
| GROUP BY day;
| |
| </source>
| |
| '''Explanation''': This will pull the number of (non-deleted) revisions to a particular wiki and group the numbers by day. {{done}} (Tool Labs)
| |
| | |
| === Number of revisions per day by a specific user ===
| |
| <source lang="sql">
| |
| SELECT
| |
| DATE(CONCAT(YEAR(rev_timestamp),"-",MONTH(rev_timestamp),"-",DAY(rev_timestamp))) AS day,
| |
| COUNT(rev_timestamp) AS revisions
| |
| FROM revision_userindex
| |
| WHERE rev_user_text = 'User name'
| |
| GROUP BY day;
| |
| </source>
| |
| '''Explanation''': This will pull the number of (non-deleted) revisions by a specific user per day. {{done}} /modified (Tool Labs)
| |
| | |
| === File description pages without an associated file ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title
| |
| FROM page
| |
| WHERE NOT EXISTS (SELECT
| |
| img_name
| |
| FROM image
| |
| WHERE img_name = page_title)
| |
| AND NOT EXISTS (SELECT
| |
| img_name
| |
| FROM commonswiki_f_p.image
| |
| WHERE img_name = page_title)
| |
| AND page_namespace = 6
| |
| AND page_is_redirect = 0
| |
| LIMIT 1000;
| |
| </source>
| |
| '''Explanation''': This will pull file description pages that do not have an associated file, either locally or in the Commons repo. {{done}} /modified (Tool Labs)
| |
| | |
| === Files without an associated file description page ===
| |
| <source lang="sql">
| |
| SELECT
| |
| img_name
| |
| FROM image
| |
| WHERE NOT EXISTS (SELECT
| |
| page_title
| |
| FROM page
| |
| WHERE img_name = page_title
| |
| AND page_namespace = 6)
| |
| AND NOT EXISTS (SELECT
| |
| page_title
| |
| FROM commonswiki_p.page
| |
| WHERE img_name = page_title
| |
| AND page_namespace = 6);
| |
| </source>
| |
| '''Explanation''': This will pull files that have no associated file description page, either locally or in the Commons repo.
| |
| | |
| === File description pages containing no templates ===
| |
| <source lang="sql">
| |
| SELECT
| |
| ns_name,
| |
| page_title,
| |
| page_len
| |
| FROM page
| |
| JOIN toolserver.namespace
| |
| ON dbname = 'enwiki_p'
| |
| AND ns_id = page_namespace
| |
| LEFT JOIN templatelinks
| |
| ON tl_from = page_id
| |
| WHERE NOT EXISTS (SELECT
| |
| img_name
| |
| FROM commonswiki_p.image
| |
| WHERE img_name = page_title)
| |
| AND page_namespace = 6
| |
| AND page_is_redirect = 0
| |
| AND tl_from IS NULL
| |
| LIMIT 800;
| |
| </source>
| |
| '''Explanation''': This will pull file description pages containing no templates that do not have an associated file description page in the Commons repo.
| |
| | |
| === File description pages containing no templates or categories ===
| |
| <source lang="sql">
| |
| SELECT
| |
| ns_name,
| |
| page_title,
| |
| page_len
| |
| FROM page
| |
| JOIN toolserver.namespace
| |
| ON dbname = 'enwiki_p'
| |
| AND ns_id = page_namespace
| |
| LEFT JOIN templatelinks
| |
| ON tl_from = page_id
| |
| LEFT JOIN categorylinks
| |
| ON cl_from = page_id
| |
| WHERE NOT EXISTS (SELECT
| |
| img_name
| |
| FROM commonswiki_p.image
| |
| WHERE img_name = page_title)
| |
| AND page_namespace = 6
| |
| AND page_is_redirect = 0
| |
| AND tl_from IS NULL
| |
| AND cl_from IS NULL
| |
| LIMIT 800;
| |
| </source>
| |
| '''Explanation''': This will pull file description pages containing no templates or categories that do not have an associated file description page in the Commons repo.
| |
| | |
| === Links on a particular page ===
| |
| <source lang="sql">
| |
| SELECT
| |
| pl_namespace,
| |
| pl_title
| |
| FROM page
| |
| JOIN pagelinks
| |
| ON pl_from = page_id
| |
| WHERE page_namespace = 0
| |
| AND page_title = 'Don\'t_poke_the_bear';
| |
| </source>
| |
| '''Explanation''': This will pull the names of the links ''on'' a particular page. For example, the text of the "Don't poke the bear" page contains the link "[<nowiki/>[bear]]" so the output of this query will list "Bear" as a result.
| |
| | |
| === Links to a particular page ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| page_title
| |
| FROM page
| |
| JOIN pagelinks
| |
| ON pl_from = page_id
| |
| WHERE pl_namespace = 0
| |
| AND pl_title = 'Don\'t_poke_the_bear';
| |
| </source>
| |
| '''Explanation''': This will pull the names of the links ''to'' a particular page. This is the equivalent of [[Special:WhatLinksHere]]. For example, the page "What not to do" may contain the link "[<nowiki/>[Don't poke the bear]]"; this query would output "What not to do" as a result.
| |
| | |
| === Pages containing 0 page links ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| page_title
| |
| FROM page
| |
| LEFT JOIN pagelinks
| |
| ON pl_from = page_id
| |
| WHERE pl_namespace IS NULL
| |
| LIMIT 1000;
| |
| </source>
| |
| '''Explanation''': This will pull pages that contain 0 page links. This will not account for things like image links, template links, category links, or external links. {{done}} (Tool Labs)
| |
| | |
| === Pages with 0 links to them ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_namespace,
| |
| page_title
| |
| FROM page
| |
| LEFT JOIN pagelinks
| |
| ON pl_title = page_title
| |
| AND pl_namespace = page_namespace
| |
| WHERE pl_namespace IS NULL
| |
| LIMIT 1;
| |
| </source>
| |
| '''Explanation''': This will pull pages that have 0 links to them ([[Special:WhatLinksHere]] would be empty!).
| |
| | |
| === Short pages with a single author (excluding user pages and redirects) ===
| |
| <source lang="sql">
| |
| SELECT
| |
| CONCAT(ns_name, ':', page_title),
| |
| page_len
| |
| FROM page
| |
| JOIN toolserver.namespace
| |
| ON page_namespace = ns_id
| |
| AND dbname = 'enwiki_p'
| |
| LEFT JOIN templatelinks
| |
| ON tl_from = page_id
| |
| WHERE page_len < 50
| |
| AND page_is_redirect = 0
| |
| AND page_namespace NOT IN (2, 3)
| |
| AND tl_from IS NULL
| |
| AND (SELECT
| |
| COUNT(DISTINCT rev_user_text)
| |
| FROM revision
| |
| WHERE rev_page = page_id) = 1
| |
| ORDER BY page_len ASC
| |
| </source>
| |
| '''Explanation''': This will list short pages with a single author (excluding user pages and redirects).
| |
| | |
| === Unused templates ===
| |
| <source lang="sql">
| |
| SELECT
| |
| CONCAT('Template:', page_title)
| |
| FROM page
| |
| LEFT JOIN templatelinks
| |
| ON page_namespace = tl_namespace
| |
| AND page_title = tl_title
| |
| WHERE page_namespace = 10
| |
| AND tl_from IS NULL
| |
| </source>
| |
| '''Explanation''': This will pull unused templates.
| |
| | |
| === Broken image links ===
| |
| <source lang="sql">
| |
| SELECT
| |
| CONCAT("* [[", if (page_namespace = 0, page_title, concat(":", ns_name, ":", page_title)), "]]",
| |
| " - [[:Bilde:", il_to, "]]") as links
| |
| FROM imagelinks
| |
| JOIN page
| |
| ON page_id = il_from
| |
| JOIN toolserver.namespace
| |
| ON ns_id = page_namespace
| |
| AND dbname = "nowiki_p"
| |
| LEFT JOIN image as I
| |
| ON I.img_name = il_to
| |
| LEFT JOIN commonswiki_p.image as J
| |
| ON J.img_name = il_to
| |
| WHERE I.img_name IS NULL
| |
| AND J.img_name IS NULL /* SLOW_OK */;
| |
| </source>
| |
| '''Explanation''': List references to images that are not present locally nor on Commons (nowiki).
| |
| | |
| === Revision counting for specific user till specific time ===
| |
| <source lang="sql">
| |
| SELECT
| |
| COUNT(*)
| |
| FROM revision
| |
| WHERE rev_user_text = 'User name'
| |
| AND rev_timestamp < '20120327000000';
| |
| </source>
| |
| '''Explanation''': Return the number of revisions by a specific user up to a certain time.
| |
| | |
| === Log count in recent time by users not in group ===
| |
| | |
| A query like this outputs the users with:
| |
| * the most log actions of the kind specified (in the example, patrol/patrol minus autopatrolled edits),
| |
| * in the interval specified (in the example, last 24 months),
| |
| * excluding users in some user groups (in the example, sysop and rollbacker).
| |
| | |
| <source lang="sql">
| |
| /* SLOW_OK adapted from Krinkle, bug 25799 */
| |
| | |
| SELECT
| |
| count(*) AS counter,
| |
| user_name,
| |
| user_editcount,
| |
| GROUP_CONCAT(DISTINCT ug_group SEPARATOR ", ")
| |
| | |
| FROM logging log
| |
| | |
| JOIN user us
| |
| ON log.log_user = us.user_id
| |
| | |
| JOIN user_groups ug
| |
| ON log.log_user = ug.ug_user
| |
| | |
| WHERE ug.ug_user NOT IN
| |
| (SELECT DISTINCT user_groups.ug_user
| |
| FROM user_groups
| |
| WHERE ug_group = 'sysop'
| |
| OR ug_group = 'rollbacker'
| |
| )
| |
| AND log.log_type = 'patrol'
| |
| AND log.log_action = 'patrol'
| |
| AND log.log_timestamp > ( NOW() - INTERVAL 24 MONTH )
| |
| AND ( log.log_params LIKE '%"6::auto";i:0%'
| |
| OR log.log_params LIKE '%\n0' /* not autopatrolled */ )
| |
| GROUP BY log.log_user
| |
| | |
| ORDER BY counter DESC
| |
| LIMIT 50;
| |
| </source>
| |
| | |
| In wikis where "patrol" right is given to all users, the example query can be used to find active patrollers who would be likely candidates for additional rights (like autropatrolled, rollbacker, sysop etc. depending on the wiki's configuration).
| |
| | |
| === Redirects with more than one revision ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title,
| |
| rd_namespace,
| |
| rd_title,
| |
| COUNT(*) as edits
| |
| FROM revision
| |
| INNER JOIN page
| |
| ON rev_page = page_id
| |
| INNER JOIN redirect
| |
| ON rev_page = rd_from
| |
| WHERE page_namespace = 0
| |
| AND page_is_redirect = 1
| |
| GROUP BY page_title
| |
| HAVING COUNT(*) > 1;
| |
| </source>
| |
| '''Explanation''': This query lists redirect pages with more than one revision.
| |
| | |
| === Short non-disambiguation pages ===
| |
| <source lang="sql">
| |
| SELECT
| |
| page_title
| |
| FROM page
| |
| WHERE page_namespace = 0
| |
| AND page_is_redirect = 0
| |
| AND page_id NOT IN (SELECT
| |
| tl_from
| |
| FROM templatelinks
| |
| WHERE tl_title = 'Disambiguation'
| |
| AND tl_namespace = 10)
| |
| ORDER BY page_len ASC
| |
| LIMIT 1000;
| |
| </source>
| |
| '''Explanation''': This query lists 1000 non-disambiguation short pages.
| |
| | |
| === List of external links ===
| |
| <source lang="sql">
| |
| SELECT
| |
| COUNT(*),
| |
| TRIM(LEADING 'www.' FROM SUBSTRING_INDEX(SUBSTRING_INDEX(el_to, '/', 3),'/',-1)) AS site
| |
| FROM externallinks, page
| |
| WHERE el_from = page_id
| |
| AND page_namespace = 0
| |
| GROUP BY 2
| |
| HAVING COUNT(*) > 10
| |
| ORDER BY 1;
| |
| </source>
| |
| '''Explanation''': This query compiles a list of external links (from the main namespace)
| |
| grouped by website.<!--
| |
| | |
| === Header ===
| |
| <source lang="sql">
| |
| | |
| </source>
| |
| '''Explanation''': -->
| |
| | |
| === List of external links from all wikis ===
| |
| List of external links to a given domain from all namespaces on all Wikimedia projects; requires [[iterating over wikis]], with a bash script in the example, syntax like <code>./extlinks.sh http://meta.wikimedia.org</code> (outputs DB name, page title, linked URL).
| |
| <source lang="bash">
| |
| #!/bin/bash
| |
| WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND is_closed = 0;')
| |
|
| |
| for dbname in $WIKIS;
| |
| do
| |
| echo $dbname
| |
| echo "
| |
| /* SLOW_OK */
| |
| SELECT CONCAT(ns_name, ':', page_title), el_to
| |
| FROM externallinks
| |
| JOIN page ON el_from = page_id
| |
| JOIN toolserver.namespacename ON dbname = '$dbname' AND ns_is_favorite AND ns_id = page_namespace
| |
| WHERE el_to LIKE '$1%'
| |
| -- " | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
| |
| # mysql switches
| |
| # -c Prevent comment stripping, need to prevent the query killer
| |
| # -N No column name heading
| |
| # -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
| |
| done;
| |
| </source>
| |
| | |
| === List of interwiki links from all wikis ===
| |
| As above; use lowercase interwikiki prefix, like <code>./iwlinks.sh meatball</code>.
| |
| <source lang="bash">
| |
| #!/bin/bash
| |
| WIKIS=$(mysql -BN -h sql -e 'SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND is_closed = 0;')
| |
|
| |
| for dbname in $WIKIS;
| |
| do
| |
| echo $dbname
| |
| echo "
| |
| /* SLOW_OK */
| |
| SELECT CONCAT(ns_name, ':', page_title), CONCAT('$1:', iwl_title)
| |
| FROM iwlinks
| |
| JOIN page ON iwl_from = page_id
| |
| JOIN toolserver.namespacename ON dbname = '$dbname' AND ns_is_favorite AND ns_id = page_namespace
| |
| WHERE iwl_prefix = '$1'
| |
| -- " | mysql -h ${dbname/_/-}.rrdb.toolserver.org -BcN $dbname
| |
| # mysql switches
| |
| # -c Prevent comment stripping, need to prevent the query killer
| |
| # -N No column name heading
| |
| # -B Bare formatted / -t Table formatted / -H HTML output / -X XML output
| |
| done;
| |
| </source>
| |
| | |
| | |
| ===User Activity===
| |
| Activity of registered users, as edits per day. This query filters out flagged bots but might still miss [[en:List_of_Wikipedians_by_number_of_edits/Unflagged_bots|unflagged bot accounts]] though.
| |
| | |
| <source lang=SQL>
| |
| select
| |
| rev_user,
| |
| count(*) / ( unix_timestamp(max(rev_timestamp)) - unix_timestamp(min(rev_timestamp)) ) * 86400.0 as edits_per_day
| |
| from revision left join user_groups
| |
| on rev_user = user_id | |
| where (user_id is null or ug_group <> 'bot') and rev_user > 0
| |
| group by rev_user;
| |
| </source>
| |
| | |
| ===Article Size===
| |
| ''This query works on the WMF Slave db''
| |
| | |
| <source lang=SQL>
| |
| SELECT
| |
| rev_page AS page_id,
| |
| count(*) AS revisions,
| |
| count(distinct rev_user_text) AS editors,
| |
| sum(rev_len) AS size
| |
| FROM revision
| |
| GROUP BY rev_page;
| |
| </source>
| |
| | |
| ===Top 10 Articles in Size===
| |
| ''This query works on the WMF Slave db''
| |
| | |
| <source lang=SQL>
| |
| SELECT
| |
| rev_page,
| |
| count(distinct rev_user_text),
| |
| sum(rev_len)
| |
| page_title as title,
| |
| FROM revision
| |
| INNER JOIN page
| |
| page_id = rev_page
| |
| GROUP BY rev_page
| |
| ORDER BY sum(rev_len) DESC
| |
| LIMIT 10;
| |
| </source>
| |
| | |
| ===Top links, red and blue===
| |
| ''This query works on the WMF Slave db''
| |
| <source lang=SQL>
| |
| SELECT
| |
| pl_title,
| |
| count(*),
| |
| page_id
| |
| FROM enwiki.pagelinks
| |
| LEFT JOIN enwiki.page ON
| |
| pl_title = page_title
| |
| WHERE
| |
| pl_namespace = 0 AND
| |
| page_namespace = 0
| |
| GROUP BY 1
| |
| </source>
| |
| | |
| ===First messages to users, by tool/bot used (if any)===
| |
| <source lang=SQL>
| |
| | |
| SELECT
| |
| page_id, page_title, rev_comment,rev_user, rev_user_text,
| |
| cast(rev_timestamp as datetime) as rev_timestamp, cast(user_registration as datetime) as user_registration,
| |
| datediff(rev_timestamp,user_registration) as datediff_msg_reg, timediff(rev_timestamp,user_registration) as timediff_msg_reg,
| |
| DATE_FORMAT(user_registration,"%Y-%m-%d") as user_reg_ymd, DATE_FORMAT(rev_timestamp,"%Y-%m-%d") as first_msg_ymd, user_id,
| |
| (rev_comment LIKE '%WP:HG%' OR rev_comment LIKE '%WP:HUGGLE%') as huggle, (rev_comment LIKE '%WP:TW%') tw,
| |
| (rev_comment LIKE '%igloo|GLOO]]%') as gloo,
| |
| (rev_comment LIKE '%WP:STiki%') as stiki,
| |
| (rev_user_text LIKE 'ClueBot%') as cluebot,
| |
| (rev_user_text = 'XLinkBot') as xlinkbot,
| |
| (rev_user_text = 'CorenSearchBot') as corenbot,
| |
| (rev_user_text LIKE 'BetacommandBot%') as betacommandbot,
| |
| (rev_user_text LIKE '%Bot') as bot,
| |
| (rev_user_text LIKE page_title) as self
| |
| FROM enwiki.page
| |
| LEFT JOIN enwiki.revision ON page_id = rev_page
| |
| LEFT JOIN enwiki.user ON replace(page_title,"_"," ") = user_name
| |
| WHERE page_namespace = 3 AND LOCATE('/',page_title) = 0
| |
| GROUP BY page_id;
| |
| </source>
| |
| | |
| === Number of registrations per day ===
| |
| <source lang=SQL>
| |
| select DATE_FORMAT(log_timestamp,"%Y-%m-%d"), count(*)
| |
| from enwiki.logging
| |
| where log_action = 'create' AND log_type='newusers'
| |
| group by 1
| |
| order by log_timestamp;
| |
| </source>
| |
| | |
| === Automated tool and bot edits ===
| |
| <source lang = SQL>
| |
| UPDATE rev_table SET tool = 'rollback' WHERE rev_comment RLIKE "(Reverted ([0-9]+ )?edits by \[\[Special:Contributions/[^\|]+\|[^]]+\]\] \(\[\[User talk:[^\|]+\|talk\]\]\) to last version by .+)";
| |
| UPDATE rev_table SET tool = 'undo' WHERE rev_comment LIKE "Undid revision%";
| |
| UPDATE rev_table SET tool = 'huggle' WHERE rev_comment LIKE '%WP:HG%' OR rev_comment LIKE '%WP:HUGGLE%';
| |
| UPDATE rev_table SET tool = 'huggle' WHERE rev_comment RLIKE "(Message re\. \[\[[^]]+\]\])|(Level [0-9]+ warning re\. \[\[[^]]+\]\])";
| |
| UPDATE rev_table SET tool = 'twinkle' WHERE rev_comment LIKE '%WP:TW%';
| |
| UPDATE rev_table SET tool = 'friendly' WHERE rev_comment LIKE '%WP:FRIENDLY%' OR rev_comment LIKE '%WP:Friendly%';
| |
| UPDATE rev_table SET tool = 'vandalproof' WHERE rev_comment LIKE '%WP:VPRF%' OR rev_comment LIKE '%WP:VandalProof%' OR rev_comment LIKE '%VandalProof|VandalProof%' OR rev_comment LIKE '%WP:VP2%' OR rev_comment LIKE '%WP:VandalProof%';
| |
| UPDATE rev_table SET tool = 'stiki' WHERE rev_comment LIKE '%|STiki]]%';
| |
| UPDATE rev_table SET tool = 'npwatcher' WHERE rev_comment LIKE '%|NPWatcher%';
| |
| UPDATE rev_table SET tool = 'vandalsniper' WHERE rev_comment LIKE '%|VandalSniper%';
| |
| UPDATE rev_table SET tool = 'wikimonitor' WHERE rev_comment LIKE '%m:WikiMonitor%';
| |
| UPDATE rev_table SET tool = 'mwt' WHERE rev_comment LIKE '%MWT|MWT]]%';
| |
| UPDATE rev_table SET tool = 'awb' WHERE rev_comment LIKE '%AWB|AWB]]%' OR rev_comment LIKE '%AutoWikiBrowser%';
| |
| UPDATE rev_table SET tool = 'cluebot' WHERE lower(cast(rev_user_text as CHAR)) LOWER LIKE 'cluebot%';
| |
| UPDATE rev_table SET tool = 'antivandalbot' WHERE rev_user_text = 'AntiVandalBot';
| |
| UPDATE rev_table SET tool = 'orphanbot' WHERE rev_user_text = 'OrphanBot';
| |
| UPDATE rev_table SET tool = 'pseudobot' WHERE rev_user_text = 'PseudoBot';
| |
| UPDATE rev_table SET tool = 'voabot' WHERE rev_user_text LIKE 'VoABot%';
| |
| UPDATE rev_table SET tool = 'martinbot' WHERE rev_user_text = 'MartinBot';
| |
| UPDATE rev_table SET tool = 'stbot' WHERE rev_user_text LIKE 'STBot%';
| |
| UPDATE rev_table SET tool = 'sqlbot' WHERE rev_user_text = 'SQLbot';
| |
| UPDATE rev_table SET tool = 'tawkerbot' WHERE rev_user_text LIKE 'Tawkerbot%';
| |
| UPDATE rev_table SET tool = 'sinebot' WHERE rev_user_text = 'SineBot';
| |
| UPDATE rev_table SET tool = 'csdwarnbot' WHERE rev_user_text = 'CSDWarnBot';
| |
| UPDATE rev_table SET tool = 'antispambot' WHERE rev_user_text = 'AntiSpamBot';
| |
| </source>
| |
| | |
| === Determining Huggle and Twinkle messages types by revision comment ===
| |
| <source lang = SQL>
| |
| UPDATE rev_table SET tool = 'huggle' WHERE rev_comment LIKE '%WP:HG%' OR rev_comment LIKE '%WP:HUGGLE%';
| |
| UPDATE rev_table SET tool = 'twinkle' WHERE rev_comment LIKE '%WP:TW%';
| |
| | |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level1' WHERE rev_comment LIKE 'Message re.%' AND tool = 'huggle';
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level2' WHERE rev_comment LIKE 'Level 2 warning%' AND tool = 'huggle';
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level3' WHERE rev_comment LIKE 'Level 3 warning%' AND tool = 'huggle';
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level4' WHERE rev_comment LIKE 'Level 4 warning%' AND tool = 'huggle';
| |
|
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level1' WHERE rev_comment LIKE 'General note:%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level2' WHERE rev_comment LIKE 'Caution:%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level3' WHERE rev_comment LIKE 'Warning:%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'warning', msg_type_sub='level4' WHERE rev_comment LIKE 'Final warning:%' AND tool = 'twinkle';
| |
|
| |
| UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'csd' WHERE rev_comment LIKE 'Notification: speedy deletion%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'tag' WHERE rev_comment LIKE 'Notification: tagging for deletion%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'prop' WHERE rev_comment LIKE 'Notification: proposed deletion%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'listing' WHERE rev_comment LIKE 'Notification: listing at%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'talkback' WHERE rev_comment LIKE 'Talkback%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'blocked' WHERE rev_comment LIKE 'You have been%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'blocked' WHERE rev_comment LIKE 'Your IP address has been blocked%' AND tool = 'twinkle';
| |
| UPDATE rev_table SET msg_type = 'welcome' WHERE rev_comment LIKE 'Added welcome%' AND tool = 'twinkle';
| |
|
| |
| UPDATE rev_table SET msg_type = 'welcome' WHERE rev_comment LIKE 'Welcoming%' AND tool = 'huggle';
| |
| UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'csd' WHERE rev_comment LIKE 'Notification: Speedy deletion%' AND tool = 'huggle';
| |
| UPDATE rev_table SET msg_type = 'deletion', msg_type_sub = 'prop' WHERE rev_comment LIKE 'Notification: Proposed deletion%' AND tool = 'huggle';
| |
| UPDATE rev_table SET msg_type = 'notification', msg_type_sub = 'blocked' WHERE rev_comment LIKE 'Notification: Blocked%' AND tool = 'huggle';
| |
| | |
| </source>
| |
| | |
| === Deletion count by day (Mainspace) ===
| |
| <source lang = SQL>
| |
| SELECT
| |
| DATE( log_timestamp ) AS Date,
| |
| COUNT( log_id ) AS deletions
| |
| FROM logging
| |
| WHERE log_action='delete'
| |
| AND log_namespace =0
| |
| GROUP BY Date( log_timestamp )
| |
| </source>
| |
| | |
| === List of users with a certain log action (not) among global users ===
| |
| | |
| For instance, a query to have a list of autocreated accounts (which have an entry in autocreate log) which are not known by CentralAuth ([[bugzilla:61876]]):
| |
| | |
| <source lang = SQL>
| |
| SELECT
| |
| log_title,
| |
| log_user,
| |
| log_timestamp
| |
| FROM logging
| |
| JOIN user ON logging.log_action = 'autocreate'
| |
| AND log_timestamp > 20130831000000
| |
| AND logging.log_user = user.user_id /* user_name and lu_name are normalised, log_* contains e.g. underscores */
| |
| WHERE user.user_name NOT IN (
| |
| SELECT lu_name
| |
| FROM centralauth_p.localuser lu
| |
| WHERE lu.lu_wiki = 'metawiki'
| |
| );
| |
| </source>
| |
| | |
| === Editors of a page by edit count ===
| |
| | |
| Something like this can be used for instance to quickly list users who (don't) meet requirements for voting in a poll, though usually requirements are about more than edit count (cf. [https://github.com/Pathoschild/Wikimedia-contrib.toolserver/blob/master/accounteligibility/index.php accounteligibility tool] for complex examples).
| |
| | |
| <source lang = SQL>
| |
| SELECT DISTINCT
| |
| rev_user_text,
| |
| user_editcount
| |
| FROM revision
| |
| LEFT JOIN user /* This will leave rows for unregistered users too */
| |
| ON revision.rev_user = user.user_id
| |
| WHERE revision.rev_page = 1000 /* Insert page ID here, e.g. from action=info */
| |
| ORDER BY user_editcount DESC;
| |
| </source>
| |
| | |
| === Frequency of a log action by day ===
| |
| | |
| Replace "create" (unregistered user signup) with your [[mw:Logging_table#log_action|log action]]; and "8) AS DAY" with "6) AS MONTH" and so on for lower resolution.
| |
| | |
| <source lang="sql">
| |
| SELECT
| |
| DAY,
| |
| ACTIONS
| |
| FROM (SELECT
| |
| SUBSTR(log_timestamp, 1, 8) AS DAY,
| |
| SUM(log_action = 'create') AS ACTIONS
| |
| FROM logging
| |
| GROUP BY DAY
| |
| ORDER BY log_id DESC
| |
| LIMIT 30)
| |
| AS logs;
| |
| </source>
| |
| | |
| === Number of edits prevented by AbuseFilter ===
| |
| | |
| <source lang="sql">
| |
| SELECT
| |
| SUBSTR(afl_timestamp, 1, 6) AS MONTH,
| |
| COUNT(afl_id) AS edits
| |
| FROM abuse_filter_log
| |
| WHERE afl_action = 'edit'
| |
| AND afl_actions LIKE '%disallow%'
| |
| GROUP BY MONTH;
| |
| </source>
| |
| | |
| === Number of edits hindered by AbuseFilter ===
| |
| | |
| <source lang="sql">
| |
| SELECT
| |
| SUBSTR(afl_timestamp, 1, 6) AS MONTH,
| |
| COUNT(afl_id) AS edits
| |
| FROM abuse_filter_log
| |
| WHERE afl_action = 'edit'
| |
| AND afl_actions RLIKE '.*(throttle|warn|disallow).*'
| |
| GROUP BY MONTH;
| |
| </source>
| |
| | |
| === Number of users hit by AbuseFilter ===
| |
| | |
| <source lang="sql">
| |
| SELECT
| |
| MONTH,
| |
| USERS
| |
| FROM (SELECT
| |
| SUBSTR(afl_timestamp, 1, 6) AS MONTH,
| |
| COUNT(DISTINCT(afl_user_text)) AS USERS
| |
| FROM abuse_filter_log
| |
| WHERE afl_actions RLIKE '.*(throttle|warn|disallow|blockautopromote|block|degroup).*'
| |
| GROUP BY MONTH
| |
| ORDER BY afl_id ASC)
| |
| AS logs;
| |
| </source>
| |
| | |
| === Edits per month with given tag and editcount ===
| |
| | |
| In the example, VisualEditor edits by users with a (current!) edit count of at least 1000 edits.
| |
| | |
| <source lang="sql">
| |
| SELECT
| |
| SUBSTR(rev_timestamp, 1, 6) AS MONTH,
| |
| COUNT(rev_id) AS edits
| |
| FROM revision
| |
| JOIN change_tag ON ct_rev_id = rev_id
| |
| JOIN user ON rev_user = user_id
| |
| WHERE ct_tag = 'visualeditor'
| |
| AND user_editcount > 1000
| |
| GROUP BY MONTH;
| |
| </source>
| |
| | |
| === Edits per month with two given tags ===
| |
| | |
| In the example, VisualEditor edits tagged "nowiki" (by AbuseFilter).
| |
| | |
| <source lang="sql">
| |
| SELECT
| |
| SUBSTR(rev_timestamp, 1, 6) AS MONTH,
| |
| COUNT(rev_id) AS edits
| |
| FROM change_tag as a
| |
| JOIN change_tag as b ON a.ct_rev_id = b.ct_rev_id
| |
| JOIN revision ON rev_id = a.ct_rev_id
| |
| WHERE a.ct_tag = 'nowiki'
| |
| AND b.ct_tag = 'visualeditor'
| |
| GROUP BY MONTH;
| |
| </source>
| |
| | |
| === Survived editors ===
| |
| | |
| Editor retention by month considering time between first and last edit, excluding bots, ordered by month of first edit.
| |
| | |
| <source lang=SQL>
| |
| SELECT
| |
| CONCAT(month, '/', year) monthly,
| |
| SUM(period < 7) less_than_1week,
| |
| SUM(period BETWEEN 7 AND 30) 1week_to_1month,
| |
| SUM(period BETWEEN 31 AND 365) 1month_to_1year,
| |
| SUM(period > 365) more_than_1year
| |
| FROM (
| |
| SELECT
| |
| SUBSTR(first, 5, 2) month,
| |
| SUBSTR(first, 1, 4) year,
| |
| TIMESTAMPDIFF(DAY, first, last) period
| |
| FROM (
| |
| SELECT
| |
| MIN(rev_timestamp) first,
| |
| MAX(rev_timestamp) last
| |
| FROM revision_userindex
| |
| WHERE rev_user != 0 AND rev_user NOT IN (SELECT ug_user FROM user_groups WHERE ug_group = 'bot')
| |
| GROUP BY rev_user
| |
| ) a
| |
| ) b
| |
| GROUP BY monthly
| |
| ORDER BY year, month;
| |
| </source>
| |
| | |
| === Top most active editors under some conditions ===
| |
| | |
| For instance the following gives the top 50 most active translators (editors of the "Translations" namespace, where available). A simple JOIN should give most of the filtering options you'll need.
| |
| | |
| <source lang = SQL>
| |
| SELECT
| |
| rev_user_text, COUNT(*)
| |
| FROM revision
| |
| JOIN page
| |
| ON rev_page = page_id
| |
| WHERE page_namespace = 1198
| |
| /* Other filters like page suffix and year
| |
| AND page_title LIKE '%/it'
| |
| AND rev_timestamp LIKE '2014%' */
| |
| GROUP BY rev_user_text
| |
| ORDER BY COUNT(*) DESC
| |
| LIMIT 50;
| |
| </source>
| |
| | |
| Or, if you want to count log actions and filter e.g. by current user group, in the example number of deletions by sysops:
| |
| | |
| <source lang = SQL>
| |
| SELECT
| |
| log_user_text, COUNT(*)
| |
| FROM logging
| |
| JOIN user_groups ON ug_user = log_user
| |
| WHERE log_type = 'delete'/*
| |
| AND log_action = 'delete'
| |
| AND log_timestamp > 20130101000000*/
| |
| AND ug_group = 'sysop'
| |
| GROUP BY log_user_text
| |
| ORDER BY COUNT(*) DESC
| |
| LIMIT 50;
| |
| </source>
| |
| | |
| ==Regular expressions==
| |
| ===Is an IPv4 address===
| |
| This query matches 999.999.999.999 as well as valid addresses like 129.47.61.5, but not IPv6 addresses.
| |
| | |
| <code>REGEXP '^[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}$'</code>
| |
| | |
| ==Most edited pages in last month==
| |
| <source lang = SQL>
| |
| SELECT rc_title, count(*) as num_edits
| |
| FROM recentchanges
| |
| WHERE rc_namespace = 0
| |
| GROUP BY 1 ORDER BY 2 DESC
| |
| LIMIT 25;
| |
| </source>
| |
|
| |
|
| == See also == | | == See also == |
| * [//tools.wmflabs.org/tools-info/?listmetap Database meta_p] | | * [https://tools.wmflabs.org/tools-info/?listmetap Database meta_p] |
| | * [https://quarry.wmflabs.org/ Quarry] |
|
| |
|
| [[Category:Wikimedia Labs]] | | [[Category:Cloud Services]] |