(Translated by https://www.hiragana.jp/)
⚓ T369024 SpecialUncategorizedPages slow query
Page MenuHomePhabricator

SpecialUncategorizedPages slow query
Open, Needs TriagePublic

Description

I just caught this slow query on commons - it is using wikiadmin and coming from mwmaint host, so it is from a maintenance script

| 294402967 | wikiadmin2023   | 10.64.16.77:52520    | commonswiki | Query     |   11314 | Sending data                                   | SELECT /* MediaWiki\Specials\SpecialUncategorizedPages::reallyDoQuery  */  page_namespace AS `namespace`,page_title AS `title`  FROM `page` LEFT JOIN `categorylinks` ON ((cl_from = page_id))   WHERE cl_from IS NULL AND page_namespace IN (6,0)  AND page_is_redirect = 0  ORDER BY page_namespace,page_title LIMIT 5000


cumin2024@db1248.eqiad.wmnet[(none)]> show explain for 294402967;
+------+-------------+---------------+-------+---------------------------------------------+-----------------+---------+--------------------------+----------+--------------------------------------+
| id   | select_type | table         | type  | possible_keys                               | key             | key_len | ref                      | rows     | Extra                                |
+------+-------------+---------------+-------+---------------------------------------------+-----------------+---------+--------------------------+----------+--------------------------------------+
|    1 | SIMPLE      | page          | range | page_name_title,page_redirect_namespace_len | page_name_title | 4       | NULL                     | 74007461 | Using where                          |
|    1 | SIMPLE      | categorylinks | ref   | PRIMARY                                     | PRIMARY         | 4       | commonswiki.page.page_id |        3 | Using where; Using index; Not exists |
+------+-------------+---------------+-------+---------------------------------------------+-----------------+---------+--------------------------+----------+--------------------------------------+
2 rows in set, 1 warning (0.071 sec)

This seem to be it:

www-data 10323  0.0  0.0  19944 11620 ?        Ss   Jul01   0:00 /usr/bin/python3 /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscriptwikiset updateSpecialPages.php s4.dblist
www-data 10412  0.0  0.0   2384   760 ?        S    Jul01   0:00 /bin/sh -c /usr/local/bin/mwscriptwikiset updateSpecialPages.php s4.dblist
www-data 10414  0.0  0.0   6724  3308 ?        S    Jul01   0:00 /bin/bash /usr/local/bin/mwscriptwikiset updateSpecialPages.php s4.dblist
www-data 10440  0.0  0.0   6724  3172 ?        S    Jul01   0:00 /bin/bash /usr/local/bin/mwscript updateSpecialPages.php commonswiki
www-data 10467  0.0  0.1 240056 87572 ?        S    Jul01   0:00 php /srv/mediawiki-staging/multiversion/MWScript.php updateSpecialPages.php commonswiki

Any chances this can be done in a different way?

Event Timeline

That's part of updating query page entries and by design it's slow (it'll fill a cache). The proper fix is to move it to hadoop T309738: Move Mediawiki QueryPages computation to Hadoop (which I've been slowly working on it) but in the mean time, we can reduce its frequency.

That's part of updating query page entries and by design it's slow (it'll fill a cache). The proper fix is to move it to hadoop T309738: Move Mediawiki QueryPages computation to Hadoop (which I've been slowly working on it) but in the mean time, we can reduce its frequency.

Reducing it would work for me :)
Thanks!

Change #1052058 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Reduce frequency of two query pages in commonswiki

https://gerrit.wikimedia.org/r/1052058

Change #1052058 merged by jenkins-bot:

[operations/mediawiki-config@master] Reduce frequency of two query pages in commonswiki

https://gerrit.wikimedia.org/r/1052058

Mentioned in SAL (#wikimedia-operations) [2024-07-08T16:31:18Z] <ladsgroup@deploy1002> Started scap sync-world: Backport for [[gerrit:1052058|Reduce frequency of two query pages in commonswiki (T369024)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-08T16:33:35Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1052058|Reduce frequency of two query pages in commonswiki (T369024)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-07-08T16:39:08Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1052058|Reduce frequency of two query pages in commonswiki (T369024)]] (duration: 07m 50s)

Ladsgroup claimed this task.

Doesn't this also need a puppet change to add a cron job like enwiki has, or is that not needed anymore? Commons reports its been over a month since uncategorizedcategories got updated.

https://github.com/wikimedia/operations-puppet/tree/6d29f4037c0c34fd28e11fd68d4e72eb88cc71d7/modules/profile/manifests/mediawiki/maintenance/updatequerypages

just fyi, there is some push back on commons about the longer time period for Special:UncategorizedCategories - https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=908992771#Special:UncategorizedCategories

This bug is a bit confusing, as the query in the bug description is for Special:UncategorizedPages, but the patch was for Special:Mostlinkedtemplates and Special:UncategorizedCategories. Special:uncategorizedpages remains on its original schedule of once every 3 days for commons.

Change #1064424 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Change the disabled query page for commons

https://gerrit.wikimedia.org/r/1064424

Change #1064424 merged by jenkins-bot:

[operations/mediawiki-config@master] Change the disabled query page for commons

https://gerrit.wikimedia.org/r/1064424

Mentioned in SAL (#wikimedia-operations) [2024-08-21T17:28:05Z] <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1064424|Change the disabled query page for commons (T369024)]]

Mentioned in SAL (#wikimedia-operations) [2024-08-21T17:30:22Z] <ladsgroup@deploy1003> ladsgroup: Backport for [[gerrit:1064424|Change the disabled query page for commons (T369024)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-08-21T17:35:42Z] <ladsgroup@deploy1003> Finished scap sync-world: Backport for [[gerrit:1064424|Change the disabled query page for commons (T369024)]] (duration: 07m 36s)

Does this category mean UncategorizedFiles is not needed or does something use that report to populate that category? There are so few files in it so it seems like some 'uncategorized' maintenance category gets added somehow. Special:UncategorizedPages also isn't really needed and only contains a few files. See T371662 .
In contrast, UncategorizedCategories would be relatively important to keep and to sustain at least the prior frequency. (I think rather than only users going over these items suggesting categories via SuggestedEdits would be good and an additional report about categories with only redcats would be good.)

@Prototyperspective That sounds off-topic, this ticket is specifically about queries on Special:UncategorizedPages being slow.