(Translated by https://www.hiragana.jp/)
⚓ T360891 Wikibase Lua tracking sampling is broken
Page MenuHomePhabricator

Wikibase Lua tracking sampling is broken
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
Notice that the cache-miss rate is above 2, that is above 200%!

What should have happened instead?:
It should be something that is actually possible.

Other information (browser name/version, screenshots, etc.):
This seems to be caused by the fact that for the last 3 years, we only track a sample of the lua events for performance reason, about 1 in every 1000 events. That was introduced in Add a sample rate for Lua function call tracking. That is achieved by comparing the sampling rate to a randomly generated number. However, the random number generator is never explicitly initialized and so every request gets the exact same list of random numbers!

In practice, this would sadly seem to mean that the Lua tracking data of the last 3 years is of questionably worth.

Event Timeline

Change #1014000 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/Wikibase@master] Fix Lua tracking by initializing random generator

https://gerrit.wikimedia.org/r/1014000

Pulling directly into our review column as an “external” change to review (implemented / submitted by the Wikidata Integrations Team).

We (Brad Jorsch and I) didn't want random numbers in Scribunto because it encourages an inefficient implementation of things like "spotlight" templates that show a random featured article from a list of such articles. We want to cache the output from Scribunto but then people will see the same random selection for months at a time, so users will inevitably try to defeat caching or incentivize purge requests.

For this application I would suggest passing a random number in $setupOptions to LuaEngine::registerInterface(). Implement your own LCG which should be doable in 2 lines of code even though you only have floating point numbers. The multiplication needs to fit in the 53-bit mantissa of the floating point number without rounding, which is not a problem for say MINSTD m=2^31-1, a=48271, which gives a maximum base 2 logarithm of 47.

Or just increment a counter and report every time the counter mod N is zero. Offset the counter at startup by a random number.

If you actually wanted to unconditionally seed the global RNG, this would not be the right way to do it. Better to seed it in mw.executeModule() based on an additional parameter passed from PHP.

os.clock() only looks attractive as a random number source because we've deliberately avoided providing better random number sources to Lua.

Or just increment a counter and report every time the counter mod N is zero. Offset the counter at startup by a random number.

I think that sounds good enough – since the counter is offset for each page parse, the same module will still have different calls sampled each time it’s used on a page, so that should be random enough. Thanks for your detailed reply!

Change #1014000 abandoned by Michael Große:

[mediawiki/extensions/Wikibase@master] Fix Lua tracking by initializing random generator

Reason:

I'm in a different team now and do not have the capacity to work on this anymore. Please see for T360891 for the problems with the change in its current form and possible alternatives.

https://gerrit.wikimedia.org/r/1014000

Change #1021429 had a related patch set uploaded (by Hoo man; author: Hoo man):

[mediawiki/extensions/Wikibase@master] Fix sampling of Lua function tracking

https://gerrit.wikimedia.org/r/1021429

Change #1021429 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Fix sampling of Lua function tracking

https://gerrit.wikimedia.org/r/1021429