(Translated by https://www.hiragana.jp/)
⚓ T225805 DatabaseTermIdsAcquirer fails on terms longer than 255 bytes
Page MenuHomePhabricator

DatabaseTermIdsAcquirer fails on terms longer than 255 bytes
Closed, ResolvedPublic

Description

When trying to store terms longer than 255 bytes in the new normalized term store (in real-world data, this is typically descriptions in non-Latin scripts), a fail-safe exception is thrown in ReplicaMasterAwareRecordIdsAcquirer: the term text is implicitly truncated by the database (cf. T108255), and then a subsequent select with wbx_text = 'untruncated term text' can’t find it. (If ReplicaMasterAwareRecordIdsAcquirer didn’t detect this case and throw a fail-safe exception, it would continue to attempt acquiring an ID for the same text in an infinite loop.)

We need to fix this somewhere between DatabaseTermIdsAcquirer and ReplicaMasterAwareRecordIdsAcquirer: callers of the TermIdsAcquirer interface shouldn’t be expected to truncate the terms to some store-specific length.

The equivalent problem in wb_terms was discussed in T142691; since the normalized term store is only supposed to be a replacement for wb_terms so far, we’ll continue to truncate the values for now even though that behavior is ultimately considered a bug.

Event Timeline

Change 517070 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Truncate terms if necessary before they reach the database

https://gerrit.wikimedia.org/r/517070

Change 517070 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Truncate terms if necessary before they reach the database

https://gerrit.wikimedia.org/r/517070