(Translated by https://www.hiragana.jp/) FirstGraphemeCluster does not need to preserve state across grapheme clusters · Issue #58 · rivo/uniseg · GitHub
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The FirstGraphemeCluster function can be used to iteratively extract grapheme clusters from a string (without additional allocations). The function mentions that a state should be passed (initially set to -1), is then returned and should be passed again on the next call, in order to preserve some state across calls of this function.
This state contains the current grapheme cluster parser state, and the property of the next codepoint.
It did not make sense to me that decoding grapheme cluster depended on earlier state: I'd expected that each grapheme cluster was fully independent.
When preserving the state across grapheme clusters: everything works (as expected: the library is compliant 😋)
When explicitly resetting the state to -1 across calls to FirstGraphemeClusterInString (should be incorrect): everything still works, all tests pass!!!
This would mean that even when not preserving any state, the actual grapheme clusters that are returned are always the same.
So, from my understanding, there shouldn't be the need for any state at all between calls of the library; and the state parameter can be fully deprecated.
Hi,
The FirstGraphemeCluster function can be used to iteratively extract grapheme clusters from a string (without additional allocations). The function mentions that a state should be passed (initially set to -1), is then returned and should be passed again on the next call, in order to preserve some state across calls of this function.
This state contains the current grapheme cluster parser state, and the property of the next codepoint.
It did not make sense to me that decoding grapheme cluster depended on earlier state: I'd expected that each grapheme cluster was fully independent.
To test this, I took the full test case for grapheme cluster boundary processing of Unicode 14.0 (the version supported by the library), and ran a simple test by calling FirstGraphemeClusterInString and comparing the results with the spec:
This would mean that even when not preserving any state, the actual grapheme clusters that are returned are always the same.
So, from my understanding, there shouldn't be the need for any state at all between calls of the library; and the state parameter can be fully deprecated.
Full test case (see the TODO line), try running in the Go playground (prints All tests passed): https://gist.github.com/delthas/0965a2c198b3a114fbb6706435786b73
The text was updated successfully, but these errors were encountered: