FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

delthas · 2024-06-07T14:46:11Z

Hi,

The FirstGraphemeCluster function can be used to iteratively extract grapheme clusters from a string (without additional allocations). The function mentions that a state should be passed (initially set to -1), is then returned and should be passed again on the next call, in order to preserve some state across calls of this function.

This state contains the current grapheme cluster parser state, and the property of the next codepoint.

It did not make sense to me that decoding grapheme cluster depended on earlier state: I'd expected that each grapheme cluster was fully independent.

To test this, I took the full test case for grapheme cluster boundary processing of Unicode 14.0 (the version supported by the library), and ran a simple test by calling FirstGraphemeClusterInString and comparing the results with the spec:

When preserving the state across grapheme clusters: everything works (as expected: the library is compliant 😋)
When explicitly resetting the state to -1 across calls to FirstGraphemeClusterInString (should be incorrect): everything still works, all tests pass!!!

This would mean that even when not preserving any state, the actual grapheme clusters that are returned are always the same.

So, from my understanding, there shouldn't be the need for any state at all between calls of the library; and the state parameter can be fully deprecated.

Full test case (see the TODO line), try running in the Go playground (prints All tests passed): https://gist.github.com/delthas/0965a2c198b3a114fbb6706435786b73

We don't need to keep track of grapheme states, see rivo/uniseg#58

aymanbagabas added a commit to charmbracelet/x that referenced this issue Aug 2, 2024

refactor(ansi): drop grapheme states

d5128f7

We don't need to keep track of grapheme states, see rivo/uniseg#58

aymanbagabas added a commit to charmbracelet/x that referenced this issue Aug 2, 2024

refactor(ansi): drop grapheme states

be935ba

We don't need to keep track of grapheme states, see rivo/uniseg#58

aymanbagabas added a commit to charmbracelet/x that referenced this issue Aug 5, 2024

refactor(ansi): drop grapheme states

fe2aaae

We don't need to keep track of grapheme states, see rivo/uniseg#58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

delthas commented Jun 7, 2024

FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

Comments

delthas commented Jun 7, 2024