Wubi method
Wubi method | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Simplified Chinese | 五笔字型输入法 | ||||||||||||
Literal meaning | five-stroke character model input method | ||||||||||||
| |||||||||||||
Alternative Chinese name | |||||||||||||
Simplified Chinese | |||||||||||||
Literal meaning | Wang code | ||||||||||||
|
The Wubizixing input method (simplified Chinese: 五笔字型输入法; traditional Chinese:
The method is also known as Wang Ma (simplified Chinese:
The Wubi method is based on the structure of characters rather than their pronunciation, making it possible to input characters even when the user does not know the pronunciation, as well as not being too closely linked to any particular spoken variety of Chinese. It is also extremely efficient: nearly every character can be written with at most 4 keystrokes. In practice, most characters can be written with fewer. There are reports of experienced typists reaching 160 characters per minute with Wubi.[2] What this means in the context of Chinese is not entirely the same as it is for English, but it is true that Wubi is extremely fast when used by an experienced typist. The main reason for this is that, unlike with traditional phonetic input methods, one does not have to spend time selecting the desired character from a list of homophonic possibilities: virtually all characters have a unique representation.
As its name suggests, the keyboard is divided into five regions. The Chinese character 笔 (bǐ), when used in the context of writing Chinese characters, refers to the brush strokes used in Chinese calligraphy. Each region is assigned a certain type of stroke.
- Region 1: horizontal (
一 ) - Region 2: vertical (丨)
- Region 3: downward right-to-left (丿)
- Region 4: dot strokes or downward left-to-right strokes (丶)
- Region 5: hook (
乙 )
A major drawback to learning Wubi is its steeper learning curve, since as a more complex system it takes longer to acquire as a skill. Memorization and practice are key factors for proficient usage.
To use Wubi, there are multiple input methods available, including Google Input Tools (used by Google Translate) and keyboard options on Mac devices. Wubi sequences can be looked up for specific characters by using online dictionaries.
In this article, the following convention will be used: character will always mean Chinese character, whereas letter, key and keystroke will always refer to the keys on keyboard.
How it works
[edit]Essentially, a character is broken down into components, which are usually (but not always) the same as radicals. These are typed in the order in which they would be written by hand. In order to ensure that extremely complex characters do not require an inordinate number of keystrokes, any character containing more than 4 components is entered by typing the first 3 components written, followed by the last. In this way, each character's data can be entered with no more than 4 keystrokes.
Wubi distributes its characters very evenly and as such the vast majority of characters are uniquely defined by the 4 keystrokes discussed above. One then types a space to move the character from the input buffer onto the screen. In the event that the 4 letter representation of the character is not unique, one would type a digit to select the relevant character (for example, if two characters have the same representation, typing 1 would select the first, and 2 the second). In most implementations, a space can always be typed and simply means 1 in an ambiguous setting. Intelligent software will try to make sure that the character in the default position is the one desired.
Many characters have more than one representation. This sometimes is for ease of use, in case there is more than one obvious way to break down a character. More often though, it is because certain characters have a short representation that is less than 4 letters, as well as a "full" representation.
For characters with fewer than 4 components that do not have a short form representation, one types each component and then "fills up" the representation (that is, types enough extra keystrokes to make the representation 4 keystrokes) by manually typing the strokes of the last component, in the order they would be written. If there are too many strokes, one should write as many as possible, but put the last stroke last (this mirrors the component rule for characters with more than 4 components outlined above).
Once the algorithm is understood, one can type almost any character with a little practice, even if one has not typed it before. Muscle memory ensures that frequent typists using this method do not have to think about how the characters are actually constructed, just as the vast majority of English typists do not think very much about the spelling of words when they write.
Implementation of specific details
[edit]Many implementations employ further, multiple-word optimizations. Usually, a commonly used digraph (two character word) in which both characters have short form two-keystroke representations can be combined into a single, four keystroke representation which generates two characters rather than one. There are also a few 3-character shortcuts, and even one rather longer, politically motivated one.[clarification needed] Some examples of these are provided in the examples section below.
Another common feature is the use of the 'z' key as a wildcard. The Wubi method was actually designed with this feature in mind; this is why no components are assigned to the z key. Basically, one can type a z when unsure what the component should be, and the input method will help complete it. If one knew, for example, that the character ought to start with "kt", but was unsure what the next component should be, typing "ktz" would produce a list of all characters starting with "kt". In practice though, many input method engines use a tabular lookup method for all table based input systems, including for Wubi. This means that they simply have a large table in memory, associating different characters to their respective representations. The input method then simply becomes a table lookup. In such an implementation, the z key breaks the paradigm and as such is not found in much generalized software (although the Wubi input method commonly found in Chinese Windows implements the feature). For this same reason, the multiple character optimization described in the previous paragraph is also relatively rare.
Some input methods, such as xcin (found on many UNIX-like systems), provide a generic wildcard functionality which can be used in all the table based input systems, including pinyin and virtually anything else. Xcin uses '*' for auto-complete and '?' for just one letter, following the conventions pioneered in UNIX file globbing. Other implementations have their own conventions.
Subdivision of the keyboard
[edit]The Wubi keyboard assumes a QWERTY-like layout, so users of keyboards implementing a nationalized or alternative layout (such as Dvorak or the French AZERTY) will probably have to do some remapping to make the system sane. Wubi does not position its components arbitrarily: there are far too many of them, and it is only with the introduction of a logical methodology that the system becomes easy to learn.
Basically, the keyboard is divided into 5 zones, each representing a stroke. Those five strokes are falling left, falling right, horizontal, vertical, and hook, and the zones that represent them are QWERT, YUIOP, ASDFG, HJKLM, and XCVBN, respectively. These zones are all laid out horizontally, with the exception of M, which is not in line with the rest of the letters in its zone.
In a general way, the keyboard can be thought of as divided down the center, between T and Y, G and H, and N and M. The keys in each zone are numbered moving away from this dividing line: so we should actually say that in zone QWERT, T is the first letter, R is the second, and E the third; in zone YUIOP, Y is the first, U is the second, I the third, etc. For XCVBN, N is the first, and so on. In HJKLM, consider M to be the last in the series, even though it does not lie on the line.
This is important because components in the first position will have one repetition of the stroke in question (the stroke assigned to the zone in which they belong), those in the second, two, those in the third, three. Those components which are not easily classifiable using this paradigm will be placed on the last letter.
Therefore, one would expect
Furthermore, each letter of each zone has one component associated with it, its "main component". These are usually a complete character (with the exception of X) in their own right. One can always type this main component by typing the letter it is situated on four times. So, for example, the main component of H is
Each letter also has a shortcut character associated with it. In some cases, this character is the same as the component associated with the key in question, and sometimes not. This shortcut character is the character produced when one types just the letter and nothing else; these are all extremely common characters used when typing Chinese.
It is entirely possible that there are a number of components not listed below, either because of oversight, because they are rarely used, or because no simple Unicode representation for the component exists.
QWERT zone (falling left)
[edit]The Q key's main component is
The W key's main component and shortcut character are both
The E key's main component is
The R key's main component is
The T key's main component is 禾, and its shortcut character is
YUIOP zone (falling right)
[edit]This zone might also be called the dot zone, because its pattern of Y: 讠 U: 冫 I: 氵 and O: 灬 is not actually necessarily built up of right falling strokes. In fact, one could argue that the first stroke in 灬 actually falls left. It is called the falling right zone because the keys in this zone, when used to construct a character by stroke (rather than component), all represent right falling strokes for some character configuration (see the section on disambiguation strokes for more information).
The Y key's main component is
The U key's main component is
The I key's main component is
The O key's main component is
The P key's main component is
ASDFG zone (horizontal)
[edit]- The A key's shortcut character is
工 . - The S key's main component is
木 , and its shortcut character is要 . - The D key's main component is
大 , and its shortcut character is在 . - The F key's main component is
土 , and its shortcut character is地 . The main component's name (earth) correlates to the shortcut character which means earth. - The G key's main component is
王 , and its shortcut character is一 .
HJKLM zone (vertical)
[edit]- The H key's main component is
目 , and its shortcut character is上 . - The J key's main component is
日 , and its shortcut character is是 . - The K key's main component is
口 , and its shortcut character is中 . - The L key's main component is
田 , and its shortcut character is国 . - The M key's main component is
山 , and its shortcut character is同 .
XCVBN zone (hook)
[edit]- The X key's main component is 纟, and its shortcut character is 经.
- The C key's main component is
又 , and its shortcut character is 以. - The V key's main component is
女 , and its shortcut character is 发. - The B key's main component is
子 , and its shortcut character is了 . - The N key's main component is
已 , and its shortcut character is民 .
Disambiguation strokes
[edit]Strokes of keyboard is divided into 5 zones
Zone | Keys | Stroke | Shape |
---|---|---|---|
1 | GFDSA | Left-right (horizontal) | |
2 | HJKLM | 丨 | Top-bottom (vertical) |
3 | TREWQ | 丿 | Falling left |
4 | YUIOP | 丶 | Falling right |
5 | NBVCX | Hook |
Examples
[edit]Characters with 4 components or fewer (but no need for strokes)
[edit]Example 1: 请
Consists of three components: y (讠, radical #10), g (
Characters with more than four components
[edit]Example 2: 遗
Consists of five components: k (
Characters with fewer than 4 components (needing strokes)
[edit]Example 3a:
Example 3b:
Example 3c: 广: The code for this character is 'YYGT'. At first, you type the key where this character is located, which is a 'Y'. Then, you type a
Characters requiring disambiguation strokes
[edit]Example 4:
Consists of three components: t (
Disambiguation strokes: The last stroke is 丶 and the character is with top-bottom structure (42,u) →
Poem
[edit]A poem was made as a mnemonic for the Wubi keyboard, associating few characters with each key. The first character is the corresponding key main component, while the next ones are components or associated characters.
1986 version
[edit]G11
王 旁 青 头戋五 一
F12土 士 二 干 十 寸 雨
D13大 犬 三 羊 古石 厂
S14木 丁 西
A15工 戈 草 头右框 七
H21目 具 上 止 卜 虎 皮
J22日 早 两竖与虫 依
K23口 与川 ,字 根 稀
L24田甲 方 框 四 车力
M25山 由 贝,下 框 几
T31禾竹一 撇双人 立 ,反 文 条 头共三 一
R32白 手 看 头三 二 斤
E33月 彡(衫)乃用家 衣 底
W34人 和 八 ,登 祭 头
Q35金 勺 缺点 无尾鱼,犬 旁 留 义儿一 点 夕 ,氏 无七
Y41言文 方 广在四 一 ,高 头一捺谁人 去
U42立 辛 两点六 门疒(病 )
I43水 旁 兴头小 倒立
O44火 业头,四 点 米
P45之 宝 盖,摘示衣
N51已 半 巳 满不出 己 ,左 框 折 尸 心 和 羽
B52子 耳 了 也框向上
V53女 刀 九 臼 山 朝 西
C54又 巴 马,丢矢矣
X55慈母 无心弓 和 匕,幼 无力
1998 version
[edit]G11
王 旁 青 头五夫 一
F12土 干 十 寸 未 甘 雨 ,不要 忘了革 字 底
D13大 犬 戊 其古石 厂
S14木 丁 西 甫 一 四 里
A15工 戈 草 头右框 七
H21目上 卜 止 虎 具 头
J22日 早 两竖与虫 依
K23口 流川 ,码元稀
L24田甲 方 框 四 车里
M25山 由 贝骨下 框 集
T31 禾竹反 文 双 人 立
R32白 斤 气丘叉 手提
E33月 用 力 豸毛衣 臼
W34人 八登头单人几
Q35金 夕 鸟儿犭边鱼
Y41言文 方 点 谁人去
U42立 辛 六 羊 病 门里
I43水 族 三 点 鳖头小
O44火 业广鹿 四 点 米
P45之 字 宝 盖补示 衣
N51已 类左框 心 尸 羽
B52子 耳 了 也乃框 皮
V53女 刀 九 良 山西 倒
C54又 巴 牛 入 马失蹄
X55 幺母贯头弓 和 匕
New-century (3rd-generation) version
[edit]G11
王 旁 青 头五一 提
F12土 士 二 干 十 寸 雨
D13大 三肆头古石厂
S14木 丁 西 边要无女
A15工 戈 草 头右框 七
H21目 止 具 头卜虎 皮
J22日 曰两竖与虫 依
K23口 中 两川三 个竖
L24田 框 四 车甲单底
M25山 由 贝骨下 框 里
T31 禾竹牛 旁 卧人立
R32白 斤 气头叉 手提
E33月 舟 衣 力 豕 豸臼
W34人 八登祭风头几
Q35金 夕 犭儿包 头鱼
Y41言文 方 点在 四 一
U42立 带两点 病 门里
I43水 边一族 三 点 小
O44火 变三 态广二 米
P45之 字 宝 盖补示 衣
N51已 类左框 心 尸 羽
B52子 耳 了 也乃齿底
V53女 刀 九 巡 录无水
C54又 巴 甬矣马失蹄
X55 幺母绞丝弓 三 匕
In media
[edit]In 2020, the history of Wubi was featured in a Radiolab episode titled "The Wubi Effect".[3]
Notes and references
[edit]- ^ This is the name used in Mac OS X
- ^ a b Wicentowski, Joe (1996), Wubizixing for Speakers of English, archived from the original on 10 July 2015
- ^ Adler, Simon. "The Wubi Effect: Radiolab". WNYC Studios, 14 August 2020.