Mojikyō
Developer(s) | Tadahisa Ishikawa ( Tokio Furuya ( Mojikyō Institute ( |
---|---|
Initial release | 1.0 / July 1997 |
Final release | 4.0
/ December 15, 2018 |
Operating system | Microsoft Windows |
Size | 51MB |
Available in | Japanese |
Type | Character set bundled with fonts and a character map |
License | Proprietary |
Website | mojikyo |
Mojikyō (Japanese:
Conceptualized in 1996,[3] the first version of the CD-ROM was released in July 1997.[4] For a time, the Mojikyō Institute also offered a web subscription, termed "Mojikyō WEB" (
As of September 2006[update], Mojikyō encoded 174,975 characters.[6] Among those, 150,366 characters (≈86%) then belonged to the extended Chinese–Japanese–Korean–Vietnamese (CJKV)[note 2] family.[5] Many of Mojikyō's characters are considered obsolete or obscure, and are not encoded by any other character set, including the most widely used international text encoding standard, Unicode.
Originally a paid proprietary software product, as of 2015, the Mojikyō Institute began to upload its latest releases to Internet Archive as freeware,[7] as a memorial to honor one of its developers, Tokio Furuya (
Premise[edit]
The Mojikyō encoding was created to provide a complete index of Chinese, Korean, and Japanese characters. It also encodes a large number of characters in ancient scripts, such as the oracle bone script, the seal script, and Sanskrit (Siddhaṃ). For many characters, it is the only character encoding to encode them, and its data is often used as a starting point for Unicode proposals.[8][9] However, Mojikyō has much looser standards than Unicode for encoding, which leads Mojikyō to have many encoded glyphs of dubious, or even unintentionally fictional, origin.[10][11] As such, while many non-Unicode Mojikyō characters are suitable for addition to Unicode, not all can become Unicode characters, due to the differing standards of evidence required by each.
Composition[edit]
The Mojikyō fonts (
Encoding[edit]
When referring to a character encoded in Mojikyō, the format MJXXXXXX is often used, similar to the U+XXXX format used for Unicode. For example, hentaigana U+1B008 𛀈 HENTAIGANA LETTER I-3 has Mojikyō encoding MJ090007 and Unicode encoding U+1B008.[13] A difference, however, is that Mojikyō encodings displayed this way are decimal, while Unicode's U+ encoding is hexadecimal.
From the earliest days of Unicode, Mojikyō has both influenced—and been influenced by—the standard. Glyphs originating from Mojikyō first appear in a proposal to the Ideographic Rapporteur Group (IRG),[note 8] which is responsible for maintaining all CJK blocks in Unicode,[14][15] on 18 April 2002.[16] In May 2007, Mojikyō played a minor role in an eventually successful series of proposals to encode the Tangut script in Unicode;[17][note 9] Mojikyō already had within its encoding 6,000 Tangut characters by October 2002.[6]
The Unicode Standard's Unihan Database refers to Mojikyō as the "Japanese KOKUJI Collection" (
Blocks[edit]
As of September 2006[update] it encoded 174,975 characters.[6] Among those, 150,366 characters then belonged to the extended CJKV[note 2] family.[5] Many of the encoded characters are considered obsolete or otherwise obscure, and are not encoded by any other character set, including the international standard, Unicode. Each Mojikyō character has a unique number, and the characters are organized into blocks.
Mojikyō puts CJKV characters in different blocks according to their traditional Kangxi radical. Common radicals containing an especially high number of characters, such as Radicals 9 (
No unification[edit]
Unlike Unicode, Mojikyō purposely avoids Han unification; no attempt at compactness of the encoding is made, nor is there an attempt to keep all common characters below U+FFFF as there is in Unicode.
Unicode, on the other hand, sorts its CJK into blocks based on how common they are: the most common are generally put into the Basic Multilingual Plane,[note 14] while those that are rare or obscure are put into the Supplementary Planes.
For example, Radical 9 has two characters where Unicode has one: MJ054435 (
License[edit]
Mojikyō is proprietary software under a restrictive license. Originally, the Mojikyō Institute tried to prevent its character data from being used, and threatened those who published conversion tables to and from its character set. In July 2010, the Mojikyō Institute abandoned its legal efforts to stop at least one Japanese user from publishing conversion tables or converting characters encoded in Mojikyō to Unicode or other character sets.[23] Mere data, sometimes including the shapes of letters, are considered in many jurisdictions to be common property as they do not meet the threshold of originality.[note 16]
Due to this legacy, however, GlyphWiki disallowed Mojikyō data as of 2020.[24]
Collected writing systems[edit]
Living[edit]
- Chinese — Hanzi
- Japanese — Kanji, Kana (including Hentaigana)
- Korean — Hanja
- Latin alphabet with diacritics
- Cyrillic script with diacritics
Dead or obsolete[edit]
- Ancient Chinese
- Taiwanese kana
- Vietnamese — Chữ Nôm
- Sanskrit — Siddhaṃ
- Tangut script
- Sui script
See also[edit]
References[edit]
- ^ "
今昔 文字 鏡 について" [About Mojikyō]. Mojikyō Institute (in Japanese). Archived from the original on 3 February 2001. Retrieved 6 July 2020. - ^ ようこそ、
今昔 文字 鏡 の世界 へ! [Welcome to the world of Mojikyō!] (in Japanese). Kinokuniya KK. Archived from the original on 4 March 2005. Retrieved 5 July 2020. - ^ a b c Ishikawa, Tadahisa (August 2015). "
古家 時雄 君 を悼 む" [Tokio Furuya, we grieve your death]. Mojikyō Institute (in Japanese). Retrieved 8 July 2020. - ^ Konjaku Mojikyō
今昔 文字 鏡 (in Japanese), July 1997, ISBN 9784314900034 - ^ a b c
今昔 文字 鏡 とは [About Mojikyo] (in Japanese). Kinokuniya KK. Archived from the original on 27 April 2010. Retrieved 5 July 2020. - ^ a b c
今昔 文字 鏡 とは [What is Mojikyō?] (in Japanese). Kinokuniya KK. Archived from the original on 5 February 2005. Retrieved 5 July 2020. - ^ "Search: creator:"MOJIKYO Institute"". Internet Archive. Retrieved 6 July 2020.
- ^ Takada, Tomokazu; Yada, Tsutomu; Saito, Tatsuya (18 September 2015). Proposal for hentaigana (PDF). Translated by Kobayashi, Tatsuo; Kobayashi, Daniel. Information Processing Society of Japan. L2/15-239. Retrieved 5 July 2020 – via Unicode Consortium.
- ^ Hiura, Hideki; Kobayashi, Tatsuo; et al. (31 October 2003). Ideograph Variation Selector and Variation Collection Identifier. Open Internationalization Initiative. L2/03-413. Retrieved 5 July 2020 – via Unicode Consortium.
- ^ Takada, Tomokazu [
高田 智和 ]; Oda, Tetsuji [織田 哲治 ]; et al. (26 August 2013).平成 25年度 第 3回 文字 情報 検討 サブワーキンググループ議事 録 [Meeting Minutes of the Third Character Information Examination Sub-Working Group of 2013 (Heisei 25)] (PDF). Information Technology Promotion Agency, Government of Japan (in Japanese). p. 2. Retrieved 6 July 2020.文字 鏡 研究 会 の関係 者 にヒアリングしたところ、オランダから提案 されたWG2 N36981には文字 鏡 のフォントが使用 されているが、文字 鏡 研究 会 は関与 しておらず、提案 内容 についても疑問 があるとのことであった。[According to an interview with a representative of the Mojikyō Institute, a Mojikyō font is used in WG2 N36981 proposed by the Netherlands, but the Mojikyō Institute itself is not involved with the proposal; it furthermore has doubts about some of the content of that proposal.] - ^ a b Suzuki, Toshiya [
鈴木 俊哉 ] (30 July 2009).統合 漢字 に申請 された「殷 周 金文 集成 引得」図形 文字 の調査 [Investigation on Glyphs collected from "Index to Collection of Inscriptions of the Yin-Zhou Period" to submit to CJK Unified Ideographs]. IPSJ SIG Technical Report (in Japanese). 2009-DD-72 (7). Information Processing Society of Japan: 2 – via Internet Archive.しかし、
拡張 Cの標準 化 作業 が8年 の長期 にわたり、また事後 的 に用例 が必須 とされたため、正式 に公布 された拡張 C漢字 の典拠 は当初 の典拠 とはかなり異 なるものとなっている。たとえば日本 では当初 は文字 鏡 研究 会 によって選定 された1000文字 程度 の漢字 を申請 していた[。] [...]典拠 用例 確認 は文字 鏡 とは独立 に行 なわれたため、字形 が文字 鏡 漢字 から変更 されたものも多 い。[As the standardization effort for CJK Unified Ideographs Extension C has been eight long years in the making and examples of kanji have been requested after their encoding, the officially promulgated Extension C kanji standard is quite different from the original standard. For example, we, the Government of Japan, initially applied for about 1,000 kanji selected by the Mojikyō Institute[.] [...] Since the verification of the kanji was performed independently of the Mojikyō Institute, the character shapes were often changed from Mojikyō's version of that same codepoint.] - ^ Ishikawa, Tadahisa (25 May 1999). "パソコン
悠悠 漢字 術 今昔 文字 鏡 徹底 活用 " [Kanji on your PC, Made Easy—The Complete Mojikyō Manual]. Mojikyō Institute. Retrieved 6 July 2020. - ^ MJ
文字 情報 一覧 表 [Table of MJ Character Encodings] (in Japanese). Information Technology Promotion Agency. Archived from the original on 29 September 2018. Retrieved 5 July 2020. - ^ "Unicode Standard Annex #45: U-source Ideographs". The Unicode Standard. Unicode Consortium.
- ^ "Appendix E: Han Unification History" (PDF). The Unicode Standard. Unicode Consortium. March 2020.
- ^ "CJK Extension C1 From Japan". Ideographic Rapporteur Group. IRG#19 N895 – via The Chinese University of Hong Kong's Department of Computer Science and Engineering.
N895-Japan_C1
- ^ Cook, Richard (9 May 2007). Proposal to encode Tangut characters in UCS Plane 1 (PDF). UC Berkeley Script Encoding Initiative. p. 4. L2/07-143 – via Unicode Consortium.
- ^ Jenkins, John H.; Cook, Richard; Lunde, Ken, eds. (5 March 2020), "kIRG JSource", Unicode Standard Annex #38, Unicode Consortium
- ^ Kobayashi, Tatsuo (3 December 2001). "List of Japanese Ideographs which may be proposed in Extension-C". ISO/IEC JTC1/SC2/WG2/IRG N853.
- ^ a b Ken Lunde [@ken_lunde] (6 July 2020). "In particular, all 782 JK-prefixed ideographs are indeed from
今昔 文字 鏡 per IRG N862. Most were encoded in #ExtensionC, and the stragglers were encoded in #ExtensionE." (Tweet). Retrieved 6 July 2020 – via Twitter. - ^ Ken Lunde [@ken_lunde] (7 July 2020). "JK-prefixed J-Source ideographs came from
今昔 文字 鏡 , which are in Extensions C and E (the mention of Extension D was simply that what became Extension E was originally targeted to become Extension D)" (Tweet). Archived from the original on 7 July 2020. Retrieved 6 July 2020 – via Twitter. - ^ Ken Lunde [@ken_lunde] (7 July 2020). "367 JK-prefixed ideographs are in Extension C, and the remaining 415 are in Extension E." (Tweet). Retrieved 6 July 2020 – via Twitter.
- ^ "
終戦 宣言 " [Announcement: The War is Over].青蛙 亭 漢語 塾 [Seiwatei's Kanji Cram School] (in Japanese) (28 January 2016 ed.). 21 July 2010. Retrieved 7 July 2020. - ^ "データ・
記事 のライセンス" [License of our data and articles]. GlyphWiki (9 June 2010 ed.). Retrieved 6 July 2020.今昔 文字 鏡 およびその関連 製品 、データは、そのライセンス上 グリフウィキには用 いることができません。文字 鏡 番号 (独自 部分 )および文字 鏡 のフォントに収録 されているグリフそのもの、およびそれを参照 、利用 して作成 していると判断 できる情報 は、グリフウィキに登録 する際 の典拠 とすることはできませんので、ご協力 をお願 いいたします。 [Konjaku Mojikyō and related products and associated data are licensed in such a way that they are incompatible with our above GlyphWiki license. Neither the number of the Mojikyō encoding slot, nor the appearance of the glyph itself in Mojikyō's fonts, nor any information whatsoever that can be judged to have been gathered by referring to a Mojikyō product, can be used when entering data into GlyphWiki. We absolutely cannot accept Mojikyō data. Please cooperate with us.]
Notes[edit]
- ^ As yet, lacks a Unicode encoding, so is approximated here with CSS and U+30BB セ KATAKANA LETTER SE.
- ^ a b For Korean, Hanja are referred to. For Vietnamese, Chữ Nôm.
- ^ Download the file MojikyoCmap400ALL49TTF.7z from the official website
- ^ English name from the title of the window produced by running the executable; Japanese name from the icon of the executable.
- ^ Also called the "Mojikyō Cmap".
- ^ See the screenshots on the official website
- ^ Into the system fonts directory C:\Windows\Fonts.
- ^ As of 2019, the IRG rebranded as the Ideographic Research Group.
- ^ The history of the encoding of the Tangut script is quite complicated, see Tangut (Unicode block) § History for a full listing of all the related proposals and a timeline.
- ^ Ideographic Description Sequence: ⿰
魚 嵐 - ^ This is a column name in the Unihan database; ⟨J⟩ here is short for "Japanese glyph source". The full name of the column is
kIRG_JSource
. Under Han unification, there are nine such sources. See §3.1 of UAX#38 for a complete list and more information. - ^ Other J-Source prefixes exist, such as J4, meaning the character originates from JIS X 0213:2004.
- ^ That is to say, a glyph made up of the same radicals in the same positions.
- ^ a b Errors in large collections of ideographs are, of course, not uncommon. Such errors even accidentally occur in well funded government-produced collections, such as the famous kanji from unknown sources in the Japanese Industrial Standards Committee's JIS X 0208 double-byte character encoding standard. All of these JIS X 0208 error kanji (Ghost characters,
幽霊 文字 ; e.g., 彁) have made their way into Unicode despite not being "real" kanji. - ^ For proof, see the list in the Mojikyō Character Map, MOCHRMAP.EXE.
- ^ See also: fictitious entry; trap street.
External links[edit]
- Character sets
- Encodings of Asian languages
- Encodings of Japanese
- 1997 establishments in Japan
- Software companies established in 1997
- CJK input methods
- Chinese-language computing
- Japanese-language computing
- Korean-language computing
- Indic computing
- Language software for Windows
- CJK typefaces
- Symbol typefaces
- Latin-script typefaces
- Tangut script
- Windows-only freeware