表意文字描述字符 Ideographic Description Characters | |
---|---|
范围 | U+2FF0–U+2FFF (16个码位) |
平面 | 基本多文种平面(BMP) |
字符 | 通用 |
分配 | 17个码位 |
未分配 | -1个保留码位 |
来源标准 | GB 2312-80 |
Unicode版本历史 | |
3.0 | 12 (+12) |
15.0 | 17 (+5) |
注:15.1版本更新时,由于原区块只有4个空位,因此赋予第5个新增IDC码位为U+31EF,置于中日韩汉字笔划区间[1][2][3] |
表意文字描述字符(英:Ideographic Description Character,IDC),自Unicode3.0版本后加入,至15.1版本再次加入5个字符。
Unicode对于汉字编码的处理大致为先搜集汉字,给予每个汉字一个数字编码。然而,汉字数量庞大,往往字集不完全。再加上汉字本身具有组合以及开放的特性,汉字用户很有可能自造新字,因此不可能有一个字集可以搜集到所有汉字,所以用这样的字符来描述某一个“字”是如何被更基本的部件组合起来。
码表
表意文字描述字符 Ideographic Description Characters [1][2] Unicode 联盟官方码表(PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+2FFx | ⿰ | ⿱ | ⿲ | ⿳ | ⿴ | ⿵ | ⿶ | ⿷ | ⿸ | ⿹ | ⿺ | ⿻ | | | | |
注释
|
中日韩笔画 CJK Strokes[1][2] Unicode 联盟官方码表 (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+31Cx | ㇀ | ㇁ | ㇂ | ㇃ | ㇄ | ㇅ | ㇆ | ㇇ | ㇈ | ㇉ | ㇊ | ㇋ | ㇌ | ㇍ | ㇎ | ㇏ |
U+31Dx | ㇐ | ㇑ | ㇒ | ㇓ | ㇔ | ㇕ | ㇖ | ㇗ | ㇘ | ㇙ | ㇚ | ㇛ | ㇜ | ㇝ | ㇞ | ㇟ |
U+31Ex | ㇠ | ㇡ | ㇢ | ㇣ | | |||||||||||
注释 |
另有收录 U+303E 〾 IDEOGRAPHIC VARIATION INDICATOR 表意文字指示符,以用来表示形似但不相等的字。
表意文字描述序列
表意文字描述序列(英:Ideographic Description Sequence,IDS)是Unicode标准定义的汉字结构描述语法,一个描述序列是由一个描述符号与所需数量的特定字符(通常为汉字)组合而成,用来表示一个汉字的抽象结构。
Unicode定义了17种组合字符:
编码 | 字符 | 意义 | 例字 | 序列 | 例字 | 序列 |
---|---|---|---|---|---|---|
U+2FF0 | ⿰ | 左右结构 | 相 | ⿰木目 | 𠁢 | ⿰丨㇍ |
U+2FF1 | ⿱ | 上下结构 | 杏 | ⿱木口 | 𪧷 | ⿱夕寸 |
U+2FF2 | ⿲ | 左中右结构 | 衍 | ⿲彳氵亍 | 𠂗 | ⿲丿夕乚 |
U+2FF3 | ⿳ | 上中下结构 | 京 | ⿳亠口小 | 𠋑 | ⿳亼目口 |
U+2FF4 | ⿴ | 全包围结构 | 回 | ⿴囗口 | 𠀬 | ⿴㐁人 |
U+2FF5 | ⿵ | 上包围结构 | 凰 | ⿵几皇 | 𧓉 | ⿵齊虫 |
U+2FF6 | ⿶ | 下包围结构 | 义 | ⿶乂丶 | 𱐍 | ⿶凵米 |
U+2FF7 | ⿷ | 左包围结构 | 匠 | ⿷匚斤 | 𧆬 | ⿷虎九 |
U+2FF8 | ⿸ | 左上包围结构 | 病 | ⿸疒丙 | 𤆯 | ⿸耂火 |
U+2FF9 | ⿹ | 右上包围结构 | 戒 | ⿹戈廾 | 𢧌 | ⿹或壬 |
U+2FFA | ⿺ | 左下包围结构 | 超 | ⿺走召 | 𥘶 | ⿺礼分 |
U+2FFB | ⿻ | 嵌套结构 | 巫 | ⿻工从 | 𣏃 | ⿻木⿻コ一 |
U+2FFC | | 右包围结构 | 丑 | ユ十 | 𫜹 | コ一 |
U+2FFD | | 右下包围结构 | 斗 | 十⺀ | 𠥼 | 十十 |
U+2FFE | | 左右镜像 | 臦 | ⿰臣臣 | 𨙨 | 邑 |
U+2FFF | | 上下翻转 | 𮗙 | ⿺見鬼 | 𰒥 | ⿱戈戈 |
U+31EF | | 减去笔画 | 口 | 曰一 | 𠀃 | 且二 |
IDS的运算方式是前缀表示法,即运算符号在前,对应数量的汉字元素在后。这种方式不须使用括号等字符辅助即可无歧义地表示运算顺序。
Unicode标准中,规范的IDS应由汉字、部首、笔画、全角问号字符(U+FF1F),以IDC连接而成。[4]
限制
- Unicode定义规范IDS序列为前缀表达式,但尚未规定每个汉字的唯一表述方式,即一个汉字可用多种IDS表达,例如“巫”可以表示为“⿻工从”或“⿷工人人”,
“鸂”可以表示为“⿰氵鷄”或“⿰溪鳥”。
- IDS主要目的在于表达汉字的抽象结构,而不是像组合字符一样用于动态组字。实务上组合汉字的字形绘制有许多复杂要素须考量,光用IDS尚不足以绘出符合一般要求的合成字。例如,组合汉字上下、左右比例往往并非1:1,而是按二部件的实际外形调整;左上-右下、三方包围等组合字的比例计算则更复杂;上下交叠的两部件也需要依赖对汉字的一般认识才能正确解读,例如“⿻工从”是将两个“人”放到“工”的左右两个开口里,而非简单地将“工”和“从”上下叠合。
历史
以下的Unicode相关文件记录了本区段中出现之字符的提议及定稿。
Unicode 版本 |
最终码位[a] | 码位数 | UTC ID | L2 ID | WG2 ID | 表意文字小组 ID | 文档 |
---|---|---|---|---|---|---|---|
3.0 | U+2FF0..2FFB | 12 | X3L2/95-111 | N1284 | Ideographic Structure Symbol (additional request), 1995-11-07 | ||
N1303 (html, doc) | Umamaheswaran, V. S.; Ksar, Mike, 8.13 Ideographic structure symbols, Minutes of Meeting 29, Tokyo, 1996-01-26 | ||||||
N1348 | Ideographic Components and Composition Scheme, 1996-02-05 | ||||||
N1357 | Revised Ideographic Structure Symbols, 1996-04-12 | ||||||
N1353 | Umamaheswaran, V. S.; Ksar, Mike, 9, Draft minutes of WG2 Copenhagen Meeting # 30, 1996-06-25 | ||||||
L2/97-026 | N1494 | IRG proposal: Ideographic structure character, 1996-06-27 | |||||
N1430 | N365 | Proposal Summary Form: Ideographic Structure Character, 1996-08-01 | |||||
N1453 | Ksar, Mike; Umamaheswaran, V. S., 9.6 Ideographic Structure Characters, WG 2 Minutes - Quebec Meeting 31, 1996-12-06 | ||||||
L2/97-023 | N1486 | N437 | IRG #8 Resolutions, 1997-01-16 | ||||
N1489 | Supplement to Ideographic Components and Composition Schemes, 1997-01-16 | ||||||
N1490 | N436 | Response to WG2 question on Ideographic Structure Characters, 1997-01-16 | |||||
L2/97-030 | N1503 (pdf, doc) | Umamaheswaran, V. S.; Ksar, Mike, 9.6, Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24, 1997-04-01 | |||||
L2/97-114 | N1544 (html, doc ) | N453 | Sato, T. K., Questions on the "Han structure method" described in WG2 N1490 (IRG N436), 1997-04-08 | ||||
L2/97-255R | Aliprand, Joan, 4.B.2 Ideographic Structure Characters, Approved Minutes - UTC #73 & L2 #170 joint meeting, Palo Alto, CA - August 4-5, 1997, 1997-12-03 | ||||||
N1680 | Project Sub-Division Proposal on Scheme of Ideograph Description Sequence, 1997-12-18 | ||||||
N1782 | Clause X Ideographic Description Sequence (IDS) – IRG N575, 1998-05-06 | ||||||
L2/98-158 | Aliprand, Joan; Winkler, Arnold, SC2 SC2 Action re Ideographic Description Sequences, Draft Minutes - UTC #76 & NCITS Subgroup L2 #173 joint meeting, Tredyffrin, Pennsylvania, April 20-22, 1998, 1998-05-26 | ||||||
N1842 | Proposed text for a Draft for amendment 28 - Ideographic Description Sequences, 1998-06-03 | ||||||
L2/98-286 | N1703 | Umamaheswaran, V. S.; Ksar, Mike, 9.5, Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20, 1998-07-02, The original proposal was to use character composition. It has changed from being composition to description over its three year development. | |||||
L2/98-317 | N1892 (pdf, doc ) | Combined CD registration and consideration ballot on WD for 10646-1/Amd. 28, AMENDMENT 28: Ideographic description characters, 1998-10-22 | |||||
L2/99-010 | N1903 (pdf, html , doc ) | Umamaheswaran, V. S., 10.3, Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25, 1998-12-30 | |||||
L2/99-072.1 | N1971 | Irish Comments on SC 2 N 3186, 1999-01-19 | |||||
L2/99-072 | N1970 (html, doc) | Summary of Voting on SC 2 N 3186, PDAM ballot on WD for 10646-1/Amd. 28: Ideographic description characters, 1999-02-05 | |||||
N2023 | Paterson, Bruce, FPDAM 28 Text - Ideographic Description Characters, 1999-04-06 | ||||||
L2/99-120 | Text for FPDAM ballot of ISO/IEC 10646, Amd. 28 - Ideographic description characters, 1999-04-07 | ||||||
UTC/1999-014 | Jenkins, John, Recursion depth limit for IDC's, 1999-06-01 | ||||||
UTC/1999-015 | Whistler, Ken, Re: Brief note on length of ideograph descriptions, 1999-06-01 | ||||||
UTC/1999-020 | Jenkins, John, Diagram and language [for Ideograph Description Sequences], 1999-06-04 | ||||||
L2/99-176R | Moore, Lisa, Recursion Limit for Ideographic Description Characters, Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999, 1999-11-04 | ||||||
L2/99-232 | N2003 | Umamaheswaran, V. S., 6.1.2 PDAM28 - Ideographic Description Characters, Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15, 1999-08-03 | |||||
L2/99-253 | N2067 | Summary of Voting on SC 2 N 3312, ISO 10646-1/FPDAM 28 - Ideographic description characters, 1999-08-19 | |||||
L2/99-301 | N2123 | Disposition of Comments Report on SC 2 N 3312, ISO/IEC 10646-1/FPDAM 28, AMENDMENT 28: Ideographic description characters, 1999-09-20 | |||||
L2/99-302 | N2124 | Paterson, Bruce, Revised Text for FDAM ballot of ISO/IEC 10646-1/FDAM 28, AMENDMENT 28: Ideographic description characters, 1999-09-24 | |||||
L2/00-010 | N2103 | Umamaheswaran, V. S., 6.4.3, Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13--16, 2000-01-05 | |||||
L2/00-045 | Summary of FDAM voting: ISO 10646 Amd. 28: Ideographic description characters, 2000-01-31 | ||||||
L2/02-221 | N2480 | Cook, Richard, Proposal to add Ideographic Description Characters (IDC) to the UCS, 2002-05-18 | |||||
L2/02-436 | N2534 | N955 | IRG Radical Classification, 2002-11-21 | ||||
L2/12-087 | Proposed Changes to ISO/IEC 10646 Annex I, Ideographic Description Characters, 2012-02-09 | ||||||
L2/12-007 | Moore, Lisa, Consensus 130-C13, UTC #130 / L2 #227 Minutes, 2012-02-14, Submit L2/12-087 on extensions to ideographic description sequences to WG2. | ||||||
L2/15-065 | Jenkins, John, Proposal to Add IDS Links to Online Unihan Database, 2015-02-02 | ||||||
L2/15-070 | Davis, Mark, IDS in Unihan, 2015-02-03 | ||||||
L2/15-313 | Lunde, Ken, Request for IDS Data, 2015-11-03 | ||||||
|
另见
参考文献
- ↑ U+2FF0-2FFF (PDF). The Unicode Standard. [2023-10-04].
- ↑ U+31EF (PDF). The Unicode Standard. [2023-10-04].
- ↑ Enumerated Versions of The Unicode Standard. The Unicode Standard. [2016-07-09].
- ↑ The Unicode StandardVersion 6.0 – Core Specification (PDF). [2020-02-10].