精华内容
下载资源
问答
  • Code Page

    千次阅读 2011-05-05 14:57:00
    Code pageFrom Wikipedia, the free encyclopedia Code page is another name for character encoding. It consists of a table of values that describes the character set for a particular ...

    Code page

    From Wikipedia, the free encyclopedia

    Code page is another name for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated fromIBM's EBCDIC-based mainframe systems,[1] but many vendors use this term including MicrosoftSAP,[2] and Oracle Corporation.[3] Vendors often allocate their own code page number to a character encoding, even if it is better known by another name (for example UTF-8 character encoding has code page numbers 1208 at IBM, 65001 at Microsoft, 4110 at SAP).

    Contents

     [hide]

    [edit]The code page numbering system

    IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest (first) numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's extended ASCII encoding as used in its PC hardware.

    With the release of PC-DOS version 3.3 (and the near identical MS-DOS 3.3) IBM introduced the code page numbering system to regular PC users, as the code page numbers (and the phrase "code page") were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.[4]

    After IBM and Microsoft ceased to cooperate in the 1990-s the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one 3rd party vendor (Oracle) also has its own different list of numeric assignments.[5] IBM's current assignments are listed in their CCSIDrepository. Microsoft's assignments seem not to be documented anywhere, but a list of the names and approximate IANA abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine (this information is used by Microsoft programs such as Internet Explorer).

    Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their code-points into 8 bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved.

    The text mode of standard (VGA-compatible) PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to 8 may be stored in the display adaptor for easy switching [1]. There were a selection of 3rd party code page fonts that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a "code page" number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages.

    [edit]Relationship to ASCII

    The vast majority of code pages in current use are supersets of ASCII, a 7-bit code representing 128 characters and control codes. In the distant past, 8-bit implementations of the ASCII code often either set the top bit to zero, or used it as a parity bit in network data transmissions. When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. Most vendors (including IBM) used this extended range to encode characters used by various languages and/or graphical elements that allowed the imitation of primitive graphics on text only output devices. No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings. Other vendors such as DEC and Apple did the same although they did call them "code pages".

    [edit]Relationship to Unicode

    Unicode is an effort to include all characters from previous code pages into a single character enumeration that can be used with a number of encoding schemes. In the process, duplicate characters are eliminated and new variants are introduced, like Fullwidth ASCII. While consistent use of any single Unicode encoding would theoretically eliminate the need to keep track of different code pages or character encodings, the existence of multiple encodings of Unicode as well as the need to remain compatible with existing documents and systems that use the older encodings remains. In practice the various Unicode character set encodings have simply been assigned their own code page numbers, and all the other code pages have been technically redefined as encodings for various subsets of Unicode.

    [edit]Various Noteworthy code pages

    [edit]IBM PC (OEM) code pages

    These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changes by physically replacing a ROM chip that contained the font. The interface of those adapters (emulated by all later adapters such as VGA) was typically limited to single byte character sets with only 256 characters in each font/encoding (although VGA added partial support for slightly larger character sets). Since the original IBM PC code page (number 437) was not really designed for international use, several partially compatible country or region specific variants emerged. Microsoft refers to these as the OEM code pages because they were defined by the OEM's who licensed MS-DOS for distribution with their hardware, not by Microsoft or a standard body. Examples include:

    When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but use of newer code pages, in particular Unicode, is encouraged for new designs.

    [edit]Code pages for DBCS character sets

    These code pages represent DBCS character encodings for various CJK languages. In Microsoft operating systems, these are used as both the "OEM" and "ANSI" code page for the applicable locale.

    [edit]Microsoft code page numbers for various other character encodings

    The following code page numbers are specific to Microsoft Windows. IBM may use different numbers for these code pages.

    [edit]Miscellaneous

    [edit]Windows (ANSI) code pages

    Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an apocryphal ANSI draft of what became ISO 8859-1). Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.

    Microsoft recommends applications use UTF-8 or UCS-2/UTF-16 instead of these code pages.[7]

    [edit]Criticism

    Many older character encodings, except Unicode, suffer from several problems.

    1. Some code page vendors insufficiently document the meaning of all code point values. This decreases the reliability of handling textual data through various computer systems consistently.
    2. Some vendors add proprietary extensions to some code pages to add or change certain code point values. For example, byte /x5C in Shift JIS can represent either a back slash or a yen currency symbol depending on the platform.
    3. In order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored.

    Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, these problems are rarely a concern for Unicode.

    Applications may also mislabel text in Windows-1252 as ISO-8859-1. Fortunately, the only difference between these code pages is that the code point values used by ISO-8859-1 for control characters are instead used as additional printable characters in Windows-1252. Since control characters have no function in HTML, web browsers tend to use Windows-1252 rather than ISO-8859-1.

    [edit]Private code pages

    When, early in the history of personal computers, users didn't find their character encoding requirements met, private or local code pages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented (e.g., cp895).

    When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for theCzech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist.

     

    转自:http://en.wikipedia.org/wiki/Code_page

    展开全文
  • codepage

    千次阅读 2012-04-13 20:18:33
    字符内码(charcter code)指的是用来代表字符的内码.读者在输入和存储文档时都要使用内码,内码分为  单字节内码 -- Single-Byte character sets (SBCS),可以支持256个字符编码.  双字节内码 -- Double-Byte ...
    字符内码(charcter code)指的是用来代表字符的内码.读者在输入和存储文档时都要使用内码,内码分为

        单字节内码 -- Single-Byte character sets (SBCS),可以支持256个字符编码.
        双字节内码 -- Double-Byte character sets)(DBCS),可以支持65000个字符编码.主要用来对大字符集的东方文字进行编码.

    codepage 指的是一个经过挑选的以特定顺序排列的字符内码列表,对于早期的单字节内码的语种,codepage中的内码顺序使得系统可以按照此列表来根据键盘的输入值给出一个对应的内码.对于双字节内码,则给出的是MultiByte到Unicode的对应表,这样就可以把以Unicode形式存放的字符转化为相应的字符内码,或者反之,在Linux核心中对应的函数就是utf8_mbtowc和utf8_wctomb。

    多字节显示汉字的时候,会看前面一个字节的值,如果这个值大于0x7F那么就会把后面的若干个字节看做是一个字符,比如GB是把本字节和后面的一个字节拿来看做一个字符来显示。UTF-8会把后面的不定个字节拿来当一个字符显示。

    宽字符显示汉字时,直接把两个字节的信息看做一个字符拿来显示。

    内码是汉字在计算机内部存储,处理和传输用的信息编码。它必须与ASCII码兼容但又不能冲突。

    所以把国标码两个字节的最高位置'1',以区别于西文,这就是内码。汉字的输入码称为"外码"。输入码即指我们输入汉字时使用的编码。常见的外码分为数字编码(如区位码),拼音编码和字形编码(如五笔)。
       再说区位码,"啊"的区位码是1601,写成16进制是0x10,0x01。这和计算机广泛使用的ASCII编码冲突。为了兼容00-7f的 ASCII编码,我们在区位码的高、低字节上分别加上A0。这样"啊"的编码就成为B0A1。我们将加过两个A0的编码也称为GB2312编码,虽然 GB2312的原文根本没提到这一点。
      内码是指操作系统内部的字符编码。早期操作系统的内码是与语言相关的.现在的Windows在内部统一使用Unicode,然后用代码页适应各种语言,"内码"的概念就比较模糊了。我们一般将缺省代码页指定的编码说成是内码。内码这个词汇,并没有什么官方的定义。代码页也只是微软的一种习惯叫法。作为程序员,我们只要知道它们是什么东西,没有必要过多地考证这些名词。
      所谓代码页(code page)就是针对一种语言文字的字符编码。例如GBK的code page是CP936,BIG5的code page是CP950,GB2312的code page是CP20936。
      Windows中有缺省代码页的概念,即缺省用什么编码来解释字符。例如Windows的记事本打开了一个文本文件,里面的内容是字节流:BA、BA、 D7、D6。Windows应该去怎么解释它呢?是按照Unicode编码解释、还是按照GBK解释、还是按照BIG5解释,还是按照ISO8859-1 去解释?如果按GBK去解释,就会得到"汉字"两个字。按照其它编码解释,可能找不到对应的字符,也可能找到错误的字符。所谓"错误"是指与文本作者的本意不符,这时就产生了乱码。
      答案是Windows按照当前的缺省代码页去解释文本文件里的字节流。缺省代码页可以通过控制面板的区域选项设置。记事本的另存为中有一项ANSI,其实就是按照缺省代码页的编码方法保存。
      Windows的内码是Unicode,它在技术上可以同时支持多个代码页。只要文件能说明自己使用什么编码,用户又安装了对应的代码页,Windows就能正确显示,例如在HTML文件中就可以指定charset。
      有的HTML文件作者,特别是英文作者,认为世界上所有人都使用英文,在文件中不指定charset。如果他使用了0x80-0xff之间的字符,中文Windows又按照缺省的GBK去解释,就会出现乱码。这时只要在这个html文件中加上指定charset的语句,例如:
      <meta http-equiv="Content-Type" content="text/html; charset=ISO8859-1">
    如果原作者使用的代码页和ISO8859-1兼容,就不会出现乱码了。

    展开全文
  • CodePage

    千次阅读 2013-11-22 12:10:48
    Albanian 1250 阿尔巴尼亚语  Arabic 1256 阿拉伯语(阿尔及利亚)、阿拉伯语(巴林)、阿拉伯语(埃及)、阿拉伯语(伊朗)、阿拉伯语(约旦)、阿拉伯语(科威特)、阿拉伯语(黎巴嫩)、阿拉伯语(利比亚)...
    Albanian 1250 阿尔巴尼亚语 
    Arabic 1256 阿拉伯语(阿尔及利亚)、阿拉伯语(巴林)、阿拉伯语(埃及)、阿拉伯语(伊朗)、阿拉伯语(约旦)、阿拉伯语(科威特)、阿拉伯语(黎巴嫩)、阿拉伯语(利比亚)、阿拉伯语(摩洛哥)、阿拉伯语(阿曼)、阿拉伯语(卡塔尔)、阿拉伯语(沙特阿拉伯)、阿拉伯语(叙利亚)、阿拉伯语(突尼斯)、阿拉伯语(阿拉伯联合酋长国)、阿拉伯语(也门)、波斯语、乌尔都语 
    Chinese_PRC 936 中文(香港特别行政区),中文(中华人民共和国),中文(新加坡) 
    Chinese_PRC_Stroke 936 按汉字笔画排序(中华人民共和国) 
    Chinese_Taiwan_Bopomofo 950 按汉语拼音排序(台湾) 
    Chinese_Taiwan_Stroke 950 繁体中文(台湾) 
    Croatian 1250 克罗地亚语 
    Cyrillic_General 1251 保加利亚语、白俄罗斯语、俄罗斯语、塞尔维亚语 
    Czech 1250 捷克语 
    Danish_Norwegian 1252 丹麦语、挪威语 (Bokm&aring;l)、挪威语(Nyorsk) 
    Estonian 1257 爱沙尼亚语 
    Finnish_Swedish 1252 芬兰语、瑞典语 
    French 1252 法语(比利时)、法语(加拿大)、法语(卢森堡)、法语(标准)、法语(瑞士) 
    Georgian_Modern_Sort 1252 按现代格鲁吉亚语排序 
    German_PhoneBook 1252 按德语电话号码簿排序 
    Greek 1253 希腊语 
    Hebrew 1255 希伯来语 
    Hindi  只用于 Unicode 数据类型 北印度语 
    Hungarian 1250 匈牙利语 
    Hungarian_Technical 1250  
    Icelandic 1252 冰岛语 
    Japanese 932 日语 
    Japanese_Unicode 932  
    Korean_Wansung 949 朝鲜语 
    Korean_Wansung_Unicode 949  
    Latin1_General 1252 南非荷兰语、巴斯克语、加泰罗尼亚语、荷兰语(比利时)、荷兰语(标准)、英语(澳大利亚)、英语(大不列颠)、英语(加拿大)、英语(加勒比)、英语(爱尔兰)、英语(牙买加)、英语(新西兰)、英语(南非)、英语(美国)、法罗语、德语(奥地利)、德语(列支敦士登)、德语(卢森堡)、德语(标准)、德语(瑞士)、印度尼西亚语、意大利语、意大利语(瑞士)、葡萄牙语(巴西)、葡萄牙语(葡萄牙) 
    Latvian 1257 拉脱维亚语 
    Lithuanian 1257 立陶宛语 
    Lithuanian_Classic 1257  
    FYRO Macedonian  1251 马其顿语 (FYROM) 
    Mexican_Trad_Spanish 1252 西班牙语(墨西哥)、西班牙语(传统排序) 
    Modern_Spanish 1252 西班牙语(阿根廷)、西班牙语(玻利维亚)、西班牙语(智利)、西班牙语(哥伦比亚)、西班牙语(哥斯达黎加)、西班牙语(多米尼加共和国)、西班牙语(厄瓜多尔)、西班牙语(危地马拉)、西班牙语(现代排序)、西班牙语(巴拿马)、西班牙语(巴拉圭)、西班牙语(秘鲁)、西班牙语(乌拉圭)、西班牙语(委内瑞拉) 
    Polish 1250 波兰语 
    Romanian 1250 罗马尼亚语 
    Slovak 1250 斯洛伐克语 
    Slovenian 1250 斯洛文尼亚语 
    Thai 874 泰国语 
    Turkish 1254 土耳其语 
    Ukrainian 1251 乌克兰语 
    Vietnamese 1258 越南语
    展开全文
  • cmd code page

    2019-07-21 21:21:18
    Console的设置在win7里能修改active的code page。但是server用的是win2003,没有这样的设置。Server本身能够在window里正常显示中文,936的code page也能看到。 [外链图片转存失败(img-xumkXVLo-1563715256790)...

    Console的设置在win7里能修改active的code page。但是server用的是win2003,没有这样的设置。Server本身能够在window里正常显示中文,936的code page也能看到。
    [外链图片转存失败(img-xumkXVLo-1563715256790)(https://user-images.githubusercontent.com/5669954/32708718-9b4bc8a6-c866-11e7-95c1-ab1eb9fecf1c.png)]

    [外链图片转存失败(img-r7YNKRua-1563715256791)(https://user-images.githubusercontent.com/5669954/32708721-9b803a3c-c866-11e7-97af-2a0834e38310.png)]
    Figure1: setting in my laptop ( win 7 )
    Figure2: control panel in win 2003
    Server上cmd的默认code page是437,我切换成936时报错。
    clipboard3

    clipboard4

    展开全文
  • codepage属性:是指出网页的代码页 如果制作的网页脚本与WEB服务端的默认代码页不同,则必须指明代码页: 代码如下: codepage=936 简体中文GBK codepage=950 繁体中文BIG5 codepage=437 美国/加拿大英语 codepage=...
  • db2 codepage

    2017-01-21 12:07:00
    首先分两个层面,DB2 CODEPAGE和OS CODEPAGE(DB2SET DB2CODEPAGE相当于设定了当前实例的OS的CODEPAGE)Linux系统查看CODEPAGE的方法:在终端输入localewindows查看CODEPAGE方法:在cmd.exe输入chcp,或者右键cmd....
  • MFC Code Page

    2019-04-02 21:09:03
    WINBASEAPI BOOL WINAPI GetCPInfoExA( _In_ UINT CodePage, _In_ DWORD dwFlags, _Out_ LPCPINFOEXA lpCPInfoEx); WINBASEAPI BOOL WINAPI GetCPInfoExW( _In_ UINT C...
  • CodePage and SAP Code Page

    千次阅读 2011-03-04 10:16:00
    codepage 指的是一个经过挑选的以特定顺序排列的字符内码列表,对于早期的单字节内码的语种,codepage中的内码顺序使得系统  可以按照此列表来根据键盘的输入值给出一个对应的内码.对于双字节内码,则给出的是...
  • 什么是 CodePage

    2019-12-06 15:40:43
    那么就空闲出最高的第 8 位,剩下的128 个数字各个国家都想利用起来,由于大家语言不通,对高 128 表示的字符定义不同,结果就出现了不同的 Code Point 集合,这些集合叫做 Code Page。 参考 The Abso...
  • codepage 和 charset

    2017-06-15 15:09:00
    codepage 和 charset codepage:简单地说,这是程序用于对字符进行编码的一个表。代码页是服务器的事情。 常见的三种codepage 简体中文 : 936 繁体中文 : 950 UTF-8 : 65001 如果你不想用默认...
  • Code Page Identifiers

    2012-05-18 22:32:59
    http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx Code Page Identifiers The following table defines the available code page identifiers. Note ANSI code pages can be diffe
  • <div><p>I recently installed ...Active code page: 65001' message when I try to pull any music. Is this related to the API issues? </p><p>该提问来源于开源项目:Miserlou/SoundScrape</p></div>
  • db2codepage设置

    2018-04-11 10:05:18
    1、db2 变量查看  db2set -all  (connect to dbanme ) get db cfg  db2pd -osinfo 这个命令很强大哦  2、db2c变量的设置用命令 ... db2codepage=1386(简体中文)  db2country
  • Codepage vs Charset

    万次阅读 2016-08-18 15:28:06
    之前在面试话题中介绍了如何言简意赅的回答各种char的使用区别,本文将谈到另外一个高频问题——codepage跟charset的区别和联系。虽然这两个概念几乎天天都会在工作中出现,但就个人过往经验看,面对该问题时能够给...
  • 原文网址:...Code Page Identifiers 05/31/2018 The following table defines the available code page identifiers. Note ANSI code...
  • Use class for Translate codepage to codepage.Data : g_codepage LIKE tcp0c-charco VALUE '1100'.CONSTANTS: c_unico...
  • true</code> in <code>ToolTask</code> will reset the console code page of the child process to the OEM code page. With this fixed, the batch file must also be written in the console code page since ...
  • Application code page not determined, using ANSI codepage 1208,如题,我要把服务器的DB2整个库导入到另一个数据库,在执行数据导入的时候报如上错误,请问是什么原因? db2move db_name import -io replace -u ...
  • 什么是codepage

    千次阅读 2012-07-29 10:53:18
    Codepage就是各国的文字编码和Unicode之间的映射表。   Code page: An ordered set of characters in which a numeric index (code point value) is associated with each character. This term is generally ...
  • Encoding & Code page

    2011-09-16 18:08:20
    Western Eurepean(Windows) -code Page 1252 Chinese Simplified(GB2312) -Code Page 936 Unicode(UTF-8 without Signature) -Code Page 60051 Uni
  • Windows code page

    千次阅读 2009-08-10 16:38:00
    source:http://en.wikipedia.org/wiki/Windows_code_pagesWindows code pageFrom Wikipedia, the free encyclopedia (Redirected from Windows code pages)Jump to: navigation, searchWindo
  • DB2 设置CodePage

    2013-02-01 08:58:00
    在 Linux 系统下安装DB2数据库,在进行数据迁移的过程中,会一些编码的错误,所以需要修改一下CODEPAGE与源数据库的CODEPAGE相同。修改的命令如下: db2set codepage=value db2set DB2COUNTRY=86 db2set ...
  • db2 代码页codepage、命令db2set

    千次阅读 2020-02-17 23:30:02
    db2之codepage、字符集 计算机处理文本时,把一门语言中每个字符都赋以特定的值,这种字符与数值的对照表就叫 codepage或字符集( IBM公司首先使用codepage,代码页等同于字符集 ) ,可理解成字符和字节数据的映射...
  • db2 codepage 转换

    2012-06-04 19:57:20
    今天运行datastage 报这个错误,就是连不上db2 数据库 SQL0332N Character conversion from the source code page "1386" to the target code page "...db2set db2codepage=...
  • 代码页 CodePage

    2013-06-28 09:59:53
    #include "stdafx.h" ...void CodePage(int nCodePage) { SetConsoleOutputCP(nCodePage) ; for (int i=0; i; ++i) { printf("%c ",i) ; } } int main(int argc, char* argv[]) { CodePage(874)
  • DB2 CODEPAGE

    千次阅读 2008-07-08 17:20:00
    Table 41. Encoding declarations supported by XML Extender Category Encoding Code page Unico
  • Treat codepage 21010 as codepage 1200"</li><li>https://github.com/SheetJS/js-xls/issues/44 "xls: error parsing bad.xls: Error: Unrecognized CP: 21010"</li><li>...
  • Informatica 设置Code Page

    千次阅读 2012-02-18 22:31:22
    创建集成服务时,如果集成服务的编码和存储库不...Code page mismatch. Service process is running in code page [ISO 8859-1 Western European] whereas the service is configured in the Admin Console to run in
  • 关于发布报错 NotSupportedException: CodePage 435 not supported, 目前了解到的一般是dll缺失,打包发布时候一些库没能包含到。 解决方法是在unity安装目录找到相关库放到工程文件夹里,再从新发布。 E:\Unity\...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 211,199
精华内容 84,479
关键字:

codepage