精华内容
下载资源
问答
  • 没有前缀的字符串),必须将本机编码(iso8859-1/ latin1,除非使用enigmaticsys.setdefaultencoding函数进行修改)解码unicode,然后编码可以显示所需字符的字符集,在这种情况下,会推荐UTF-8。首先,这是一个...

    1dcf9987ec5f21d6e7c14e79c70ab90f.png

    犯罪嫌疑人X

    这是一个普遍的问题,因此这里是一个相对详尽的说明。对于非unicode字符串(u例如u'\xc4pple',没有前缀的字符串),必须将本机编码(iso8859-1/ latin1,除非使用enigmaticsys.setdefaultencoding函数进行修改)解码为unicode,然后编码为可以显示所需字符的字符集,在这种情况下,会推荐UTF-8。首先,这是一个方便的实用程序函数,它将帮助阐明Python 2.7字符串和unicode的模式:>>> def tell_me_about(s): return (type(s), s)一个普通的字符串>>> v = "\xC4pple" # iso-8859-1 aka latin1 encoded string>>> tell_me_about(v)(, '\xc4pple')>>> v'\xc4pple'        # representation in memory>>> print v?pple             # map the iso-8859-1 in-memory to iso-8859-1 chars                  # note that '\xc4' has no representation in iso-8859-1,                   # so is printed as "?".解码iso8859-1字符串-将纯字符串转换为unicode>>> uv = v.decode("iso-8859-1")>>> uvu'\xc4pple'       # decoding iso-8859-1 becomes unicode, in memory>>> tell_me_about(uv)(, u'\xc4pple')>>> print v.decode("iso-8859-1")Äpple             # convert unicode to the default character set                  # (utf-8, based on sys.stdout.encoding)>>> v.decode('iso-8859-1') == u'\xc4pple'True              # one could have just used a unicode representation                   # from the start多一点插图-带“Ä”>>> u"Ä" == u"\xc4"True              # the native unicode char and escaped versions are the same>>> "Ä" == u"\xc4"  False             # the native unicode char is '\xc3\x84' in latin1>>> "Ä".decode('utf8') == u"\xc4"True              # one can decode the string to get unicode>>> "Ä" == "\xc4"False             # the native character and the escaped string are                  # of course not equal ('\xc3\x84' != '\xc4').编码为UTF>>> u8 = v.decode("iso-8859-1").encode("utf-8")>>> u8'\xc3\x84pple'    # convert iso-8859-1 to unicode to utf-8>>> tell_me_about(u8)(, '\xc3\x84pple')>>> u16 = v.decode('iso-8859-1').encode('utf-16')>>> tell_me_about(u16)(, '\xff\xfe\xc4\x00p\x00p\x00l\x00e\x00')>>> tell_me_about(u8.decode('utf8'))(, u'\xc4pple')>>> tell_me_about(u16.decode('utf16'))(, u'\xc4pple')unicode与UTF和latin1之间的关系>>> print u8Äpple             # printing utf-8 - because of the encoding we now know                  # how to print the characters>>> print u8.decode('utf-8') # printing unicodeÄpple>>> print u16     # printing 'bytes' of u16���pple>>> print u16.decode('utf16')Äpple             # printing unicode>>> v == u8False             # v is a iso8859-1 string; u8 is a utf-8 string>>> v.decode('iso8859-1') == u8False             # v.decode(...) returns unicode>>> u8.decode('utf-8') == v.decode('latin1') == u16.decode('utf-16')True              # all decode to the same unicode memory representation                  # (latin1 is iso-8859-1)Unicode例外 >>> u8.encode('iso8859-1')Traceback (most recent call last):  File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:  ordinal not in range(128)>>> u16.encode('iso8859-1')Traceback (most recent call last):  File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:  ordinal not in range(128)>>> v.encode('iso8859-1')Traceback (most recent call last):  File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:  ordinal not in range(128)可以通过将特定编码(latin-1,utf8,utf16)转换为unicode来解决这些问题u8.decode('utf8').encode('latin1')。因此,也许可以得出以下原理和概括:类型str是一组字节,可以具有多种编码中的一种,例如Latin-1,UTF-8和UTF-16类型unicode是一组字节,可以转换为任意数量的编码,最常见的是UTF-8和latin-1(iso8859-1)该print命令具有自己的编码逻辑,设置为sys.stdout.encoding并且默认为UTF-8str在转换为另一种编码之前,必须先将a解码为unicode。当然,所有这些变化在Python 3.x中都有。希望那是照亮的。

    展开全文
  • python-latin1-to-utf8 将错误的 Latin-1 字符转换为 UTF-8 字符。 是对编码问题的全面描述和解释。 用法 $ python latin1-to-utf8.py Automática > Automática 相关项目
  • l1u8recode 是一个简单的实用程序,用于将文件从 ISO-8859-1 (latin1) 重新编码支持混合编码和/或格式的 UTF8。 它能够处理这两种特殊情况: 重新编码可以应用于输入的分隔部分。 适用于包含文本和二进制数据的...
  • I inherited a web system that I need to develop further.The system seems to be created by someone who read two chapters of a PHP tutorial and thought he could code...So... the webpage itself is in UTF...

    I inherited a web system that I need to develop further.

    The system seems to be created by someone who read two chapters of a PHP tutorial and thought he could code...

    So... the webpage itself is in UTF8 and displays and inputs everything in it. The database tables have been created with UTF8 character set. But, in the config, there is "SET NAMES LATIN1". In other words, UTF8 encoded strings are populated into the database with forced latin1 coding.

    Is there a way to convert this mess to actually store in utf8 and get rid of latin1?

    I tried this, but since the database table is set to utf8, this does not work. Also tried this one without success.

    I maybe able to do this by reading all tables in PHP with latin1 encoding then write them back to a new database in utf8, but I want to avoid it if possible.

    解决方案

    I managed to solve it by running updates on text fields like this:

    UPDATE table SET title = CONVERT(CONVERT(CONVERT(title USING latin1) USING binary) USING UTF8)

    展开全文
  • My client's web app has large database which millions of ... All table's encoding is latin1.When I fetch some text field which holds huge data and mail that string some strange haracter issue com...

    My client's web app has large database which millions of records. All table's encoding is latin1.

    When I fetch some text field which holds huge data and mail that string some strange haracter issue comes. Such when I recieve email spaces are converted into this character Â.

    It is not premissible to change the DB encoding.

    I tried the following PHP function but no outcome ;(

    $msg = mb_convert_encoding($msg, "UTF-8", "latin1");

    Please help

    解决方案

    I would check for the encoding php thinks it is

    echo mb_detect_encoding($str);

    And then do

    iconv("detectedEncoding", "UTF-8", $str);

    Or if iconv is not installed, check if your encoding was right in your solution. ;)

    展开全文
  • 对于非unicode字符串(即那些没有u前缀的字符串,如u'\xc4pple'),必须从本机编码(iso8859-1/latin1,除非modified with the enigmatic ^{}函数)解码到^{},然后编码到可以显示所需字符的字符集,在这种情况下,我...

    这是一个常见的问题,所以这里有一个相对彻底的说明。

    对于非unicode字符串(即那些没有u前缀的字符串,如u'\xc4pple'),必须从本机编码(iso8859-1/latin1,除非modified with the enigmatic ^{}函数)解码到^{},然后编码到可以显示所需字符的字符集,在这种情况下,我建议使用^{}。

    首先,这里有一个方便的实用程序函数,可以帮助说明Python2.7字符串和unicode的模式:>>> def tell_me_about(s): return (type(s), s)

    普通字符串>>> v = "\xC4pple" # iso-8859-1 aka latin1 encoded string

    >>> tell_me_about(v)

    (, '\xc4pple')

    >>> v

    '\xc4pple' # representation in memory

    >>> print v

    ?pple # map the iso-8859-1 in-memory to iso-8859-1 chars

    # note that '\xc4' has no representation in iso-8859-1,

    # so is printed as "?".

    解码iso8859-1字符串-将普通字符串转换为unicode>>> uv = v.decode("iso-8859-1")

    >>> uv

    u'\xc4pple' # decoding iso-8859-1 becomes unicode, in memory

    >>> tell_me_about(uv)

    (, u'\xc4pple')

    >>> print v.decode("iso-8859-1")

    Äpple # convert unicode to the default character set

    # (utf-8, based on sys.stdout.encoding)

    >>> v.decode('iso-8859-1') == u'\xc4pple'

    True # one could have just used a unicode representation

    # from the start

    再举个例子-用“Ä”>>> u"Ä" == u"\xc4"

    True # the native unicode char and escaped versions are the same

    >>> "Ä" == u"\xc4"

    False # the native unicode char is '\xc3\x84' in latin1

    >>> "Ä".decode('utf8') == u"\xc4"

    True # one can decode the string to get unicode

    >>> "Ä" == "\xc4"

    False # the native character and the escaped string are

    # of course not equal ('\xc3\x84' != '\xc4').

    编码到UTF>>> u8 = v.decode("iso-8859-1").encode("utf-8")

    >>> u8

    '\xc3\x84pple' # convert iso-8859-1 to unicode to utf-8

    >>> tell_me_about(u8)

    (, '\xc3\x84pple')

    >>> u16 = v.decode('iso-8859-1').encode('utf-16')

    >>> tell_me_about(u16)

    (, '\xff\xfe\xc4\x00p\x00p\x00l\x00e\x00')

    >>> tell_me_about(u8.decode('utf8'))

    (, u'\xc4pple')

    >>> tell_me_about(u16.decode('utf16'))

    (, u'\xc4pple')

    unicode与UTF和latin1之间的关系>>> print u8

    Äpple # printing utf-8 - because of the encoding we now know

    # how to print the characters

    >>> print u8.decode('utf-8') # printing unicode

    Äpple

    >>> print u16 # printing 'bytes' of u16

    ���pple

    >>> print u16.decode('utf16')

    Äpple # printing unicode

    >>> v == u8

    False # v is a iso8859-1 string; u8 is a utf-8 string

    >>> v.decode('iso8859-1') == u8

    False # v.decode(...) returns unicode

    >>> u8.decode('utf-8') == v.decode('latin1') == u16.decode('utf-16')

    True # all decode to the same unicode memory representation

    # (latin1 is iso-8859-1)

    Unicode异常>>> u8.encode('iso8859-1')

    Traceback (most recent call last):

    File "", line 1, in

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:

    ordinal not in range(128)

    >>> u16.encode('iso8859-1')

    Traceback (most recent call last):

    File "", line 1, in

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:

    ordinal not in range(128)

    >>> v.encode('iso8859-1')

    Traceback (most recent call last):

    File "", line 1, in

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:

    ordinal not in range(128)

    人们可以通过将特定的编码(拉丁语-1、utf8、utf16)转换为unicode(例如u8.decode('utf8').encode('latin1'))来解决这些问题。

    因此,也许我们可以得出以下原则和概括:类型str是一组字节,它可以有许多编码中的一个,如拉丁语-1、UTF-8和UTF-16

    类型unicode是一组字节,可以转换为任意数量的编码,最常见的是UTF-8和拉丁语-1(iso8859-1)

    print命令有its own logic for encoding,设置为sys.stdout.encoding,默认为UTF-8

    在转换为另一种编码之前,必须将str解码为unicode。

    当然,Python 3.x中的所有这些更改

    希望这是有启发性的。

    进一步阅读

    还有Armin Ronacher的非常生动的咆哮:

    展开全文
  • MYSQL数据库由latin1转换为utf8

    千次阅读 2013-03-28 21:24:41
    需要将MySQL数据库由latin1编码转换为utf8编码,网上搜了些方法都不管用。 自己多次摸索找出了一个可行方法: 0.下载工具Navicat for MySQL 1.将数据库的表结构导出 mysqldump -uuser -p -d db > db.sql 2.编辑...
  • 如果在建立数据库的时候,DEFAULT CHARSET = utf8 那不会出什么问题,但如果设置是 DEFAULT CHARSET = latin1 就会出现一些问题,不巧的是,很多 MySQL 数据库的默认编码设置就是 latin1。 如果数据库的默认
  • 我所有的表都是InnoDB,排序规则utf8_unicode_ci”,我所有的VARCHAR列也均utf8_unicode_ci”。 我的PHP脚本上有Jáuò Iñe,我的所有PHP文件都编码为UTF-8。因此,直到现在,每次我用变音符号...
  • 若有发生这种情况,是因为虽然mysql本身及网页都是utf-8,但是xoops却用latin1将数据传给mysql,因此最后是utf-8数据被以latin1的格式储存进资料库。其实在xoops来看没有任何问题,只是用phpmyadmin看时,就都是乱码...
  • 问题:不允许重新启动的生产环境出现字符集设置不对,如何从默认的latin1修改为utf8生产环境正确做法和步骤:1.在配置文件my.cnf中 [client] 增加 default-character-set =utf8 ,会立即对本机上的新创建连接生效2.在...
  • 将defaultcharset=latin1改成utf8的 用菜单里的“转换”,将文件从ASC转成UTF8 再将文件重新上传到服务器,并另存文件product_kind.1。 我可是用二进制方式上传的,不是ASC方式 3.导入到新库中 mysql大部分的表,...
  • 《MYSQL教程mysql编码转换 latin1编码向utf-8转换》要点:本文介绍了MYSQL教程mysql编码转换 latin1编码向utf-8转换,希望对您有用。如果有疑问,可以联系我们。导读:遇到这样的问题,明明全部系统都以utf-8设置,...
  • 背景:某个操作系统的Mysql数据库数据库Databnsednname采用默认的latin1字符集,操作系统升级需求将所有数据转换utf-8各式,目的数据库Databnsenewdbname(建库时应用utf8)方法一:步骤一 命令行执行:Mysql...
  • 关于latin1_swedisch_ci到utf8的转换,有很多主题.但是反过来呢?我已经解决了很长时间了,到目前为止还没有找到解决方案.由于我不知道还有什么正在访问该数据库,所以我不想更改表的字符编码.我在表中有一列以latin1_...
  • 老版网站系统的mysql数据库dnname采用默认的latin1字符集,系统升级需要将所有数据转换utf-8格式,目标数据库newdbname(建库时使用utf8)方法一:步骤一 命令行执行:mysqldump –opt -hlocalhost -uroot -p*** ...
  • I imported some data using LOAD DATA INFILE into a MySQL Database... The table itself and the columns are using the UTF8 character set, but the default character set of the database is latin 1. Becaus...
  • 三、latin1转换utf8 以原来的字符集为latin1为例,升级成为utf8的字符集。原来的表: databasename (default charset=latin1),新表:new_databasename(default charset=utf8)。 mysql>showcreatedatabase ...
  • latin1utf8的方法

    万次阅读 2013-06-24 15:46:30
    mysql 默认的代码是latin1我喜欢 utf8多国语言都靠他了,刚开始学没注意 EMS MYSQL 乱码我才知道 latin1utf8 了 看到的全是 鸭子虫,乱码让我蛋疼 我用 ezSQL so $db->query("set names utf8"); 你们懂的 ...
  • Windows中默认的文件格式是GBK(gb2312),而Linux一般都是UTF-8。下面介绍一下,在Linux中如何查看文件的编码及如何进行对文件进行编码转换。查看文件编码在Linux中查看文件编码可以通过以下几种方式:1.在中可以直接...
  • 可以采用下面的方法latin1字符集转换为gbk字符集或utf8字符集。具体的转换步骤如下:一、latin1转gbk1、导出数据库mysqldump --default-character-set=latin1 -h 数据库连接ip -u root -P 3306 -p数据库密码 db_name...
  • 背景:目前正在进行业务重构,需要对使用MySQL的业务库表进行重新设计,在迁移时,遇到了中文字符乱码问题(源库表的默认编码是LATIN1,新库表的默认编码为UTF8),故重新学习了下MySQL编码和解码相关知识,并整理了...
  • 绝大多数情况下,一个项目中,都是使用同一套编码。如,全部使用UTF-8或者GBK。但是当涉及到多个项目合并、新手加入等情况时,不可...前置说明:==============JavaMySQLUTF-8utf8ISO-8859-1 latin1==============M...
  • 三、字符集转换latin1转换utf8 以原来的字符集为latin1为例,升级成为utf8的字符集。 原来的表: databasename (default charset=latin1),新表:new_databasename(default charset=utf8)。 代码如下: mysql> ...
  • 若有发生这种情况,是因为虽然MySQL本身及网页都是utf-8,但是xoops却用latin1将数据传给mysql,因此最后是utf-8数据被以latin1的格式储存进资料库。其实在xoops来看没有任何问题,只是用phpMyAdmin看时,就都是乱码...
  • 数据库是MYSQL5.1的,进了网站的phpmyadmin管理后发现表内中文全部显示乱码,导出后本地导入一样是乱码,无法查看所需要的信息,乱码一般情况不用说都知道多数是编码的问题,查看了下目标库的编码为latin1,估计网站...
  • 字符串来作为载体如下将默认html编码转换为R中的UTF-8或latin1vector_cities = strsplit("Nova Lima,São Paulo,Contagem,Rio de Janeiro,Rio de Janeiro,São Paulo,Castanhal,Diadema,Rio de Janeiro,Rio Verde,...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 18,480
精华内容 7,392
关键字:

latin1转换为utf