精华内容
下载资源
问答
  • 2020-12-10 06:12:51

    本主题主要说明python的xml处理标准模块xml.etree的使用。xml.etree模块包含4个子模块,其中cElementTree是ElementTree的别名,已经不推荐使用。本主题主要包含内容:

    1. ElementInclude模块使用

    2. ElementPath模块使用

    3. ElementTree模块使用

    一、etree模块帮助

    import xml.etree

    help(xml.etree)

    Help on package xml.etree in xml:

    NAME

    xml.etree

    DESCRIPTION

    # $Id: __init__.py 3375 2008-02-13 08:05:08Z fredrik $

    # elementtree package

    PACKAGE CONTENTS

    ElementInclude

    ElementPath

    ElementTree

    cElementTree

    FILE

    /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/__init__.py

    提供四个模块:

    |- ElementInclude

    |- ElementPath

    |- ElementTree

    |- cElementTree

    二、ElementInclude模块

    ElementInclude模块主要提供xml中xinclude展开使用。

    提供了几个异常类:

    FatalIncludeError

    |-builtins.SyntaxError

    |-builtins.Exception

    |-builtins.BaseException

    |-builtins.object

    提供了两个函数:

    FUNCTIONS

    |-default_loader(href, parse, encoding=None)

    |-include(elem, loader=None)

    提供了三个常量:

    DATA

    |-XINCLUDE = '{http://www.w3.org/2001/XInclude}'

    |-XINCLUDE_FALLBACK = '{http://www.w3.org/2001/XInclude}fallback'

    |-XINCLUDE_INCLUDE = '{http://www.w3.org/2001/XInclude}include'

    2.1. 使用xinclude语法的xml文件

    1. webpages.xml文件

    2. footer.xml文件

    Mage Education, 2019

    2.2. 使用default_loader加载xml文件

    函数default_loader加载任何文本文件,不过按照xml加载就返回Element对象,非xml就返回str字符串。

    函数原型如下:

    default_loader(href, parse, encoding=None)

    |- href:用来指定加载的xml文件

    |- parse:指定解析的类型:xml与非xml,xml就返回xml.etree.ElementTree.Element类型,否则字符串。

    |- encoding:对非xml有用,用来指定编码格式,一般就是utf-8了。

    import xml.etree.ElementInclude

    # 非xml解析,直接返回字符串

    result = xml.etree.ElementInclude.default_loader(

    href='codes/webpages.xml',

    parse='text',

    encoding='utf-8')

    print(':', type(result))

    print(result)

    print('------------------')

    # 作为xml解析返回xml.etree.ElementTree.Element对象。

    result = xml.etree.ElementInclude.default_loader(

    href='codes/webpages.xml',

    parse='xml',

    encoding='utf-8')

    print(':', type(result))

    print(result)

    :

    ------------------

    :

    2.3. 使用include函数扩展xinclude

    函数include用来展开xml中的xinclude指令。

    函数原型如下:

    include(elem, loader=None)

    |- elem:要扩展xinclude的元素。

    |- loader:只加载扩展xml文件的加载器,默认是default_loader函数加载器。

    注意:

    经过include函数处理的元素,如果包含include指令,则会展开成xml的文件。 可以仔细观察西面例子的输出。

    import xml.etree.ElementInclude

    # 由于版本变化,默认的常量值,可以根据已有的文档修改。

    xml.etree.ElementInclude.XINCLUDE_INCLUDE='{http://www.w3.org/2003/XInclude}include'

    # 作为xml解析返回xml.etree.ElementTree.Element对象。

    result = xml.etree.ElementInclude.default_loader(

    href='codes/webpages.xml',

    parse='xml',

    encoding='utf-8')

    print('xinclude扩展前输出')

    for ele in result:

    print(type(ele),ele)

    xml.etree.ElementInclude.include(result, loader=None)

    print('xinclude扩展后输出')

    for ele in result:

    print(type(ele),ele)

    xinclude扩展前输出

    xinclude扩展后输出

    三、ElementPath模块

    提供XPath的支持。

    支持XPath的函数如下:

    FUNCTIONS

    |- find(elem, path, namespaces=None)

    |- findall(elem, path, namespaces=None)

    |- findtext(elem, path, default=None, namespaces=None)

    |- get_parent_map(context)

    |- iterfind(elem, path, namespaces=None)

    |- prepare_child(next, token)

    |- prepare_descendant(next, token)

    |- prepare_parent(next, token)

    |- prepare_predicate(next, token)

    |- prepare_self(next, token)

    |- prepare_star(next, token)

    |- xpath_tokenizer(pattern, namespaces=None)

    其中提供的常量数据有:

    |- ops = {'': , '': , '.'...

    |- xpath_tokenizer_re = re.compile('('[^']'|\"[^\"]*\"|::|//?|\.....

    注意:

    其中prepare_XXX函数就是ops的操作列表,操作会被iterfind使用,被用来解析path的每个部分,一般不直接使用。下面也不介绍。

    实际ElementTree中的path操作方式也是使用这里的find系列函数。

    下面是xpath支持的语法:

    xpath语法

    说明

    tag

    选择所有tag子元素

    *

    选择一级子元素,比如*/egg选在孙子元素egg

    .

    表示当前元素

    //

    所有子元素,比如.//egg选在任何级别节点上的egg元素。

    ..

    上级父元素

    [@attrib]

    具有属性attrib的元素

    [@attrib='value']

    属性值等于value的元素

    [tag]

    所有具有tag子元素的元素

    [tag='text']

    选择子节点是tag,同时内容为text的元素

    [position]

    选在给定位置的元素(1表示第一个), last()表示最后一个,last()-1从最后开始计数.

    3.1. findall函数

    findall函数返回一个列表,原型实现如下:

    def findall(elem, path, namespaces=None):

    return list(iterfind(elem, path, namespaces))

    findall本质是调用iterfind函数实现。

    参数说明:

    elem:被搜索的元素。

    path:查找路径。

    namespaces=None:指定path的命名空间。

    import xml.etree.ElementPath

    import xml.etree.ElementInclude

    print(xml.etree.ElementPath.ops)

    print(xml.etree.ElementPath.xpath_tokenizer_re)

    root = xml.etree.ElementInclude.default_loader('codes/books.xml', 'xml')

    eles = xml.etree.ElementPath.findall(root, 'book')

    print('findall结果:----------')

    print(eles)

    {'': , '*': , '.': , '..': , '//': , '[': }

    re.compile('(\'[^\']*\'|\\"[^\\"]*\\"|::|//?|\\.\\.|\\(\\)|[/.*:\\[\\]\\(\\)@=])|((?:\\{[^}]+\\})?[^/\\[\\]\\(\\)@=\\s]+)|\\s+')

    findall结果:----------

    [, , ]

    3.2. find函数

    find函数返回查找的第一个元素,函数原型实现如下:

    def find(elem, path, namespaces=None):

    return next(iterfind(elem, path, namespaces), None)

    从find的实现代码中可以看出,实际find每次返回都是iterfind返回的迭代器中的下一个元素,因为iterfind每次返回都是全新的查找的结果,所以find每次返回都是查找列表中的第一个。

    import xml.etree.ElementPath

    import xml.etree.ElementInclude

    root = xml.etree.ElementInclude.default_loader('codes/books.xml', 'xml')

    ele = xml.etree.ElementPath.find(root, 'book')

    if ele:

    print(ele.attrib)

    {'category': 'Python'}

    3.3. findtext函数

    findtext是查找满足path条件的text内容,函数原型如下:

    def findtext(elem, path, default=None, namespaces=None):

    try:

    elem = next(iterfind(elem, path, namespaces))

    return elem.text or ""

    except StopIteration:

    return default

    注意:返回的不是查找的元素,而是元素的text文本。

    参数说明:

    default:就是找到的元素没有文本内容的时候,使用该值替代。

    import xml.etree.ElementPath

    import xml.etree.ElementInclude

    root = xml.etree.ElementInclude.default_loader('codes/note.xml', 'xml')

    ele = xml.etree.ElementPath.findtext(root, 'to', '缺省值')

    print(ele)

    Tove

    3.4. iterfind函数

    iterfind返回一个迭代器类型,实际本质是一个生成器(class 'generator'),该函数的原型实现如下:

    def iterfind(elem, path, namespaces=None):

    import xml.etree.ElementPath

    import xml.etree.ElementInclude

    root = xml.etree.ElementInclude.default_loader('codes/books.xml', 'xml')

    eles = xml.etree.ElementPath.iterfind(root, 'book')

    print(eles)

    print(type(eles))

    for e in eles:

    print(e)

    .select at 0x105a487d8>

    四、ElementTree模块

    ElementTree模块提供xml的dom解析实现,该模块的类包含:

    builtins.SyntaxError(builtins.Exception)

    ParseError

    builtins.object

    Element:xml的基本单元:元素

    ElementTree:xml元素构成的树状数据结构

    QName:Quality Name:根元素

    TreeBuilder:xml树构建器

    XMLParser:xml解析器

    XMLPullParser:xml非阻塞解析器

    同时提供一组快捷函数:

    Comment(text=None)

    PI = ProcessingInstruction(target, text=None)

    ProcessingInstruction(target, text=None)

    SubElement(...)

    XML(text, parser=None)

    XMLID(text, parser=None)

    dump(elem)

    fromstring = XML(text, parser=None)

    fromstringlist(sequence, parser=None)

    iselement(element)

    iterparse(source, events=None, parser=None)

    parse(source, parser=None)

    register_namespace(prefix, uri)

    tostring(element, encoding=None, method=None, *, short_empty_elements=True)

    tostringlist(element, encoding=None, method=None, *, short_empty_elements=True)

    4.1. TreeBuilder与XMLParser阻塞解析

    TreeBuilder负责构建Element对象树,XMLParser负责解析xml内容。

    TreeBuilder提供基本的树的构建功能,我们只要返回root Element元素即可得到Element树。

    # coding = utf-8

    from xml.etree.ElementTree import TreeBuilder

    from xml.etree.ElementTree import XMLParser

    class MyBuilder(TreeBuilder):

    is_root = True

    root_element = None

    def start(self, tag, attrs):

    elem = super().start(tag, attrs)

    if self.is_root:

    self.root_element = elem

    self.is_root = False

    return elem

    builder = MyBuilder()

    parser = XMLParser(target=builder)

    fd = open('codes/books.xml', 'r')

    xml_data = fd.read()

    parser.feed(xml_data)

    root = builder.root_element

    for item in root.getchildren():

    print(item.tag, ':', item.attrib)

    for it in item.getchildren():

    print('\t|-', it, ':', it.tag,':', it.text)

    book : {'category': 'Python'}

    |- : title : 网络爬虫开发

    |- : author : 蜘蛛精

    |- : year : 2018

    |- : price : 66.50

    |- : publisher : 清华大学出版社

    book : {'category': '系统运维'}

    |- : title : K8S运维指南r

    |- : author : 马哥教育

    |- : year : 2018

    |- : price : 99.00

    |- : publisher : 机械版社

    book : {'category': '区块链'}

    |- : title : 以太坊智能合约开发

    |- : author : 钱多多

    |- : year : 2019

    |- : price : 88.95

    |- : publisher : 邮电出版社

    4.2. XMLPullParser 非阻塞解析

    XMLPullParser与XMLParser的区别是非阻塞,阻塞的特点在指定回调事件,非阻塞的特点是产生事件列表。

    XMLPullParser的read_events(self)返回的是一个数据生成器,也是迭代器。

    其中第一个节点就是root节点,可以直接遍历。

    # coding = utf-8

    from xml.etree.ElementTree import XMLPullParser

    events = ("start", "end", "start-ns", "end-ns")

    parser = XMLPullParser(events=events)

    fd = open('codes/books.xml', 'r')

    xml_data = fd.read()

    parser.feed(xml_data)

    # 转换成列表操作

    re_events = list(parser.read_events())

    # 构造xml的root

    root_element = re_events[0][1]

    # 从根节点偏离element树

    def list_tree(element, depth):

    print('\t' * depth, element.tag, ":", element.text if element.text.strip() != '' else '')

    children_elements = element.getchildren()

    if children_elements:

    for e_ in children_elements:

    list_tree(e_, depth+1)

    list_tree(root_element, 0)

    books :

    book :

    title : 网络爬虫开发

    author : 蜘蛛精

    year : 2018

    price : 66.50

    publisher : 清华大学出版社

    book :

    title : K8S运维指南r

    author : 马哥教育

    year : 2018

    price : 99.00

    publisher : 机械版社

    book :

    title : 以太坊智能合约开发

    author : 钱多多

    year : 2019

    price : 88.95

    publisher : 邮电出版社

    4.3. Element对象与ElementTree对象

    ElementTree实际是Element的封装,从上面的XMLParser与XMLPullParser可以看出,已经实现基本的Element树结构。

    ElementTree与XMLParser、XMLPullParser返回的Element都是Element树结构。但ElementTree提供更加快捷的XMLParser、XMLPullParser的解析功能与xml加载功能,同时提供xml保存功能。

    Element对象就是每个标签的封装,提供如下基本数据封装。

    | -attrib

    |  类型是字典,用来封装元素的所有属性。

    |

    | -tag

    |  类型字符串,用来封装元素的标签名。

    |

    | -tail

    |   类型字符串,标签结束后的文本,也可以是None。

    |

    | -text

    | 类型字符串,开始标签后的文本,也可以是None。

    构造器:

    | __init__(self, /, *args, **kwargs)

    同时提供了一组对封装数据的操作函数:

    |

    | -makeelement(self, tag, attrib, /)

    |

    | -append(self, subelement, /)

    |

    | -insert(self, index, subelement, /)

    |

    | -remove(self, subelement, /)

    |

    | -set(self, key, value, /)

    |

    | -clear(self, /)

    |

    | -extend(self, elements, /)

    |

    | -find(self, /, path, namespaces=None)

    |

    | -findall(self, /, path, namespaces=None)

    |

    | -findtext(self, /, path, default=None, namespaces=None)

    |

    | -get(self, /, key, default=None)

    |

    | -getchildren(self, /)

    |

    | -getiterator(...)

    |  iter($self, /, tag=None)

    |  --

    |

    | -items(self, /)

    |

    | -iter(self, /, tag=None)

    |

    | -iterfind(self, /, path, namespaces=None)

    |

    | -itertext(self, /)

    |

    | -keys(self, /)

    | - __init__(self, element=None, file=None)

    | - find(self, path, namespaces=None)

    | - findall(self, path, namespaces=None)

    | - findtext(self, path, default=None, namespaces=None)

    | - getiterator(self, tag=None)

    | - getroot(self)

    | - iter(self, tag=None)

    | - iterfind(self, path, namespaces=None)

    | - parse(self, source, parser=None)

    | - write(self, file_or_filename, encoding=None, xml_declaration=None, default_namespace=None, method=None, *, short_empty_elements=True)

    | - write_c14n(self, file)

    注意:

    c14n说明:W3C推出了C14n标准用于XML数据的规范化。

    目前还没有c14n真正的实现。

    # coding = utf-8

    from xml.etree.ElementTree import ElementTree

    from xml.etree.ElementTree import Element

    tree = ElementTree()

    tree.parse('codes/books.xml')

    root_element = tree.getroot()

    # 从根节点偏离element树

    def list_tree(element, depth):

    print('\t' * depth, element.tag, ":", element.text if element.text.strip() != '' else '')

    children_elements = element.getchildren()

    if children_elements:

    for e_ in children_elements:

    list_tree(e_, depth+1)

    list_tree(root_element, 0)

    books :

    book :

    title : 网络爬虫开发

    author : 蜘蛛精

    year : 2018

    price : 66.50

    publisher : 清华大学出版社

    book :

    title : K8S运维指南r

    author : 马哥教育

    year : 2018

    price : 99.00

    publisher : 机械版社

    book :

    title : 以太坊智能合约开发

    author : 钱多多

    year : 2019

    price : 88.95

    publisher : 邮电出版社

    ElementTree还可以通过构造器封装root元素,提供更加方便的操作。

    五、cElementTree模块

    cElementTree模块是xml.etree.ElementTree模块的别名,目前已经不推荐使用。

    更多相关内容
  • 我一般都是通过xpath解析DOM树的时候会使用lxml的etree,可以很方便的从html源码中得到自己想要的内容。 这里主要介绍一下我常用到的两个方法,分别是etree.HTML()和etree.tostrint()。 1.etree.HTML() etree.HTML()...
  • 在python中安装了lxml-4.2.1,在使用时发现导入etree时IDE中报错Unresolved reference 其实发现,不影响使用,可以正常运行,对于我这种要刨根问底的人不搞明白怎么能罢休了,要保证代码不红 就研究了下源码,通过...
  • xml.etree.ElementTree可以通过支持的有限的XPath表达式来定位元素。 语法 ElementTree支持的语法如下: 语法 说明 tag 查找所有具有指定名称tag的子元素。例如:country表示所有名为country的元素,country/...
  • etree软件包是一个轻量级的纯go软件包,它以元素树的形式表示XML。 它的设计灵感来自Python 模块。 该软件包的一些功能和特性: 将XML文档表示为元素树,以便于遍历。 从头开始导入,序列化,修改或创建XML文档。...
  • 先 pip uninstall lxml 卸载模块 ,下载压缩包, 然后去找到C:\Program Files\Anaconda3\Lib\site-packages 模块路径,解压就可以了
  • python xml.etree.ElementTree遍历xml所有节点 XML文件内容: 代码: #-*- coding: UTF-8 -*- # 从文件中读取数据 import xml.etree.ElementTree as ET #全局唯一标识 unique_id = 1 #遍历所有的节点
  • from lxml import etree

    2017-09-13 11:30:05
    使用python 中 from lxml import etree 时,系统提示未找到文件之类的错误,由于python 自带的并没有集成,安装后即可
  • 该项目将包括用于使用Proftpd,Mysql和PHP3 / 4自动化ftp服务器的源代码和配置脚本。 该项目主要集中在etree服务器的配置和管理上。 www.etree.org
  • 总结了一下使用Python对xml文件...from xml.etree import ElementTree import xml.dom.minidom Get XML String info 查询属性值 response:xml string tag:xml tag element:xml attribute def get_xml_info(respons
  • etree-scripts-开源

    2021-06-29 04:33:02
    用于验证、标记、转换和刻录无损音频文件(例如 SHN、FLAC)的命令行实用程序,还提供到 MP3/OGG​​ 的转换。
  • 本文整理汇总了Python中lxml.etree.XPath方法的典型用法代码示例。如果您正苦于以下问题:Python etree.XPath方法的具体用法?Python etree.XPath怎么用?Python etree.XPath使用的例子?那么恭喜您, 这里精选的方法...

    本文整理汇总了Python中lxml.etree.XPath方法的典型用法代码示例。如果您正苦于以下问题:Python etree.XPath方法的具体用法?Python etree.XPath怎么用?Python etree.XPath使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在模块lxml.etree的用法示例。

    在下文中一共展示了etree.XPath方法的28个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Python代码示例。

    示例1: post

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def post(self, html):

    """

    Try to play with request ...

    """

    import urllib2

    response = urllib2.urlopen('file://%s' % html)

    data = response.read()

    post = etree.HTML(data)

    # find text function

    find_text = etree.XPath("//text()", smart_strings=False)

    LOG.info(find_text(post))

    post.clear()

    开发者ID:gramps-project,项目名称:addons-source,代码行数:21,

    示例2: test_parse_rule

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def test_parse_rule():

    """Ensure parse_rule returns expected output."""

    expr = XPath("//Num")

    assert parse_rule(

    rule_name='',

    rule_values=dict(

    description='',

    expr=expr,

    example="a = 1",

    instead="a = int('1')",

    settings=Settings(included=[], excluded=[], allow_ignore=True),

    )

    ) == Rule(

    name='',

    description='',

    expr=expr,

    example="a = 1",

    instead="a = int('1')",

    settings=Settings(included=[], excluded=[], allow_ignore=True)

    )

    开发者ID:hchasestevens,项目名称:bellybutton,代码行数:22,

    示例3: _details_prepare_merge

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def _details_prepare_merge(details):

    # We may mutate the details later, so copy now to prevent

    # affecting the caller's data.

    details = details.copy()

    # Prepare an nsmap in an OrderedDict. This ensures that lxml

    # serializes namespace declarations in a stable order.

    nsmap = OrderedDict((ns, ns) for ns in sorted(details))

    # Root everything in a namespace-less element. Setting the nsmap

    # here ensures that prefixes are preserved when dumping later.

    # This element will be replaced by the root of the lshw detail.

    # However, if there is no lshw detail, this root element shares

    # its tag with the tag of an lshw XML tree, so that XPath

    # expressions written with the lshw tree in mind will still work

    # without it, e.g. "/list//{lldp}something".

    root = etree.Element("list", nsmap=nsmap)

    # We have copied details, and root is new.

    return details, root

    开发者ID:maas,项目名称:maas,代码行数:22,

    示例4: _details_do_merge

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def _details_do_merge(details, root):

    # Merge the remaining details into the composite document.

    for namespace in sorted(details):

    xmldata = details[namespace]

    if xmldata is not None:

    try:

    detail = etree.fromstring(xmldata)

    except etree.XMLSyntaxError as e:

    maaslog.warning("Invalid %s details: %s", namespace, e)

    else:

    # Add the namespace to all unqualified elements.

    for elem in detail.iter("{}*"):

    elem.tag = etree.QName(namespace, elem.tag)

    root.append(detail)

    # Re-home `root` in a new tree. This ensures that XPath

    # expressions like "/some-tag" work correctly. Without this, when

    # there's well-formed lshw data -- see the backward-compatibilty

    # hack futher up -- expressions would be evaluated from the first

    # root created in this function, even though that root is now the

    # parent of the current `root`.

    return etree.ElementTree(root)

    开发者ID:maas,项目名称:maas,代码行数:24,

    示例5: merge_details_cleanly

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def merge_details_cleanly(details):

    """Merge node details into a single XML document.

    `details` should be of the form::

    {"name": xml-as-bytes, "name2": xml-as-bytes, ...}

    where `name` is the namespace (and prefix) where each detail's XML

    should be placed in the composite document; elements in each

    detail document without a namespace are moved into that namespace.

    This is similar to `merge_details`, but the ``lshw`` detail is not

    treated specially. The result of this function is not compatible

    with XPath expressions created for old releases of MAAS.

    The returned document is always rooted with a ``list`` element.

    """

    details, root = _details_prepare_merge(details)

    return _details_do_merge(details, root)

    开发者ID:maas,项目名称:maas,代码行数:21,

    示例6: match_xpath

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def match_xpath(xpath, doc):

    """Return a match of expression `xpath` against document `doc`.

    :type xpath: Either `unicode` or `etree.XPath`

    :type doc: Either `etree._ElementTree` or `etree.XPathDocumentEvaluator`

    :rtype: bool

    """

    is_xpath_compiled = is_compiled_xpath(xpath)

    is_doc_compiled = is_compiled_doc(doc)

    if is_xpath_compiled and is_doc_compiled:

    return doc(xpath.path)

    elif is_xpath_compiled:

    return xpath(doc)

    elif is_doc_compiled:

    return doc(xpath)

    else:

    return doc.xpath(xpath)

    开发者ID:maas,项目名称:maas,代码行数:21,

    示例7: try_match_xpath

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def try_match_xpath(xpath, doc, logger=logging):

    """See if the XPath expression matches the given XML document.

    Invalid XPath expressions are logged, and are returned as a

    non-match.

    :type xpath: Either `unicode` or `etree.XPath`

    :type doc: Either `etree._ElementTree` or `etree.XPathDocumentEvaluator`

    :rtype: bool

    """

    try:

    # Evaluating an XPath expression against a document with LXML

    # can return a list or a string, and perhaps other types.

    # Casting the return value into a boolean context appears to

    # be the most reliable way of detecting a match.

    return bool(match_xpath(xpath, doc))

    except etree.XPathEvalError as error:

    # Get a plaintext version of `xpath`.

    expr = xpath.path if is_compiled_xpath(xpath) else xpath

    logger.warning("Invalid expression '%s': %s", expr, str(error))

    return False

    开发者ID:maas,项目名称:maas,代码行数:24,

    示例8: scenario

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def scenario(name, xpath, doc, expected_result, expected_log=""):

    """Return a scenario (for `testscenarios`) to test `try_match_xpath`.

    This is a convenience function to reduce the amount of

    boilerplate when constructing `scenarios_inputs` later on.

    The scenario it constructs defines an XML document, and XPath

    expression, the expectation as to whether it will match or

    not, and the expected log output.

    """

    doc = etree.fromstring(doc).getroottree()

    return (

    name,

    dict(

    xpath=xpath,

    doc=doc,

    expected_result=expected_result,

    expected_log=dedent(expected_log),

    ),

    )

    # Exercise try_match_xpath with a variety of different inputs.

    开发者ID:maas,项目名称:maas,代码行数:24,

    示例9: populate_tag_for_multiple_nodes

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def populate_tag_for_multiple_nodes(tag, nodes, batch_size=DEFAULT_BATCH_SIZE):

    """Reevaluate a single tag for a multiple nodes.

    Presumably this tag's expression has recently changed. Use `populate_tags`

    when many nodes need reevaluating AND there are rack controllers available

    to which to farm-out work. Use this only when many nodes need reevaluating

    locally, i.e. when there are no rack controllers connected.

    """

    # Same expression, multuple documents: compile expression with XPath.

    xpath = etree.XPath(tag.definition, namespaces=tag_nsmap)

    # The XML details documents can be large so work in batches.

    for batch in gen_batches(nodes, batch_size):

    probed_details = get_probed_details(batch)

    probed_details_docs_by_node = {

    node: merge_details(probed_details[node.system_id])

    for node in batch

    }

    nodes_matching, nodes_nonmatching = classify(

    partial(try_match_xpath, xpath, logger=maaslog),

    probed_details_docs_by_node.items(),

    )

    tag.node_set.remove(*nodes_nonmatching)

    tag.node_set.add(*nodes_matching)

    开发者ID:maas,项目名称:maas,代码行数:25,

    示例10: test_DictCharWidget_renders_with_empty_string_as_input_data

    ​点赞 6

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def test_DictCharWidget_renders_with_empty_string_as_input_data(self):

    names = [factory.make_string(), factory.make_string()]

    initials = []

    labels = [factory.make_string(), factory.make_string()]

    widget = DictCharWidget(

    [widgets.TextInput, widgets.TextInput, widgets.CheckboxInput],

    names,

    initials,

    labels,

    skip_check=True,

    )

    name = factory.make_string()

    html_widget = fromstring(

    "" + widget.render(name, "") + ""

    )

    widget_names = XPath("fieldset/input/@name")(html_widget)

    widget_labels = XPath("fieldset/label/text()")(html_widget)

    expected_names = [

    "%s_%s" % (name, widget_name) for widget_name in names

    ]

    self.assertEqual(

    [expected_names, labels], [widget_names, widget_labels]

    )

    开发者ID:maas,项目名称:maas,代码行数:25,

    示例11: filter_add

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def filter_add(self, xpath):

    self.filters.append(ET.XPath(xpath))

    开发者ID:openSUSE,项目名称:openSUSE-release-tools,代码行数:4,

    示例12: group_by

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def group_by(self, xpath, required=False):

    self.groups.append(ET.XPath(xpath))

    if required:

    self.filter_add(xpath)

    开发者ID:openSUSE,项目名称:openSUSE-release-tools,代码行数:6,

    示例13: makeXmlPageFromRaw

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def makeXmlPageFromRaw(xml):

    """ Discard the metadata around a element in string"""

    root = etree.XML(xml)

    find = etree.XPath("//*[local-name() = 'page']")

    # The tag will inherit the namespace, like:

    #

    # FIXME: pretty_print doesn't seem to work, only adds a newline

    return etree.tostring(find(root)[0], pretty_print=True)

    开发者ID:WikiTeam,项目名称:wikiteam,代码行数:10,

    示例14: final_attribute_name

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def final_attribute_name(xpath):

    """

    Find the final text element of an xpath which we will assume is the name

    of an attribute.

    TODO: find a better and less error-prone way to do this!

    """

    if type(xpath) == XPath: ## in case compiled:

    pathstring = xpath.path

    else:

    pathstring = xpath

    fragments = re.split("[/:@\(\)]+", pathstring)

    return fragments[-1]

    开发者ID:CSTR-Edinburgh,项目名称:Ossian,代码行数:15,

    示例15: _make_xpath_builder

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def _make_xpath_builder(self):

    namespaces = {

    'ds' : 'http://www.w3.org/2000/09/xmldsig#',

    'md' : 'urn:oasis:names:tc:SAML:2.0:metadata',

    'saml' : 'urn:oasis:names:tc:SAML:2.0:assertion',

    'samlp': 'urn:oasis:names:tc:SAML:2.0:protocol'

    }

    def xpath_with_namespaces(xpath_str):

    return etree.XPath(xpath_str, namespaces=namespaces)

    return xpath_with_namespaces

    开发者ID:bluedatainc,项目名称:jupyterhub-samlauthenticator,代码行数:14,

    示例16: _get_username_from_saml_etree

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def _get_username_from_saml_etree(self, signed_xml):

    xpath_with_namespaces = self._make_xpath_builder()

    xpath_fun = xpath_with_namespaces(self.xpath_username_location)

    xpath_result = xpath_fun(signed_xml)

    if isinstance(xpath_result, etree._ElementUnicodeResult):

    return xpath_result

    if type(xpath_result) is list and len(xpath_result) > 0:

    return xpath_result[0]

    self.log.warning('Could not find name from name XPath')

    return None

    开发者ID:bluedatainc,项目名称:jupyterhub-samlauthenticator,代码行数:15,

    示例17: _get_roles_from_saml_etree

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def _get_roles_from_saml_etree(self, signed_xml):

    if self.xpath_role_location:

    xpath_with_namespaces = self._make_xpath_builder()

    xpath_fun = xpath_with_namespaces(self.xpath_role_location)

    xpath_result = xpath_fun(signed_xml)

    if xpath_result:

    return xpath_result

    self.log.warning('Could not find role from role XPath')

    else:

    self.log.warning('Role XPath not set')

    return []

    开发者ID:bluedatainc,项目名称:jupyterhub-samlauthenticator,代码行数:16,

    示例18: lint_file

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def lint_file(filepath, file_contents, rules):

    """Run rules against file, yielding any failures."""

    matching_rules = [

    rule

    for rule in rules

    if rule_settings_match(rule, filepath)

    ]

    if not matching_rules:

    return

    ignored_lines = get_ignored_lines(file_contents)

    xml_ast = file_contents_to_xml_ast(file_contents) # todo - use caching module?

    for rule in sorted(matching_rules, key=attrgetter('name')):

    # TODO - hacky - need to find better way to do this (while keeping chain)

    # TODO - possibly having both filepath and contents/input supplied?

    if isinstance(rule.expr, XPath):

    matching_lines = set(find_in_ast(

    xml_ast,

    rule.expr.path,

    return_lines=True

    ))

    elif isinstance(rule.expr, re._pattern_type):

    matching_lines = {

    file_contents[:match.start()].count('\n') + 1 # TODO - slow

    for match in re.finditer(rule.expr, file_contents)

    }

    elif callable(rule.expr):

    matching_lines = set(rule.expr(file_contents))

    else:

    continue # todo - maybe throw here?

    if rule.settings.allow_ignore:

    matching_lines -= ignored_lines

    if not matching_lines:

    yield LintingResult(rule, filepath, succeeded=True, lineno=None)

    for line in matching_lines:

    yield LintingResult(rule, filepath, succeeded=False, lineno=line)

    开发者ID:hchasestevens,项目名称:bellybutton,代码行数:42,

    示例19: xpath

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def xpath(loader, node):

    """Construct XPath expressions."""

    value = loader.construct_scalar(node)

    return XPath(value)

    开发者ID:hchasestevens,项目名称:bellybutton,代码行数:6,

    示例20: test_parse_rule_requires_settings

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def test_parse_rule_requires_settings():

    """Ensure parse_rule raises an exception if settings are not provided."""

    with pytest.raises(InvalidNode):

    parse_rule(

    rule_name='',

    rule_values=dict(

    description='',

    expr=XPath("//Num"),

    example="a = 1",

    instead="a = int('1')",

    )

    )

    开发者ID:hchasestevens,项目名称:bellybutton,代码行数:14,

    示例21: _xp_all_of

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def _xp_all_of(types):

    xp = is_instance_xpath(types)

    return XPath('''./descendant-or-self::*[

    {predicate}

    ]'''.format(predicate=xp))

    开发者ID:scrapinghub,项目名称:js2xml,代码行数:7,

    示例22: is_instance

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def is_instance(tree, types=None):

    if types is None:

    types = (dict, list)

    return XPath(is_instance_xpath(types))(tree)

    开发者ID:scrapinghub,项目名称:js2xml,代码行数:6,

    示例23: escape_text

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def escape_text(self, txt):

    result = txt

    for k,v in TiKZMaker.escapes.items():

    result = result.replace(k,v)

    return result

    # get_all_text = etree.XPath('.//text()')

    开发者ID:paaguti,项目名称:svg2tikz,代码行数:9,

    示例24: css

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def css(self, css):

    return etree.XPath(HTMLTranslator().css_to_xpath(css))(self.tree)

    开发者ID:elliterate,项目名称:capybara.py,代码行数:4,

    示例25: process_node_tags

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def process_node_tags(

    rack_id,

    nodes,

    tag_name,

    tag_definition,

    tag_nsmap,

    client,

    batch_size=None,

    ):

    """Update the nodes for a new/changed tag definition.

    :param rack_id: System ID for the rack controller.

    :param nodes: List of nodes to process tags for.

    :param client: A `MAASClient` used to fetch the node's details via

    calls to the web API.

    :param tag_name: Name of the tag to update nodes for

    :param tag_definition: Tag definition

    :param batch_size: Size of batch

    """

    # We evaluate this early, so we can fail before sending a bunch of data to

    # the server

    xpath = etree.XPath(tag_definition, namespaces=tag_nsmap)

    system_ids = [node["system_id"] for node in nodes]

    process_all(

    client,

    rack_id,

    tag_name,

    tag_definition,

    system_ids,

    xpath,

    batch_size=batch_size,

    )

    开发者ID:maas,项目名称:maas,代码行数:34,

    示例26: is_compiled_xpath

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def is_compiled_xpath(xpath):

    """Is `xpath` a compiled expression?"""

    return isinstance(xpath, etree.XPath)

    开发者ID:maas,项目名称:maas,代码行数:5,

    示例27: test_logs_to_specified_logger

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def test_logs_to_specified_logger(self):

    xpath = etree.XPath("/foo:bar")

    doc = etree.XML("")

    root_logger = self.useFixture(FakeLogger())

    callers_logger = Mock()

    try_match_xpath(xpath, doc, callers_logger)

    self.assertEqual("", root_logger.output)

    self.assertThat(

    callers_logger.warning,

    MockCalledOnceWith(

    "Invalid expression '%s': %s",

    "/foo:bar",

    "Undefined namespace prefix",

    ),

    )

    开发者ID:maas,项目名称:maas,代码行数:17,

    示例28: test_merges_into_new_tree

    ​点赞 5

    # 需要导入模块: from lxml import etree [as 别名]

    # 或者: from lxml.etree import XPath [as 别名]

    def test_merges_into_new_tree(self):

    xml = self.do_merge_details(

    {

    "lshw": b"Hello",

    "lldp": b"Hello",

    }

    )

    # The presence of a getroot() method indicates that this is a

    # tree object, not an element.

    self.assertThat(xml, MatchesStructure(getroot=IsCallable()))

    # The list tag can be obtained using an XPath expression

    # starting from the root of the tree.

    self.assertSequenceEqual(

    ["list"], [elem.tag for elem in xml.xpath("/list")]

    )

    开发者ID:maas,项目名称:maas,代码行数:17,

    注:本文中的lxml.etree.XPath方法示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。

    展开全文
  • 解决lxml包没有etree的问题,解决lxml包没有etree的问题,解决lxml包没有etree的问题
  • Python etree 问题

    2020-11-25 17:09:22
    from lxml import etree headers = { 此处略过 } url = 'https://www.guazi.com/cs/buy/o1/' resp = requests.get(url,headers=headers) text = print(resp.content.decode('utf-8...
  • 操作环境、软件版本等信息 问题相关代码 import requests from lxml import etree class Spider(object): def start_request(self): #1.请求拿到小说名创建文件夹 response = requests.get(...
  • xml.etree.ElementTree模块的作用基于事件和基于文档的APID来解析XML,可以使用XPath表达式搜索已解析的文件,具有对文档的增删改查的功能,该方式需要注意大xml文件,因为是一次性加载到内存,所以如果是大xml文件...

    xml.etree.ElementTree模块的作用

    基于事件和基于文档的APID来解析XML,可以使用XPath表达式搜索已解析的文件,具有对文档的增删改查的功能,该方式需要注意大xml文件,因为是一次性加载到内存,

    所以如果是大xml文件,不推荐使用该模块解析,应该使用sax方式。

    测试解析的内容

    1

    2008

    141100

    4

    2011

    59900

    68

    2011

    13600

    test.xml

    1、解析xml文档

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)print(tree)

    ElementTree_parse_xml.py

    运行结果

    #返回ElementTree对象

    2、遍历解析XML树,获取节点名字

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)for node intree.iter():print(node.tag)

    ElementTree_dump_xml.py

    运行结果

    #打印所有节点名字

    data

    country

    rank

    year

    gdppc

    neighbor

    neighbor

    country

    rank

    year

    gdppc

    neighbor

    country

    rank

    year

    gdppc

    neighbor

    neighbor

    3、遍历解析XML树,获取属性值

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)for node in tree.iter('neighbor'):

    attr_name= node.attrib.get('name')

    attr_direction= node.attrib.get('direction')#如果两个值都不为空,则打印两个值,否则打印一个值

    if attr_name andattr_direction:print('{:<25}{:<25}'.format(attr_name, attr_direction))else:print('{:<25}'.format(attr_name))

    ElementTree_show_name_direction.py

    运行结果

    Austria E

    Switzerland W

    Malaysia N

    Costa Rica W

    Colombia E

    4、利用XPath在XML文档中查找节点

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)for node in tree.findall('.//neighbor'):

    name= node.attrib.get('name')ifname:print(name)

    ElementTree_find_feeds_by_tag.py

    运行结果

    Austria

    Switzerland

    Malaysia

    Costa Rica

    Colombia

    5、利用XPath在XML文档中查找更深一层的节点

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)for node in tree.findall('.//neighbor/neighbor'):

    name= node.attrib.get('name')ifname:print(name)

    ElementTree_find_feeds_by_structure.py

    运行结果

    Malaysia

    6、利用XPath表达式,查询节点的属性名和值

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)

    node= tree.find('./country')print('标签名:', node.tag)for name, value innode.attrib.items():print('属性名:{name},属性值:{value}'.format(name=name, value=value))

    ElementTree_node_attributes.py

    运行结果

    标签名: country

    属性名:name,属性值:Liechtenstein

    7、 利用XPath表达式,查询多个路径的文本即text

    from xml.etree importElementTree

    with open('test.xml', 'tr', encoding='utf-8') as rf:

    tree=ElementTree.parse(rf)for path in ['./country/year', './country/gdppc']:

    node=tree.find(path)print('节点名字', node.tag)print(node.text)print(node.tail)

    ElementTree_node_text.py

    运行结果

    节点名字 year2008节点名字 gdppc141100

    8、解析监听标签的事件

    from xml.etree.ElementTree importiterparse#计算深度值

    depth =0#前缀的长度

    prefix_width = 8

    #前缀的圆点数量

    prefix_dots = '.' *prefix_width#拼接格式化字符串模板

    line_template = ''.join(['{prefix:<0.{prefix_len}}','{event:<8}','{suffix:

    ])

    EVENT_NAMES= ['start', 'end', 'start-ns', 'end-ns']for (event, node) in iterparse('test.xml', EVENT_NAMES):#如果是结束,深度减1

    if event == 'end':

    depth-= 1

    #前缀的长度

    prefix_len = depth * 2

    print(line_template.format(

    prefix=prefix_dots, #前缀显示的内容

    prefix_len=prefix_len, #前缀的长度

    suffix='', #后缀显示的内容

    suffix_len=(prefix_width - prefix_len), #后缀的长度=前缀总长度-前缀实际的长度

    event=event, #当前的事件

    node_id=id(node), #显示内存的ID

    node=node, #ElementTree的对象

    ))#如果是开始,深度加1

    if event == 'start':

    depth+= 1

    ElementTree_show_all_events.py

    运行结果

    start data 3102087901736..start country3102087901816....start rank3102087901896....end rank3102087901896....start year3102087901976....end year3102087901976....start gdppc3102087902056....end gdppc3102087902056....start neighbor3102087902136....end neighbor3102087902136....start neighbor3102087902216....end neighbor3102087902216..end country3102087901816..start country3102087902296....start rank3102087902376....end rank3102087902376....start year3102087902456....end year3102087902456....start gdppc3102087902536....end gdppc3102087902536....start neighbor3102087902616......start neighbor3102087902776......end neighbor3102087902776....end neighbor3102087902616..end country3102087902296..start country3102087902936....start rank3102087903016....end rank3102087903016....start year3102087903096....end year3102087903096....start gdppc3102087903176....end gdppc3102087903176....start neighbor3102087903336....end neighbor3102087903336....start neighbor3102087903496....end neighbor3102087903496..end country3102087902936end data3102087901736

    9、XML转为CVS的文件格式,这里只存到内存中测试,生产中是存到硬盘上

    importcsvimportsysfrom xml.etree.ElementTree importiterparse

    writer= csv.writer(sys.stdout, quoting=csv.QUOTE_NONNUMERIC)

    group_name= ''parsing= iterparse('test.xml', events=['start'])for event, node inparsing:#去除不想获取的标签

    if node.tag in ['rank', 'year', 'gdppc']:continue

    #如果没有属性名为name的话,则为父标签,否则为子标签

    if not node.attrib.get('name'):

    group_name= node.attrib.get('text')else:

    writer.writerow(

    (group_name, node.attrib.get('name'), node.attrib.get('direction'))

    )

    ElementTree_write_podcast_csv.py

    测试效果

    "Liechtenstein","Austria","E"

    "Liechtenstein","Switzerland","W"

    "Singapore","Malaysia","N"

    "Singapore","Malaysia","N"

    "Panama","Costa Rica","W"

    "Panama","Colombia","E"

    10、创建一个定制的树的构造器

    importcsvimportsysfrom xml.etree.ElementTree importXMLParserclassPodcastListToCSV(object):def __init__(self, output_file):

    self.writer=csv.writer(

    output_file,

    quoting=csv.QUOTE_NONNUMERIC

    )defstart(self, tag, attrib):if tag in ['rank', 'year', 'gdppc']:return

    if not attrib.get('name'):

    self.group_name= attrib.get('text')else:

    self.writer.writerow(

    (self.group_name,

    tag,

    attrib['name'],

    attrib['direction'])

    )defend(self, tag):"""忽略关闭标签"""

    pass

    defdata(self, data):"""忽略节点内部的数据"""

    pass

    defclose(self):"""在这里没什么特别的"""

    passtarget=PodcastListToCSV(sys.stdout)

    parser= XMLParser(target=target)

    with open('test.xml', 'rt') as rf:for line inrf:

    parser.feed(line)

    parser.close()

    ElementTree_podcast_csv_treebuilder.py

    数据源

    1

    2008

    141100

    test.xml

    运行效果

    "Liechtenstein","neighbor","Austria","E"

    "Liechtenstein","neighbor","Switzerland","W"

    11、利用递归的方法,解析XML

    from xml.etree.ElementTree importXMLdefshow_node(node):if node.text is not None andnode.text.strip():print('文本内容: %s' %node.text)if node.tail is not None andnode.tail.strip():print('尾部内容: %s' %node.tail)for name, value insorted(node.attrib.items()):print('%s=%s' %(name, value))for child innode:

    show_node(child)

    parsed= XML("""

    This is child "a".

    This is child "b".

    This is child "c".

    """)print('parsed =', parsed)for elem inparsed:

    show_node(elem)

    ElementTree_XML.py

    运行结果

    parsed = 文本内容: Thisis child "a".

    id=a

    文本内容: Thisis child "b".

    id=b

    文本内容: Thisis child "c".

    id=c

    12、利用属性节点为标识,解析XML子节点

    from xml.etree.ElementTree importXMLID

    tree, id_map= XMLID('''

    This is child "a".

    This is child "b".

    This is child "c".

    ''')for key, value insorted(id_map.items()):print('%s=%s' % (key, value))

    ElementTree_XMLID.py

    运行结果

    a=b=c=

    13、创建XML节点,并且打印出来

    from xml.etree.ElementTree import(Element, SubElement, Comment, tostring)

    top= Element('top')

    comment= Comment('这个是创建一个XML top根节点')

    top.append(comment)

    child= SubElement(top, 'child')

    child.text= '这个是子节点child的文本'child_with_tail= SubElement(top, 'child_with_tail')

    child_with_tail.text= '这个是子节点child_with_tail的text'child_with_tail.tail= '这个是子节点child_with_tail的tail'

    print(tostring(top, encoding='utf-8').decode('utf-8'))

    ElementTree_create.py

    运行结果

    这个是子节点child的文本这个是子节点child_with_tail的text这个是子节点child_with_tail的tail

    14、创建XML节点,并且格式化打印出来

    from xml.etree importElementTreefrom xml.dom importminidomdefprettify(elem):

    rough_string= ElementTree.tostring(elem, 'utf-8')

    reparsed=minidom.parseString(rough_string)return reparsed.toprettyxml(indent=" ")

    ElementTree_format.py

    from xml.etree.ElementTree import(Element, SubElement, Comment, tostring)from ElementTree_format importprettify

    top= Element('top')

    comment= Comment('这个是创建一个XML top根节点')

    top.append(comment)

    child= SubElement(top, 'child')

    child.text= '这个是子节点child的文本'child_with_tail= SubElement(top, 'child_with_tail')

    child_with_tail.text= '这个是子节点child_with_tail的text'child_with_tail.tail= '这个是子节点child_with_tail的tail'

    print(prettify(top))

    ElementTree_pretty.py

    运行结果

    这个是子节点child的文本

    这个是子节点child_with_tail的text这个是子节点child_with_tail的tail

    15、创建XML节点并且设置节点元素的属性

    from xml.etree.ElementTree import(Element, SubElement, Comment)from ElementTree_format importprettify#创建根节点,并且设置属性方式一

    root = Element('root')

    root.set('version', '1.0')#增加注释

    root.append(Comment('这个是测试设置属性值的Demo'))#设置属性方式二

    head = SubElement(root, 'head', {'name': 'My Cyc'})

    head.text= '这个是文本'title= SubElement(root, 'title')

    title.text= 'My Title'

    print(prettify(root))

    ElementTree_set_attribute.py

    运行结果

    My Title

    16、自创建XML节点的扩展

    from xml.etree.ElementTree importElementfrom ElementTree_format importprettify#创建根节点

    root = Element('top')#列表推导式生成三个子节点

    children =[

    Element('child', {'num': str(i)}) for i in range(3)

    ]#用根节点,扩展3个子节点

    root.extend(children)print(prettify(root))

    ElementTree_extend.py

    运行结果

    17、创建XML()节点,对节点进行扩展

    from xml.etree.ElementTree import(Element, SubElement, XML)from ElementTree_format importprettify#创建根节点

    root = Element('top')#将parent挂载到root节点

    parent = SubElement(root, 'parent')#解析XML的对象

    children = XML('')#在parnet节点扩展children节点

    parent.extend(children)print(prettify(root))

    ElementTree_extend_node.py

    运行结果

    18、创建XML()节点,对节点进行扩展并不会改变现有父子节点的关系

    from xml.etree.ElementTree import(Element, SubElement, XML)from ElementTree_format importprettify#创建根节点

    root = Element('top')#将parent挂载到root节点

    parent_a = SubElement(root, 'parent', {'id': 'a'})

    parent_b= SubElement(root, 'parent', {'id': 'b'})#解析XML的对象

    childrens = XML('')#给所有的子节点设置属性

    for child inchildrens:

    child.set('id', str(id(child)))#给parent_a扩展子节点

    print('A:')

    parent_a.extend(childrens)print(prettify(root))#给parent_b扩展子节点

    print('B:')

    parent_b.extend(childrens)print(prettify(root))

    ElementTree_extend_node_copy.py

    运行结果

    A:<?xml version="1.0" ?>

    B:<?xml version="1.0" ?>

    从上面可以看出来,内存的id都是一样的

    19、将创建完成的XML序列化到标准的输出流,显示出来

    importsysfrom xml.etree.ElementTree import(

    Element, SubElement, Comment, ElementTree,

    )from ElementTree_format importprettify

    root= Element('root')

    comment= Comment('注释的功能')

    root.append(comment)

    child= SubElement(root, 'child')

    child.text= '这个是child的text'child_with_tail= SubElement(root, '这个是child的tail')

    child_with_tail.text= '这个是child_with_tail的text'child_with_tail.tail= '这个是child_with_tail的tail'child_with_entity_ref= SubElement(root, 'child_with_entity_ref')

    child_with_entity_ref.text= '这个是child_with_entity_ref的text'empty_child= SubElement(root, 'empty_child')

    sys.stdout.write(prettify(root))

    ElementTree_write.py

    运行结果

    这个是child的text

    这个是child_with_tail的text这个是child的tail>这个是child_with_tail的tail这个是child_with_entity_ref的text

    20、将创建完成的XML序列化到标准的输出流,并且设置xml,html,text不同的方法,显示不一样的效果

    importsysfrom xml.etree.ElementTree import(

    Element, SubElement, ElementTree,

    )#创建根节点

    root = Element('root')#将child节点,增加到root节点

    child = SubElement(root, 'child')#设置child节点的文本

    child.text = 'Contains text.'

    #将empty_child,增加到root节点

    empty_child = SubElement(root, 'empty_child')for method in ['xml', 'html', 'text']:print(method)

    sys.stdout.flush()

    ElementTree(root).write(sys.stdout.buffer, method=method)print('\n')

    ElementTree_write_method.py

    运行结果

    xmlContains text.htmlContains text.text

    Contains text.

    展开全文
  • [注意]xml.etree.ElementTree模块在应对恶意结构数据时显得并不安全。每个element对象都具有以下属性:1. tag:string对象,表示数据代表的种类。2. attrib:dictionary对象,表示附有的属性。3. text:string对象...

    f2c32f964afb

    简介

    Element类型是一种灵活的容器对象,用于在内存中存储结构化数据。

    [注意]xml.etree.ElementTree模块在应对恶意结构数据时显得并不安全。

    每个element对象都具有以下属性:

    1. tag:string对象,表示数据代表的种类。

    2. attrib:dictionary对象,表示附有的属性。

    3. text:string对象,表示element的内容。

    4. tail:string对象,表示element闭合之后的尾迹。

    5. 若干子元素(child elements)。

    texttail

    1234

    创建元素的方法有Element或者SubElement(),前者称作元素的构建函数(constructor),用以构建任一独存的元素;后者称作元素的制造函数(factory function),用以制造某一元素的子元素。

    有了一串元素之后,使用ElementTree类来将其打包,把一串元素转换为xml文件或者从xml文件中解析出来。

    若想加快速度,可以使用C语言编译的API xml.etree.cElementTree。

    导入ElementTree

    在使用xml.etree.ElementTree时,一般都按如下导入:

    try:

    import xml.etree.cElementTree as ET

    except ImportError:

    import xml.etree.ElementTree asET

    XML是中结构化数据形式,在ET中使用ElementTree代表整个XML文档,并视其为一棵树,Element代表这个文档树中的单个节点。

    ET对象具有多种方法从不同来源导入数据,如下:

    #从硬盘的xml文件读取数据

    import xml.etree.ElementTree as ET

    tree = ET.parse('country_data.xml')    #载入数据

    root = tree.getroot()    #获取根节点

    #从字符串读取数据

    root = ET.fromstring(country_data_as_string)

    [注意]fromstring()是直接获取string对象中的根节点,因此以上root其实是一个Element。

    作为一个Element对象,本身是具有子元素,因此可以直接对Element进行迭代取值:

    >>>forchildin root:

    ...  print child.tag, child.attrib

    ...

    country {'name':'Liechtenstein'}

    country {'name':'Singapore'}

    country {'name':'Panama'}

    或者直接使用索引寻找子节点:

    >>> root[0][1].text'2008'

    Element中的遍历与查询

    Element.iter(tag=None):遍历该Element所有后代,也可以指定tag进行遍历寻找。

    Element.findall(path):查找当前元素下tag或path能够匹配的直系节点。

    Element.find(path):查找当前元素下tag或path能够匹配的首个直系节点。

    Element.text: 获取当前元素的text值。

    Element.get(key, default=None):获取元素指定key对应的属性值,如果没有该属性,则返回default值。

    Element对象

    classxml.etree.ElementTree.Element(tag, attrib={}, **extra)

    tag:string,元素代表的数据种类。

    text:string,元素的内容。

    tail:string,元素的尾形。

    attrib:dictionary,元素的属性字典。

    #针对属性的操作

    clear():清空元素的后代、属性、text和tail也设置为None。

    get(key,default=None):获取key对应的属性值,如该属性不存在则返回default值。

    items():根据属性字典返回一个列表,列表元素为(key, value)。

    keys():返回包含所有元素属性键的列表。

    set(key, value):设置新的属性键与值。

    #针对后代的操作

    append(subelement):添加直系子元素。

    extend(subelements):增加一串元素对象作为子元素。#python2.7新特性

    find(match):寻找第一个匹配子元素,匹配对象可以为tag或path。

    findall(match):寻找所有匹配子元素,匹配对象可以为tag或path。

    findtext(match):寻找第一个匹配子元素,返回其text值。匹配对象可以为tag或path。

    insert(index, element):在指定位置插入子元素。

    iter(tag=None):生成遍历当前元素所有后代或者给定tag的后代的迭代器。#python2.7新特性

    iterfind(match):根据tag或path查找所有的后代。

    itertext():遍历所有后代并返回text值。

    remove(subelement):删除子元素。

    ElementTree对象

    classxml.etree.ElementTree.ElementTree(element=None, file=None)

    element如果给定,则为新的ElementTree的根节点。

    _setroot(element):用给定的element替换当前的根节点。慎用。

    # 以下方法与Element类中同名方法近似,区别在于它们指定以根节点作为操作对象。

    find(match)

    findall(match)

    findtext(match, default=None)

    getroot():获取根节点.

    iter(tag=None)

    iterfind(match)

    parse(source, parser=None):装载xml对象,source可以为文件名或文件类型对象.

    write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None,method="xml")

    模块方法

    xml.etree.ElementTree.Comment(text=None)

    创建一个特别的element,通过标准序列化使其代表了一个comment。comment可以为bytestring或unicode。

    xml.etree.ElementTree.dump(elem)

    生成一个element tree,通过sys.stdout输出,elem可以是元素树或单个元素。这个方法最好只用于debug。

    xml.etree.ElementTree.fromstring(text)

    text是一个包含XML数据的字符串,与XML()方法类似,返回一个Element实例。

    xml.etree.ElementTree.fromstringlist(sequence,parser=None)

    从字符串的序列对象中解析xml文档。缺省parser为XMLParser,返回Element实例。

    New in version 2.7.

    xml.etree.ElementTree.iselement(element)

    检查是否是一个element对象。

    xml.etree.ElementTree.iterparse(source,events=None,parser=None)

    将文件或包含xml数据的文件对象递增解析为element tree,并且报告进度。events是一个汇报列表,如果忽略,将只有end事件会汇报出来。

    注意,iterparse()只会在看见开始标签的">"符号时才会抛出start事件,因此届时属性是已经定义了,但是text和tail属性在那时还没有定义,同样子元素也没有定义,因此他们可能不能被显示出来。如果你想要完整的元素,请查找end事件。

    xml.etree.ElementTree.parse(source,parser=None)

    将一个文件或者字符串解析为element tree。

    xml.etree.ElementTree.ProcessingInstruction(target,text=None)

    这个方法会创建一个特别的element,该element被序列化为一个xml处理命令。

    xml.etree.ElementTree.register_namespace(prefix,uri)

    注册命名空间前缀。这个注册是全局有效,任何已经给出的前缀或者命名空间uri的映射关系会被删除。

    New in version 2.7.

    xml.etree.ElementTree.SubElement(parent,tag,attrib={},**extra)

    子元素工厂,创建一个Element实例并追加到已知的节点。

    xml.etree.ElementTree.tostring(element,encoding="us-ascii",method="xml")

    生成一个字符串来表示表示xml的element,包括所有子元素。element是Element实例,method为"xml","html","text"。返回包含了xml数据的字符串。

    xml.etree.ElementTree.tostringlist(element,encoding="us-ascii",method="xml")

    生成一个字符串来表示表示xml的element,包括所有子元素。element是Element实例,method为"xml","html","text"。返回包含了xml数据的字符串列表。

    New in version 2.7.

    xml.etree.ElementTree.XML(text,parser=None)

    从一个字符串常量中解析出xml片段。返回Element实例。

    xml.etree.ElementTree.XMLID(text,parser=None)

    从字符串常量解析出xml片段,同时返回一个字典,用以映射element的id到其自身。

    欢迎大家加入笔者的软件测试技术交流群1125760266,共同交流探讨

    展开全文
  • Python 3.7.2 安装 lxml etree xml.etree.ElementTree python --version Python 3.7.2 pip install lxml==4.8.0 Collecting lxml==4.8.0 Downloading lxml-4.8.0-cp37-cp37m-win_amd64.whl (3.6 MB) ---------...
  • 本文整理汇总了Python中lxml.etree.Comment方法的典型用法代码示例。如果您正苦于以下问题:Python etree.Comment方法的具体用法?Python etree.Comment怎么用?Python etree.Comment使用的例子?那么恭喜您, 这里...
  • Go使用etree解析XML

    2021-03-26 18:16:33
    Go使用etree解析XML 文章目录Go使用etree解析XML1、简单了解xml2、Go语言使用etree解析xml(1)、读取xml(2)、通过路径找到元素或属性位置(3)、对于多个同名元素节点 1、简单了解xml 通过下图对xml中的元素和...
  • 来一段代码先:importrequestsfrom lxml importetreehtml=requests.get('https://python123.io/ws/demo.html').texttree=etree.HTML(html)print(tree.xpath('//p[@class="title"]/b/text()'))#列表类型print(tree....
  • [注意]xml.etree.ElementTree模块在应对恶意结构数据时显得并不安全。每个element对象都具有以下属性:1. tag:string对象,表示数据代表的种类。2. attrib:dictionary对象,表示附有的属性。3. text:string对象...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 45,579
精华内容 18,231
关键字:

etree

友情链接: 4104618boc.zip