精华内容
下载资源
问答
  • 事件抽取论文一览表 论文标题 模型简称 会议 发表时间 摘要 备注 代码链接 Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction ...

             事件抽取任务主要包括三个任务:1.实体识别   2.事件检测  3.事件角色填充。本人主要找了一些ACL顶级会议的篇章级事件抽取,有触发词和无触发词的事件检测,以及事件论元抽取。

    论文标题 模型简称 会议 发表时间 摘要 备注 代码链接
    Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction   EMNLP 2019   事件的时序性  
    Document-Level Event Role Filler Extraction using Multi-Granularity Contextualized Encoding   ACL 2020 论文提出一种无触发词的篇章级事件抽取,涉及篇章级事件事件抽取为事件角色抽取,从而转化为序列标注模型,论文提出多种粒度篇章全局编码,融合句子编码和段落编码方式,提升效果显著。 融合多种粒度文章级别的事件抽取 https://github.com/xinyadu/doc_event_role
    Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction DOC2EDAG EMNLP 2019 论文提出一种无触发词的篇章级事件抽取,论文设计了三个transformer,transformer-1+crf实体识别,max-pooling获取实体和句子嵌入, transformer-2编码所有实体和句子之间的特征。max-pooling在所有句子嵌入中获取文章嵌入,采用分类层获取事件类型。以事件类行为开头,预先构造事件角色表,为每个事件角色分配实体和NA实体。为了屏蔽已经分配的实体,设计了m tensor,已分配的角色实体加入m tensor中,采用tansformer-3编码获取实体是否当前角色。   https://github.com/dolphin-zs/Doc2EDAG
    Joint Event Extraction via Recurrent Neural Networks JRNN NAACL 2016      
    Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction DBRNN AAAI 2018      
    Event Extraction via Dynamic Multi-Pooling Convolutional Neural networks DMCNN ACL 2015 论文将事件抽取任务分为两个阶段的多分类任务。第一个阶段是触发词分类,利用DMCNN模型对句子中的每个单词进行识别,判断是否为触发词。如果一个句子中包含了触发词,那么开始执行第二个阶段;第二个阶段是论元分类,这里使用了相似的DMCNN模型,对句字中除了触发词以外的所有实体论元进行判别,识别出与该触发词存在关系的论元以及该论元所扮演的论元角色。论文中DMCNN引入多种特征,词向量特征,位置特征,事件类型特征;另外论文设计了一个动态多池化层。因为在事件抽取任务中,一个句子可能包含多个事件,并且一个候选论元相对于不同触发词可能扮演不同的角色。因此在论元分类任务中,我们将卷积得到的特征图分为三个部分,按照触发词、候选论元所在的位置进行切分。之后使用每个部分的特征最大值作为最终提取的特征。    
    Event Detection with Trigger-Aware Lattice Neural Network TLNN EMNLP 2019 论文主要是做事件检测,即事件触发词识别以及分类,将事件检测设计为序列标注问题,论文为了解决预处理阶段分词错误导致触发词识别错误和触发词多义性导致触发词识别错误,论文提出动态融合词和字信息,构造了类似于实体识别lattice-lstm网络结构,并且引入外部知识hownet增强对多义触发词的理解,lstm编码句子中的K个意思集成到lstm细胞状态中,从而获取包含多种外部知识的字符表示。   https://github.com/thunlp/TLNN
    Event Detection with Multi-Order Graph Convolution and Aggregated Attention   EMNLP 2019 论文主要是做事件检测,也是将事件检测设计为序列标注问题,作者认为触发词与论元在句法解析树中需要获取多级连接才能获取关系,采用多阶GAT网络来学习句法特征。    
    Event Detection without Triggers   NACCL 2019 论文提出了无触发词识别事件检测,将事件检测转化为多标签分类,每个输入样例转化为<s,t>,论文融合全局和局部特征,首先采用lstm编码,再分别计算局部特征和event type的attention,和全局特征与event type的attention。融合这两种特征输出模型的预测结果。   https://github.com/liushulinle/event detection without
    triggers
    HMEAE: Hierarchical Modular Event Argument Extraction HMEAE ACL 2019 论文提出了HMEAE模型,用于处理EAE(事件元素抽取)问题(面向的是argument roles的分类问题)。采用灵活的模块网络(modular networks),利用了元素角色(argument roles)相关的层次概念。这篇文章的亮点在于使用到了概念层次的信息,有助于EAE中的argument roles分类问题。模型在建模的过程中以一个实例作为对象,也就是一个句子。先使用CNN或BERT将句子建模成隐层嵌入序列;然后根据触发词和候选元素(句中实体)的位置,使用dynamic multi-pooling进行了特征的聚合,得到了实例的嵌入。接着,在上级概念模块(SCM)中使用注意力机制,给每个隐层嵌入分配一个注意力分值,表示该隐层嵌入和该上级概念的关联性程度。然后给定角色,对隐层在不同上级概念中的注意力分值求平均,得到每个token ii针对该角色的注意力分值。再使用这个注意力分值作为权重,对所有的隐层嵌入进行加权求和,得到输入实例(句子)的面向角色的嵌入。最后,将实例的嵌入和实例的面向角色的嵌入拼接起来作为分类器的输入,和元素角色的嵌入相乘,再经过一层softmax,为输入的实例xx预测角色rr。   https://github.com/thunlp/HMEAE
    DCFEE: A Document-level Chinese Financial Event Extraction System
    based on Automatically Labeled Training Data
    DCFEE ACL 2018 论文提出了篇章级事件抽取,先通过BiLSTM+CRF进行句子级事件抽取,抽取出句子中的候选论元及触发词。关键句事件检测通过拼接句子级事件抽取输出的事件论元和事件触发器的表示,和当前句子的向量表示,采用CNN编码,来判定关键句事件与否。另外使用论元填充策略,该策略可以自动地从周围句子中填充缺失的事件论元。    
    基于联合标注和全局推理的篇章级事件抽取   中文信息报 2019 论文提出基于自注意力机制的实体和事件序列标注模型识别实体和候选论元,采用多层感知机学习实体,实体类别,触发词,触发词类别,文本表示,位置特征,分类所有的角色类型。最后篇章级全局推理方式获取篇章级的事件,方法是结合事件描述和事件结构信息采用向量和tf-idf判定是否为同一事件。    
    Entity, Relation, and Event Extraction
    with Contextualized Span Representations
      EMNLP 2019     https://github.com/dwadden/dygiepp
    A Two-Step Approach for Implicit Event Argument Detection   ACL 2020 论文为了解决隐藏论元角色抽取任务,提取两个步骤,首先检测论元实体的头词,然后对头词进行扩展。分别计算头词和谓词(触发词)进行扩展,然后他们之间概率值进行分类,然后对实体的头词分别从左或者从右进行扩展。   https://github.com/zzsfornlp/zmsp

     

    展开全文
  • LIC2021事件抽取任务基线 目录LIC2021事件抽取任务基线一、篇章级事件抽取基线1.1 评测方法1.2 快速复现基线Step1:数据预处理并加载1.3 快速复现基线Step2:构建模型1.4 快速复现基线Step3:数据处理1.5 快速复现...

    PaddleNLP实战——LIC2021事件抽取任务基线

    相关系列笔记:

    论文阅读:DuEE:A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios(附数据集地址)
    PaddleNLP实战——LIC2021事件抽取任务基线(附代码)
    PaddleNLP实战——LIC2021关系抽取任务基线(附代码)

      信息抽取旨在从非结构化自然语言文本中提取结构化知识,如实体、关系、事件等。事件抽取的目标是对于给定的自然语言句子,根据预先指定的事件类型和论元角色,识别句子中所有目标事件类型的事件,并根据相应的论元角色集合抽取事件所对应的论元。其中目标事件类型 (event_type) 和论元角色 (role) 限定了抽取的范围,例如 (event_type:胜负,role:时间,胜者,败者,赛事名称)、(event_type:夺冠,role:夺冠事件,夺冠赛事,冠军)。

    在这里插入图片描述

      该示例展示了如何使用PaddleNLP快速复现LIC2021事件抽取比赛基线并进阶优化基线。

    # 安装paddlenlp最新版本
    !pip install --upgrade paddlenlp
    
    %cd event_extraction/
    
    Looking in indexes: https://mirror.baidu.com/pypi/simple/
    Collecting paddlenlp
      Downloading https://mirror.baidu.com/pypi/packages/e9/89/812c1f3683f8296114ca91d591601515352741d37d9847114836a9dfa188/paddlenlp-2.0.0rc16-py3-none-any.whl (295kB)
         |████████████████████████████████| 296kB 20.7MB/s eta 0:00:01
    Requirement already satisfied, skipping upgrade: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.9.0)
    Requirement already satisfied, skipping upgrade: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.1.1)
    Requirement already satisfied, skipping upgrade: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.42.1)
    Requirement already satisfied, skipping upgrade: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (4.1.0)
    Requirement already satisfied, skipping upgrade: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.4.4)
    Requirement already satisfied, skipping upgrade: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (1.2.2)
    Requirement already satisfied, skipping upgrade: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.15.0)
    Requirement already satisfied, skipping upgrade: numpy>=1.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.16.4)
    Requirement already satisfied, skipping upgrade: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.0.0)
    Requirement already satisfied, skipping upgrade: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (0.8.53)
    Requirement already satisfied, skipping upgrade: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (2.22.0)
    Requirement already satisfied, skipping upgrade: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.1.1)
    Requirement already satisfied, skipping upgrade: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (7.1.2)
    Requirement already satisfied, skipping upgrade: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (3.14.0)
    Requirement already satisfied, skipping upgrade: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (3.8.2)
    Requirement already satisfied, skipping upgrade: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (0.7.1.1)
    Requirement already satisfied, skipping upgrade: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.21.0)
    Requirement already satisfied, skipping upgrade: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp) (0.22.1)
    Requirement already satisfied, skipping upgrade: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2019.3)
    Requirement already satisfied, skipping upgrade: Jinja2>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2.10.1)
    Requirement already satisfied, skipping upgrade: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2.8.0)
    Requirement already satisfied, skipping upgrade: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp) (0.18.0)
    Requirement already satisfied, skipping upgrade: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp) (3.9.9)
    Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (2019.9.11)
    Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (2.8)
    Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (3.0.4)
    Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (1.25.6)
    Requirement already satisfied, skipping upgrade: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (1.1.0)
    Requirement already satisfied, skipping upgrade: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (7.0)
    Requirement already satisfied, skipping upgrade: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (0.16.0)
    Requirement already satisfied, skipping upgrade: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (0.6.1)
    Requirement already satisfied, skipping upgrade: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (2.6.0)
    Requirement already satisfied, skipping upgrade: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (0.23)
    Requirement already satisfied, skipping upgrade: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (2.2.0)
    Requirement already satisfied, skipping upgrade: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (2.0.1)
    Requirement already satisfied, skipping upgrade: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (0.10.0)
    Requirement already satisfied, skipping upgrade: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.3.4)
    Requirement already satisfied, skipping upgrade: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (5.1.2)
    Requirement already satisfied, skipping upgrade: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (16.7.9)
    Requirement already satisfied, skipping upgrade: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.3.0)
    Requirement already satisfied, skipping upgrade: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.4.10)
    Requirement already satisfied, skipping upgrade: scipy>=0.17.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (1.3.0)
    Requirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (0.14.1)
    Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.5->Flask-Babel>=1.0.0->visualdl->paddlenlp) (1.1.1)
    Requirement already satisfied, skipping upgrade: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl->paddlenlp) (0.6.0)
    Requirement already satisfied, skipping upgrade: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl->paddlenlp) (7.2.0)
    Installing collected packages: paddlenlp
      Found existing installation: paddlenlp 2.0.0rc7
        Uninstalling paddlenlp-2.0.0rc7:
          Successfully uninstalled paddlenlp-2.0.0rc7
    Successfully installed paddlenlp-2.0.0rc16
    /home/aistudio/event_extraction
    

      该比赛有两个子任务,一个篇章级事件抽取任务,一个句子级事件抽取任务

    一、篇章级事件抽取基线

      篇章级事件抽取数据集(DuEE-Fin)是金融领域篇章级别事件抽取数据集, 共包含13个已定义好的事件类型约束和1.15万中文篇章(存在部分非目标篇章作为负样例),其中6900训练集,1150验证集和3450测试集。 在该数据集上基线采用基于ERNIE的序列标注(sequence labeling)方案,分为基于序列标注的触发词抽取模型、基于序列标注的论元抽取模型和枚举属性分类模型,属于PipeLine模型;基于序列标注的触发词抽取模型采用BIO方式,识别触发词的位置以及对应的事件类型,基于序列标注的论元抽取模型采用BIO方式识别出事件中的论元以及对应的论元角色;枚举属性分类模型采用ernie进行分类。

    评测方法

      本任务采用预测论元F1值作为评价指标,对于每个篇章,采用不放回的方式给每个目标事件寻找最相似的预测事件(事件级别匹配),搜寻方式是优先寻找与目标事件的事件类型相同且角色和论元正确数量最多的预测事件。

      f1_score = (2 * P * R) / (P + R),其中

      • 预测论元正确=事件类型和角色相同且论元正确
      • P=预测论元正确数量 / 所有预测论元的数量
      • R=预测论元正确数量 / 所有人工标注论元的数量

    1.1 快速复现基线Step1:数据预处理并加载

      从比赛官网下载数据集,解压存放于data/DuEE-Fin目录下,将原始数据预处理成序列标注格式数据。 处理之后的数据同样放在data/DuEE-Fin下, 触发词识别数据文件存放在data/DuEE-Fin/role下, 论元角色识别数据文件存放在data/DuEE-Fin/trigger下。 枚举分类数据存放在data/DuEE-Fin/enum下。

    !bash ./run_duee_fin.sh data_prepare
    
    check and create directory
    create dir * ./ckpt *
    create dir * ./ckpt/DuEE-Fin *
    create dir * ./submit *
    
    start DuEE-Fin data prepare
    
    =================DUEE FINANCE DATASET==============
    
    =================start schema process==============
    input path ./conf/DuEE-Fin/event_schema.json
    save trigger tag 27 at ./conf/DuEE-Fin/trigger_tag.dict
    save trigger tag 121 at ./conf/DuEE-Fin/role_tag.dict
    save enum tag 4 at ./conf/DuEE-Fin/enum_tag.dict
    =================end schema process===============
    
    =================start data process==============
    
    ********** start document process **********
    train 32795 dev 5302 test 140867
    ********** end document process **********
    
    ********** start sentence process **********
    
    ----trigger------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/trigger
    train 7251 dev 1180
    
    ----role------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/role
    train 9441 dev 1524
    
    ----enum------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/enum
    train 429 dev 69
    ********** end sentence process **********
    =================end data process==============
    end DuEE-Fin data prepare
    

      我们可以加载自定义数据集。通过继承paddle.io.Dataset,自定义实现__getitem__ 和 __len__两个方法。

      如完成触发词识别,加载数据集event_extraction/data/DuEE-Fin/trigger

    import paddle
    from utils import load_dict
    
    class DuEventExtraction(paddle.io.Dataset):
        """DuEventExtraction"""
        def __init__(self, data_path, tag_path):
    
            self.label_vocab = load_dict(tag_path)
            self.word_ids = []
            self.label_ids = []
            with open(data_path, 'r', encoding='utf-8') as fp:
                # skip the head line
                next(fp)
                for line in fp.readlines():
                    words, labels = line.strip('\n').split('\t')
                    words = words.split('\002')
                    labels = labels.split('\002')
                    self.word_ids.append(words)
                    self.label_ids.append(labels)
    
            self.label_num = max(self.label_vocab.values()) + 1
    
        def __len__(self):
            return len(self.word_ids)
    
        def __getitem__(self, index):
            return self.word_ids[index], self.label_ids[index]
    
    train_ds = DuEventExtraction('./data/DuEE-Fin/trigger/train.tsv', './conf/DuEE-Fin/trigger_tag.dict')
    dev_ds = DuEventExtraction('./data/DuEE-Fin/trigger/dev.tsv', './conf/DuEE-Fin/trigger_tag.dict')
    
    count = 0
    for text, label in train_ds:
        print(f"text: {text}; label: {label}")
        count += 1
        if count >= 3:
            break
    
    text: ['原', '标', '题', ':', '万', '讯', '自', '控', '(', '7', '.', '4', '9', '0', ',', '-', '0', '.', '1', '0', ',', '-', '1', '.', '3', '2', '%', ')', ':', '傅', '宇', '晨', '解', '除', '部', '分', '股', '份', '质', '押', '、', '累', '计', '质', '押', '比', '例', '为', '3', '9', '.', '5', '5', '%', ',', ',', ',', ',', '来', '源', ':', '每', '日', '经', '济', '新', '闻', ',', '每', '经', 'a', 'i', '快', '讯', ',', '万', '讯', '自', '控', '(', 's', 'z', ',', '3', '0', '0', '1', '1', '2', ',', '收', '盘', '价', ':', '7', '.', '4', '9', '元', ')', '6', '月', '3', '日', '下', '午', '发', '布', '公', '告', '称', ',', '公', '司', '接', '到', '股', '东', '傅', '宇', '晨', '的', '通', '知', ',', '获', '悉', '傅', '宇', '晨', '将', '其', '部', '分', '股', '份', '办', '理', '了', '质', '押', '业', '务', '。', ',', '截', '至', '本', '公', '告', '日', ',', '傅', '宇', '晨', '共', '持', '有', '公', '司', '股', '份', '5', '7', '9', '0', '.', '3', '8', '万', '股', ',', '占', '公', '司', '总', '股', '本', '的', '2', '0', '.', '2', '5', '%', ';', '累', '计', '质', '押', '股', '份', '2', '2', '9', '0', '万', '股', ',', '占', '傅', '宇', '晨', '持', '有', '公', '司', '股', '份', '总', '数', '的', '3', '9', '.', '5', '5', '%', ',', '占', '公', '司', '总', '股', '本', '的', '8', '.', '0', '1', '%', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-质押', 'I-质押', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
    text: ['客', '户', '端', ',', '新', '浪', '港', '股', '讯', ',', '众', '安', '集', '团', '(', '0', '.', '2', '4', '8', ',', '-', '0', '.', '0', '0', ',', '-', '0', '.', '8', '0', '%', ')', '(', '0', '0', '6', '7', '2', '.', 'h', 'k', ')', '发', '布', '公', '告', ',', '于', '2', '0', '1', '9', '年', '1', '0', '月', '1', '5', '日', ',', '公', '司', '耗', '资', '9', '4', '.', '5', '6', '万', '港', '元', '回', '购', '3', '8', '0', '.', '5', '万', '股', ',', '回', '购', '价', '格', '每', '股', '0', '.', '2', '4', '8', '-', '0', '.', '2', '4', '9', '港', '元', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-股份回购', 'I-股份回购', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
    text: ['原', '标', '题', ':', '金', '徽', '酒', '(', '6', '0', '3', '9', '1', '9', '.', 's', 'h', ')', ':', '亚', '特', '集', '团', '解', '除', '质', '押', '1', '9', '8', '0', '万', '股', ',', ',', ',', ',', '来', '源', ':', '格', '隆', '汇', ',', '格', '隆', '汇', '8', '月', '5', '日', '丨', '金', '徽', '酒', '(', '6', '0', '3', '9', '1', '9', '.', 's', 'h', ')', '公', '布', ',', '公', '司', '近', '日', '收', '到', '控', '股', '股', '东', '甘', '肃', '亚', '特', '投', '资', '集', '团', '有', '限', '公', '司', '(', '“', '亚', '特', '集', '团', '”', ')', '将', '其', '持', '有', '的', '公', '司', '部', '分', '股', '份', '解', '除', '质', '押', '的', '通', '知', '。', ',', '2', '0', '1', '8', '年', '4', '月', '9', '日', ',', '亚', '特', '集', '团', '将', '其', '持', '有', '的', '公', '司', '5', '9', '8', '0', '万', '股', '有', '限', '售', '条', '件', '股', '份', '质', '押', '给', '兰', '州', '银', '行', '股', '份', '有', '限', '公', '司', '陇', '南', '分', '行', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-解除质押', 'I-解除质押', 'I-解除质押', 'I-解除质押', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
    

    1.2 快速复现基线Step2:构建模型

      基于序列标注的触发词抽取模型是整体模型的一部分,该部分主要是给定事件类型,识别句子中出现的事件触发词对应的位置以及对应的事件类别,该模型是基于ERNIE开发序列标注模型,模型原理图如下:

    在这里插入图片描述
      同样地,基于序列标注的论元抽取模型也是基于ERNIE开发序列标注模型,该部分主要是识别出事件中的论元以及对应论元角色,模型原理图如下:
    在这里插入图片描述

      上述样例中通过模型识别出:

      1)论元"新东方",并分配标签"B-收购方"、“I-收购方”、“I-收购方”;
      2)论元"东方优播", 并分配标签"B-被收购方"、“I-被收购方”、“I-被收购方”、“I-被收购方”。

      最终识别出文本中包含的论元角色和论元对是 <收购方,新东方>、<被收购方,东方优播>

      PaddleNLP提供了ERNIE预训练模型常用序列标注模型,可以通过指定模型名字完成一键加载

    from paddlenlp.transformers import ErnieForTokenClassification, ErnieForSequenceClassification
    
    label_map = load_dict('./conf/DuEE-Fin/trigger_tag.dict')
    id2label = {val: key for key, val in label_map.items()}
    model = ErnieForTokenClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))
    
    [2021-04-10 16:11:55,651] [    INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams and saved to /home/aistudio/.paddlenlp/models/ernie-1.0
    [2021-04-10 16:11:55,654] [    INFO] - Downloading ernie_v1_chn_base.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams
    100%|██████████| 390123/390123 [00:05<00:00, 72718.98it/s]
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
      warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
      warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
    

      同时,对于枚举分类数据采用的是基于ERNIE的文本分类模型,枚举角色类型为环节。模型原理图如下:

    在这里插入图片描述

      给定文本,对文本进行分类,得到不同类别上的概率 筹备上市(0.8)、暂停上市(0.02)、正式上市(0.15)、终止上市(0.03)

      同样地,PaddleNLP提供了ERNIE预训练模型常用文本分类模型,可以通过指定模型名字完成一键加载

    from paddlenlp.transformers import ErnieForSequenceClassification
    
    model = ErnieForSequenceClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))
    

    1.3 快速复现基线Step3:数据处理

      我们需要将原始数据处理成模型可读入的数据。PaddleNLP为了方便用户处理数据,内置了对于各个预训练模型对应的Tokenizer,可以完成 文本token化,转token ID,文本长度截断等操作。与加载模型类似地,也可以一键加载。

      文本数据处理直接调用tokenizer即可输出模型所需输入数据。

    from paddlenlp.transformers import ErnieTokenizer, ErnieModel
    
    tokenizer = ErnieTokenizer.from_pretrained("ernie-1.0")
    ernie_model = ErnieModel.from_pretrained("ernie-1.0")
    
    # 一行代码完成切分token,映射token ID以及拼接特殊token
    encoded_text = tokenizer(text="请输入测试样例", return_length=True, return_position_ids=True)
    for key, value in encoded_text.items():
        print("{}:\n\t{}".format(key, value))
    
    # 转化成paddle框架数据格式
    input_ids = paddle.to_tensor([encoded_text['input_ids']])
    print("input_ids : \n\t{}".format(input_ids))
    
    segment_ids = paddle.to_tensor([encoded_text['token_type_ids']])
    print("token_type_ids : \n\t{}".format(segment_ids))
    
    # 此时即可输入ERNIE模型中得到相应输出
    sequence_output, pooled_output = ernie_model(input_ids, segment_ids)
    print("Token wise output shape: \n\t{}\nPooled output shape: \n\t{}".format(sequence_output.shape, pooled_output.shape))
    
    [2021-04-10 16:12:14,372] [    INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/ernie/vocab.txt
    100%|██████████| 89/89 [00:00<00:00, 4018.40it/s]
    [2021-04-10 16:12:14,586] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    
    input_ids:
    	[1, 647, 789, 109, 558, 525, 314, 656, 2]
    token_type_ids:
    	[0, 0, 0, 0, 0, 0, 0, 0, 0]
    seq_len:
    	9
    position_ids:
    	[0, 1, 2, 3, 4, 5, 6, 7, 8]
    input_ids : 
    	Tensor(shape=[1, 9], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
           [[1  , 647, 789, 109, 558, 525, 314, 656, 2  ]])
    token_type_ids : 
    	Tensor(shape=[1, 9], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
           [[0, 0, 0, 0, 0, 0, 0, 0, 0]])
    Token wise output shape: 
    	[1, 9, 768]
    Pooled output shape: 
    	[1, 768]
    

      由以上代码可以见,tokenizer提供了一种非常便利的方式生成模型所需的数据格式。

      以上,

      • input_ids: 表示输入文本的token ID。
      • token_type_ids: 表示对应的token属于输入的第一个句子还是第二个句子。(Transformer类预训练模型支持单句以及句对输入。)详细参见左侧 sequence_labeling.py convert_example_to_feature()函数解释。
      • seq_len: 表示输入句子的token个数。
      • input_mask:表示对应的token是否一个padding token。由于一个batch中的输入句子长度不同,所以需要将不同长度的句子padding到统一固定长度。1表示真实输入,0表示对应token为padding token。
      • position_ids: 表示对应token在整个输入序列中的位置。

      同时,ERNIE模型输出有2个tensor

      • sequence_output是对应每个输入token的语义特征表示,shape为(1, num_tokens, hidden_size)。其一般用于序列标注、问答等任务。
      • pooled_output是对应整个句子的语义特征表示,shape为(1, hidden_size)。其一般用于文本分类、信息检索等任务。

      NOTE:

      如需使用ernie-tiny预训练模型,则对应的tokenizer应该使用paddlenlp.transformers.ErnieTinyTokenizer.from_pretrained(‘ernie-tiny’)

      以上代码示例展示了使用Transformer类预训练模型所需的数据处理步骤。为了更方便地使用,PaddleNLP同时提供了更加高阶API,一键即可返回模型所需数据格式。

      本基线将对数据作以下处理:

      • 将原始数据处理成模型可以读入的格式。首先使用tokenizer切词并映射词表中input ids,转化token type ids等。
      • 使用paddle.io.DataLoader接口多进程异步加载数据

    from functools import partial
    from paddlenlp.data import Stack, Tuple, Pad
    
    def convert_example_to_feature(example, tokenizer, label_vocab=None, max_seq_len=512, no_entity_label="O", ignore_label=-1, is_test=False):
        tokens, labels = example
        tokenized_input = tokenizer(
            tokens,
            return_length=True,
            is_split_into_words=True,
            max_seq_len=max_seq_len)
    
        input_ids = tokenized_input['input_ids']
        token_type_ids = tokenized_input['token_type_ids']
        seq_len = tokenized_input['seq_len']
    
        if is_test:
            return input_ids, token_type_ids, seq_len
        elif label_vocab is not None:
            labels = labels[:(max_seq_len-2)]
            encoded_label = [no_entity_label] + labels + [no_entity_label]
            encoded_label = [label_vocab[x] for x in encoded_label]
            return input_ids, token_type_ids, seq_len, encoded_label
    
    
    no_entity_label = "O"
    # padding label value
    ignore_label = -1
    batch_size = 32
    max_seq_len = 300
    
    trans_func = partial(
        convert_example_to_feature,
        tokenizer=tokenizer,
        label_vocab=train_ds.label_vocab,
        max_seq_len=max_seq_len,
        no_entity_label=no_entity_label,
        ignore_label=ignore_label,
        is_test=False)
    batchify_fn = lambda samples, fn=Tuple(
        Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]), # input ids
        Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]), # token type ids
        Stack(), # sequence lens
        Pad(axis=0, pad_val=ignore_label) # labels
    ): fn(list(map(trans_func, samples)))
    
    train_loader = paddle.io.DataLoader(
        dataset=train_ds,
        batch_size=batch_size,
        shuffle=True,
        collate_fn=batchify_fn)
    dev_loader = paddle.io.DataLoader(
        dataset=dev_ds,
        batch_size=batch_size,
        collate_fn=batchify_fn)
    

    1.4 快速复现基线Step4:定义损失函数和优化器,开始训练

      在该基线上,我们选择交叉墒作为损失函数,使用paddle.optimizer.AdamW作为优化器。

    import numpy as np
    
    @paddle.no_grad()
    def evaluate(model, criterion, metric, num_label, data_loader):
        """evaluate"""
        model.eval()
        metric.reset()
        losses = []
        for input_ids, seg_ids, seq_lens, labels in data_loader:
            logits = model(input_ids, seg_ids)
            loss = paddle.mean(criterion(logits.reshape([-1, num_label]), labels.reshape([-1])))
            losses.append(loss.numpy())
            preds = paddle.argmax(logits, axis=-1)
            n_infer, n_label, n_correct = metric.compute(None, seq_lens, preds, labels)
            metric.update(n_infer.numpy(), n_label.numpy(), n_correct.numpy())
            precision, recall, f1_score = metric.accumulate()
        avg_loss = np.mean(losses)
        model.train()
    
        return precision, recall, f1_score, avg_loss
    
    # 模型参数保存路径
    !mkdir ckpt/DuEE-Fin/trigger/
    
    import warnings
    from paddlenlp.metrics import ChunkEvaluator
    
    warnings.filterwarnings('ignore')
    
    learning_rate=5e-5
    weight_decay=0.01
    num_epoch = 1
    
    checkpoints = 'ckpt/DuEE-Fin/trigger/'
    
    num_training_steps = len(train_loader) * num_epoch
    # Generate parameter names needed to perform weight decay.
    # All bias and LayerNorm parameters are excluded.
    decay_params = [
        p.name for n, p in model.named_parameters()
        if not any(nd in n for nd in ["bias", "norm"])
    ]
    optimizer = paddle.optimizer.AdamW(
        learning_rate=learning_rate,
        parameters=model.parameters(),
        weight_decay=weight_decay,
        apply_decay_param_fun=lambda x: x in decay_params)
    
    metric = ChunkEvaluator(label_list=train_ds.label_vocab.keys(), suffix=False)
    criterion = paddle.nn.loss.CrossEntropyLoss(ignore_index=ignore_label)
    
    step, best_f1 = 0, 0.0
    model.train()
    rank = paddle.distributed.get_rank()
    for epoch in range(num_epoch):
        for idx, (input_ids, token_type_ids, seq_lens, labels) in enumerate(train_loader):
            logits = model(input_ids, token_type_ids).reshape(
                [-1, train_ds.label_num])
            loss = paddle.mean(criterion(logits, labels.reshape([-1])))
            loss.backward()
            optimizer.step()
            optimizer.clear_grad()
            loss_item = loss.numpy().item()
            if step > 0 and step % 10 == 0 and rank == 0:
                print(f'train epoch: {epoch} - step: {step} (total: {num_training_steps}) - loss: {loss_item:.6f}')
            if step > 0 and step % 50 == 0 and rank == 0:
                p, r, f1, avg_loss = evaluate(model, criterion, metric, len(label_map), dev_loader)
                print(f'dev step: {step} - loss: {avg_loss:.5f}, precision: {p:.5f}, recall: {r:.5f}, ' \
                        f'f1: {f1:.5f} current best {best_f1:.5f}')
                if f1 > best_f1:
                    best_f1 = f1
                    print(f'==============================================save best model ' \
                            f'best performerence {best_f1:5f}')
                    paddle.save(model.state_dict(), '{}/best.pdparams'.format(checkpoints))
            step += 1
    
    # save the final model
    if rank == 0:
        paddle.save(model.state_dict(), '{}/final.pdparams'.format(checkpoints))
    
    train epoch: 0 - step: 10 (total: 227) - loss: 0.136036
    train epoch: 0 - step: 20 (total: 227) - loss: 0.130759
    train epoch: 0 - step: 30 (total: 227) - loss: 0.117360
    train epoch: 0 - step: 40 (total: 227) - loss: 0.126342
    train epoch: 0 - step: 50 (total: 227) - loss: 0.117132
    dev step: 50 - loss: 0.11086, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 60 (total: 227) - loss: 0.127355
    train epoch: 0 - step: 70 (total: 227) - loss: 0.120025
    train epoch: 0 - step: 80 (total: 227) - loss: 0.112086
    train epoch: 0 - step: 90 (total: 227) - loss: 0.106585
    train epoch: 0 - step: 100 (total: 227) - loss: 0.109516
    dev step: 100 - loss: 0.09834, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 110 (total: 227) - loss: 0.082624
    train epoch: 0 - step: 120 (total: 227) - loss: 0.056104
    train epoch: 0 - step: 130 (total: 227) - loss: 0.064101
    train epoch: 0 - step: 140 (total: 227) - loss: 0.059635
    train epoch: 0 - step: 150 (total: 227) - loss: 0.057752
    dev step: 150 - loss: 0.04139, precision: 0.35824, recall: 0.38144, f1: 0.36947 current best 0.00000
    ==============================================save best model best performerence 0.369475
    train epoch: 0 - step: 160 (total: 227) - loss: 0.045838
    train epoch: 0 - step: 170 (total: 227) - loss: 0.030626
    train epoch: 0 - step: 180 (total: 227) - loss: 0.029898
    train epoch: 0 - step: 190 (total: 227) - loss: 0.020956
    train epoch: 0 - step: 200 (total: 227) - loss: 0.032151
    dev step: 200 - loss: 0.01862, precision: 0.66860, recall: 0.71763, f1: 0.69225 current best 0.36947
    ==============================================save best model best performerence 0.692250
    train epoch: 0 - step: 210 (total: 227) - loss: 0.017710
    train epoch: 0 - step: 220 (total: 227) - loss: 0.012850
    

      论元识别模型训练与触发词模型训练相同,只需将数据换成处理过后的论元识别数据集即可。 可通过如下方式启动训练。

    # 触发词识别模型训练
    !bash run_duee_fin.sh trigger_train
    
    该条输出内容超过1000行,保存时将被截断
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin trigger train
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    -----------  Configuration Arguments -----------
    gpus: 0
    heter_worker_num: None
    heter_workers: 
    http_port: None
    ips: 127.0.0.1
    log_dir: log
    nproc_per_node: None
    server_num: None
    servers: 
    training_script: sequence_labeling.py
    training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/trigger_tag.dict', '--train_data', './data/DuEE-Fin/trigger/train.tsv', '--dev_data', './data/DuEE-Fin/trigger/dev.tsv', '--test_data', './data/DuEE-Fin/trigger/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/trigger', '--init_ckpt', './ckpt/DuEE-Fin/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/trigger/test_pred.json', '--device', 'gpu']
    worker_num: None
    workers: 
    ------------------------------------------------
    WARNING 2021-04-10 16:29:19,740 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
    launch train in GPU mode
    INFO 2021-04-10 16:29:19,742 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
        +=======================================================================================+
        |                        Distributed Envs                      Value                    |
        +---------------------------------------------------------------------------------------+
        |                       PADDLE_TRAINER_ID                        0                      |
        |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:54382               |
        |                     PADDLE_TRAINERS_NUM                        1                      |
        |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:54382               |
        |                     FLAGS_selected_gpus                        0                      |
        +=======================================================================================+
    
    INFO 2021-04-10 16:29:19,742 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 16:29:20,983] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 16:29:20,997] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 16:29:20.998939   762 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 16:29:21.003577   762 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start train==========
    train epoch: 0 - step: 10 (total: 9080) - loss: 0.109321
    train epoch: 0 - step: 20 (total: 9080) - loss: 0.129953
    train epoch: 0 - step: 30 (total: 9080) - loss: 0.116185
    train epoch: 0 - step: 40 (total: 9080) - loss: 0.126599
    train epoch: 0 - step: 50 (total: 9080) - loss: 0.109494
    dev step: 50 - loss: 0.11120, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 60 (total: 9080) - loss: 0.111870
    train epoch: 0 - step: 70 (total: 9080) - loss: 0.156219
    train epoch: 0 - step: 80 (total: 9080) - loss: 0.104292
    train epoch: 0 - step: 90 (total: 9080) - loss: 0.129062
    train epoch: 0 - step: 100 (total: 9080) - loss: 0.116484
    dev step: 100 - loss: 0.10372, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 110 (total: 9080) - loss: 0.107833
    train epoch: 0 - step: 120 (total: 9080) - loss: 0.097913
    train epoch: 0 - step: 130 (total: 9080) - loss: 0.102398
    train epoch: 0 - step: 140 (total: 9080) - loss: 0.061798
    train epoch: 0 - step: 150 (total: 9080) - loss: 0.070677
    dev step: 150 - loss: 0.05695, precision: 0.25240, recall: 0.12324, f1: 0.16562 current best 0.00000
    ==============================================save best model best performerence 0.165618
    ……
    train epoch: 19 - step: 8660 (total: 9080) - loss: 0.000040
    train epoch: 19 - step: 8670 (total: 9080) - loss: 0.000292
    train epoch: 19 - step: 8680 (total: 9080) - loss: 0.000617
    train epoch: 19 - step: 8690 (total: 9080) - loss: 0.000061
    train epoch: 19 - step: 8700 (total: 9080) - loss: 0.000340
    dev step: 8700 - loss: 0.01594, precision: 0.86531, recall: 0.89704, f1: 0.88089 current best 0.89685
    train epoch: 19 - step: 8710 (total: 9080) - loss: 0.002070
    train epoch: 19 - step: 8720 (total: 9080) - loss: 0.000533
    train epoch: 19 - step: 8730 (total: 9080) - loss: 0.001161
    train epoch: 19 - step: 8740 (total: 9080) - loss: 0.007269
    train epoch: 19 - step: 8750 (total: 9080) - loss: 0.000043
    dev step: 8750 - loss: 0.01295, precision: 0.86478, recall: 0.90796, f1: 0.88584 current best 0.89685
    train epoch: 19 - step: 8760 (total: 9080) - loss: 0.002034
    train epoch: 19 - step: 8770 (total: 9080) - loss: 0.000233
    train epoch: 19 - step: 8780 (total: 9080) - loss: 0.000176
    train epoch: 19 - step: 8790 (total: 9080) - loss: 0.000349
    train epoch: 19 - step: 8800 (total: 9080) - loss: 0.001374
    dev step: 8800 - loss: 0.01408, precision: 0.86432, recall: 0.89938, f1: 0.88150 current best 0.89685
    train epoch: 19 - step: 8810 (total: 9080) - loss: 0.000389
    train epoch: 19 - step: 8820 (total: 9080) - loss: 0.003733
    train epoch: 19 - step: 8830 (total: 9080) - loss: 0.000166
    train epoch: 19 - step: 8840 (total: 9080) - loss: 0.000097
    train epoch: 19 - step: 8850 (total: 9080) - loss: 0.000143
    dev step: 8850 - loss: 0.01380, precision: 0.86353, recall: 0.90328, f1: 0.88296 current best 0.89685
    train epoch: 19 - step: 8860 (total: 9080) - loss: 0.000026
    train epoch: 19 - step: 8870 (total: 9080) - loss: 0.000193
    train epoch: 19 - step: 8880 (total: 9080) - loss: 0.001100
    train epoch: 19 - step: 8890 (total: 9080) - loss: 0.000031
    train epoch: 19 - step: 8900 (total: 9080) - loss: 0.000353
    dev step: 8900 - loss: 0.01387, precision: 0.88104, recall: 0.89548, f1: 0.88820 current best 0.89685
    train epoch: 19 - step: 8910 (total: 9080) - loss: 0.000200
    train epoch: 19 - step: 8920 (total: 9080) - loss: 0.000586
    train epoch: 19 - step: 8930 (total: 9080) - loss: 0.000042
    train epoch: 19 - step: 8940 (total: 9080) - loss: 0.000408
    train epoch: 19 - step: 8950 (total: 9080) - loss: 0.000845
    dev step: 8950 - loss: 0.01537, precision: 0.86103, recall: 0.91342, f1: 0.88645 current best 0.89685
    train epoch: 19 - step: 8960 (total: 9080) - loss: 0.000170
    train epoch: 19 - step: 8970 (total: 9080) - loss: 0.002247
    train epoch: 19 - step: 8980 (total: 9080) - loss: 0.000848
    train epoch: 19 - step: 8990 (total: 9080) - loss: 0.002282
    train epoch: 19 - step: 9000 (total: 9080) - loss: 0.000029
    dev step: 9000 - loss: 0.01638, precision: 0.88240, recall: 0.87207, f1: 0.87721 current best 0.89685
    train epoch: 19 - step: 9010 (total: 9080) - loss: 0.000446
    train epoch: 19 - step: 9020 (total: 9080) - loss: 0.000021
    train epoch: 19 - step: 9030 (total: 9080) - loss: 0.000486
    train epoch: 19 - step: 9040 (total: 9080) - loss: 0.003263
    train epoch: 19 - step: 9050 (total: 9080) - loss: 0.000346
    dev step: 9050 - loss: 0.01396, precision: 0.88304, recall: 0.88924, f1: 0.88613 current best 0.89685
    train epoch: 19 - step: 9060 (total: 9080) - loss: 0.000052
    train epoch: 19 - step: 9070 (total: 9080) - loss: 0.000063
    INFO 2021-04-10 17:34:32,659 launch.py:240] Local processes completed.
    end DuEE-Fin trigger train
    
    # 触发词识别预测
    !bash run_duee_fin.sh trigger_predict
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin trigger predict
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 17:34:34,610] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 17:34:34,624] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 17:34:34.625129  3383 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 17:34:34.629817  3383 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start predict==========
    Loaded parameters from ./ckpt/DuEE-Fin/trigger/best.pdparams
    save data 140867 to ./ckpt/DuEE-Fin/trigger/test_pred.json
    end DuEE-Fin trigger predict
    
    # 论元识别模型训练
    !bash run_duee_fin.sh role_train
    
    该条输出内容超过1000行,保存时将被截断
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin role train
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    -----------  Configuration Arguments -----------
    gpus: 0
    heter_worker_num: None
    heter_workers: 
    http_port: None
    ips: 127.0.0.1
    log_dir: log
    nproc_per_node: None
    server_num: None
    servers: 
    training_script: sequence_labeling.py
    training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/role_tag.dict', '--train_data', './data/DuEE-Fin/role/train.tsv', '--dev_data', './data/DuEE-Fin/role/dev.tsv', '--test_data', './data/DuEE-Fin/role/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/role', '--init_ckpt', './ckpt/DuEE-Fin/role/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/role/test_pred.json', '--device', 'gpu']
    worker_num: None
    workers: 
    ------------------------------------------------
    WARNING 2021-04-10 17:57:54,959 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
    launch train in GPU mode
    INFO 2021-04-10 17:57:54,961 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
        +=======================================================================================+
        |                        Distributed Envs                      Value                    |
        +---------------------------------------------------------------------------------------+
        |                       PADDLE_TRAINER_ID                        0                      |
        |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:44116               |
        |                     PADDLE_TRAINERS_NUM                        1                      |
        |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:44116               |
        |                     FLAGS_selected_gpus                        0                      |
        +=======================================================================================+
    
    INFO 2021-04-10 17:57:54,961 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 17:57:56,200] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 17:57:56,213] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 17:57:56.215006  4136 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 17:57:56.219677  4136 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start train==========
    train epoch: 0 - step: 10 (total: 11800) - loss: 1.228878
    train epoch: 0 - step: 20 (total: 11800) - loss: 1.163631
    train epoch: 0 - step: 30 (total: 11800) - loss: 1.130505
    train epoch: 0 - step: 40 (total: 11800) - loss: 1.303947
    train epoch: 0 - step: 50 (total: 11800) - loss: 1.111251
    dev step: 50 - loss: 1.14692, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 60 (total: 11800) - loss: 1.335606
    train epoch: 0 - step: 70 (total: 11800) - loss: 0.886442
    train epoch: 0 - step: 80 (total: 11800) - loss: 1.020030
    train epoch: 0 - step: 90 (total: 11800) - loss: 0.871939
    train epoch: 0 - step: 100 (total: 11800) - loss: 0.928532
    dev step: 100 - loss: 0.98844, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 110 (total: 11800) - loss: 1.005332
    train epoch: 0 - step: 120 (total: 11800) - loss: 0.769859
    train epoch: 0 - step: 130 (total: 11800) - loss: 0.761578
    train epoch: 0 - step: 140 (total: 11800) - loss: 0.653325
    train epoch: 0 - step: 150 (total: 11800) - loss: 0.899768
    dev step: 150 - loss: 0.71772, precision: 0.06080, recall: 0.00835, f1: 0.01468 current best 0.00000
    ==============================================save best model best performerence 0.014678
    train epoch: 0 - step: 160 (total: 11800) - loss: 0.690438
    train epoch: 0 - step: 170 (total: 11800) - loss: 0.774387
    train epoch: 0 - step: 180 (total: 11800) - loss: 0.615638
    train epoch: 0 - step: 190 (total: 11800) - loss: 0.483597
    train epoch: 0 - step: 200 (total: 11800) - loss: 0.571479
    dev step: 200 - loss: 0.52474, precision: 0.18197, recall: 0.12865, f1: 0.15073 current best 0.01468
    ==============================================save best model best performerence 0.150733
    train epoch: 0 - step: 210 (total: 11800) - loss: 0.540742
    train epoch: 0 - step: 220 (total: 11800) - loss: 0.524742
    train epoch: 0 - step: 230 (total: 11800) - loss: 0.464600
    train epoch: 0 - step: 240 (total: 11800) - loss: 0.478460
    train epoch: 0 - step: 250 (total: 11800) - loss: 0.523782
    dev step: 250 - loss: 0.42025, precision: 0.25433, recall: 0.23644, f1: 0.24506 current best 0.15073
    ==============================================save best model best performerence 0.245059
    train epoch: 0 - step: 260 (total: 11800) - loss: 0.374678
    train epoch: 0 - step: 270 (total: 11800) - loss: 0.530323
    train epoch: 0 - step: 280 (total: 11800) - loss: 0.325683
    train epoch: 0 - step: 290 (total: 11800) - loss: 0.375011
    train epoch: 0 - step: 300 (total: 11800) - loss: 0.385494
    dev step: 300 - loss: 0.34790, precision: 0.27753, recall: 0.26766, f1: 0.27251 current best 0.24506
    ==============================================save best model best performerence 0.272508
    train epoch: 0 - step: 310 (total: 11800) - loss: 0.353424
    train epoch: 0 - step: 320 (total: 11800) - loss: 0.410307
    train epoch: 0 - step: 330 (total: 11800) - loss: 0.322043
    train epoch: 0 - step: 340 (total: 11800) - loss: 0.384293
    train epoch: 0 - step: 350 (total: 11800) - loss: 0.271734
    dev step: 350 - loss: 0.30927, precision: 0.33494, recall: 0.44913, f1: 0.38372 current best 0.27251
    ==============================================save best model best performerence 0.383722
    train epoch: 0 - step: 360 (total: 11800) - loss: 0.424462
    train epoch: 0 - step: 370 (total: 11800) - loss: 0.398466
    train epoch: 0 - step: 380 (total: 11800) - loss: 0.220276
    train epoch: 0 - step: 390 (total: 11800) - loss: 0.329981
    train epoch: 0 - step: 400 (total: 11800) - loss: 0.291278
    dev step: 400 - loss: 0.28080, precision: 0.37307, recall: 0.44899, f1: 0.40752 current best 0.38372
    ==============================================save best model best performerence 0.407524
    train epoch: 0 - step: 410 (total: 11800) - loss: 0.315920
    train epoch: 0 - step: 420 (total: 11800) - loss: 0.335757
    train epoch: 0 - step: 430 (total: 11800) - loss: 0.331377
    train epoch: 0 - step: 440 (total: 11800) - loss: 0.339501
    train epoch: 0 - step: 450 (total: 11800) - loss: 0.216479
    dev step: 450 - loss: 0.27126, precision: 0.42649, recall: 0.48424, f1: 0.45353 current best 0.40752
    ==============================================save best model best performerence 0.453535
    train epoch: 0 - step: 460 (total: 11800) - loss: 0.334343
    train epoch: 0 - step: 470 (total: 11800) - loss: 0.246070
    train epoch: 0 - step: 480 (total: 11800) - loss: 0.266857
    train epoch: 0 - step: 490 (total: 11800) - loss: 0.262747
    train epoch: 0 - step: 500 (total: 11800) - loss: 0.250897
    dev step: 500 - loss: 0.25047, precision: 0.47231, recall: 0.60383, f1: 0.53003 current best 0.45353
    ==============================================save best model best performerence 0.530032
    train epoch: 0 - step: 510 (total: 11800) - loss: 0.223253
    train epoch: 0 - step: 520 (total: 11800) - loss: 0.228720
    train epoch: 0 - step: 530 (total: 11800) - loss: 0.246290
    train epoch: 0 - step: 540 (total: 11800) - loss: 0.287393
    train epoch: 0 - step: 550 (total: 11800) - loss: 0.297358
    dev step: 550 - loss: 0.24383, precision: 0.49097, recall: 0.55548, f1: 0.52123 current best 0.53003
    train epoch: 0 - step: 560 (total: 11800) - loss: 0.266396
    train epoch: 0 - step: 570 (total: 11800) - loss: 0.296538
    train epoch: 0 - step: 580 (total: 11800) - loss: 0.210442
    train epoch: 1 - step: 590 (total: 11800) - loss: 0.282502
    train epoch: 1 - step: 600 (total: 11800) - loss: 0.239531
    dev step: 600 - loss: 0.22736, precision: 0.49346, recall: 0.61347, f1: 0.54696 current best 0.53003
    ==============================================save best model best performerence 0.546959
    train epoch: 1 - step: 610 (total: 11800) - loss: 0.281700
    train epoch: 1 - step: 620 (total: 11800) - loss: 0.291554
    train epoch: 1 - step: 630 (total: 11800) - loss: 0.284449
    train epoch: 1 - step: 640 (total: 11800) - loss: 0.175821
    train epoch: 1 - step: 650 (total: 11800) - loss: 0.234460
    dev step: 650 - loss: 0.22660, precision: 0.50054, recall: 0.66628, f1: 0.57164 current best 0.54696
    ==============================================save best model best performerence 0.571640
    train epoch: 1 - step: 660 (total: 11800) - loss: 0.253709
    train epoch: 1 - step: 670 (total: 11800) - loss: 0.206524
    train epoch: 1 - step: 680 (total: 11800) - loss: 0.273749
    train epoch: 1 - step: 690 (total: 11800) - loss: 0.267098
    train epoch: 1 - step: 700 (total: 11800) - loss: 0.221125
    dev step: 700 - loss: 0.22382, precision: 0.50251, recall: 0.62052, f1: 0.55531 current best 0.57164
    train epoch: 1 - step: 710 (total: 11800) - loss: 0.194055
    train epoch: 1 - step: 720 (total: 11800) - loss: 0.213713
    train epoch: 1 - step: 730 (total: 11800) - loss: 0.266367
    train epoch: 1 - step: 740 (total: 11800) - loss: 0.265232
    train epoch: 1 - step: 750 (total: 11800) - loss: 0.222215
    dev step: 750 - loss: 0.23990, precision: 0.49661, recall: 0.71780, f1: 0.58707 current best 0.57164
    ==============================================save best model best performerence 0.587065
    ……
    train epoch: 19 - step: 11210 (total: 11800) - loss: 0.071786
    train epoch: 19 - step: 11220 (total: 11800) - loss: 0.126563
    train epoch: 19 - step: 11230 (total: 11800) - loss: 0.079284
    train epoch: 19 - step: 11240 (total: 11800) - loss: 0.097921
    train epoch: 19 - step: 11250 (total: 11800) - loss: 0.082845
    dev step: 11250 - loss: 0.26768, precision: 0.60864, recall: 0.73406, f1: 0.66549 current best 0.68086
    train epoch: 19 - step: 11260 (total: 11800) - loss: 0.040633
    train epoch: 19 - step: 11270 (total: 11800) - loss: 0.036113
    train epoch: 19 - step: 11280 (total: 11800) - loss: 0.090494
    train epoch: 19 - step: 11290 (total: 11800) - loss: 0.058005
    train epoch: 19 - step: 11300 (total: 11800) - loss: 0.086870
    dev step: 11300 - loss: 0.27434, precision: 0.65781, recall: 0.68772, f1: 0.67244 current best 0.68086
    train epoch: 19 - step: 11310 (total: 11800) - loss: 0.092861
    train epoch: 19 - step: 11320 (total: 11800) - loss: 0.081821
    train epoch: 19 - step: 11330 (total: 11800) - loss: 0.093358
    train epoch: 19 - step: 11340 (total: 11800) - loss: 0.041281
    train epoch: 19 - step: 11350 (total: 11800) - loss: 0.072158
    dev step: 11350 - loss: 0.26591, precision: 0.63945, recall: 0.72125, f1: 0.67789 current best 0.68086
    train epoch: 19 - step: 11360 (total: 11800) - loss: 0.056884
    train epoch: 19 - step: 11370 (total: 11800) - loss: 0.103474
    train epoch: 19 - step: 11380 (total: 11800) - loss: 0.053013
    train epoch: 19 - step: 11390 (total: 11800) - loss: 0.120952
    train epoch: 19 - step: 11400 (total: 11800) - loss: 0.096058
    dev step: 11400 - loss: 0.28324, precision: 0.59984, recall: 0.73752, f1: 0.66159 current best 0.68086
    train epoch: 19 - step: 11410 (total: 11800) - loss: 0.053519
    train epoch: 19 - step: 11420 (total: 11800) - loss: 0.084413
    train epoch: 19 - step: 11430 (total: 11800) - loss: 0.082539
    train epoch: 19 - step: 11440 (total: 11800) - loss: 0.025818
    train epoch: 19 - step: 11450 (total: 11800) - loss: 0.104579
    dev step: 11450 - loss: 0.27601, precision: 0.62382, recall: 0.71161, f1: 0.66483 current best 0.68086
    train epoch: 19 - step: 11460 (total: 11800) - loss: 0.023326
    train epoch: 19 - step: 11470 (total: 11800) - loss: 0.074468
    train epoch: 19 - step: 11480 (total: 11800) - loss: 0.131153
    train epoch: 19 - step: 11490 (total: 11800) - loss: 0.144081
    train epoch: 19 - step: 11500 (total: 11800) - loss: 0.059301
    dev step: 11500 - loss: 0.24404, precision: 0.63090, recall: 0.69881, f1: 0.66312 current best 0.68086
    train epoch: 19 - step: 11510 (total: 11800) - loss: 0.087042
    train epoch: 19 - step: 11520 (total: 11800) - loss: 0.103437
    train epoch: 19 - step: 11530 (total: 11800) - loss: 0.141086
    train epoch: 19 - step: 11540 (total: 11800) - loss: 0.073799
    train epoch: 19 - step: 11550 (total: 11800) - loss: 0.080609
    dev step: 11550 - loss: 0.26010, precision: 0.63815, recall: 0.71392, f1: 0.67391 current best 0.68086
    train epoch: 19 - step: 11560 (total: 11800) - loss: 0.070097
    train epoch: 19 - step: 11570 (total: 11800) - loss: 0.080336
    train epoch: 19 - step: 11580 (total: 11800) - loss: 0.083600
    train epoch: 19 - step: 11590 (total: 11800) - loss: 0.094290
    train epoch: 19 - step: 11600 (total: 11800) - loss: 0.070526
    dev step: 11600 - loss: 0.26730, precision: 0.63843, recall: 0.73536, f1: 0.68347 current best 0.68086
    ==============================================save best model best performerence 0.683475
    train epoch: 19 - step: 11610 (total: 11800) - loss: 0.081728
    train epoch: 19 - step: 11620 (total: 11800) - loss: 0.063919
    train epoch: 19 - step: 11630 (total: 11800) - loss: 0.126019
    train epoch: 19 - step: 11640 (total: 11800) - loss: 0.104756
    train epoch: 19 - step: 11650 (total: 11800) - loss: 0.077707
    dev step: 11650 - loss: 0.25038, precision: 0.63025, recall: 0.72140, f1: 0.67275 current best 0.68347
    train epoch: 19 - step: 11660 (total: 11800) - loss: 0.092881
    train epoch: 19 - step: 11670 (total: 11800) - loss: 0.068379
    train epoch: 19 - step: 11680 (total: 11800) - loss: 0.046535
    train epoch: 19 - step: 11690 (total: 11800) - loss: 0.078183
    train epoch: 19 - step: 11700 (total: 11800) - loss: 0.104983
    dev step: 11700 - loss: 0.26015, precision: 0.64215, recall: 0.70471, f1: 0.67197 current best 0.68347
    train epoch: 19 - step: 11710 (total: 11800) - loss: 0.086539
    train epoch: 19 - step: 11720 (total: 11800) - loss: 0.118713
    train epoch: 19 - step: 11730 (total: 11800) - loss: 0.081435
    train epoch: 19 - step: 11740 (total: 11800) - loss: 0.073214
    train epoch: 19 - step: 11750 (total: 11800) - loss: 0.129037
    dev step: 11750 - loss: 0.25711, precision: 0.62550, recall: 0.68067, f1: 0.65192 current best 0.68347
    train epoch: 19 - step: 11760 (total: 11800) - loss: 0.117920
    train epoch: 19 - step: 11770 (total: 11800) - loss: 0.048488
    train epoch: 19 - step: 11780 (total: 11800) - loss: 0.095776
    train epoch: 19 - step: 11790 (total: 11800) - loss: 0.122794
    INFO 2021-04-10 19:32:21,529 launch.py:240] Local processes completed.
    end DuEE-Fin role train
    
    # 论元识别预测
    !bash run_duee_fin.sh role_predict
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin role predict
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 19:32:29,053] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 19:32:29,067] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 19:32:29.068078  7827 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 19:32:29.072537  7827 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start predict==========
    Loaded parameters from ./ckpt/DuEE-Fin/role/best.pdparams
    save data 140867 to ./ckpt/DuEE-Fin/role/test_pred.json
    end DuEE-Fin role predict
    
    # 枚举分类模型训练
    !bash run_duee_fin.sh enum_train
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin enum train
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    -----------  Configuration Arguments -----------
    gpus: 0
    heter_worker_num: None
    heter_workers: 
    http_port: None
    ips: 127.0.0.1
    log_dir: log
    nproc_per_node: None
    server_num: None
    servers: 
    training_script: classifier.py
    training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/enum_tag.dict', '--train_data', './data/DuEE-Fin/enum/train.tsv', '--dev_data', './data/DuEE-Fin/enum/dev.tsv', '--test_data', './data/DuEE-Fin/enum/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '1', '--valid_step', '5', '--checkpoints', './ckpt/DuEE-Fin/enum', '--init_ckpt', './ckpt/DuEE-Fin/enum/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/enum/test_pred.json', '--device', 'gpu']
    worker_num: None
    workers: 
    ------------------------------------------------
    WARNING 2021-04-10 19:52:37,709 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
    launch train in GPU mode
    INFO 2021-04-10 19:52:37,711 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
        +=======================================================================================+
        |                        Distributed Envs                      Value                    |
        +---------------------------------------------------------------------------------------+
        |                       PADDLE_TRAINER_ID                        0                      |
        |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:53319               |
        |                     PADDLE_TRAINERS_NUM                        1                      |
        |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:53319               |
        |                     FLAGS_selected_gpus                        0                      |
        +=======================================================================================+
    
    INFO 2021-04-10 19:52:37,711 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 19:52:38,983] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 19:52:38.984846  8459 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 19:52:38.990355  8459 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
      warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
      warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py:423: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card.
      warnings.warn("The program will return to single-card operation. "
    [2021-04-10 19:52:45,669] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    ============start train==========
    train epoch: 0 - step: 1 (total: 540) - loss: 1.816590 acc 0.00000
    train epoch: 0 - step: 2 (total: 540) - loss: 1.258928 acc 0.16667
    train epoch: 0 - step: 3 (total: 540) - loss: 1.420988 acc 0.21875
    train epoch: 0 - step: 4 (total: 540) - loss: 1.131907 acc 0.27500
    train epoch: 0 - step: 5 (total: 540) - loss: 1.223589 acc 0.29167
    dev step: 5 - loss: 1.056646 accuracy: 0.57353, current best 0.00000
    ==============================================save best model best performerence 0.573529
    train epoch: 0 - step: 6 (total: 540) - loss: 0.891011 acc 0.62500
    train epoch: 0 - step: 7 (total: 540) - loss: 1.019258 acc 0.53125
    train epoch: 0 - step: 8 (total: 540) - loss: 0.944579 acc 0.54167
    train epoch: 0 - step: 9 (total: 540) - loss: 0.998457 acc 0.54688
    train epoch: 0 - step: 10 (total: 540) - loss: 1.451570 acc 0.52500
    dev step: 10 - loss: 0.973503 accuracy: 0.58824, current best 0.57353
    ==============================================save best model best performerence 0.588235
    train epoch: 0 - step: 11 (total: 540) - loss: 1.007745 acc 0.50000
    train epoch: 0 - step: 12 (total: 540) - loss: 0.987179 acc 0.56250
    train epoch: 0 - step: 13 (total: 540) - loss: 1.315943 acc 0.54167
    train epoch: 0 - step: 14 (total: 540) - loss: 0.999895 acc 0.53125
    train epoch: 0 - step: 15 (total: 540) - loss: 1.151808 acc 0.51250
    dev step: 15 - loss: 0.960856 accuracy: 0.57353, current best 0.58824
    train epoch: 0 - step: 16 (total: 540) - loss: 0.993396 acc 0.50000
    train epoch: 0 - step: 17 (total: 540) - loss: 0.963157 acc 0.56250
    train epoch: 0 - step: 18 (total: 540) - loss: 1.068855 acc 0.58333
    train epoch: 0 - step: 19 (total: 540) - loss: 0.926241 acc 0.53125
    train epoch: 0 - step: 20 (total: 540) - loss: 1.040999 acc 0.55000
    dev step: 20 - loss: 0.976091 accuracy: 0.57353, current best 0.58824
    train epoch: 0 - step: 21 (total: 540) - loss: 0.889343 acc 0.56250
    train epoch: 0 - step: 22 (total: 540) - loss: 1.093462 acc 0.53125
    train epoch: 0 - step: 23 (total: 540) - loss: 0.737294 acc 0.60417
    train epoch: 0 - step: 24 (total: 540) - loss: 0.808597 acc 0.64062
    train epoch: 0 - step: 25 (total: 540) - loss: 1.001462 acc 0.62500
    dev step: 25 - loss: 0.890632 accuracy: 0.58824, current best 0.58824
    train epoch: 0 - step: 26 (total: 540) - loss: 1.133129 acc 0.58333
    train epoch: 1 - step: 27 (total: 540) - loss: 0.722086 acc 0.60714
    train epoch: 1 - step: 28 (total: 540) - loss: 1.116035 acc 0.59091
    train epoch: 1 - step: 29 (total: 540) - loss: 0.887589 acc 0.61667
    train epoch: 1 - step: 30 (total: 540) - loss: 0.892591 acc 0.63158
    dev step: 30 - loss: 0.789007 accuracy: 0.66176, current best 0.58824
    ==============================================save best model best performerence 0.661765
    train epoch: 1 - step: 31 (total: 540) - loss: 0.553415 acc 0.93750
    train epoch: 1 - step: 32 (total: 540) - loss: 0.908041 acc 0.81250
    train epoch: 1 - step: 33 (total: 540) - loss: 0.635944 acc 0.81250
    train epoch: 1 - step: 34 (total: 540) - loss: 0.589399 acc 0.79688
    train epoch: 1 - step: 35 (total: 540) - loss: 0.848807 acc 0.75000
    dev step: 35 - loss: 0.724788 accuracy: 0.73529, current best 0.66176
    ==============================================save best model best performerence 0.735294
    train epoch: 1 - step: 36 (total: 540) - loss: 0.357636 acc 0.87500
    train epoch: 1 - step: 37 (total: 540) - loss: 0.589867 acc 0.87500
    train epoch: 1 - step: 38 (total: 540) - loss: 0.742335 acc 0.81250
    train epoch: 1 - step: 39 (total: 540) - loss: 0.882202 acc 0.76562
    train epoch: 1 - step: 40 (total: 540) - loss: 0.428002 acc 0.78750
    dev step: 40 - loss: 0.696543 accuracy: 0.76471, current best 0.73529
    ==============================================save best model best performerence 0.764706
    train epoch: 1 - step: 41 (total: 540) - loss: 1.359658 acc 0.50000
    train epoch: 1 - step: 42 (total: 540) - loss: 1.061078 acc 0.59375
    train epoch: 1 - step: 43 (total: 540) - loss: 0.830923 acc 0.60417
    train epoch: 1 - step: 44 (total: 540) - loss: 1.215348 acc 0.59375
    train epoch: 1 - step: 45 (total: 540) - loss: 0.437100 acc 0.65000
    dev step: 45 - loss: 0.735505 accuracy: 0.76471, current best 0.76471
    train epoch: 1 - step: 46 (total: 540) - loss: 0.742862 acc 0.68750
    train epoch: 1 - step: 47 (total: 540) - loss: 0.711089 acc 0.68750
    train epoch: 1 - step: 48 (total: 540) - loss: 0.544343 acc 0.72917
    train epoch: 1 - step: 49 (total: 540) - loss: 0.928760 acc 0.67188
    train epoch: 1 - step: 50 (total: 540) - loss: 0.650753 acc 0.70000
    dev step: 50 - loss: 0.666267 accuracy: 0.80882, current best 0.76471
    ==============================================save best model best performerence 0.808824
    train epoch: 1 - step: 51 (total: 540) - loss: 0.561961 acc 0.81250
    train epoch: 1 - step: 52 (total: 540) - loss: 0.444493 acc 0.84375
    train epoch: 1 - step: 53 (total: 540) - loss: 0.727330 acc 0.81818
    train epoch: 2 - step: 54 (total: 540) - loss: 0.535819 acc 0.85000
    train epoch: 2 - step: 55 (total: 540) - loss: 0.804540 acc 0.80263
    dev step: 55 - loss: 0.748626 accuracy: 0.75000, current best 0.80882
    ……
    train epoch: 19 - step: 521 (total: 540) - loss: 0.001116 acc 1.00000
    train epoch: 19 - step: 522 (total: 540) - loss: 0.001323 acc 1.00000
    train epoch: 19 - step: 523 (total: 540) - loss: 0.000761 acc 1.00000
    train epoch: 19 - step: 524 (total: 540) - loss: 0.000776 acc 1.00000
    train epoch: 19 - step: 525 (total: 540) - loss: 0.000688 acc 1.00000
    dev step: 525 - loss: 0.963112 accuracy: 0.83824, current best 0.86765
    train epoch: 19 - step: 526 (total: 540) - loss: 0.001005 acc 1.00000
    train epoch: 19 - step: 527 (total: 540) - loss: 0.000491 acc 1.00000
    train epoch: 19 - step: 528 (total: 540) - loss: 0.000759 acc 1.00000
    train epoch: 19 - step: 529 (total: 540) - loss: 0.000579 acc 1.00000
    train epoch: 19 - step: 530 (total: 540) - loss: 0.000592 acc 1.00000
    dev step: 530 - loss: 0.965140 accuracy: 0.83824, current best 0.86765
    train epoch: 19 - step: 531 (total: 540) - loss: 0.000727 acc 1.00000
    train epoch: 19 - step: 532 (total: 540) - loss: 0.000827 acc 1.00000
    train epoch: 19 - step: 533 (total: 540) - loss: 0.002026 acc 1.00000
    train epoch: 19 - step: 534 (total: 540) - loss: 0.001417 acc 1.00000
    train epoch: 19 - step: 535 (total: 540) - loss: 0.000947 acc 1.00000
    dev step: 535 - loss: 0.967908 accuracy: 0.83824, current best 0.86765
    train epoch: 19 - step: 536 (total: 540) - loss: 0.000558 acc 1.00000
    train epoch: 19 - step: 537 (total: 540) - loss: 0.000692 acc 1.00000
    train epoch: 19 - step: 538 (total: 540) - loss: 0.001994 acc 1.00000
    train epoch: 19 - step: 539 (total: 540) - loss: 0.000524 acc 1.00000
    INFO 2021-04-10 19:56:40,966 launch.py:240] Local processes completed.
    end DuEE-Fin enum train
    
    # 枚举分类预测
    !bash run_duee_fin.sh enum_predict
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin enum predict
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 19:56:50,581] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 19:56:50.583134  9015 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 19:56:50.588418  9015 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
      warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
      warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py:423: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card.
      warnings.warn("The program will return to single-card operation. "
    [2021-04-10 19:56:57,202] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    ============start predict==========
    Loaded parameters from ./ckpt/DuEE-Fin/enum/best.pdparams
    save data 140867 to ./ckpt/DuEE-Fin/enum/test_pred.json
    end DuEE-Fin enum predict
    

    1.5 快速复现基线Step5:数据后处理,提交结果

      按照比赛预测指定格式提交结果至评测网站。 结果存放于submit/test_duee_fin.json

    !bash run_duee_fin.sh pred_2_submit
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE-Fin exist
    dir ./submit exist
    
    start DuEE-Fin predict data merge to submit fotmat
    trigger predict 140867 load from ./ckpt/DuEE-Fin/trigger/test_pred.json
    role predict 140867 load from ./ckpt/DuEE-Fin/role/test_pred.json
    enum predict 140867 load from ./ckpt/DuEE-Fin/enum/test_pred.json
    schema 13 load from ./conf/DuEE-Fin/event_schema.json
    submit data 30000 save to ./submit/test_duee_fin.json
    end DuEE-Fin role predict data merge
    

    二、句子级事件抽取基线

      句子级别通用领域的事件抽取数据集(DuEE 1.0)上进行事件抽取的基线模型,该模型采用基于ERNIE的序列标注(sequence labeling)方案,分为基于序列标注的触发词抽取模型基于序列标注的论元抽取模型,属于PipeLine模型;基于序列标注的触发词抽取模型采用BIO方式,识别触发词的位置以及对应的事件类型,基于序列标注的论元抽取模型采用BIO方式识别出事件中的论元以及对应的论元角色。模型和数据处理方式与篇章级事件抽取相同,此处不再赘述。句子级别通用领域的事件抽取无枚举角色分类。

    # 数据预处理
    !bash run_duee_1.sh data_prepare
    
    # 训练触发词识别模型
    !bash run_duee_1.sh trigger_train
    
    该条输出内容超过1000行,保存时将被截断
    
    check and create directory
    dir ./ckpt exist
    create dir * ./ckpt/DuEE1.0 *
    dir ./submit exist
    
    start DuEE1.0 data prepare
    
    ===============DUEE 1.0 DATASET==============
    
    =================start schema process==============
    input path ./conf/DuEE1.0/event_schema.json
    save trigger tag 131 at ./conf/DuEE1.0/trigger_tag.dict
    save trigger tag 243 at ./conf/DuEE1.0/role_tag.dict
    =================end schema process===============
    
    =================start schema process==============
    
    ----trigger------for dir ./data/DuEE1.0 to ./data/DuEE1.0/trigger
    train 11959 dev 1499
    
    ----role------for dir ./data/DuEE1.0 to ./data/DuEE1.0/role
    train 13916 dev 1791 test 1
    =================end schema process==============
    end DuEE1.0 data prepare
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE1.0 exist
    dir ./submit exist
    
    start DuEE1.0 trigger train
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    -----------  Configuration Arguments -----------
    gpus: 0
    heter_worker_num: None
    heter_workers: 
    http_port: None
    ips: 127.0.0.1
    log_dir: log
    nproc_per_node: None
    server_num: None
    servers: 
    training_script: sequence_labeling.py
    training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE1.0/trigger_tag.dict', '--train_data', './data/DuEE1.0/trigger/train.tsv', '--dev_data', './data/DuEE1.0/trigger/dev.tsv', '--test_data', './data/DuEE1.0/trigger/test.tsv', '--predict_data', './data/DuEE1.0/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE1.0/trigger', '--init_ckpt', './ckpt/DuEE1.0/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE1.0/trigger/test_pred.json', '--device', 'gpu']
    worker_num: None
    workers: 
    ------------------------------------------------
    WARNING 2021-04-10 20:12:04,884 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
    launch train in GPU mode
    INFO 2021-04-10 20:12:04,886 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
        +=======================================================================================+
        |                        Distributed Envs                      Value                    |
        +---------------------------------------------------------------------------------------+
        |                       PADDLE_TRAINER_ID                        0                      |
        |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:44437               |
        |                     PADDLE_TRAINERS_NUM                        1                      |
        |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:44437               |
        |                     FLAGS_selected_gpus                        0                      |
        +=======================================================================================+
    
    INFO 2021-04-10 20:12:04,886 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 20:12:06,137] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 20:12:06,151] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 20:12:06.152766  9531 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 20:12:06.157284  9531 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start train==========
    train epoch: 0 - step: 10 (total: 14960) - loss: 0.399632
    train epoch: 0 - step: 20 (total: 14960) - loss: 0.439437
    train epoch: 0 - step: 30 (total: 14960) - loss: 0.408838
    train epoch: 0 - step: 40 (total: 14960) - loss: 0.298826
    train epoch: 0 - step: 50 (total: 14960) - loss: 0.394555
    dev step: 50 - loss: 0.36327, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 60 (total: 14960) - loss: 0.485982
    train epoch: 0 - step: 70 (total: 14960) - loss: 0.250205
    train epoch: 0 - step: 80 (total: 14960) - loss: 0.382578
    train epoch: 0 - step: 90 (total: 14960) - loss: 0.202613
    train epoch: 0 - step: 100 (total: 14960) - loss: 0.309972
    dev step: 100 - loss: 0.35608, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 110 (total: 14960) - loss: 0.310728
    train epoch: 0 - step: 120 (total: 14960) - loss: 0.324738
    train epoch: 0 - step: 130 (total: 14960) - loss: 0.262632
    train epoch: 0 - step: 140 (total: 14960) - loss: 0.432903
    train epoch: 0 - step: 150 (total: 14960) - loss: 0.436539
    dev step: 150 - loss: 0.35624, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 160 (total: 14960) - loss: 0.485794
    train epoch: 0 - step: 170 (total: 14960) - loss: 0.315029
    train epoch: 0 - step: 180 (total: 14960) - loss: 0.284743
    train epoch: 0 - step: 190 (total: 14960) - loss: 0.259944
    train epoch: 0 - step: 200 (total: 14960) - loss: 0.311902
    dev step: 200 - loss: 0.33042, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 210 (total: 14960) - loss: 0.330571
    train epoch: 0 - step: 220 (total: 14960) - loss: 0.273139
    train epoch: 0 - step: 230 (total: 14960) - loss: 0.378063
    train epoch: 0 - step: 240 (total: 14960) - loss: 0.250299
    train epoch: 0 - step: 250 (total: 14960) - loss: 0.290701
    dev step: 250 - loss: 0.29563, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 260 (total: 14960) - loss: 0.202284
    train epoch: 0 - step: 270 (total: 14960) - loss: 0.180812
    train epoch: 0 - step: 280 (total: 14960) - loss: 0.238939
    train epoch: 0 - step: 290 (total: 14960) - loss: 0.256409
    train epoch: 0 - step: 300 (total: 14960) - loss: 0.192298
    dev step: 300 - loss: 0.19781, precision: 0.28393, recall: 0.17765, f1: 0.21856 current best 0.00000
    ==============================================save best model best performerence 0.218557
    train epoch: 0 - step: 310 (total: 14960) - loss: 0.236116
    train epoch: 0 - step: 320 (total: 14960) - loss: 0.185691
    train epoch: 0 - step: 330 (total: 14960) - loss: 0.150023
    train epoch: 0 - step: 340 (total: 14960) - loss: 0.160092
    train epoch: 0 - step: 350 (total: 14960) - loss: 0.251915
    dev step: 350 - loss: 0.16887, precision: 0.41444, recall: 0.23408, f1: 0.29918 current best 0.21856
    ==============================================save best model best performerence 0.299179
    train epoch: 0 - step: 360 (total: 14960) - loss: 0.226977
    train epoch: 0 - step: 370 (total: 14960) - loss: 0.157772
    train epoch: 0 - step: 380 (total: 14960) - loss: 0.204087
    train epoch: 0 - step: 390 (total: 14960) - loss: 0.193559
    train epoch: 0 - step: 400 (total: 14960) - loss: 0.076721
    dev step: 400 - loss: 0.14077, precision: 0.40486, recall: 0.33520, f1: 0.36675 current best 0.29918
    ==============================================save best model best performerence 0.366748
    train epoch: 0 - step: 410 (total: 14960) - loss: 0.132487
    train epoch: 0 - step: 420 (total: 14960) - loss: 0.234711
    train epoch: 0 - step: 430 (total: 14960) - loss: 0.146011
    train epoch: 0 - step: 440 (total: 14960) - loss: 0.182145
    train epoch: 0 - step: 450 (total: 14960) - loss: 0.124297
    dev step: 450 - loss: 0.11585, precision: 0.47749, recall: 0.47989, f1: 0.47868 current best 0.36675
    ==============================================save best model best performerence 0.478685
    train epoch: 0 - step: 460 (total: 14960) - loss: 0.128533
    train epoch: 0 - step: 470 (total: 14960) - loss: 0.232507
    train epoch: 0 - step: 480 (total: 14960) - loss: 0.138922
    train epoch: 0 - step: 490 (total: 14960) - loss: 0.063667
    train epoch: 0 - step: 500 (total: 14960) - loss: 0.067490
    dev step: 500 - loss: 0.09702, precision: 0.55856, recall: 0.51955, f1: 0.53835 current best 0.47868
    ==============================================save best model best performerence 0.538350
    train epoch: 0 - step: 510 (total: 14960) - loss: 0.076103
    train epoch: 0 - step: 520 (total: 14960) - loss: 0.057995
    train epoch: 0 - step: 530 (total: 14960) - loss: 0.066106
    train epoch: 0 - step: 540 (total: 14960) - loss: 0.122683
    train epoch: 0 - step: 550 (total: 14960) - loss: 0.106140
    dev step: 550 - loss: 0.08321, precision: 0.62119, recall: 0.59274, f1: 0.60663 current best 0.53835
    ==============================================save best model best performerence 0.606632
    train epoch: 0 - step: 560 (total: 14960) - loss: 0.039723
    train epoch: 0 - step: 570 (total: 14960) - loss: 0.093354
    train epoch: 0 - step: 580 (total: 14960) - loss: 0.125624
    train epoch: 0 - step: 590 (total: 14960) - loss: 0.056028
    train epoch: 0 - step: 600 (total: 14960) - loss: 0.050333
    dev step: 600 - loss: 0.07346, precision: 0.67859, recall: 0.63575, f1: 0.65648 current best 0.60663
    ==============================================save best model best performerence 0.656475
    train epoch: 0 - step: 610 (total: 14960) - loss: 0.106334
    train epoch: 0 - step: 620 (total: 14960) - loss: 0.106583
    train epoch: 0 - step: 630 (total: 14960) - loss: 0.060192
    train epoch: 0 - step: 640 (total: 14960) - loss: 0.032199
    train epoch: 0 - step: 650 (total: 14960) - loss: 0.104459
    dev step: 650 - loss: 0.06579, precision: 0.69209, recall: 0.68939, f1: 0.69074 current best 0.65648
    ==============================================save best model best performerence 0.690736
    train epoch: 0 - step: 660 (total: 14960) - loss: 0.068539
    train epoch: 0 - step: 670 (total: 14960) - loss: 0.059690
    train epoch: 0 - step: 680 (total: 14960) - loss: 0.064414
    train epoch: 0 - step: 690 (total: 14960) - loss: 0.085624
    train epoch: 0 - step: 700 (total: 14960) - loss: 0.064715
    dev step: 700 - loss: 0.06439, precision: 0.68861, recall: 0.69553, f1: 0.69205 current best 0.69074
    ==============================================save best model best performerence 0.692051
    train epoch: 0 - step: 710 (total: 14960) - loss: 0.071924
    train epoch: 0 - step: 720 (total: 14960) - loss: 0.064167
    train epoch: 0 - step: 730 (total: 14960) - loss: 0.053353
    train epoch: 0 - step: 740 (total: 14960) - loss: 0.084605
    train epoch: 1 - step: 750 (total: 14960) - loss: 0.071954
    dev step: 750 - loss: 0.05509, precision: 0.71468, recall: 0.73184, f1: 0.72316 current best 0.69205
    ==============================================save best model best performerence 0.723158
    train epoch: 1 - step: 760 (total: 14960) - loss: 0.063369
    train epoch: 1 - step: 770 (total: 14960) - loss: 0.010517
    train epoch: 1 - step: 780 (total: 14960) - loss: 0.053650
    train epoch: 1 - step: 790 (total: 14960) - loss: 0.042259
    train epoch: 1 - step: 800 (total: 14960) - loss: 0.032458
    dev step: 800 - loss: 0.05442, precision: 0.70917, recall: 0.77374, f1: 0.74005 current best 0.72316
    ==============================================save best model best performerence 0.740048
    train epoch: 1 - step: 810 (total: 14960) - loss: 0.056759
    train epoch: 1 - step: 820 (total: 14960) - loss: 0.027823
    train epoch: 1 - step: 830 (total: 14960) - loss: 0.047783
    train epoch: 1 - step: 840 (total: 14960) - loss: 0.038662
    train epoch: 1 - step: 850 (total: 14960) - loss: 0.085002
    dev step: 850 - loss: 0.05003, precision: 0.72125, recall: 0.80223, f1: 0.75959 current best 0.74005
    ==============================================save best model best performerence 0.759587
    train epoch: 1 - step: 860 (total: 14960) - loss: 0.022502
    train epoch: 1 - step: 870 (total: 14960) - loss: 0.039028
    train epoch: 1 - step: 880 (total: 14960) - loss: 0.042963
    train epoch: 1 - step: 890 (total: 14960) - loss: 0.045788
    train epoch: 1 - step: 900 (total: 14960) - loss: 0.026486
    dev step: 900 - loss: 0.04721, precision: 0.74372, recall: 0.84302, f1: 0.79026 current best 0.75959
    ==============================================save best model best performerence 0.790259
    train epoch: 1 - step: 910 (total: 14960) - loss: 0.032655
    train epoch: 1 - step: 920 (total: 14960) - loss: 0.021889
    train epoch: 1 - step: 930 (total: 14960) - loss: 0.033798
    train epoch: 1 - step: 940 (total: 14960) - loss: 0.060657
    train epoch: 1 - step: 950 (total: 14960) - loss: 0.019720
    dev step: 950 - loss: 0.04749, precision: 0.73062, recall: 0.84246, f1: 0.78256 current best 0.79026
    train epoch: 1 - step: 960 (total: 14960) - loss: 0.037086
    train epoch: 1 - step: 970 (total: 14960) - loss: 0.027883
    train epoch: 1 - step: 980 (total: 14960) - loss: 0.044426
    train epoch: 1 - step: 990 (total: 14960) - loss: 0.021761
    train epoch: 1 - step: 1000 (total: 14960) - loss: 0.044189
    dev step: 1000 - loss: 0.04534, precision: 0.79933, recall: 0.80335, f1: 0.80134 current best 0.79026
    ==============================================save best model best performerence 0.801337
    train epoch: 1 - step: 1010 (total: 14960) - loss: 0.050067
    train epoch: 1 - step: 1020 (total: 14960) - loss: 0.033646
    train epoch: 1 - step: 1030 (total: 14960) - loss: 0.030856
    train epoch: 1 - step: 1040 (total: 14960) - loss: 0.045213
    train epoch: 1 - step: 1050 (total: 14960) - loss: 0.068307
    dev step: 1050 - loss: 0.04333, precision: 0.79307, recall: 0.81788, f1: 0.80528 current best 0.80134
    ==============================================save best model best performerence 0.805281
    train epoch: 1 - step: 1060 (total: 14960) - loss: 0.031629
    train epoch: 1 - step: 1070 (total: 14960) - loss: 0.034574
    train epoch: 1 - step: 1080 (total: 14960) - loss: 0.009664
    train epoch: 1 - step: 1090 (total: 14960) - loss: 0.022344
    train epoch: 1 - step: 1100 (total: 14960) - loss: 0.030906
    dev step: 1100 - loss: 0.04319, precision: 0.77368, recall: 0.84413, f1: 0.80737 current best 0.80528
    ==============================================save best model best performerence 0.807374
    train epoch: 1 - step: 1110 (total: 14960) - loss: 0.021814
    train epoch: 1 - step: 1120 (total: 14960) - loss: 0.015393
    train epoch: 1 - step: 1130 (total: 14960) - loss: 0.018273
    train epoch: 1 - step: 1140 (total: 14960) - loss: 0.012760
    train epoch: 1 - step: 1150 (total: 14960) - loss: 0.047260
    dev step: 1150 - loss: 0.04239, precision: 0.79338, recall: 0.83017, f1: 0.81136 current best 0.80737
    ==============================================save best model best performerence 0.811357
    train epoch: 1 - step: 1160 (total: 14960) - loss: 0.055832
    train epoch: 1 - step: 1170 (total: 14960) - loss: 0.023067
    train epoch: 1 - step: 1180 (total: 14960) - loss: 0.029046
    train epoch: 1 - step: 1190 (total: 14960) - loss: 0.022165
    train epoch: 1 - step: 1200 (total: 14960) - loss: 0.021577
    dev step: 1200 - loss: 0.04173, precision: 0.79144, recall: 0.82682, f1: 0.80874 current best 0.81136
    train epoch: 1 - step: 1210 (total: 14960) - loss: 0.040631
    train epoch: 1 - step: 1220 (total: 14960) - loss: 0.028234
    train epoch: 1 - step: 1230 (total: 14960) - loss: 0.033360
    train epoch: 1 - step: 1240 (total: 14960) - loss: 0.023661
    train epoch: 1 - step: 1250 (total: 14960) - loss: 0.051824
    dev step: 1250 - loss: 0.04070, precision: 0.77673, recall: 0.83184, f1: 0.80335 current best 0.81136
    train epoch: 1 - step: 1260 (total: 14960) - loss: 0.027152
    train epoch: 1 - step: 1270 (total: 14960) - loss: 0.027165
    train epoch: 1 - step: 1280 (total: 14960) - loss: 0.035664
    train epoch: 1 - step: 1290 (total: 14960) - loss: 0.038181
    train epoch: 1 - step: 1300 (total: 14960) - loss: 0.034335
    dev step: 1300 - loss: 0.03963, precision: 0.77882, recall: 0.83799, f1: 0.80732 current best 0.81136
    train epoch: 1 - step: 1310 (total: 14960) - loss: 0.045533
    train epoch: 1 - step: 1320 (total: 14960) - loss: 0.076441
    train epoch: 1 - step: 1330 (total: 14960) - loss: 0.035492
    train epoch: 1 - step: 1340 (total: 14960) - loss: 0.020915
    train epoch: 1 - step: 1350 (total: 14960) - loss: 0.009881
    dev step: 1350 - loss: 0.04082, precision: 0.78723, recall: 0.84749, f1: 0.81625 current best 0.81136
    ==============================================save best model best performerence 0.816250
    train epoch: 1 - step: 1360 (total: 14960) - loss: 0.037463
    train epoch: 1 - step: 1370 (total: 14960) - loss: 0.044000
    train epoch: 1 - step: 1380 (total: 14960) - loss: 0.033455
    train epoch: 1 - step: 1390 (total: 14960) - loss: 0.011349
    train epoch: 1 - step: 1400 (total: 14960) - loss: 0.027764
    dev step: 1400 - loss: 0.04117, precision: 0.79249, recall: 0.82570, f1: 0.80876 current best 0.81625
    train epoch: 1 - step: 1410 (total: 14960) - loss: 0.032213
    train epoch: 1 - step: 1420 (total: 14960) - loss: 0.024112
    train epoch: 1 - step: 1430 (total: 14960) - loss: 0.025826
    train epoch: 1 - step: 1440 (total: 14960) - loss: 0.039797
    train epoch: 1 - step: 1450 (total: 14960) - loss: 0.073417
    dev step: 1450 - loss: 0.03987, precision: 0.78395, recall: 0.85140, f1: 0.81628 current best 0.81625
    ==============================================save best model best performerence 0.816283
    train epoch: 1 - step: 1460 (total: 14960) - loss: 0.021326
    train epoch: 1 - step: 1470 (total: 14960) - loss: 0.018628
    train epoch: 1 - step: 1480 (total: 14960) - loss: 0.029017
    train epoch: 1 - step: 1490 (total: 14960) - loss: 0.048521
    ……
    train epoch: 19 - step: 14220 (total: 14960) - loss: 0.001144
    train epoch: 19 - step: 14230 (total: 14960) - loss: 0.000301
    train epoch: 19 - step: 14240 (total: 14960) - loss: 0.001033
    train epoch: 19 - step: 14250 (total: 14960) - loss: 0.003649
    dev step: 14250 - loss: 0.07217, precision: 0.83424, recall: 0.85754, f1: 0.84573 current best 0.85026
    train epoch: 19 - step: 14260 (total: 14960) - loss: 0.000222
    train epoch: 19 - step: 14270 (total: 14960) - loss: 0.001345
    train epoch: 19 - step: 14280 (total: 14960) - loss: 0.000353
    train epoch: 19 - step: 14290 (total: 14960) - loss: 0.004071
    train epoch: 19 - step: 14300 (total: 14960) - loss: 0.004355
    dev step: 14300 - loss: 0.07171, precision: 0.83568, recall: 0.86089, f1: 0.84810 current best 0.85026
    train epoch: 19 - step: 14310 (total: 14960) - loss: 0.001791
    train epoch: 19 - step: 14320 (total: 14960) - loss: 0.001619
    train epoch: 19 - step: 14330 (total: 14960) - loss: 0.003730
    train epoch: 19 - step: 14340 (total: 14960) - loss: 0.000157
    train epoch: 19 - step: 14350 (total: 14960) - loss: 0.000462
    dev step: 14350 - loss: 0.07241, precision: 0.83370, recall: 0.85698, f1: 0.84518 current best 0.85026
    train epoch: 19 - step: 14360 (total: 14960) - loss: 0.000490
    train epoch: 19 - step: 14370 (total: 14960) - loss: 0.000182
    train epoch: 19 - step: 14380 (total: 14960) - loss: 0.002310
    train epoch: 19 - step: 14390 (total: 14960) - loss: 0.000973
    train epoch: 19 - step: 14400 (total: 14960) - loss: 0.000543
    dev step: 14400 - loss: 0.07378, precision: 0.83623, recall: 0.86145, f1: 0.84865 current best 0.85026
    train epoch: 19 - step: 14410 (total: 14960) - loss: 0.000710
    train epoch: 19 - step: 14420 (total: 14960) - loss: 0.000122
    train epoch: 19 - step: 14430 (total: 14960) - loss: 0.003291
    train epoch: 19 - step: 14440 (total: 14960) - loss: 0.001306
    train epoch: 19 - step: 14450 (total: 14960) - loss: 0.002820
    dev step: 14450 - loss: 0.07792, precision: 0.83982, recall: 0.85531, f1: 0.84750 current best 0.85026
    train epoch: 19 - step: 14460 (total: 14960) - loss: 0.000153
    train epoch: 19 - step: 14470 (total: 14960) - loss: 0.009174
    train epoch: 19 - step: 14480 (total: 14960) - loss: 0.002065
    train epoch: 19 - step: 14490 (total: 14960) - loss: 0.001641
    train epoch: 19 - step: 14500 (total: 14960) - loss: 0.013356
    dev step: 14500 - loss: 0.07694, precision: 0.81122, recall: 0.86425, f1: 0.83689 current best 0.85026
    train epoch: 19 - step: 14510 (total: 14960) - loss: 0.000902
    train epoch: 19 - step: 14520 (total: 14960) - loss: 0.009084
    train epoch: 19 - step: 14530 (total: 14960) - loss: 0.000777
    train epoch: 19 - step: 14540 (total: 14960) - loss: 0.000141
    train epoch: 19 - step: 14550 (total: 14960) - loss: 0.001748
    dev step: 14550 - loss: 0.07448, precision: 0.81751, recall: 0.86592, f1: 0.84102 current best 0.85026
    train epoch: 19 - step: 14560 (total: 14960) - loss: 0.000747
    train epoch: 19 - step: 14570 (total: 14960) - loss: 0.012806
    train epoch: 19 - step: 14580 (total: 14960) - loss: 0.004823
    train epoch: 19 - step: 14590 (total: 14960) - loss: 0.001402
    train epoch: 19 - step: 14600 (total: 14960) - loss: 0.012385
    dev step: 14600 - loss: 0.07167, precision: 0.82297, recall: 0.87263, f1: 0.84707 current best 0.85026
    train epoch: 19 - step: 14610 (total: 14960) - loss: 0.003738
    train epoch: 19 - step: 14620 (total: 14960) - loss: 0.000189
    train epoch: 19 - step: 14630 (total: 14960) - loss: 0.004993
    train epoch: 19 - step: 14640 (total: 14960) - loss: 0.000982
    train epoch: 19 - step: 14650 (total: 14960) - loss: 0.000245
    dev step: 14650 - loss: 0.07632, precision: 0.83324, recall: 0.85978, f1: 0.84630 current best 0.85026
    train epoch: 19 - step: 14660 (total: 14960) - loss: 0.002108
    train epoch: 19 - step: 14670 (total: 14960) - loss: 0.002859
    train epoch: 19 - step: 14680 (total: 14960) - loss: 0.000802
    train epoch: 19 - step: 14690 (total: 14960) - loss: 0.001411
    train epoch: 19 - step: 14700 (total: 14960) - loss: 0.000175
    dev step: 14700 - loss: 0.07886, precision: 0.81485, recall: 0.87039, f1: 0.84171 current best 0.85026
    train epoch: 19 - step: 14710 (total: 14960) - loss: 0.000079
    train epoch: 19 - step: 14720 (total: 14960) - loss: 0.000239
    train epoch: 19 - step: 14730 (total: 14960) - loss: 0.002459
    train epoch: 19 - step: 14740 (total: 14960) - loss: 0.000840
    train epoch: 19 - step: 14750 (total: 14960) - loss: 0.000168
    dev step: 14750 - loss: 0.07765, precision: 0.82555, recall: 0.85922, f1: 0.84205 current best 0.85026
    train epoch: 19 - step: 14760 (total: 14960) - loss: 0.000097
    train epoch: 19 - step: 14770 (total: 14960) - loss: 0.000967
    train epoch: 19 - step: 14780 (total: 14960) - loss: 0.000198
    train epoch: 19 - step: 14790 (total: 14960) - loss: 0.000484
    train epoch: 19 - step: 14800 (total: 14960) - loss: 0.002144
    dev step: 14800 - loss: 0.07190, precision: 0.82507, recall: 0.86425, f1: 0.84420 current best 0.85026
    train epoch: 19 - step: 14810 (total: 14960) - loss: 0.000452
    train epoch: 19 - step: 14820 (total: 14960) - loss: 0.000663
    train epoch: 19 - step: 14830 (total: 14960) - loss: 0.022780
    train epoch: 19 - step: 14840 (total: 14960) - loss: 0.007530
    train epoch: 19 - step: 14850 (total: 14960) - loss: 0.000360
    dev step: 14850 - loss: 0.07089, precision: 0.83607, recall: 0.85475, f1: 0.84530 current best 0.85026
    train epoch: 19 - step: 14860 (total: 14960) - loss: 0.002914
    train epoch: 19 - step: 14870 (total: 14960) - loss: 0.000343
    train epoch: 19 - step: 14880 (total: 14960) - loss: 0.001293
    train epoch: 19 - step: 14890 (total: 14960) - loss: 0.000621
    train epoch: 19 - step: 14900 (total: 14960) - loss: 0.001378
    dev step: 14900 - loss: 0.06631, precision: 0.82437, recall: 0.87318, f1: 0.84807 current best 0.85026
    train epoch: 19 - step: 14910 (total: 14960) - loss: 0.000467
    train epoch: 19 - step: 14920 (total: 14960) - loss: 0.001079
    train epoch: 19 - step: 14930 (total: 14960) - loss: 0.002540
    train epoch: 19 - step: 14940 (total: 14960) - loss: 0.006217
    train epoch: 19 - step: 14950 (total: 14960) - loss: 0.000213
    dev step: 14950 - loss: 0.07010, precision: 0.83477, recall: 0.86369, f1: 0.84898 current best 0.85026
    INFO 2021-04-10 21:18:14,794 launch.py:240] Local processes completed.
    end DuEE1.0 trigger train
    
    # 触发词识别预测
    !bash run_duee_1.sh trigger_predict
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE1.0 exist
    dir ./submit exist
    
    start DuEE1.0 trigger predict
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 21:19:00,925] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 21:19:00,939] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 21:19:00.940081 12545 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 21:19:00.944607 12545 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start predict==========
    Loaded parameters from ./ckpt/DuEE1.0/trigger/best.pdparams
    save data 499 to ./ckpt/DuEE1.0/trigger/test_pred.json
    end DuEE1.0 trigger predict
    
    # 论元识别模型训练
    !bash run_duee_1.sh role_train
    
    该条输出内容超过1000行,保存时将被截断
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE1.0 exist
    dir ./submit exist
    
    start DuEE1.0 role train
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    -----------  Configuration Arguments -----------
    gpus: 0
    heter_worker_num: None
    heter_workers: 
    http_port: None
    ips: 127.0.0.1
    log_dir: log
    nproc_per_node: None
    server_num: None
    servers: 
    training_script: sequence_labeling.py
    training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE1.0/role_tag.dict', '--train_data', './data/DuEE1.0/role/train.tsv', '--dev_data', './data/DuEE1.0/role/dev.tsv', '--test_data', './data/DuEE1.0/role/test.tsv', '--predict_data', './data/DuEE1.0/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE1.0/role', '--init_ckpt', './ckpt/DuEE1.0/role/best.pdparams', '--predict_save_path', './ckpt/DuEE1.0/role/test_pred.json', '--device', 'gpu']
    worker_num: None
    workers: 
    ------------------------------------------------
    WARNING 2021-04-10 21:19:31,729 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
    launch train in GPU mode
    INFO 2021-04-10 21:19:31,731 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
        +=======================================================================================+
        |                        Distributed Envs                      Value                    |
        +---------------------------------------------------------------------------------------+
        |                       PADDLE_TRAINER_ID                        0                      |
        |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:39979               |
        |                     PADDLE_TRAINERS_NUM                        1                      |
        |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:39979               |
        |                     FLAGS_selected_gpus                        0                      |
        +=======================================================================================+
    
    INFO 2021-04-10 21:19:31,731 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 21:19:33,027] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 21:19:33,041] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 21:19:33.042527 12581 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 21:19:33.047051 12581 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start train==========
    train epoch: 0 - step: 10 (total: 17400) - loss: 1.631316
    train epoch: 0 - step: 20 (total: 17400) - loss: 1.261623
    train epoch: 0 - step: 30 (total: 17400) - loss: 1.499143
    train epoch: 0 - step: 40 (total: 17400) - loss: 1.374749
    train epoch: 0 - step: 50 (total: 17400) - loss: 2.372678
    dev step: 50 - loss: 1.41820, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
    train epoch: 0 - step: 60 (total: 17400) - loss: 1.314709
    train epoch: 0 - step: 70 (total: 17400) - loss: 1.248975
    train epoch: 0 - step: 80 (total: 17400) - loss: 1.268549
    train epoch: 0 - step: 90 (total: 17400) - loss: 1.528821
    train epoch: 0 - step: 100 (total: 17400) - loss: 1.331270
    dev step: 100 - loss: 1.19630, precision: 0.00465, recall: 0.00027, f1: 0.00051 current best 0.00000
    ==============================================save best model best performerence 0.000513
    train epoch: 0 - step: 110 (total: 17400) - loss: 1.180573
    train epoch: 0 - step: 120 (total: 17400) - loss: 1.466052
    train epoch: 0 - step: 130 (total: 17400) - loss: 1.448678
    train epoch: 0 - step: 140 (total: 17400) - loss: 1.050897
    train epoch: 0 - step: 150 (total: 17400) - loss: 1.163228
    dev step: 150 - loss: 1.00658, precision: 0.15019, recall: 0.07632, f1: 0.10121 current best 0.00051
    ==============================================save best model best performerence 0.101207
    train epoch: 0 - step: 160 (total: 17400) - loss: 1.152660
    train epoch: 0 - step: 170 (total: 17400) - loss: 0.800648
    train epoch: 0 - step: 180 (total: 17400) - loss: 0.863754
    train epoch: 0 - step: 190 (total: 17400) - loss: 1.399399
    train epoch: 0 - step: 200 (total: 17400) - loss: 0.933540
    dev step: 200 - loss: 0.87702, precision: 0.29212, recall: 0.16812, f1: 0.21341 current best 0.10121
    ==============================================save best model best performerence 0.213411
    train epoch: 0 - step: 210 (total: 17400) - loss: 0.622572
    train epoch: 0 - step: 220 (total: 17400) - loss: 0.693645
    train epoch: 0 - step: 230 (total: 17400) - loss: 0.456951
    train epoch: 0 - step: 240 (total: 17400) - loss: 0.852158
    train epoch: 0 - step: 250 (total: 17400) - loss: 0.692744
    dev step: 250 - loss: 0.74932, precision: 0.19567, recall: 0.19880, f1: 0.19722 current best 0.21341
    train epoch: 0 - step: 260 (total: 17400) - loss: 0.705850
    train epoch: 0 - step: 270 (total: 17400) - loss: 0.601921
    train epoch: 0 - step: 280 (total: 17400) - loss: 0.790073
    train epoch: 0 - step: 290 (total: 17400) - loss: 0.576146
    train epoch: 0 - step: 300 (total: 17400) - loss: 0.896055
    dev step: 300 - loss: 0.68762, precision: 0.26187, recall: 0.26806, f1: 0.26493 current best 0.21341
    ==============================================save best model best performerence 0.264931
    train epoch: 0 - step: 310 (total: 17400) - loss: 0.550488
    train epoch: 0 - step: 320 (total: 17400) - loss: 0.755333
    train epoch: 0 - step: 330 (total: 17400) - loss: 0.608667
    train epoch: 0 - step: 340 (total: 17400) - loss: 0.735348
    train epoch: 0 - step: 350 (total: 17400) - loss: 0.608221
    dev step: 350 - loss: 0.58279, precision: 0.30612, recall: 0.30554, f1: 0.30583 current best 0.26493
    ==============================================save best model best performerence 0.305831
    train epoch: 0 - step: 360 (total: 17400) - loss: 0.547571
    train epoch: 0 - step: 370 (total: 17400) - loss: 0.651604
    train epoch: 0 - step: 380 (total: 17400) - loss: 0.356159
    train epoch: 0 - step: 390 (total: 17400) - loss: 0.471009
    train epoch: 0 - step: 400 (total: 17400) - loss: 0.464584
    dev step: 400 - loss: 0.54754, precision: 0.28086, recall: 0.28055, f1: 0.28071 current best 0.30583
    train epoch: 0 - step: 410 (total: 17400) - loss: 0.626027
    train epoch: 0 - step: 420 (total: 17400) - loss: 0.362687
    train epoch: 0 - step: 430 (total: 17400) - loss: 0.477045
    train epoch: 0 - step: 440 (total: 17400) - loss: 0.504392
    train epoch: 0 - step: 450 (total: 17400) - loss: 0.452660
    dev step: 450 - loss: 0.51604, precision: 0.31064, recall: 0.29495, f1: 0.30259 current best 0.30583
    train epoch: 0 - step: 460 (total: 17400) - loss: 0.315736
    train epoch: 0 - step: 470 (total: 17400) - loss: 0.695824
    train epoch: 0 - step: 480 (total: 17400) - loss: 0.668844
    train epoch: 0 - step: 490 (total: 17400) - loss: 0.485630
    train epoch: 0 - step: 500 (total: 17400) - loss: 0.553830
    dev step: 500 - loss: 0.48320, precision: 0.35442, recall: 0.33623, f1: 0.34509 current best 0.30583
    ==============================================save best model best performerence 0.345087
    train epoch: 0 - step: 510 (total: 17400) - loss: 0.630377
    train epoch: 0 - step: 520 (total: 17400) - loss: 0.870098
    train epoch: 0 - step: 530 (total: 17400) - loss: 0.525724
    train epoch: 0 - step: 540 (total: 17400) - loss: 0.456801
    train epoch: 0 - step: 550 (total: 17400) - loss: 0.336868
    dev step: 550 - loss: 0.46332, precision: 0.34337, recall: 0.40576, f1: 0.37197 current best 0.34509
    ==============================================save best model best performerence 0.371966
    train epoch: 0 - step: 560 (total: 17400) - loss: 0.549533
    train epoch: 0 - step: 570 (total: 17400) - loss: 0.589998
    train epoch: 0 - step: 580 (total: 17400) - loss: 0.466343
    train epoch: 0 - step: 590 (total: 17400) - loss: 0.757425
    train epoch: 0 - step: 600 (total: 17400) - loss: 0.476054
    dev step: 600 - loss: 0.46148, precision: 0.32498, recall: 0.40494, f1: 0.36058 current best 0.37197
    train epoch: 0 - step: 610 (total: 17400) - loss: 0.867860
    train epoch: 0 - step: 620 (total: 17400) - loss: 0.423540
    train epoch: 0 - step: 630 (total: 17400) - loss: 0.584098
    train epoch: 0 - step: 640 (total: 17400) - loss: 0.333824
    train epoch: 0 - step: 650 (total: 17400) - loss: 0.506903
    dev step: 650 - loss: 0.41693, precision: 0.35424, recall: 0.40657, f1: 0.37860 current best 0.37197
    ==============================================save best model best performerence 0.378604
    train epoch: 0 - step: 660 (total: 17400) - loss: 0.349384
    train epoch: 0 - step: 670 (total: 17400) - loss: 0.551703
    train epoch: 0 - step: 680 (total: 17400) - loss: 0.407071
    train epoch: 0 - step: 690 (total: 17400) - loss: 0.340015
    train epoch: 0 - step: 700 (total: 17400) - loss: 0.514608
    dev step: 700 - loss: 0.39935, precision: 0.36408, recall: 0.44704, f1: 0.40132 current best 0.37860
    ==============================================save best model best performerence 0.401317
    train epoch: 0 - step: 710 (total: 17400) - loss: 0.391622
    train epoch: 0 - step: 720 (total: 17400) - loss: 0.411886
    train epoch: 0 - step: 730 (total: 17400) - loss: 0.396601
    train epoch: 0 - step: 740 (total: 17400) - loss: 0.408536
    train epoch: 0 - step: 750 (total: 17400) - loss: 0.490862
    dev step: 750 - loss: 0.38335, precision: 0.38105, recall: 0.40196, f1: 0.39122 current best 0.40132
    train epoch: 0 - step: 760 (total: 17400) - loss: 0.589839
    train epoch: 0 - step: 770 (total: 17400) - loss: 0.495729
    train epoch: 0 - step: 780 (total: 17400) - loss: 0.292985
    train epoch: 0 - step: 790 (total: 17400) - loss: 0.288670
    train epoch: 0 - step: 800 (total: 17400) - loss: 0.591148
    dev step: 800 - loss: 0.37288, precision: 0.37370, recall: 0.43998, f1: 0.40414 current best 0.40132
    ==============================================save best model best performerence 0.404141
    train epoch: 0 - step: 810 (total: 17400) - loss: 0.323106
    train epoch: 0 - step: 820 (total: 17400) - loss: 0.374065
    train epoch: 0 - step: 830 (total: 17400) - loss: 0.303335
    train epoch: 0 - step: 840 (total: 17400) - loss: 0.362465
    train epoch: 0 - step: 850 (total: 17400) - loss: 0.270363
    dev step: 850 - loss: 0.34926, precision: 0.40548, recall: 0.39788, f1: 0.40164 current best 0.40414
    train epoch: 0 - step: 860 (total: 17400) - loss: 0.640344
    train epoch: 1 - step: 870 (total: 17400) - loss: 0.317456
    train epoch: 1 - step: 880 (total: 17400) - loss: 0.338208
    train epoch: 1 - step: 890 (total: 17400) - loss: 0.270992
    train epoch: 1 - step: 900 (total: 17400) - loss: 0.262994
    dev step: 900 - loss: 0.35676, precision: 0.36447, recall: 0.50978, f1: 0.42505 current best 0.40414
    ==============================================save best model best performerence 0.425045
    train epoch: 1 - step: 910 (total: 17400) - loss: 0.187394
    train epoch: 1 - step: 920 (total: 17400) - loss: 0.319919
    train epoch: 1 - step: 930 (total: 17400) - loss: 0.364867
    train epoch: 1 - step: 940 (total: 17400) - loss: 0.167465
    train epoch: 1 - step: 950 (total: 17400) - loss: 0.378459
    dev step: 950 - loss: 0.33845, precision: 0.40183, recall: 0.50027, f1: 0.44568 current best 0.42505
    ==============================================save best model best performerence 0.445681
    train epoch: 1 - step: 960 (total: 17400) - loss: 0.505818
    train epoch: 1 - step: 970 (total: 17400) - loss: 0.318232
    train epoch: 1 - step: 980 (total: 17400) - loss: 0.354184
    train epoch: 1 - step: 990 (total: 17400) - loss: 0.473859
    train epoch: 1 - step: 1000 (total: 17400) - loss: 0.268665
    dev step: 1000 - loss: 0.34670, precision: 0.41990, recall: 0.50543, f1: 0.45871 current best 0.44568
    ==============================================save best model best performerence 0.458713
    train epoch: 1 - step: 1010 (total: 17400) - loss: 0.457268
    train epoch: 1 - step: 1020 (total: 17400) - loss: 0.279792
    train epoch: 1 - step: 1030 (total: 17400) - loss: 0.311157
    train epoch: 1 - step: 1040 (total: 17400) - loss: 0.266172
    train epoch: 1 - step: 1050 (total: 17400) - loss: 0.348649
    dev step: 1050 - loss: 0.33027, precision: 0.43967, recall: 0.50570, f1: 0.47038 current best 0.45871
    ==============================================save best model best performerence 0.470380
    train epoch: 1 - step: 1060 (total: 17400) - loss: 0.250878
    train epoch: 1 - step: 1070 (total: 17400) - loss: 0.255359
    train epoch: 1 - step: 1080 (total: 17400) - loss: 0.244313
    train epoch: 1 - step: 1090 (total: 17400) - loss: 0.394027
    train epoch: 1 - step: 1100 (total: 17400) - loss: 0.345162
    dev step: 1100 - loss: 0.31890, precision: 0.40973, recall: 0.53992, f1: 0.46590 current best 0.47038
    train epoch: 1 - step: 1110 (total: 17400) - loss: 0.351362
    train epoch: 1 - step: 1120 (total: 17400) - loss: 0.505625
    train epoch: 1 - step: 1130 (total: 17400) - loss: 0.254914
    train epoch: 1 - step: 1140 (total: 17400) - loss: 0.299322
    train epoch: 1 - step: 1150 (total: 17400) - loss: 0.230382
    dev step: 1150 - loss: 0.33202, precision: 0.39473, recall: 0.57387, f1: 0.46774 current best 0.47038
    train epoch: 1 - step: 1160 (total: 17400) - loss: 0.531530
    train epoch: 1 - step: 1170 (total: 17400) - loss: 0.327992
    train epoch: 1 - step: 1180 (total: 17400) - loss: 0.261732
    train epoch: 1 - step: 1190 (total: 17400) - loss: 0.416111
    train epoch: 1 - step: 1200 (total: 17400) - loss: 0.587504
    dev step: 1200 - loss: 0.31763, precision: 0.42678, recall: 0.57224, f1: 0.48892 current best 0.47038
    ==============================================save best model best performerence 0.488920
    train epoch: 1 - step: 1210 (total: 17400) - loss: 0.318957
    train epoch: 1 - step: 1220 (total: 17400) - loss: 0.240229
    train epoch: 1 - step: 1230 (total: 17400) - loss: 0.268677
    train epoch: 1 - step: 1240 (total: 17400) - loss: 0.306026
    train epoch: 1 - step: 1250 (total: 17400) - loss: 0.207791
    dev step: 1250 - loss: 0.32002, precision: 0.44161, recall: 0.53205, f1: 0.48263 current best 0.48892
    train epoch: 1 - step: 1260 (total: 17400) - loss: 0.328496
    train epoch: 1 - step: 1270 (total: 17400) - loss: 0.169225
    train epoch: 1 - step: 1280 (total: 17400) - loss: 0.154055
    train epoch: 1 - step: 1290 (total: 17400) - loss: 0.245896
    train epoch: 1 - step: 1300 (total: 17400) - loss: 0.307641
    dev step: 1300 - loss: 0.31898, precision: 0.43654, recall: 0.56328, f1: 0.49188 current best 0.48892
    ==============================================save best model best performerence 0.491877
    train epoch: 1 - step: 1310 (total: 17400) - loss: 0.333137
    train epoch: 1 - step: 1320 (total: 17400) - loss: 0.245721
    train epoch: 1 - step: 1330 (total: 17400) - loss: 0.284762
    train epoch: 1 - step: 1340 (total: 17400) - loss: 0.454689
    train epoch: 1 - step: 1350 (total: 17400) - loss: 0.181988
    dev step: 1350 - loss: 0.31523, precision: 0.43998, recall: 0.58039, f1: 0.50053 current best 0.49188
    ==============================================save best model best performerence 0.500527
    train epoch: 1 - step: 1360 (total: 17400) - loss: 0.207600
    train epoch: 1 - step: 1370 (total: 17400) - loss: 0.521199
    train epoch: 1 - step: 1380 (total: 17400) - loss: 0.212064
    train epoch: 1 - step: 1390 (total: 17400) - loss: 0.304855
    train epoch: 1 - step: 1400 (total: 17400) - loss: 0.364982
    dev step: 1400 - loss: 0.32255, precision: 0.45131, recall: 0.57523, f1: 0.50579 current best 0.50053
    ==============================================save best model best performerence 0.505791
    train epoch: 1 - step: 1410 (total: 17400) - loss: 0.282940
    train epoch: 1 - step: 1420 (total: 17400) - loss: 0.247372
    train epoch: 1 - step: 1430 (total: 17400) - loss: 0.204306
    train epoch: 1 - step: 1440 (total: 17400) - loss: 0.197937
    train epoch: 1 - step: 1450 (total: 17400) - loss: 0.248342
    dev step: 1450 - loss: 0.31655, precision: 0.43383, recall: 0.57605, f1: 0.49492 current best 0.50579
    train epoch: 1 - step: 1460 (total: 17400) - loss: 0.303543
    train epoch: 1 - step: 1470 (total: 17400) - loss: 0.228280
    train epoch: 1 - step: 1480 (total: 17400) - loss: 0.272400
    train epoch: 1 - step: 1490 (total: 17400) - loss: 0.295671
    train epoch: 1 - step: 1500 (total: 17400) - loss: 0.238553
    dev step: 1500 - loss: 0.29889, precision: 0.45878, recall: 0.50027, f1: 0.47863 current best 0.50579
    train epoch: 1 - step: 1510 (total: 17400) - loss: 0.340570
    train epoch: 1 - step: 1520 (total: 17400) - loss: 0.178270
    train epoch: 1 - step: 1530 (total: 17400) - loss: 0.304790
    train epoch: 1 - step: 1540 (total: 17400) - loss: 0.289224
    train epoch: 1 - step: 1550 (total: 17400) - loss: 0.371867
    dev step: 1550 - loss: 0.30130, precision: 0.45212, recall: 0.61162, f1: 0.51991 current best 0.50579
    ==============================================save best model best performerence 0.519912
    train epoch: 1 - step: 1560 (total: 17400) - loss: 0.240305
    train epoch: 1 - step: 1570 (total: 17400) - loss: 0.316205
    train epoch: 1 - step: 1580 (total: 17400) - loss: 0.311467
    train epoch: 1 - step: 1590 (total: 17400) - loss: 0.270995
    train epoch: 1 - step: 1600 (total: 17400) - loss: 0.184202
    dev step: 1600 - loss: 0.29522, precision: 0.43972, recall: 0.59234, f1: 0.50474 current best 0.51991
    train epoch: 1 - step: 1610 (total: 17400) - loss: 0.431742
    train epoch: 1 - step: 1620 (total: 17400) - loss: 0.234169
    train epoch: 1 - step: 1630 (total: 17400) - loss: 0.247429
    train epoch: 1 - step: 1640 (total: 17400) - loss: 0.355582
    train epoch: 1 - step: 1650 (total: 17400) - loss: 0.281345
    dev step: 1650 - loss: 0.29843, precision: 0.46141, recall: 0.58446, f1: 0.51570 current best 0.51991
    train epoch: 1 - step: 1660 (total: 17400) - loss: 0.201275
    train epoch: 1 - step: 1670 (total: 17400) - loss: 0.304434
    train epoch: 1 - step: 1680 (total: 17400) - loss: 0.330689
    train epoch: 1 - step: 1690 (total: 17400) - loss: 0.277704
    train epoch: 1 - step: 1700 (total: 17400) - loss: 0.196703
    dev step: 1700 - loss: 0.28736, precision: 0.46048, recall: 0.59017, f1: 0.51732 current best 0.51991
    train epoch: 1 - step: 1710 (total: 17400) - loss: 0.253590
    train epoch: 1 - step: 1720 (total: 17400) - loss: 0.238998
    train epoch: 1 - step: 1730 (total: 17400) - loss: 0.267489
    ……
    train epoch: 19 - step: 16530 (total: 17400) - loss: 0.090804
    train epoch: 19 - step: 16540 (total: 17400) - loss: 0.172505
    train epoch: 19 - step: 16550 (total: 17400) - loss: 0.041797
    dev step: 16550 - loss: 0.42366, precision: 0.53121, recall: 0.61706, f1: 0.57093 current best 0.58724
    train epoch: 19 - step: 16560 (total: 17400) - loss: 0.083284
    train epoch: 19 - step: 16570 (total: 17400) - loss: 0.027010
    train epoch: 19 - step: 16580 (total: 17400) - loss: 0.075735
    train epoch: 19 - step: 16590 (total: 17400) - loss: 0.055073
    train epoch: 19 - step: 16600 (total: 17400) - loss: 0.089312
    dev step: 16600 - loss: 0.40673, precision: 0.53275, recall: 0.62955, f1: 0.57712 current best 0.58724
    train epoch: 19 - step: 16610 (total: 17400) - loss: 0.140136
    train epoch: 19 - step: 16620 (total: 17400) - loss: 0.056313
    train epoch: 19 - step: 16630 (total: 17400) - loss: 0.080976
    train epoch: 19 - step: 16640 (total: 17400) - loss: 0.049731
    train epoch: 19 - step: 16650 (total: 17400) - loss: 0.029350
    dev step: 16650 - loss: 0.41901, precision: 0.53045, recall: 0.63878, f1: 0.57960 current best 0.58724
    train epoch: 19 - step: 16660 (total: 17400) - loss: 0.039192
    train epoch: 19 - step: 16670 (total: 17400) - loss: 0.114814
    train epoch: 19 - step: 16680 (total: 17400) - loss: 0.128558
    train epoch: 19 - step: 16690 (total: 17400) - loss: 0.090364
    train epoch: 19 - step: 16700 (total: 17400) - loss: 0.015403
    dev step: 16700 - loss: 0.40519, precision: 0.52265, recall: 0.61108, f1: 0.56342 current best 0.58724
    train epoch: 19 - step: 16710 (total: 17400) - loss: 0.110993
    train epoch: 19 - step: 16720 (total: 17400) - loss: 0.070296
    train epoch: 19 - step: 16730 (total: 17400) - loss: 0.062231
    train epoch: 19 - step: 16740 (total: 17400) - loss: 0.067118
    train epoch: 19 - step: 16750 (total: 17400) - loss: 0.041820
    dev step: 16750 - loss: 0.40756, precision: 0.51713, recall: 0.62710, f1: 0.56683 current best 0.58724
    train epoch: 19 - step: 16760 (total: 17400) - loss: 0.061612
    train epoch: 19 - step: 16770 (total: 17400) - loss: 0.121729
    train epoch: 19 - step: 16780 (total: 17400) - loss: 0.143003
    train epoch: 19 - step: 16790 (total: 17400) - loss: 0.092972
    train epoch: 19 - step: 16800 (total: 17400) - loss: 0.085720
    dev step: 16800 - loss: 0.39751, precision: 0.52164, recall: 0.61543, f1: 0.56466 current best 0.58724
    train epoch: 19 - step: 16810 (total: 17400) - loss: 0.121482
    train epoch: 19 - step: 16820 (total: 17400) - loss: 0.056438
    train epoch: 19 - step: 16830 (total: 17400) - loss: 0.142359
    train epoch: 19 - step: 16840 (total: 17400) - loss: 0.037087
    train epoch: 19 - step: 16850 (total: 17400) - loss: 0.090542
    dev step: 16850 - loss: 0.43593, precision: 0.54292, recall: 0.62520, f1: 0.58117 current best 0.58724
    train epoch: 19 - step: 16860 (total: 17400) - loss: 0.180082
    train epoch: 19 - step: 16870 (total: 17400) - loss: 0.053868
    train epoch: 19 - step: 16880 (total: 17400) - loss: 0.099053
    train epoch: 19 - step: 16890 (total: 17400) - loss: 0.041414
    train epoch: 19 - step: 16900 (total: 17400) - loss: 0.059607
    dev step: 16900 - loss: 0.40950, precision: 0.53281, recall: 0.64177, f1: 0.58223 current best 0.58724
    train epoch: 19 - step: 16910 (total: 17400) - loss: 0.081703
    train epoch: 19 - step: 16920 (total: 17400) - loss: 0.058062
    train epoch: 19 - step: 16930 (total: 17400) - loss: 0.029519
    train epoch: 19 - step: 16940 (total: 17400) - loss: 0.045415
    train epoch: 19 - step: 16950 (total: 17400) - loss: 0.078151
    dev step: 16950 - loss: 0.39955, precision: 0.52993, recall: 0.62520, f1: 0.57364 current best 0.58724
    train epoch: 19 - step: 16960 (total: 17400) - loss: 0.112182
    train epoch: 19 - step: 16970 (total: 17400) - loss: 0.072816
    train epoch: 19 - step: 16980 (total: 17400) - loss: 0.171157
    train epoch: 19 - step: 16990 (total: 17400) - loss: 0.017713
    train epoch: 19 - step: 17000 (total: 17400) - loss: 0.090382
    dev step: 17000 - loss: 0.41824, precision: 0.54227, recall: 0.61841, f1: 0.57785 current best 0.58724
    train epoch: 19 - step: 17010 (total: 17400) - loss: 0.126030
    train epoch: 19 - step: 17020 (total: 17400) - loss: 0.072342
    train epoch: 19 - step: 17030 (total: 17400) - loss: 0.060565
    train epoch: 19 - step: 17040 (total: 17400) - loss: 0.073558
    train epoch: 19 - step: 17050 (total: 17400) - loss: 0.033999
    dev step: 17050 - loss: 0.42881, precision: 0.52828, recall: 0.61896, f1: 0.57004 current best 0.58724
    train epoch: 19 - step: 17060 (total: 17400) - loss: 0.036299
    train epoch: 19 - step: 17070 (total: 17400) - loss: 0.052640
    train epoch: 19 - step: 17080 (total: 17400) - loss: 0.054092
    train epoch: 19 - step: 17090 (total: 17400) - loss: 0.042668
    train epoch: 19 - step: 17100 (total: 17400) - loss: 0.058963
    dev step: 17100 - loss: 0.42499, precision: 0.52823, recall: 0.62765, f1: 0.57366 current best 0.58724
    train epoch: 19 - step: 17110 (total: 17400) - loss: 0.030797
    train epoch: 19 - step: 17120 (total: 17400) - loss: 0.096806
    train epoch: 19 - step: 17130 (total: 17400) - loss: 0.078804
    train epoch: 19 - step: 17140 (total: 17400) - loss: 0.047607
    train epoch: 19 - step: 17150 (total: 17400) - loss: 0.056086
    dev step: 17150 - loss: 0.39892, precision: 0.53097, recall: 0.58908, f1: 0.55852 current best 0.58724
    train epoch: 19 - step: 17160 (total: 17400) - loss: 0.148140
    train epoch: 19 - step: 17170 (total: 17400) - loss: 0.096577
    train epoch: 19 - step: 17180 (total: 17400) - loss: 0.146454
    train epoch: 19 - step: 17190 (total: 17400) - loss: 0.045576
    train epoch: 19 - step: 17200 (total: 17400) - loss: 0.084547
    dev step: 17200 - loss: 0.39334, precision: 0.51481, recall: 0.59017, f1: 0.54992 current best 0.58724
    train epoch: 19 - step: 17210 (total: 17400) - loss: 0.081501
    train epoch: 19 - step: 17220 (total: 17400) - loss: 0.079089
    train epoch: 19 - step: 17230 (total: 17400) - loss: 0.063774
    train epoch: 19 - step: 17240 (total: 17400) - loss: 0.017078
    train epoch: 19 - step: 17250 (total: 17400) - loss: 0.086831
    dev step: 17250 - loss: 0.38374, precision: 0.52425, recall: 0.62819, f1: 0.57153 current best 0.58724
    train epoch: 19 - step: 17260 (total: 17400) - loss: 0.076878
    train epoch: 19 - step: 17270 (total: 17400) - loss: 0.036476
    train epoch: 19 - step: 17280 (total: 17400) - loss: 0.146443
    train epoch: 19 - step: 17290 (total: 17400) - loss: 0.182334
    train epoch: 19 - step: 17300 (total: 17400) - loss: 0.040053
    dev step: 17300 - loss: 0.40251, precision: 0.52484, recall: 0.63987, f1: 0.57667 current best 0.58724
    train epoch: 19 - step: 17310 (total: 17400) - loss: 0.107188
    train epoch: 19 - step: 17320 (total: 17400) - loss: 0.143759
    train epoch: 19 - step: 17330 (total: 17400) - loss: 0.113866
    train epoch: 19 - step: 17340 (total: 17400) - loss: 0.115857
    train epoch: 19 - step: 17350 (total: 17400) - loss: 0.035648
    dev step: 17350 - loss: 0.41305, precision: 0.52708, recall: 0.61325, f1: 0.56691 current best 0.58724
    train epoch: 19 - step: 17360 (total: 17400) - loss: 0.047787
    train epoch: 19 - step: 17370 (total: 17400) - loss: 0.057836
    train epoch: 19 - step: 17380 (total: 17400) - loss: 0.094507
    train epoch: 19 - step: 17390 (total: 17400) - loss: 0.066693
    INFO 2021-04-10 22:43:36,736 launch.py:240] Local processes completed.
    end DuEE1.0 role train
    
    # 论元识别预测
    !bash run_duee_1.sh role_predict
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE1.0 exist
    dir ./submit exist
    
    start DuEE1.0 role predict
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
      import imp
    [2021-04-10 22:44:10,178] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
    [2021-04-10 22:44:10,192] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
    W0410 22:44:10.193476 16283 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
    W0410 22:44:10.198055 16283 device_context.cc:372] device: 0, cuDNN Version: 7.6.
    ============start predict==========
    Loaded parameters from ./ckpt/DuEE1.0/role/best.pdparams
    save data 499 to ./ckpt/DuEE1.0/role/test_pred.json
    end DuEE1.0 role predict
    
    # 数据后处理,提交预测结果
    # 结果存放于submit/test_duee_1.json
    !bash run_duee_1.sh pred_2_submit
    
    check and create directory
    dir ./ckpt exist
    dir ./ckpt/DuEE1.0 exist
    dir ./submit exist
    
    start DuEE1.0 predict data merge to submit fotmat
    trigger predict 499 load from ./ckpt/DuEE1.0/trigger/test_pred.json
    role predict 499 load from ./ckpt/DuEE1.0/role/test_pred.json
    schema 65 load from ./conf/DuEE1.0/event_schema.json
    submit data 499 save to ./submit/test_duee_1.json
    end DuEE1.0 role predict data merge
    

    2.1 评测方法

      事件论元结果与人工标注的事件论元结果进行匹配,并按字级别匹配F1进行打分,不区分大小写,如论元有多个表述,则取多个匹配F1中的最高值

      f1_score = (2 * P * R) / (P + R),其中

      • P=预测论元得分总和 / 所有预测论元的数量
      • R=预测论元得分总和 / 所有人工标注论元的数量
      • 预测论元得分=事件类型是否准确 * 论元角色是否准确 * 字级别匹配F1值 (*是相乘)
      • 字级别匹配F1值 = 2 * 字级别匹配P值 * 字级别匹配R值 / (字级别匹配P值 + 字级别匹配R值)
      • 字级别匹配P值 = 预测论元和人工标注论元共有字的数量/ 预测论元字数
      • 字级别匹配R值 = 预测论元和人工标注论元共有字的数量/ 人工标注论元字数

    三、Tricks

    3.1 尝试更多的预训练模型

      基线采用的预训练模型为ERNIE,PaddleNLP提供了丰富的预训练模型,如BERT,RoBERTa,Electra,XLNet等。

      参考PaddleNLP预训练模型介绍

      如可以选择RoBERTa large中文模型优化模型效果,只需更换模型和tokenizer即可无缝衔接。

    from paddlenlp.transformers import RobertaForTokenClassification, RobertaTokenizer
    
    model = RobertaForTokenClassification.from_pretrained("roberta-wwm-ext-large", num_classes=len(label_map))
    tokenizer = RobertaTokenizer.from_pretrained("roberta-wwm-ext-large")
    
    [2021-04-10 22:48:18,899] [    INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams and saved to /home/aistudio/.paddlenlp/models/roberta-wwm-ext-large
    [2021-04-10 22:48:18,902] [    INFO] - Downloading roberta_chn_large.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams
    100%|██████████| 1271615/1271615 [00:18<00:00, 69327.15it/s]
    [2021-04-10 22:48:42,145] [    INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/vocab.txt
    100%|██████████| 107/107 [00:00<00:00, 2073.95it/s]
    

    3.2 修改模型网络结构

      对于序列标注任务,大家会想到GRU+CRF作为常用网络,如何在预训练模型基础之上增加这些网络层呢?

    import paddle.nn as nn
    from paddlenlp.transformers import ErnieModel
    from paddlenlp.layers import LinearChainCrf, LinearChainCrfLoss
    
    
    class Model(ErnieModel):
        def __init__(self, ernie, num_classes=2, dropout=None, gru_hidden_size=128):
            super(Model, self).__init__()
            self.num_classes = num_classes
            # allow ernie to be config
            self.ernie = ernie  
            self.dropout = nn.Dropout(dropout if dropout is not None else
                                      self.ernie.config["hidden_dropout_prob"])
            # add bi-gru
            self.gru = nn.GRU(
                input_size=self.ernie.config["hidden_size"],
                hidden_size=gru_hidden_size,
                direction='bidirect')
            self.fc = nn.Linear(
                in_features=gru_hidden_size * 2,
                out_features=num_classes)
            # add crf
            self.crf = LinearChainCrf(
                num_classes, 
                with_start_stop_tag=False)
            self.crf_loss = LinearChainCrfLoss(self.crf)
            self.viterbi_decoder = ViterbiDecoder(
                self.crf.transitions, 
                with_start_stop_tag=False)
    
    
        def forward(self,
                    input_ids,
                    token_type_ids=None,
                    position_ids=None,
                    attention_mask=None):
            sequence_output, _ = self.bert(
                input_ids,
                token_type_ids=token_type_ids,
                position_ids=position_ids,
                attention_mask=attention_mask)
            sequence_output = self.dropout(sequence_output)
            bigru_output, _ = self.gru(sequence_output)
            emission = self.fc(bigru_output)
            _, prediction = self.viterbi_decoder(emission, lengths)
            if labels is not None:
                loss = self.crf_loss(emission, lengths, prediction, labels)
                return loss, lengths, prediction, labels
            else:
                return inputs, lengths, prediction
    

    3.3 模型集成

      使用多个模型进行训练预测,将各个模型预测结果进行融合

    参考资料

      https://aistudio.baidu.com/aistudio/competition/detail/65

    展开全文
  • nlp事件抽取算例实现:(有完整算例和完整代码

    千次阅读 多人点赞 2020-06-27 17:41:13
    事件抽取技术是从非结构化信息中抽取出用户感兴趣的事件,并以结构化呈现给用户。事件抽取任务可分解为4个子任务: 触发词识别、事件类型分类、论元识别和角色分类任务。其中,触发词识别和事件类型分类可合并成事件...

    定义

    事件抽取技术是从非结构化信息中抽取出用户感兴趣的事件,并以结构化呈现给用户。事件抽取任务可分解为4个子任务: 触发词识别、事件类型分类、论元识别和角色分类任务。其中,触发词识别和事件类型分类可合并成事件识别任务。论元识别和角色分类可合并成论元角色分类任务。事件识别判断句子中的每个单词归属的事件类型,是一个基于单词的多分类任务。角色分类任务则是一个基于词对的多分类任务,判断句子中任意一对触发词和实体之间的角色关系。

    事件抽取任务:

    事件有很多种,如因果事件,转则事件。。。
    统一定义:一般一个事件都有事件,地点,人物等因素。
    事件抽取就是把这些因素提取出来。
    不多讲啦,上算例。

    算例:

    火灾新闻算例:
    一个火灾事件新闻我们感兴趣的是 事故发生时间,事故发生地点,事故伤亡,事故原因。
    我们把这些抽取出来.顺便再附上事件摘要。
    即输入一个火灾事件新闻,输出 事故地点,事故时间,事故伤亡,事故原因,事故摘要。
    方法:基于正则。

    导入包:

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    # @Author: yudengwu
    # @Date  : 2020/6/27
    import re
    

    #事故原因:

    def pattern_cause(data):
        "data.type: [文字]"
        data = str(data)
    
        patterns = []
    
        key_words = ['起火', '事故', '火灾']
        pattern = re.compile('.*?(?:{0})原因(.*?)[,.?:;!,。?:;!]'.format('|'.join(key_words)))
        patterns.append(pattern)
        for c in patterns:
            print('事故原因:',c.search(data).group(1))
    
    

    #事故伤亡:

    def pattern_lose(data):
        "data.type: [文字]"
        data = str(data)
        patterns = []
    
        key_words = ['伤亡', '损失']
        pattern = re.compile('.*?(未造成.*?(?:{0}))[,.?:;!,。?:;]'.format('|'.join(key_words)))
        patterns.append(pattern)
    
        patterns.append(re.compile('(\d+人死亡)'))
        patterns.append(re.compile('(\d+人身亡)'))
        patterns.append(re.compile('(\d+人受伤)'))
        patterns.append(re.compile('(\d+人烧伤)'))
        patterns.append(re.compile('(\d+人坠楼身亡)'))
        patterns.append(re.compile('(\d+人遇难)'))
        for i in patterns:
            jieguo = i.search(data)
            if not jieguo:
                pass
            else:
                print('事故伤亡:',jieguo.group(1))
    
    

    #事故时间:

    #事故时间:
    def pattern_time(data):
        data = ''.join(test_data)# data.type :str
        PATTERN = r"([0-9零一二两三四五六七八九十]+年)?([0-9一二两三四五六七八九十]+月)?([0-9一二两三四五六七八九十]+[号日])?([上中下午晚早]+)?([0-9零一二两三四五六七八九十百]+[点:\.时])?([0-9零一二三四五六七八九十百]+分?)?([0-9零一二三四五六七八九十百]+秒)?"
        pattern = re.compile(PATTERN)
        m = pattern.search(data)
        # "19年1月14日18时19分39秒上午"
        m1 = pattern.search("上午")
        year=m.group(1) # 年
        month=m.group(2) # 月
        day=m.group(3) # 日
        am=m.group(4)  # 上午,中午,下午,早中晚
        hour=m.group(5) # 时
        minutes=m.group(6)  # 分
        seconds=m.group(7) # 秒
        print('事故时间: ',year,month,day,am,hour,minutes,seconds)
    

    #事故地点:

    #事件地点
    def pattern_address(data):
        data = ''.join(data)#转换格式
        p_string = data.split(',')#分句
        address=[]
        for line in p_string:
            line = str(line)
            PATTERN1 = r'([\u4e00-\u9fa5]{2,5}?(?:省|自治区|市)){0,1}([\u4e00-\u9fa5]{2,7}?(?:区|县|州)){0,1}([\u4e00-\u9fa5]{2,7}?(?:镇)){0,1}([\u4e00-\u9fa5]{2,7}?(?:村|街|街道)){0,1}([\d]{1,3}?(号)){0,1}'
            # \u4e00-\u9fa5 匹配任何中文
            # {2,5} 匹配2到5次
            # ? 前面可不匹配
            # (?:pattern) 如industr(?:y|ies) 就是一个比 'industry|industries' 更简略的表达式。意思就是说括号里面的内容是一个整体是以y或者ies结尾的单词
            pattern = re.compile(PATTERN1)
            p1 = ''
            p2 = ''
            p3 = ''
            p4 = ''
            p5 = ''
            p6 = ''
            m = pattern.search(line)
            if not m:
                continue
            else:
                address.append(m.group(0))
                #print('事件地点:',m.group(0))
    
        print('事件地点:',set(address))
    
    

    #事故摘要:
    摘要讲解见链接:中文文本摘要提取 (文本摘要提取 有代码)基于python
    停用词链接:nlp 中文停用词数据集

    def shijian(data):
        import jieba
        text=''.join(data)
        text = re.sub(r'[[0-9]*]', ' ', text)  # 去除类似[1],[2]
        text = re.sub(r'\s+', ' ', text)  # 用单个空格替换了所有额外的空格
        sentences = re.split('(。|!|\!|\.|?|\?)', text)  # 分句
    
        # 加载停用词
    
        def stopwordslist(filepath):
            stopwords = [line.strip() for line in open(filepath, 'r', encoding='gbk').readlines()]
            return stopwords
    
        stopwords = stopwordslist("停用词.txt")
    
        # 词频
        word2count = {}  # line 1
        for word in jieba.cut(text):  # 对整个文本分词
            if word not in stopwords:
                if word not in word2count.keys():
                    word2count[word] = 1
                else:
                    word2count[word] += 1
        for key in word2count.keys():
            word2count[key] = word2count[key] / max(word2count.values())
    
        # 计算句子得分
        sent2score = {}
        for sentence in sentences:
            for word in jieba.cut(sentence):
                if word in word2count.keys():
                    if len(sentence) < 300:
                        if sentence not in sent2score.keys():
                            sent2score[sentence] = word2count[word]
                        else:
                            sent2score[sentence] += word2count[word]
    
        # 字典排序
        def dic_order_value_and_get_key(dicts, count):
            # by hellojesson
            # 字典根据value排序,并且获取value排名前几的key
            final_result = []
            # 先对字典排序
            sorted_dic = sorted([(k, v) for k, v in dicts.items()], reverse=True)
            tmp_set = set()  # 定义集合 会去重元素 --此处存在一个问题,成绩相同的会忽略,有待改进
            for item in sorted_dic:
                tmp_set.add(item[1])
            for list_item in sorted(tmp_set, reverse=True)[:count]:
                for dic_item in sorted_dic:
                    if dic_item[1] == list_item:
                        final_result.append(dic_item[0])
            return final_result
    
        # 摘要输出
        final_resul = dic_order_value_and_get_key(sent2score, 5)
        print('事件主要意思:',final_resul)
    
    

    #主函数:

    def main(data):
        pattern_cause(data)
        pattern_lose(data)
        pattern_time(data)
        pattern_address(data)
        shijian(data)
    if __name__ =='__main__':
        #读取数据
        with open('新闻.txt', 'r', encoding='utf-8') as f:
            test_data = f.readlines()
        main(test_data)
    

    数据集:新闻.txt

    1月14日18时19分,宝鸡市渭滨区金陵街道机厂街社区铁路家属院17号楼一单元发生火灾,火势由二、三、四阳台向上蔓延,一名老人被困屋内,情况危急。宝鸡消防支队渭滨大队广元路中队接警后,迅速赶赴现场展开救援,将被困老人救出。记者了解到,火灾发生后,宝鸡消防支队渭滨大队广元路中队立即赶赴现场开展救援,经现场侦查发现,火势由二、三、四楼阳台向上蔓延,均已过火。由于小区内道路蜿蜒且狭窄,中队立即调派经一路、开元、宝光、电子街4个卫星消防站增援。中队到场后立即成立搜救组、灭火组、供水组开展救援工作。消防在搜救过程中发现1单元2楼南户有一名老人被困,中队立即进行营救,同时并对2单元30余名群众进行疏散。灭火小组从小区南北两侧对现场火势进行打压。铁塔路及新华路中队随后也赶到现场增援,20时10分现场明火被扑灭。火灾未造成人员伤亡,起火原因正在调查中。
    

    运行结果
    在这里插入图片描述
    事件主题意思在下面:

    截个全图看下:
    在这里插入图片描述
    运行结果还不错。
    反思
    这代码是针对国内新闻的,因为地址正则是针对国内地址的。
    代码马马虎虎,不是很完善。针对其他类新闻和文本,则需修改下正则(如事事故伤亡可能没有)。
    基于正则需要花费大量脑力。基于正则主要在于如何定义规则。
    有时间还是想想基于模型吧。
    电气工程的计算机萌新:余登武。写文章不容易。如果你喜欢本文章,请点个赞支持下写作,谢谢。
    在这里插入图片描述
    在这里插入图片描述

    展开全文
  • 很多NLP实际项目都需要用到复合事件的抽取,例如:知识图谱中的事件抽取,形成事理图谱;智能聊天对话中的事件抽取,用于识别用户意图。复合事件的抽取作为一个NLP的基础模块,还是有很多东西可以研究的。本文主要...

    文/IT可达鸭

    图/IT可达鸭、网络

    前言

    什么是复合事件?复合事件包括条件事件、因果事件、顺承事件、反转事件。

    很多NLP实际项目都需要用到复合事件的抽取,例如:知识图谱中的事件抽取,形成事理图谱;智能聊天对话中的事件抽取,用于识别用户意图。复合事件的抽取作为一个NLP的基础模块,还是有很多东西可以研究的。

    本文主要介绍,一种基于规则的中文事件抽取方法,源码来自网友的贡献。本文的源码在此基础上进行优化,将数据与代码分开,方便后期扩充事件抽取的规则,同时简化代码,更易于理解。

    最好的学习方法就是阅读优秀的源码,最好的成果检验,就修改源码,把代码融入自己的想法。

    下面就跟着可达鸭一起学习如何开发这个项目吧。

    技术难点

    json文件的读取和存储

    ,以及一些乱码情况的处理。

    读取json文件

    保存文件的时候,使用dump函数,“ensure_ascii”参数设置为False,这样保存的文件就不会出现中文乱码。

    保存json文件

    如何去提取复合事件呢?这里使用

    正则表达式

    ,也是最常用的基于规则的方法。

    由于事件的句式有上百个连接关键词词,而连接词有上千种组合。我们必须设计出能匹配所有组合的正则表达式。这里设计一个生成动态正则表达式的函数,由手工整理出固定的连接词(详细见数据文件data.json),再有程序自动排列组合出各种形式。

    将上述“不但”、“而且”设计为变量,用python中的占位符代替。从文件中读取句式连接词,迭代赋值“pre”和“pos”,就能生成所有的正则表达式。

    文章代码来源于开源的源码,这边做了优化和改进。我们将数据与代码分开,并引入设计模式中的策略模式。策略模式主要解决:在有多种算法相似的情况下,使用“if...else...”所带来的难以维护的问题。避免过多使用多重条件的判断,扩展性良好。

    如果大家的代码出现大量的“if....else....”, 可以考虑是否可以使用策略模式。

    项目总体框架图

    有了项目总体框架图,就能很好的把握整个项目的开发进度。

    详细代码

    导入相应的python包,其中re是一个正则表达式模块,用于模式匹配。

    所有的事件连接词,都写入data.json,其中格式如下图。写入json文件的好处就是,方便后续的扩展。增加事件连接词,增加事件类型都可以很方便实现,不用去修改源码。

    将复合事件抽取封装成一个类,这是类初始化函数基本写法。

    数据加载模块,从data.json读取数据,并整理数据的格式。

    根据数据加载模块中获取的数据,生成所有的正则表达式。

    基于正则表达式,对每个输入的句子,进行匹配与事件抽取。

    这里是对输入的文本进行切分,然后对每个句子进行事件抽取。

    建立main函数,对写好的类进行测试。

    最终输出结果,大家可以根据自己想要的输出格式进行修改。

    结语

    本项目中用到了

    正则表达式

    文件读取

    设计模式--策略模式

    ,这些都是一些基本的编程知识。需要在日常编程中不断积累,不能一蹴而就。

    学习python,不需要一杯奶茶钱,只需要你点个关注。

    如果觉得小编的文章对你有帮助,记得点个赞,顺便帮我分享出去。如果想获取源码,可以关注后,私信:

    python事件抽取

    ,我把源码发给你。最后,感谢大家的阅读,祝大家生活愉快。

    本文由

    IT可达鸭

    原创,欢迎关注,带你一起长知识!

    展开全文

空空如也

空空如也

1 2 3 4 5 ... 13
收藏数 259
精华内容 103
关键字:

事件抽取代码