• ## 自然语言理解

千次阅读 2018-08-14 00:44:20
人类对机器理解语言的认识走了一条大...包括，目前语音理解的输出输入的打通都需要自然语言理解的基础支撑，目前，自然语言理解使用了万金油技术--神经网络，如虎添翼，一路攻城拔寨，效能飞速上升。学习掌握 NLP...
人类对机器理解语言的认识走了一条大弯路。早期的研究集中采用基于规则的方法，虽然解决了一些简单的问题，但是无法从根本上将自然语言理解实用化。直到20多年后，人们开始尝试用基于统计的方法进行自然语言处理，才有了突破性的进展和实用的产品。包括，目前语音理解的输出输入的打通都需要自然语言理解的基础支撑，目前，自然语言理解使用了万金油技术--神经网络，如虎添翼，一路攻城拔寨，效能飞速上升。学习掌握 NLP 技术是程序员的屠龙刀，武功大增，薪资翻倍的捷径。注意：本场 Chat 属于基础功夫，类似，九阴真经上半部；如果想直接练习金银白骨抓的跳过，等待下回分解。本场 Chat 主要内容：循环序列模型（Recurrent Neural Networks）为什么选择序列模型？（Why Sequence Models?）数学符号（Notation）；循环神经网络模型（Recurrent Neural Network Model）；通过时间的反向传播（Backpropagation through time）；不同类型的循环神经。阅读全文: http://gitbook.cn/gitchat/activity/5b63fa248170397a0ea45781一场场看太麻烦？订阅GitChat体验卡，畅享300场chat文章！更有CSDN下载、CSDN学院等超划算会员权益！点击查看
展开全文
• 自然语言理解自然语言理解
• 第六章 自然语言理解 课题名称自然语言理解 学习过程 课程名称 人工智能导论 教学内容 自然语言理解 学时 6课时 翻转课时 第1256课时 教学环境 多媒体教室 教学方法 情境教学法任务驱动法 讲练结合法小组讨论教学法 ...
• 什么是自然语言理解自然语言理解的准则是什么？ 自然语言理解与“智能”一样，存在各种各样的理解和解释，是利用计算机对自然语言进行理解。 自然语言理解的准则：给计算机输入一段自然语言为本，如果计算机能问答...
什么是自然语言理解？自然语言理解的准则是什么？
自然语言理解与“智能”一样，存在各种各样的理解和解释，是利用计算机对自然语言进行理解。
自然语言理解的准则：给计算机输入一段自然语言为本，如果计算机能问答——计算机能正确地回答输入文本中的有关问题；
文摘生成——机器有能力产生输入为本的摘要；
释义——机器用不同的词语和语句复述输入文本；
翻译——机器把一种语言（源语言）翻译为另一种语言（目标语言）


展开全文
• 自然语言理解和自然语言处理 什么是自然语言处理？ (What is natural language processing?) Natural language processing, or NLP, is a type of artificial intelligence (AI) that specializes in analyzing ...
自然语言理解和自然语言处理 什么是自然语言处理？ (What is natural language processing?)
Natural language processing, or NLP, is a type of artificial intelligence (AI) that specializes in analyzing human language. 自然语言处理(NLP)是一种 专门用于分析人类语言的 人工智能(AI) 。
It does this by: 它通过以下方式做到这一点：
Reading natural language, which has evolved through natural human usage and that we use to communicate with each other every day 阅读自然语言，这种自然语言是通过人类的自然使用而发展起来的，并且我们每天都在与他们交流 Interpreting natural language, typically through probability-based algorithms 解释自然语言，通常通过基于概率的算法 Analyzing natural language and providing an output 分析自然语言并提供输出 Have you ever used Apple’s Siri and wondered how it understands (most of) what you’re saying? This is an example of NLP in practice. 您是否曾经使用过Apple的Siri，想知道它如何理解(大部分)您在说什么？ 这是实践中NLP的一个示例。
NLP is becoming an essential part of our lives, and together with machine learning and deep learning, produces results that are far superior to what could be achieved just a few years ago. NLP正在成为我们生活中不可或缺的一部分，并且与机器学习和深度学习一起产生的结果远远优于几年前所能达到的结果。
In this article we’ll take a closer look at NLP, see how it’s applied and learn how it works. 在本文中，我们将仔细研究NLP，了解其应用方式并了解其工作原理。
自然语言处理能做什么？ (What can natural language processing do?)
NLP is used in a variety of ways today. These include: 如今，NLP以多种方式使用。 这些包括：
机器翻译 (Machine translation)
When was the last time you visited a foreign country and used your smart phone for language translation? Perhaps you used Google Translate? This is an example of NLP machine translation. 您上次访问国外是什么时候使用智能手机进行语言翻译？ 也许您使用过Google翻译？ 这是NLP机器翻译的示例。
Machine translation works by using NLP to translate one language into another. Historically, simple rules-based methods have been used to do this. But today’s NLP techniques are a big improvement on the rules-based methods that have been around for years. 机器翻译通过使用NLP将一种语言翻译成另一种语言来工作。 从历史上看，简单的基于规则的方法已用于执行此操作。 但是，如今的NLP技术是对基于规则的方法的重大改进，这种方法已经存在多年了。
For NLP to do well at machine translation it employs deep learning techniques. This form of machine translation is sometimes called neural machine translation (NMT), since it makes use of neural networks. NMT therefore interprets language based on a statistical, trial and error approach and can deal with context and other subtleties of language. 为了使NLP在机器翻译方面表现出色，它采用了深度学习技术。 这种形式的机器翻译有时被称为神经机器翻译(NMT)，因为它利用了神经网络。 因此，NMT基于统计，反复试验的方法来解释语言，并且可以处理上下文和其他语言的细微差别。
Translating plain text, web pages or files such as Excel, Powerpoint or Word. Systran is an example of a translation services company that does this. 翻译纯文本，网页或文件，例如Excel，Powerpoint或Word。 Systran是执行此操作的翻译服务公司的示例。 Translating social feeds in real-time, as offered by SDL Government, a company specializing in public sector language services. 由SDL Government (一家专门从事公共部门语言服务的公司)提供的服务，可以实时翻译社交信息。 Translating languages in medical situations, such as when an English-speaking doctor is treating a Spanish-speaking patient, as offered by Canopy Speak. 在Canopy Speak提供的医疗情况下(例如，当说英语的医生正在治疗说西班牙语的患者时)翻译语言。 Translating financial documents such as annual reports, investment commentaries and information documents, as offered by Lingua Custodia, a company specializing in financial translations. 由专门从事财务翻译的公司Lingua Custodia提供的财务文件的翻译，例如年度报告，投资评论和信息文件。  语音识别 (Speech recognition)
Earlier, we mentioned Siri as an example of NLP. One particular feature of NLP used by Siri is speech recognition. Alexa and Google Assistant (“ok Google”) are other well known examples of NLP speech recognition. 之前，我们提到Siri作为NLP的示例。 Siri使用的NLP的一项特殊功能是语音识别。 Alexa和Google Assistant(“ ok Google”)是NLP语音识别的其他知名示例。
Speech recognition isn’t a new science and has been around for over 50 years. It’s only recently though that its ease-of-use and accuracy have improved significantly, thanks to NLP. 语音识别并不是一门新兴的科学，已经有50多年的历史了。 直到最近，借助NLP，它的易用性和准确性有了显着提高。
At the heart of speech recognition is the ability to identify spoken words, interpret them and convert them to text. A range of actions can then follow such as answering questions, performing instructions or writing emails. 语音识别的核心是识别口语单词，对其进行解释并将其转换为文本的能力。 然后可以采取一系列行动，例如回答问题，执行说明或写电子邮件。
The powerful methods of deep learning used in NLP allow today’s speech recognition applications to work better than ever before. NLP中使用的强大的深度学习方法使当今的语音识别应用程序比以往任何时候都可以更好地工作。
聊天机器人 (Chatbots)
Chatbots are software programs that simulate natural human conversation. They are used by companies to help with customer service, consumer queries and sales enquiries. 聊天机器人是模拟自然人类对话的软件程序。 公司使用它们来帮助客户服务，消费者查询和销售查询。
You may have interacted with a chatbot the last time you logged on to a company website and used their online help system. 您上次登录公司网站并使用其在线帮助系统时，您可能与聊天机器人进行了交互。
While simple chatbots use rules-based methods, today’s more capable chatbots use NLP to understand what customers are saying and how to respond. 尽管简单的聊天机器人使用基于规则的方法，但如今功能更强大的聊天机器人使用NLP来了解客户在说什么以及如何响应。
Well known examples of chatbots include: 聊天机器人的知名示例包括：
The World Health Organization (WHO) chatbot, built on the WhatsApp platform, which shares information and answers queries about the spread of the COVID-19 virus 建立在WhatsApp平台上的世界卫生组织(WHO) 聊天机器人 ，可共享信息并回答有关COVID-19病毒传播的查询 National Geographic’s Genius chatbot, that speaks like Albert Einstein and engages with users to promote the National Geographic show of the same name 国家地理杂志的Genius 聊天机器人 ，说话方式像阿尔伯特·爱因斯坦，并与用户互动以宣传同名国家地理杂志节目 Kian, Korean car manufacturer Kia’s chatbot on FaceBook Messenger, that answers queries about Kia cars and helps with sales enquiries 韩国汽车制造商起亚( Kian)在FaceBook Messenger上的聊天机器人 ，可回答有关起亚汽车的查询并帮助进行销售查询 Whole Foods’ chatbot that help with recipe information, cooking inspiration and product recommendations Whole Foods的聊天机器人 ，可提供食谱信息，烹饪灵感和产品推荐  情绪分析 (Sentiment analysis)
Sentiment analysis uses NLP to interpret and classify emotions contained in text data. This is used, for instance, to classify online customer feedback about products or services in terms of positive or negative experience. 情感分析使用NLP来解释和分类文本数据中包含的情感。 例如，这可用于根据正面或负面体验对有关产品或服务的在线客户反馈进行分类。
In its simplest form, sentiment analysis can be done by categorizing text based on designated words that convey emotion, like “love”, “hate”, “happy”, ”sad” or “angry”. This type of sentiment analysis has been around for a long time but is of limited practical use due to its simplicity. 以最简单的形式，可以通过根据传达情感的指定单词对文本进行分类来进行情感分析，例如“爱”，“恨”，“快乐”，“悲伤”或“生气”。 这种类型的情感分析已经存在很长时间了，但是由于其简单性而在实际应用中受到限制。
Today’s sentiment analysis uses NLP to classify text based on statistical and deep learning methods. The result is sentiment analysis that can handle complex and natural-sounding text. 当今的情感分析使用NLP基于统计和深度学习方法对文本进行分类。 结果是可以处理复杂且听起来自然的文本的情感分析。
There’s a huge interest in sentiment analysis nowadays from businesses worldwide. It can provide valuable insights into customer preferences, levels of satisfaction and feedback on opinions which can help with marketing campaigns and product design. 如今，全球企业对情感分析都产生了浓厚的兴趣。 它可以提供有关客户偏爱，满意度和对意见的反馈的宝贵见解，从而有助于营销活动和产品设计。
邮件分类 (Email classification)
Email overload is a common challenge in the modern workplace. NLP can help to analyze and classify incoming emails so that they can be automatically forwarded to the right place. 电子邮件超载是现代工作场所中的常见挑战。 NLP可以帮助分析和分类传入的电子邮件，以便可以将它们自动转发到正确的位置。
In the past, simple keyword-matching techniques were used to classify emails. This had mixed success. NLP allows a far better classification approach as it can understand the context of individual sentences, paragraphs and whole sections of text. 过去，简单的关键字匹配技术用于对电子邮件进行分类。 这取得了不同的成功。 NLP可以更好地分类，因为它可以理解单个句子，段落和整个文本的上下文。
Given the sheer volume of emails that businesses have to deal with today, NLP-based email classification can be a great help in improving workplace productivity. Classification using NLP helps to ensure that emails don’t get forgotten in over-burdened inboxes and are properly filed for further action. 鉴于当今企业必须处理大量电子邮件，基于NLP的电子邮件分类可以大大提高工作效率。 使用NLP进行分类有助于确保电子邮件不会在繁重的收件箱中被遗忘，并且可以正确归档以采取进一步措施。
自然语言处理如何工作？ (How does natural language processing work?)
Now that we’ve seen what NLP can do, let’s try and understand how it works. 既然我们已经了解了NLP可以做什么，那么让我们尝试并了解它的工作原理。
In essence, NLP works by transforming a collection of text information into designated outputs. 本质上，NLP通过将文本信息的集合转换为指定的输出来工作。
If the application is machine translation, then the input text information would be documents in the source language (say, English) and the output would be the translated documents in the target language (say, French). 如果应用程序是机器翻译，则输入文本信息将是源语言(例如英语)的文档，而输出将是目标语言(例如法语)的翻译文档。
If the application is sentiment analysis, then the output would be a classification of the input text into sentiment categories. And so on. 如果应用程序是情感分析，那么输出将是将输入文本分类为情感类别。 等等。
NLP工作流程 (The NLP workflow)
Modern NLP is a mixed discipline that draws on linguistics, computer science and machine learning. The process, or workflow, that NLP uses has three broad steps: 现代自然语言处理是一门混合学科，利用语言学，计算机科学和机器学习技术。 NLP使用的过程或工作流包含三个主要步骤：
Step 1 — Text pre-processing 第1步-文本预处理
Step 2 — Text representation 第2步-文本表示
Step 3 — Analysis and modeling 第3步-分析和建模
Each step may use a range of techniques which are constantly evolving with continued research. 每个步骤都可以使用随着不断研究而不断发展的一系列技术。
步骤1：文字预处理 (Step 1: Text pre-processing)
The first step is to prepare the input text so that it can be analyzed more easily. This part of NLP is well established and draws on a range of traditional linguistic methods. 第一步是准备输入文本，以便可以更轻松地对其进行分析。 NLP的这一部分已经很好地建立，并借鉴了一系列传统语言方法。
Some of the key approaches used in this step are: 此步骤中使用的一些关键方法是：
Tokenization, which breaks up text into useful units (tokens). This separates words using blank spaces, for instance, or separates sentences using full stops. Tokenization also recognizes words that often go together, such as “New York” or “machine learning”. As an example, the tokenization of the sentence “Customer service couldn’t be better” would result in the following tokens: “customer service”, “could”, “not”, “be” and “better”. 令牌化 ，它将文本分解成有用的单位(令牌)。 例如，这使用空格分隔单词，或使用句号分隔句子。 令牌化还可以识别经常一起使用的单词，例如“纽约”或“机器学习”。 例如，句子“客户服务再好不过”的标记化将导致以下标记：“客户服务”，“可能”，“不是”，“成为”和“更好”。 Normalization transforms words to their base form using techniques like stemming and lemmatization. This is done to help reduce ‘noise’ and simplify the analysis. Stemming identifies the stems of words by removing their suffixes. The stem of the word “studies”, for instance, is “studi”. Lemmatization similarly removes suffixes, but also removes prefixes if required and results in words that are normally used in natural language. The lemma of the word “studies”, for instance, is “study”. In most applications, lemmatization is preferred to stemming as the resulting words have more meaning in natural speech. 归一化的变换的话他们的碱形式使用像词干和词形还原技术。 这样做是为了帮助减少“噪音”并简化分析。 词干通过删除词缀来识别词干。 例如，“研究”一词的词干是“研究”。 词法化同样会删除后缀，但如果需要的话也会删除前缀，并产生通常以自然语言使用的单词。 例如，“研究”一词的引理是“研究”。 在大多数应用中，词干重于词根优先，因为最终的单词在自然语音中具有更多含义。 Part-of-speech (POS) tagging draws on morphology, or the study of inter-relationships between words. Words (or tokens) are tagged based on their function in sentences. This is done by using established rules from text corpora to identify the purpose of words in speech, ie. verb, noun, adjective etc. 词性(POS)标记利用词法或词间相互关系的研究。 单词(或标记)基于其在句子中的功能进行标记。 这是通过使用来自文本语料库的已建立规则来识别语音中单词的目的来完成的。 动词，名词，形容词等 Parsing draws on syntax, or the understanding of how words and sentences fit together. This helps to understand the structure of sentences and is done by breaking down sentences into phrases based on the rules of grammar. A phrase may contain a noun and an article, such as “my rabbit”, or a verb as in “likes to eat carrots”. 解析使用语法，或者理解单词和句子如何组合在一起。 这有助于理解句子的结构，并且可以通过根据语法规则将句子分解为短语来完成。 短语可以包含名词和冠词，例如“我的兔子”，也可以包含动词，例如“喜欢吃胡萝卜”。 Semantics identifies the intended meaning of words used in sentences. Words can have more than one meaning. For example “pass” can mean (i) to physically hand over something, (ii) a decision to not take part in something, or (iii) a measure of success in an exam. A word’s meaning can be understood better by looking at the words that appear before and after it. 语义识别句子中使用的单词的预期含义。 单词可以有多个含义。 例如，“及格”可以表示(i)实际交出某件东西，(ii)不参加某件东西的决定，或(iii)考试成功的量度。 通过查看单词之前和之后出现的单词，可以更好地理解单词的含义。  步骤2：文字表示 (Step 2: Text representation)
In order for text to be analyzed using a machine and deep learning methods, it needs to be converted into numbers. This is the purpose of text representation. 为了使用机器和深度学习方法分析文本，需要将其转换为数字。 这是文本表示的目的。
Some key methods used in this step are: 此步骤中使用的一些关键方法是：
词袋 (Bag of words)
Bag of words, or BoW, is an approach that represents text by counting how many times each word in an input document occurs in comparison with a known list of reference words (vocabulary). 单词袋或BoW是一种方法，它通过与已知参考单词列表(词汇表)相比，计算输入文档中每个单词出现的次数来表示文本。
The result is a set of vectors that contain numbers depicting how many times each word occurs. These vectors are called ‘bags’ as they don’t include any information about the structure of the input documents. 结果是一组向量，这些向量包含描述每个单词出现多少次的数字。 这些向量称为“袋子”，因为它们不包含有关输入文档结构的任何信息。
To illustrate how BoW works, consider the sample sentence “the cat sat on the mat”. This contains the words “the”, “cat”, “sat”, “on” and “mat”. The frequency of occurrence of these words can be represented by a vector of the form [2, 1, 1, 1, 1]. Here, the word “the” occurs twice and the other words occur once. 为了说明BoW的工作原理，请考虑例句“猫坐在垫子上”。 其中包含单词“ the”，“ cat”，“ sat”，“ on”和“ mat”。 这些单词的出现频率可以用[2，1，1，1，1，1]形式的向量表示。 在此，单词“ the”出现两次，而其他单词出现一次。
When compared with a large vocabulary, the vector will expand to include several zeros. This is because all of the words in the vocabulary which aren’t contained in the sample sentence will have zero frequencies against them. The resulting vector may contain a large number of zeros and hence is referred to as a ‘sparse vector’. 与大词汇量相比，向量将扩展为包括几个零。 这是因为词汇表中所有单词中未包含的单词对其的频率均为零。 所得向量可能包含大量零，因此被称为“稀疏向量”。
The BoW approach is fairly straightforward and easy to understand. The resulting sparse vectors however can be very large when the vocabulary is large. This leads to computationally challenging vectors that don’t contain much information (ie. are mostly zeros). BoW方法相当简单易懂。 但是，当词汇量很大时，所得的稀疏向量可能会非常大。 这将导致计算上具有挑战性的向量不包含太多信息(即大部分为零)。
Further, BoW looks at individual words, so any information about words that go together is not captured. This results in a loss of context for later analysis. 此外，BoW会查看单个单词，因此不会捕获有关一起单词的任何信息。 这会导致上下文丢失，无法进行后续分析。
袋n克 (Bag of n-grams)
One way of reducing the loss of context with BoW is to create vocabularies of grouped words rather than single words. These grouped words are referred to as ‘n-grams’, where ’n’ is the grouping size. The resulting approach is called ‘bag of n-grams’ (BNG). 减少BoW语境损失的一种方法是创建分组词而不是单个词的词汇表。 这些分组的单词称为“ n-grams”，其中“ n”是分组大小。 最终的方法称为“ n克袋”(BNG)。
The advantage of BNG is that each n-gram captures more context than single words. BNG的优点是每个n-gram捕获的上下文比单个单词要多。
In the earlier sample sentence, “sat on” and “the mat” are examples of 2-grams, and “on the mat” is an example of a 3-gram. 在较早的例句中，“ sat on”和“ the mat”是2克的示例，而“ on the mat”是3克的示例。
特遣部队 (TF-IDF)
One issue with counting the number of times a word appears in documents is that certain words start to dominate the count. Words like “the”, “a” or “it”. These words tend to occur frequently but don’t contain much information. 计算单词在文档中出现的次数的一个问题是某些单词开始占主导地位。 诸如“ the”，“ a”或“ it”之类的词。 这些字词经常出现，但包含的信息并不多。
One way to deal with this is to treat words that appear frequently across documents differently to words that appear uniquely. The words appearing frequently tend to be low value words like “the”. The counts of these words can be penalized to help reduce their dominance. 一种解决方法是将在文档中频繁出现的单词与唯一出现的单词区别对待。 经常出现的单词往往是诸如“ the”之类的低价值单词。 这些单词的数量可能会受到惩罚，以帮助降低其优势。
This approach is called ‘term frequency — inverse document frequency’ or TF-IDF. Term frequency looks at the frequency of a word in a given document while the inverse document frequency looks at how rare the word is across all documents. 这种方法称为“术语频率-反向文档频率”或TF-IDF。 术语频率查看给定文档中单词的频率，而反向文档频率查看单词在所有文档中的稀有度。
The TF-IDF approach acts to downplay frequently occurring words and highlight more unique words that have useful information, such as “cat” or “mat”. This can lead to better results. TF-IDF方法可淡化经常出现的单词，并突出显示具有有用信息(例如“猫”或“垫子”)的更独特的单词。 这样可以带来更好的结果。
词嵌入 (Word embedding)
A more sophisticated approach to text representation involves word embedding. This maps each word to individual vectors, where the vectors tend to be ‘dense’ rather than ‘sparse’ (ie. smaller and with fewer zeros). Each word and the words surrounding it are considered in the mapping process. The resulting dense vectors allow for a better analysis and comparison between words and their context. 文本表示的一种更复杂的方法涉及单词嵌入。 这会将每个单词映射到单独的向量，其中向量倾向于“密集”而不是“稀疏”(即较小且零位较少)。 在映射过程中会考虑每个单词及其周围的单词。 生成的密集向量可以更好地分析和比较单词及其上下文。
Word embedding approaches use powerful machine learning and deep learning to perform the mapping. It is an evolving area which has produced some excellent results. Key algorithms in use today include Word2Vec, GloVe and FastText. 词嵌入方法使用强大的机器学习和深度学习来执行映射。 这是一个不断发展的领域，取得了一些出色的成果。 今天使用的关键算法包括Word2Vec，GloVe和FastText。
步骤3：分析和建模 (Step 3: Analysis and modeling)
The final step in the NLP process is to perform calculations on the vectors generated through steps 1 and 2, to produce the desired outcomes. Here, machine learning and deep learning methods are used. Many of the same machine learning techniques from non-NLP domains, such as image recognition or fraud detection, may be used in this analysis. NLP过程的最后一步是对通过步骤1和2生成的向量进行计算，以产生所需的结果。 在这里，使用了机器学习和深度学习方法。 来自非NLP域的许多相同的机器学习技术，例如图像识别或欺诈检测，都可以在此分析中使用。
Consider sentiment analysis. This can be done using either supervised or unsupervised machine learning. Supervised machine learning requires pre-labeled data while unsupervised machine learning uses pre-prepared databases of curated words (lexicons) to help with classifying sentiment. 考虑情绪分析。 这可以使用有监督或无监督的机器学习来完成。 有监督的机器学习需要预先标记的数据，而无监督的机器学习则使用预先准备的策展词(词典)数据库来帮助对情感进行分类。
Using machine learning, input text vectors are classified using a probabilistic approach. This is done through either a trained model (supervised machine learning) or by comparison with a suitable lexicon (unsupervised machine learning). 使用机器学习，使用概率方法对输入文本向量进行分类。 这可以通过训练模型(有监督的机器学习)或通过与合适的词典进行比较(无监督的机器学习)来完成。
The outcomes are sentiment classifications based on the probabilities generated through the machine learning process. 结果是基于通过机器学习过程生成的概率的情感分类。
结论 (Conclusion)
NLP is developing rapidly and is having an increasing impact on society. From language translation to speech recognition, and from chatbots to identifying sentiment, NLP is providing valuable insights and making our lives more productive. NLP发展Swift，对社会的影响越来越大。 从语言翻译到语音识别，再到聊天机器人再到情感识别，NLP都提供了宝贵的见解，使我们的生活更加高效。
Modern NLP works by using linguistics, computer science and machine learning. Over recent years, NLP has produced results that far surpass what we’ve seen in the past. 现代自然语言处理通过使用语言学，计算机科学和机器学习来工作。 近年来，NLP产生的结果远远超过了过去。
The basic workflow of NLP involves text pre-processing, text representation and analysis. A variety of techniques are in use today and more are being developed with ongoing research. NLP的基本工作流程涉及文本预处理，文本表示和分析。 如今，各种技术正在使用中，并且随着不断的研究，正在开发更多的技术。
NLP promises to revolutionize many areas of industry and consumer practice. It’s already become a familiar part of our daily lives. NLP承诺彻底改变行业和消费者实践的许多领域。 它已经成为我们日常生活中熟悉的一部分。
With NLP, we have a powerful way of engaging with a digital future through a medium we are inherently comfortable with — our ability to communicate through natural language. 借助NLP，我们拥有了一种强大的方式，可以通过我们固有的媒介(即通过自然语言进行交流的能力)来参与数字未来。
翻译自: https://towardsdatascience.com/natural-language-processing-a-simple-explanation-7e6379085a50自然语言理解和自然语言处理
展开全文
• 自然语言理解和自然语言处理by Mariya Yao 姚iya(Mariya Yao) 4种自然语言处理和理解的方法 (4 Approaches To Natural Language Processing & Understanding) In 1971, Terry Winograd wrote the SHRDLU ...
自然语言理解和自然语言处理by Mariya Yao 姚iya(Mariya Yao)
4种自然语言处理和理解的方法 (4 Approaches To Natural Language Processing & Understanding)
In 1971, Terry Winograd wrote the SHRDLU program while completing his PhD at MIT. 1971年，Terry Winograd在麻省理工学院攻读博士学位时编写了SHRDLU程序。
SHRDLU features a world of toy blocks where the computer translates human commands into physical actions, such as “move the red pyramid next to the blue cube.” SHRDLU具有玩具积木世界，其中计算机将人工命令转换为实际动作，例如“将红色金字塔移到蓝色立方体旁边”。
To succeed at such tasks, the computer must build up semantic knowledge iteratively, a process Winograd discovered as brittle and limited. 为了成功完成这些任务，计算机必须迭代地建立语义知识，而Winograd发现该过程脆弱且受限制。
The rise of chatbots and voice activated technologies has renewed fervor in natural language processing (NLP) and natural language understanding (NLU) techniques that can produce satisfying human-computer dialogs. 聊天机器人和语音激活技术的兴起重新激发了人们对自然语言处理(NLP)和自然语言理解(NLU)技术的热情，这些技术可以产生令人满意的人机对话。
Unfortunately, academic breakthroughs have not yet translated into improved user experience. Gizmodo writer Darren Orf declared Messenger chatbots “frustrating and useless” and Facebook admitted a 70% failure rate for their highly anticipated conversational assistant, “M.” 不幸的是，学术上的突破尚未转化为改善的用户体验。 Gizmodo作家Darren Orf宣布Messenger聊天机器人“ 令人沮丧且无用 ”，而Facebook承认其备受期待的对话助手“ M”的失败率高达70％ 。
Nevertheless, researchers forge ahead with new plans of attack, occasionally revisiting the same tactics and principles Winograd tried in the 70s. 尽管如此，研究人员还是提出了新的进攻计划，偶尔会重温Winograd在70年代尝试过的相同战术和原则。
OpenAI recently leveraged reinforcement learning to teach to agents to design their own language by “dropping them into a set of simple worlds, giving them the ability to communicate, and then giving them goals that can be best achieved by communicating with other agents.” The agents independently developed a simple “grounded” language. OpenAI最近利用强化学习来教给代理人设计自己的语言，方法是“将他们放到一组简单的世界中，赋予他们交流的能力，然后赋予他们可以与其他代理人进行交流的最佳目标。” 代理商独立开发了一种简单的“扎根”语言。
MIT Media Lab presents this satisfying clarification on what “grounded” means in the context of language: 麻省理工学院媒体实验室就语言中“扎根”的含义提出了令人满意的澄清：
“Language is grounded in experience. Unlike dictionaries which define words in terms of other words, humans understand many basic words in terms of associations with sensory-motor experiences. People must interact physically with their world to grasp the essence of words like “red,” “heavy,” and “above.” Abstract words are acquired only in relation to more concretely grounded terms. Grounding is thus a fundamental aspect of spoken language, which enables humans to acquire and to use words and sentences in context.” 语言基于经验。 与用其他词来定义词的词典不同，人类根据与感觉运动体验的关联来理解许多基本词。 人们必须与自己的世界进行互动，以掌握“红色”，“沉重”和“上方”等词语的本质。 仅与更具体的基础术语相关地获取抽象词。 因此，扎根是口头语言的基本方面，它使人类能够在上下文中获取和使用单词和句子。” The antithesis of grounded language is inferred language. Inferred language derives meaning from words themselves rather than what they represent. 基本语言的对立是推断语言。 推断语言是从单词本身而不是它们所代表的含义中获得含义的。
When trained only on large corpuses of text — but not on real-world representations — statistical methods for NLP and NLU lack true understanding of what words mean. 如果只接受大型文本语料库的训练，而不能接受真实世界的表示法的训练，那么NLP和NLU的统计方法就无法真正理解单词的含义。
OpenAI points out that such approaches share the weaknesses revealed by John Searle’s famous Chinese Room thought experiment. Equipped with a universal dictionary to map all possible Chinese input sentences to Chinese output sentences, anyone can perform a brute force lookup and produce conversationally acceptable answers without understanding what they’re actually saying. OpenAI指出，这种方法具有约翰·塞尔(John Searle)著名的中国房间思想实验所揭示的缺点。 配备了通用字典，可以将所有可能的中文输入句子映射到中文输出句子，任何人都可以执行暴力查询并产生对话可接受的答案，而无需了解他们的实际意思。
语言为何如此复杂？ (Why Is Language So Complex?)
Percy Liang, a Stanford CS professor and NLP expert, breaks down the various approaches to NLP / NLU into four distinct categories: 斯坦福大学CS教授和NLP专家Percy Liang 将NLP / NLU的各种方法分解为四个不同的类别：
Distributional 分配式 Frame-based 基于框架 Model-theoretical 模型理论 Interactive learning 互动学习 First, a brief linguistics lesson before we continue on to define and describe those categories. 首先，在我们继续定义和描述这些类别之前，简要讲授语言学课程。
There are three levels of linguistic analysis: 语言分析分为三个级别：
Syntax — what is grammatically correct? 语法-语法上正确的是什么？ Semantics — what is the meaning? 语义学–是什么意思？ Pragmatics — what is the purpose or goal? 语用学–目的或目标是什么？ Drawing upon a programming analogy, Liang likens successful syntax to “no compiler errors,” semantics to “no implementation bugs,” and pragmatics to “implemented the right algorithm.” 根据编程的类比，梁将成功的语法比喻为“没有编译器错误”，语义比喻为“没有实现错误”，而语用比喻为“实现了正确的算法”。
He highlights that sentences can have the same semantics, yet different syntax, such as “3+2” versus “2+3”. Similarly, they can have identical syntax yet different syntax, for example 3/2 is interpreted differently in Python 2.7 vs Python 3. 他强调指出 ，句子可以具有相同的语义，但可以具有不同的语法，例如“ 3 + 2”和“ 2 + 3”。 同样，它们可以具有相同的语法，但是可以具有不同的语法，例如3/2在Python 2.7与Python 3中的解释不同。
Ultimately, pragmatics is key, since language is created from the need to motivate an action in the world. If you implement a complex neural network to model a simple coin flip, you have excellent semantics but poor pragmatics since there are a plethora of easier and more efficient approaches to solve the same problem. 归根结底，语用是关键，因为语言是出于激发世界行动的需要而创建的。 如果您使用复杂的神经网络对简单的硬币翻转进行建模，则您将拥有出色的语义，但实用主义却很差，因为存在许多解决同一问题的更简便，更有效的方法。
Plenty of other linguistics terms exist which demonstrate the complexity of language. Words take on different meanings when combined with other words, such as “light” versus “light bulb” (that is, multi-word expressions), or used in various sentences such as “I stepped into the light” and “the suitcase was light” (polysemy). 存在许多其他语言学术语，这些语言论证了语言的复杂性。 单词与其他单词结合使用时，具有不同的含义，例如“ light”和“ light bulb”(即多词表达)，或者在各种句子中使用，例如“ Isteping the light”和“手提箱是轻”(多义)。
Hyponymy shows how a specific instance is related to a general term (a cat is a mammal) and meronymy denotes that one term is a part of another (a cat has a tail). Such relationships must be understood to perform the task of textual entailment, recognizing when one sentence is logically entailed in another. “You’re reading this article” entails the sentence “you can read.” 副词表示特定实例与一般术语(猫是哺乳动物)之间的关系，而副词则表示一个术语是另一术语的一部分(猫有尾巴)。 必须理解这种关系以执行文本包含的任务，认识到一个句子在逻辑上包含在另一个句子中。 “您正在阅读本文”包含句子“您可以阅读”。
Aside from complex lexical relationships, your sentences also involve beliefs, conversational implicatures, and presuppositions. Liang provides excellent examples of each. Superman and Clark Kent are the same person, but Lois Lane believes Superman is a hero while Clark Kent is not. 除了复杂的词汇关系外，您的句子还涉及信念，会话含义和预设。 梁提供了很好的例子。 超人和克拉克·肯特是同一个人，但路易斯·莱恩(Lois Lane)认为超人是英雄，而克拉克·肯特不是。
If you say “Where is the roast beef?” and your conversation partner replies “Well, the dog looks happy”, the conversational implicature is the dog ate the roast beef. 如果您说“烤牛肉在哪里？” 并且您的对话伙伴回答“好吧，狗看起来很高兴”，对话的含义是狗吃了烤牛肉。
Presuppositions are background assumptions that are true regardless of the truth value of a sentence. “I have stopped eating meat” has the presupposition “I once ate meat” even if you inverted the sentence to “I have not stopped eating meat.” 预设是与句子的真值无关的真实背景假设。 即使您将句子改为“我没有停止吃肉”，“我已经停止吃肉”的前提还是“我曾经吃过肉”。
Adding to the complexity are vagueness, ambiguity, and uncertainty. Uncertainty is when you see a word you don’t know and must guess at the meaning. 模糊性，歧义性和不确定性增加了复杂性。 不确定性是当您看到一个您不知道并且必须猜测其含义的单词时。
If you’re stalking a crush on Facebook and their relationship status says “It’s Complicated”, you already understand vagueness. Richard Socher, Chief Scientist at Salesforce, gave an excellent example of ambiguity at a recent AI conference: “The question ‘can I cut you?’ means very different things if I’m standing next to you in line or if I am holding a knife.” 如果您对Facebook情有独钟，并且他们的关系状态显示“很复杂”，那么您已经了解了模糊性。 Salesforce的首席科学家Richard Socher在最近的AI会议上给出了一个模棱两可的很好的例子：“我能切你吗？” 如果我排在你旁边或者我拿着刀，那意味着完全不同的事情。”
Now that you’re more enlightened about the myriad challenges of language, let’s return to Liang’s four categories of approaches to semantic analysis in NLP and NLU. 既然您对语言的无数挑战有了更多的了解，那么让我们回到Liang在NLP和NLU中进行语义分析的四类方法。
1：分配方法 (1: Distributional Approaches)
Distributional approaches include the large-scale statistical tactics of machine learning and deep learning. These methods typically turn content into word vectors for mathematical analysis and perform quite well at tasks such as part-of-speech tagging (is this a noun or a verb?), dependency parsing (does this part of a sentence modify another part?), and semantic relatedness (are these different words used in similar ways?). These NLP tasks don’t rely on understanding the meaning of words, but rather on the relationship between words themselves. 分布方法包括机器学习和深度学习的大规模统计策略。 这些方法通常将内容转换为用于数学分析的单词向量，并且在诸如词性标注(这是名词还是动词？)，依存关系分析(句子的这一部分是否修改了另一部分？)之类的任务上表现出色。以及语义相关性(这些不同的词是否以类似的方式使用？)。 这些NLP任务不依赖于理解单词的含义，而是依赖于单词本身之间的关系。
Such systems are broad, flexible, and scalable. They can be applied widely to different types of text without the need for hand-engineered features or expert-encoded domain knowledge. The downside is that they lack true understanding of real-world semantics and pragmatics. Comparing words to other words, or words to sentences, or sentences to sentences can all result in different outcomes. 这样的系统是广泛的，灵活的和可扩展的。 它们可以广泛地应用于不同类型的文本，而无需手工设计的功能或专家编码的领域知识。 缺点是他们对真实世界的语义和语用缺乏真正的了解。 将单词与其他单词进行比较，或者将单词与句子进行比较，或者将句子与句子进行比较，都可能导致不同的结果。
Semantic similarity, for example, does not mean synonymy. A nearest neighbor calculation may even deem antonyms as related: 例如，语义相似性并不意味着同义词。 最近邻居计算甚至可以将反义词视为相关：
Advanced modern neural network models, such as the end-to-end attentional memory networks pioneered by Facebook or the joint multi-task model invented by Salesforce can handle simple question and answering tasks, but are still in early pilot stages for consumer and enterprise use cases. 先进的现代神经网络模型，例如Facebook倡导的端到端注意力记忆网络或Salesforce发明的联合多任务模型可以处理简单的问答任务，但仍处于消费者和企业使用的早期试验阶段案件。
Thus far, Facebook has only publicly shown that a neural network trained on an absurdly simplified version of The Lord of The Rings can figure out where the elusive One Ring is located. 到目前为止，Facebook仅公开表明 ，在荒谬的简化版《指环王》上训练的神经网络可以找出难以捉摸的“一环”的位置。
Although distributional methods achieve breadth, they cannot handle depth. Complex and nuanced questions that rely linguistic sophistication and contextual world knowledge have yet to be answered satisfactorily. 尽管分布方法可达到广度，但它们无法处理深度。 依赖于语言复杂性和上下文世界知识的复杂细微问题尚未得到令人满意的回答。
2：基于框架的方法 (2: Frame-Based Approach)
“A frame is a data-structure for representing a stereotyped situation,” explains Marvin Minsky in his seminal 1974 paper called “A Framework For Representing Knowledge.” Think of frames as a canonical representation for which specifics can be interchanged. Maven·明斯基(Marvin Minsky)在1974年开创性的论文 “代表知识的框架”中解释说：“框架是代表刻板印象的情况的数据结构。” 可以将框架视为可以互换细节的规范表示。
Liang provides the example of a commercial transaction as a frame. In such situations, you typically have a seller, a buyers, goods being exchanged, and an exchange price. 梁以商业交易为例提供了框架。 在这种情况下，您通常会有一个卖方，一个买方，正在交换的商品以及一个交换价格。
Sentences that are syntactically different but semantically identical — such as “Cynthia sold Bob the bike for $200” and “Bob bought the bike for$200 from Cynthia” — can be fit into the same frame. Parsing then entails first identifying the frame being used, then populating the specific frame parameters — i.e. Cynthia, $200. 在句法上不同但在语义上相同的句子(例如“ Cynthia以200美元的价格卖给Bob的自行车”和“ Bob以200美元的价格从Cynthia买来的自行车”)可以安装在同一框架中。 然后进行解析需要首先识别正在使用的帧，然后填充特定的帧参数，即Cynthia，$ 200。
The obvious downside of frames is that they require supervision. In some domains, an expert must create them, which limits the scope of frame-based approaches. Frames are also necessarily incomplete. Sentences such as “Cynthia visited the bike shop yesterday” and “Cynthia bought the cheapest bike” cannot be adequately analyzed with the frame we defined above. 框架的明显缺点是它们需要监督。 在某些领域，专家必须创建它们，这限制了基于框架的方法的范围。 框架也一定是不完整的。 上面定义的框架无法充分分析“ Cynthia昨天去过自行车商店”和“ Cynthia购买了最便宜的自行车”之类的句子。
3：模型理论方法 (3: Model-Theoretical Approach)
The third category of semantic analysis falls under the model-theoretical approach. To understand this approach, we’ll introduce two important linguistic concepts: “model theory” and “compositionality”. 第三类语义分析属于模型理论方法。 为了理解这种方法，我们将介绍两个重要的语言概念：“模型理论”和“组合性”。
Model theory refers to the idea that sentences refer to the world, as in the case with grounded language (i.e. the block is blue). In compositionality, meanings of the parts of a sentence can be combined to deduce the whole meaning. 模型理论指的是句子指的是世界，就像扎根的语言一样(例如，方框是蓝色的)。 在构词性上，可以将句子各部分的含义组合起来以推断出整个含义。
Liang compares this approach to turning language into computer programs. To determine the answer to the query “what is the largest city in Europe by population”, you first have to identify the concepts of “city” and “Europe” and funnel down your search space to cities contained in Europe. Then you would need to sort the population numbers for each city you’ve shortlisted so far and return the maximum of this value. 梁比较了将语言转换为计算机程序的方法。 要确定“人口最多的欧洲城市是多少”这一查询的答案，您首先必须确定“城市”和“欧洲”的概念，然后将搜索范围集中到欧洲所包含的城市。 然后，您需要对到目前为止已入围的每个城市的人口数量进行排序，并返回该值的最大值。
To execute the sentence “Remind me to buy milk after my last meeting on Monday” requires similar composition breakdown and recombination. 要执行“在周一的上次会议后提醒我购买牛奶”的句子，需要类似的成分分解和重组。
Models vary from needing heavy-handed supervision by experts to light supervision from average humans on Mechanical Turk. The advantages of model-based methods include full-world representation, rich semantics, and end-to-end processing, which enable such approaches to answer difficult and nuanced search queries. 模式从需要专家的严格监督到从普通人在Mechanical Turk上的轻度监督，不一而足。 基于模型的方法的优点包括全域表示，丰富的语义以及端到端处理，这使此类方法能够回答困难而细微的搜索查询。
The major con is that the applications are heavily limited in scope due to the need for hand-engineered features. Applications of model-theoretic approaches to NLU generally start from the easiest, most contained use cases and advance from there. 主要缺点是，由于需要手工设计的功能，因此应用程序的范围受到很大限制。 模型理论方法在NLU中的应用通常从最简单，包含最多的用例开始，然后从那里开始。
The holy grail of NLU is both breadth and depth, but in practice you need to trade off between them. Distributional methods have scale and breadth, but shallow understanding. Model-theoretical methods are labor-intensive and narrow in scope. Frame-based methods lie in between. NLU的圣杯既有广度又有深度，但是在实践中您需要在两者之间进行权衡。 分布方法具有规模和广度，但了解较浅。 模型理论方法的工作量大且范围狭窄。 基于框架的方法介于两者之间。
4：交互式学习方法 (4: Interactive Learning Approaches)
Paul Grice, a British philosopher of language, described language as a cooperative game between speaker and listener. Liang is inclined to agree. He believes that a viable approach to tackling both breadth and depth in language learning is to employ interactive, interactive environments where humans teach computers gradually. In such approaches, the pragmatic needs of language inform the development. 英国语言哲学家保罗·格里斯(Paul Grice)将语言描述为说话者和听者之间的合作游戏。 梁倾向于同意。 他认为解决语言学习广度和深度的一种可行方法是采用交互式的交互式环境，使人们逐步教计算机。 在这种方法中，语言的实用需求为开发提供了信息。
To test this theory, Liang developed SHRDLRN as a modern-day version of Winograd’s SHRDLU. In this interactive language game, a human must instruct a computer to move blocks from a starting orientation to an end orientation. The challenge is that the computer starts with no concept of language. Step by step, the human says a sentence and then visually indicates to the computer what the result of the execution should look like. 为了检验这一理论，梁启超开发了SHRDLRN作为Winograd的SHRDLU的现代版本。 在这种交互式语言游戏中，人类必须指示计算机将块从开始方向移动到结束方向。 挑战在于计算机从没有语言概念开始。 人们逐步说出一句话，然后以视觉方式向计算机指示执行结果应是什么样。
If a human plays well, he or she adopts consistent language that enables the computer to rapidly build a model of the game environment and map words to colors or positions. The surprising result is that any language will do, even individually invented shorthand notation, as long as you are consistent. 如果人类玩得很好，则他或她会采用一致的语言，使计算机能够快速构建游戏环境模型并将单词映射到颜色或位置。 令人惊讶的结果是，只要您保持一致，任何语言都可以使用，甚至是单独发明的速记符号。
The worst players who take the longest to train the computer often employ inconsistent terminology or illogical steps. 花费最长时间训练计算机的最糟糕的玩家通常会采用不一致的术语或不合逻辑的步骤。
Liang’s bet is that such approaches would enable computers to solve NLP and NLU problems end-to-end without explicit models. “Language is intrinsically interactive,” he adds. “How do we represent knowledge, context, memory? Maybe we shouldn’t be focused on creating better models, but rather better environments for interactive learning.” Liang的押注是，这种方法将使计算机能够在没有显式模型的情况下端到端解决NLP和NLU问题。 他补充说：“语言本质上是互动的。” “我们如何代表知识，背景，记忆？ 也许我们不应该专注于创建更好的模型，而应该专注于更好的交互式学习环境。”
Language is both logical and emotional. We use words to describe both math and poetry. Accommodating the wide range of our expressions in NLP and NLU applications may entail combining the approaches outlined above, ranging from the distributional / breadth-focused methods to model-based systems to interactive learning environments. 语言既逻辑又情感。 我们用单词来描述数学和诗歌。 要在NLP和NLU应用程序中适应我们广泛的表达方式，可能需要结合上面概述的方法，范围从以分布/广度为重点的方法到基于模型的系统再到交互式学习环境。
We may also need to re-think our approaches entirely, using interactive human-computer based cooperative learning rather than researcher-driven models. 我们可能还需要使用基于交互人机的协作学习而不是研究人员驱动的模型来重新思考我们的方法。
If you have a spare hour and a half, I highly recommend you watch Percy Liang’s entire talk which this summary article was based on: 如果您有一个半小时的空闲时间，我强烈建议您观看Percy Liang的整个演讲，该摘要文章基于以下内容：
A special thanks to Melissa Fabros for recommending Percy’s talk, Matthew Kleinsmith for highlighting the MIT Media Lab definition of “grounded” language, and Jeremy Howard and Rachel Thomas of fast.ai for facilitating our connection and conversation. 特别感谢Melissa Fabros推荐Percy的演讲， Matthew Kleinsmith强调了MIT媒体实验室对“基础”语言的定义，以及fast.ai的 Jeremy Howard和Rachel Thomas促进了我们的联系和对话。
If you enjoyed my article, join the TOPBOTS community and get the best bot news and exclusive industry content.
如果您喜欢我的文章，请加入TOPBOTS社区，并获取最佳的机器人新闻和独家行业内容。
翻译自: https://www.freecodecamp.org/news/how-natural-language-processing-powers-chatbots-4-common-approaches-a077a4de04d4/自然语言理解和自然语言处理
展开全文
• 自然语言是人类的智慧，自然语言处理（NLP）是AI中最为困难的问题之一，而自然语言理解（NLU）也变成了一个主要的问题，充满了魅力和挑战。一介程序员，没有能力去做相关的研究，但是认知其中的一些基本概念，对于...
• 本套课程 是针对人工智能领域--自然语言理解的入门视频讲解，介绍了python语言对自然语言处理的工具包以及自然语言处理的方法使用。本套课程真对具有python编程基础的同学，在有python编程的基础上学习本套视频课程...
• 在文章的开头，我必须说明，自然语言理解的定义、理论在网上有太多不同的说法，我在这里给出的是我个人认为比较好理解、能梳理清楚各个子领域的一种概述，如果有哪里出错了麻烦指正。 所谓自然语言理解，就是希望...
• Artificial Intelligence (AI) 人工智能 第十章自然语言理解 Agent Agent的定义 定义1 社会中某个个体经过协商后可求得问题的解这个个体就是agent.明斯基1986年 定义2 是一种通过传感器知其环境并通过执行器作用于该...
• 自然语言理解 （第二版） 自然语言理解 （第二版）自然语言理解 （第二版）
• 自然语言理解句法剖析算法.ppt
• 人工智能 Artificial Intelligence Artificial Intelligence NLP: 1 自然语言理解 The Principles of Al NLP: 2 本章主要内容 自然语言理解的一般问题 词法分析 句法分析 语义分析 大规模真实文本的处理 Web信息抽取...
• 自然语言理解（NLU）系统是问答系统、聊天机器人等更高级应用的基石。基本的NLU工具，包括实体识别和意图识别两个任务。
• 日录 第一章绪论 第二章知识表示 第三章搜索技术 第四章推理技术 第五章机器学习 第六章专家系统 第七章自动规划系统 第八章自然语言理解 第九章智能控制 第十章人工智能程序设计 8.1语言及其理解的一般问题 81.1...
• 这是中科院自动化所宗成庆的《自然语言理解》的讲义
• 人工智能导论自然语言理解自然语言理解 语言是人类沟通交流的工具是人类文明进步的成果也是人们思维的载体所以对于语言的理解和处理非常重要在计算机领域自然语言理解和处理就是让计算机理解并生成人类的语言从而和...
• 自然语言理解（Natural Language Understanding，简称NLU）技术，涵盖领域非常广泛，包括句子检测，分词，词性标注，句法分析，文本分类/聚类，文字角度，信息抽取/自动摘要，机器翻译，自动问答，文本生成等多个...
• 宗庆成大佬的自然语言理解课程全部课件，这个应该目前网上的都没这个新了。。。
• 用于自然语言理解的编译模型的实体属性框架
• ## NLU自然语言理解

千次阅读 2016-07-24 22:52:10
自然语言理解（Natural Language Understanding，简称NLU）技术，涵盖领域非常广泛，包括句子检测，分词，词性标注，句法分析，文本分类/聚类，文字角度，信息抽取/自动摘要，机器翻译，自动问答，文本生成等多个...
• 1)自然语言不仅仅应当看做简单的数据，它更应当看作是"人脑"这台机器的“程序”；（当下的很多方法忽略了语言的控制功能只看到...自然语言理解的过程:  语言--》通过已有的知识体系与偏好进行翻译--》根据对应的信
• 自然语言理解，是Natural Language Understanding，简称NLU。（图1） （图2） 1、概括来说，NLP，除了NLU（图中红框部分），还包含理解之前的处理阶段、和理解之后的应用阶段。也就是说，NLU是NLP的子集——他们...

...