• Instead of just fixing this specific issue, might it be possible to make the formatter or it's callers more robust. I would prefer an unformatted sql to an exception. <p>Full code: </p><pre><code>...
• t let you use reserved words like delete as identifier. I'm pretty sure that's the correct behavior too. <p>I'm just learning my way around your new 'types' feature so I'd ...
• <div><p>While doing some general code cleanup on Npgsql, I came across a bit of strangeness in the PostGIS types; the types seem to be neither truly mutable, nor truly immutable. <p>For example, ...
• <div><p>In the Netherlands we have surnames which consist of multiple words. For example, "van den Berg" is a really common name. So common, it has it's ...
• <p>I have my nest hooked up but only after adding this skill, so echo is able to work with extra temperature words via the cloud, i.e. "set lounge to 20 degrees". Lounge being the device. <p>...
• In other words, it would be a Viper consistency error or a Java exception if not for the fold-unfold pass. <p>It is very easy to encode all unsupported types using the <em>same</em> abstract ...
• words, with single word Search Strings it is ok) 2. 3. What is the expected output? What do you see instead? If you put "The Beatles" Type "Music" File Types mp3 and wma in Google ...
• <div><p>I have been using the Visual Bag of Words to identify different types of standard scanned documents (the goal being to sort the 4 different with the possibility of using this classification to...
• s not on the safelist of known native error types</li><li>Grab the tamper-with-able <code>.constructor</code> property and compare that to a safelist</li><li>Work with the JS spec folks to store the ...
• <div><p>I find myself using things like <code>qq-enum-networks-masscan</code> but having trouble remembering exactly the set of words to go between <code>qq</code> and <code>masscan</code>. Equally, ...
• In various tasks such as information retrieval, document clustering, word-sense disambiguation, machine translation and text summarization, it is essential to measure the similarity between words, sen...
In various tasks such as information retrieval, document clustering, word-sense disambiguation, machine translation and text summarization, it is essential to measure the similarity between words, sentences, paragraphs and documents. This post discusses the three different types of text similarity approaches: String-based, Corpus-based and Knowledge based. Furthermore, some example implementations using python libraries of some approaches are shown.
在不同种任务中，比如信息检索，文档归类，词义消歧，机器翻译和文本摘要，计算词之间，段落之间和文本之间的相似度是非常重要的。本文讨论了三种不同类型的文本相似性方法：基于字符串，基于语料库和基于知识。 此外，示出了使用一些方法的python库的一些示例实现。

String-Based Similarity 基于字符串

A string similarity or distance takes into account the degree to which two strings match with each other.字符串相似性或距离考虑了两个字符串彼此匹配的程度。
String-Based Similarity can be further classified as Character-Based Similarity Measures and Corpus-Based Similarity
基于字符串的相似性可以进一步分类为基于字符的相似性度量和基于语料库的相似性.
1 Character-Based Similarity Measure
LCS is a common example of Character-Based Similarity Measure
Longest Common SubString (LCS) algorithm considers the maximum length of contiguous chain of characters that exist in both strings.最长公共子串（LCS）算法考虑两个字符串中存在的连续字符串的最大长度。
def longestSubstring(str1,str2):
seqMatch = SequenceMatcher(None,str1,str2)

match = seqMatch.find_longest_match(0, len(str1), 0, len(str2))
if (match.size!=0):
print (str1[match.a: match.a + match.size])
else:
print ('None')
sent1 = "It might help to study nlp if possible."
sent2 = "It can help to play football again if possible."
print('longest substring between sent1 and sent2 : ',sent_1_2)

The output:
longest substring between sent1 and sent2 : if possible

Another example of Character-Based Similarity Measure is Levenshtein edit distance. It defines distance between two strings by counting the minimum number of operations(insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters) needed to transform one string into the other.
基于角色的相似性度量的另一个例子是Levenshtein编辑距离。 它通过计算将一个字符串转换为另一个字符串所需的最小操作数（插入，删除或替换单个字符，或两个相邻字符的转置）来定义两个字符串之间的距离。
sent1 = "It might help to study nlp if possible."
sent2 = "It can help to play football again if possible."
sent_1_2 = nltk.edit_distance(sent1, sent2)
print(sent_1_2, 'Edit Distance between sent1 and sent2')

The output:
22 Edit Distance between sent1 and sent2

Term-based Similarity Measures
Cosine similarity is a measure of similarity between two vectors that measures the cosine of the angle between them.
Euclidean distance or L2 distance is the square root of the sum of squared differences between corresponding elements of the two vectors
余弦相似度是两个矢量之间相似性的度量，测量它们之间角度的余弦。
欧几里德距离或L2距离是两个矢量的相应元素之间的平方差之和的平方根

def compute_vectors(*strs):
text = [t for t in strs]
vectorizer = CountVectorizer(text)
vectorizer.fit(text)
return vectorizer.transform(text).toarray()
def compute_cosine_sim(*strs):
vectors = [t for t in compute_vectors(*strs)]
return cosine_similarity(vectors)
def compute_euc_dis(*strs):
vectors = [t for t in compute_vectors(*strs)]
return euclidean_distances(vectors)
sent1 = "It might help to study nlp if possible."
sent2 = "It can help to play football again if possible."
print("cosine_sim",compute_cosine_sim(s1,s2))
print("euclidean_dis",compute_euc_dis(s1,s2))

The output :
cosine_sim [[1.         0.58925565]
[0.58925565 1.        ]]
euclidean_dis [[0.        , 2.64575131],
[2.64575131, 0.        ]]

Dice’s coefficient is defined as twice the number of common terms in the two strings divided by the total number of terms in both strings  定义为两个字符串中常用术语数量的两倍除以两个字符串中术语的总数
Jaccard similarity is computed as the number of shared terms over union of all the terms in both strings计算为两个字符串中所有项的并集的共享项数
Overlap coefficient considers two strings a full match if one is a subset of the other.如果一个是另一个的子集，则认为两个字符串是完全匹配的。

def compute_jaccard_sim(str1, str2):
a = set(str1.split())
b = set(str2.split())
c = a.intersection(b)
return float(len(c)) / (len(a) + len(b) - len(c))
def compute_dice_sim(str1, str2):
a = set(str1.split())
b = set(str2.split())
c = a.intersection(b)
return 2*float(len(c)) / (len(a) + len(b))
def compute_overlap_sim(str1, str2):
a = set(str1.split())
b = set(str2.split())
c = a.intersection(b)
return float(len(c)) / min(len(a) , len(b) )
sent1 = "It might help to study nlp if possible."
sent2 = "It can help to play football again if possible."
print("jaccard: "compute_jaccard_sim(sent1, sent2)
print("dice: ",compute_dice_sim(sent1, sent2)
print("overlap: ",compute_overlap_sim(sent1, sent2)

The output :
jaccard: 0.4166666666666667
dice: 0.5882352941176471
overlap: 0.625

2 Corpus-Based Similarity
Corpus-Based similarity determines the semantic similarity between words according to information gained from a large corpora. Pointwise Mutual Information is an example of corpus based similarity.
Pointwise Mutual Information — Information Retrieval is a method for computing the similarity between pairs of words The more often two words co-occur near each other on a web page, the higher is their PMI-IR similarity score.
基于语料库的相似性根据从大型语料库获得的信息确定单词之间的语义相似性。 Pointwise Mutual Information是基于语料库的相似性的一个例子。
点状互信息 - 信息检索是一种计算词对之间相似性的方法。网页上两个词彼此相近出现的次数越多，他们的PMI-IR相似度得分就越高。

text = “this is a foo bus red car foo bus bus blue car foo bar bar red car shep bus bus blue”
bigram_measures = nltk.collocations.BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(word_tokenize(text))
for i in finder.score_ngrams(bigram_measures.pmi):
print(i)

The output :
(('is', 'a'), 4.392317422778761)
(('this', 'is'), 4.392317422778761)
(('a', 'foo'), 2.8073549220576046)
(('car', 'shep'), 2.8073549220576046)
(('red', 'car'), 2.8073549220576046)
(('bar', 'bar'), 2.3923174227787607)
(('bar', 'red'), 2.3923174227787607)
(('car', 'foo'), 2.222392421336448)
(('shep', 'bus'), 2.0703893278913985)
(('bus', 'blue'), 2.070389327891398)
(('blue', 'car'), 1.8073549220576046)
(('foo', 'bar'), 1.8073549220576046)
(('foo', 'bus'), 1.485426827170242)
(('bus', 'red'), 1.070389327891398)
(('bus', 'bus'), 0.7484612330040363)

Knowledge-Based Similarity
Knowledge-Based Similarity measures the degree of similarity between words using information derived from semantic networks. WordNet is the most popular semantic network. It is a large lexical database of English words tagged as Nouns, verbs, adjectives and adverbs and the words are grouped into sets of synonyms (synsets), each expressing a distinct concept.基于知识的相似性使用从语义网络导出的信息来测量单词之间的相似度。 WordNet是最流行的语义网络。 它是一个大型的词汇数据库，英文单词被标记为名词，动词，形容词和副词，这些词被分为同义词集（同义词集），每个都表达一个独特的概念。

Resnik Similarity is based on the Information Content (IC) of the Least Common Subsumer (lowest node in the hierarchy that is a hypernymn).Resnik相似性基于最小公共子集的信息内容（IC）（层次结构中的最低节点是一个超级节点）。

Jiang-Conrath Similarity is based on the Information Content (IC) of the Least Common Subsumer and that of the two input Synsets.
Jiang-Conrath相似性基于最小公共Subsumer的信息内容（IC）和两个输入Synset的信息内容。

Lin Similarity is based on the Information Content (IC) of the Least Common Subsumer and that of the two input Synsets.
Lin Similarity基于最小公共Subsumer的信息内容（IC）和两个输入Synset的信息内容。

#retrieving IC of the brown corpus
from nltk.corpus import wordnet_ic
from nltk.corpus import wordnet as wn
brown_ic = wordnet_ic.ic('ic-brown.dat')
#looking up noun words 'rat' and 'lion' using synset()
rat = wn.synset('rat.n.01')
lion = wn.synset('lion.n.01')
print("resnick: "rat.res_similarity(lion, genesis_ic))
print("jc: "rat.res_similarity(lion, genesis_ic))
print("lin: "rat.res_similarity(lion, genesis_ic))

The output:
resnick: 4.665415658815678
jc: 0.08207149300038069
lin: 0.5288091238271396

References
A Survey of Text Similarity Approaches, . Gomaa and Fahmy, International Journal of Computer Applications
http://www.nltk.org/howto/wordnet.html
nlp course slides - IIT Gandhinagar (https://sites.google.com/a/iitgn.ac.in/nlp-2018/)


展开全文
• s a question of how to represent an account that is derived from your main seed words vs from a wallet file. <h2>Proposal: Organized vaults <p>The account list section could be divided into sections. ...
• New behaviour is to search for any words. "avengers x-men" now finds all avengers <em>and</em> all x-men books. Too many results. The third behaviour should be to search for all words. "...
• <p>In other words: If we don't specify what we're sending, NCC will first try to guess what the type is, and then proceed accordingly if successful at doing so. <p>But if we DO specify what we...
• <p>Is it possible that there is some kind of derefencing error in the compiled libsumojni.dll in other words in the c++ world? The size of the StringVector ContinuationLanes is way to large. ...
• <p>I have a question on the word segmentation methods on the contexts of mentions. I noticed that you apply two different methds. In the first method, you split the words from <code>m['context'...
• <p>In my opinion, this definition of equality is wrong, because it does not include the type of the collection. I would consider this a bug. It is possible to create and use an own comparer or an own ...
• <p>In other words, lots of types. <p>Is there a possibility to extend the library in such a way that it can support any type? For example, I would like to put a time / date field in an appropriate ...
• 值类型A value type is either a struct type or anenumeration... C# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words..1 The Syste...
值类型A value type is either a struct type or anenumeration type. C# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words..1 The System.ValueType typeSystem.ValueTypeSystem.ValueTypeSystem.ValueTypeSystem.ValueTypeAll value types implicitly inherit from theclass System.ValueType, which, in turn, inherits from class object. It is not possible for any type to derive from a value type, andvalue types are thus implicitly sealed (§10.1.1.2).Note that System.ValueType is not itself a value-type. Rather, it is a class-type from which all value-types are automatically derived.A struct type is a value type that candeclare constants, fields, methods, properties, indexers, operators, instanceconstructors, static constructors, and nested types. The declaration of struct typesis described in §11.1.C# provides a set of predefined structtypes called the simple types. The simple types areidentified through reserved words, but these reserved words are simply aliasesfor predefined struct types in the Systemnamespace, as described in the table below. Reserved  word  Aliased  type sbyte  System.SByte byte  System.Byte short  System.Int16 ushort  System.UInt16 int  System.Int32 uint  System.UInt32 long  System.Int64 ulong  System.UInt64 char  System.Char float  System.Single double  System.Double bool  System.Boolean decimal  System.Decimal  Because a simple type aliases a structtype, every simple type has members. For example, int has the members declared in System.Int32and the members inherited from System.ObjectC# supports nine integral types: sbyte, byte, short, ushort, int, uint, long, ulong, and char. C# supports two floating point types: float and double.The decimal type is a 128-bitdata type suitable for financial and monetary calculations.The bool type representsboolean logical quantities. An enumeration type is a distinct type withnamed constants. Every enumeration type has an underlying type, which must be byte, sbyte, short, ushort, int, uint, long or ulong. The set of valuesof the enumeration type is the same as the set of values of the underlyingtype. Values of the enumeration type are not restricted to the values of thenamed constants. Enumeration types are defined through enumeration declarations A nullable type can represent all values ofits underlying type plus an additional null value. Anullable type is written T?, where T is the underlying type. This syntaxis shorthand for System.Nullable<T>, and the two forms can be used interchangeably.A non-nullable value typeconversely is any value type other than System.Nullable<T> and its shorthand T? (for any T), plus any type parameter that is constrained to be a non-nullablevalue type (that is, any type parameter with a structconstraint). The System.Nullable<T> type specifies the value type constraint for T (§10.1.5),which means that the underlying type of a nullable type can be anynon-nullable value type. The underlying type of a nullable type cannot be anullable type or a reference type. For example, int?? and string? are invalidtypes.An instance of a nullable type T? has two public read-only properties:·        A HasValue property of type bool·        A Value property of type T引用类型  reference typeClass  type  Description System.Object  The ultimate base class of all other types.  See §4.2.2. System.String  The string type of the C# language. See §4.2.4. System.ValueType  The base class of all value types. See §4.1.1. System.Enum  The base class of all enum types. See §14. System.Array  The base class of all array types. See §12. System.Delegate  The base class of all delegate types. See §15. System.Exception  The base class of all exception types. See §16. A reference type is a class type, aninterface type, an array type, or a delegate type. 1.1.1 The object typeThe object classtype is the ultimate base class of all other types. Every type in C# directlyor indirectly derives from the object classtype.The keyword object is simply an alias for the predefined class System.Object.1.1.2 Thedynamic typeThe dynamictype, like object, can reference any object. Whenoperators are applied to expressions of type dynamic,their resolution is deferred until the program is run. Thus, if the operatorcannot legally be applied to the referenced object, no error is given duringcompilation. Instead an exception will be thrown when resolution of the operatorfails at run-time.The dynamic type is further described in §4.7, anddynamic binding in §7.2.2.1.1.3 The string typeThe string typeis a sealed class type that inherits directly from object. Instances of the string class representUnicode character strings.Values of the string type can be written as string literals (§2.4.4.5).The keyword string is simply an alias for the predefined class System.String.1.1.4 InterfacetypesAn interface defines a contract. A class orstruct that implements an interface must adhere to its contract. An interfacemay inherit from multiple base interfaces, and a class or struct may implementmultiple interfaces.Interface types are described in §13.1.1.5 ArraytypesAn array is a data structure that containszero or more variables which are accessed through computed indices. Thevariables contained in an array, also called the elements of the array, are allof the same type, and this type is called the element type of the array.Array types are described in §12.1.1.6 DelegatetypesA delegate is a data structure that refersto one or more methods. For instance methods, it also refers to their correspondingobject instances.The closest equivalent of a delegate in Cor C++ is a function pointer, but whereas a function pointer can only referencestatic functions, a delegate can reference both static and instance methods. Inthe latter case, the delegate stores not only a reference to the method’s entrypoint, but also a reference to the object instance on which to invoke themethod.Delegate types are described in §15.
展开全文
• <p>While and I were trying to solve a potential triggering problem, we began refactoring a small portion of <code>Words.pm</code>. <p>The triple ternary used to check for trigger matches has been ...
• Prior to this there was no mention of implicit types in the data model document, and the understanding (at least by many in the group if not all) was that all the types are explicit i.e. the data ...
• 11.1.3 Simple typesC# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words, but these reserved words are simply aliases for predefin
11.1.3 Simple typesC# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words, but these reserved words are simply aliases for predefined struct types in the Systemnamespace, as described in the table below.Reserved word Aliased typesbyte System.SBytebyte System.Byteshort System.Int16ushort System.UInt16int System.Int32uint System.UInt32long System.Int64ulong System.UInt64char System.Charfloat System.Singledouble System.Doublebool System.Booleandecimal System.DecimalBecause a simple type aliases a struct type, every simple type has members. [Example: For example, inthas the members declared in System.Int32 and the members inherited from System.Object, and thefollowing statements are permitted:int i = int.MaxValue; // System.Int32.MaxValue constantstring s = i.ToString(); // System.Int32.ToString() instance methodstring t = 123.ToString(); // System.Int32.ToString() instance methodend example] The simple types differ from other struct types in that they permit certain additionaloperations:?Most simple types permit values to be created by writing literals (?.4.4). [Example: For example, 123is a literal of type int and 抋?is a literal of type char. end example] C# makes no provision for literalsof struct types in general, and non-default values of other struct types are ultimately always createdthrough instance constructors of those struct types.?When the operands of an expression are all simple type constants, the compiler evaluates the expressionat compile-time. Such an expression is known as a constant-expression (?4.15). Expressions involvingoperators defined by other struct types are not considered to be constant expressions.?Through const declarations, it is possible to declare constants of the simple types (?7.3). It is notpossible to have constants of other struct types, but a similar effect is provided by static readonlyfields.?Conversions involving simple types can participate in evaluation of conversion operators defined byother struct types, but a user-defined conversion operator can never participate in evaluation of anotheruser-defined operator (?3.4.2).
展开全文
• Thinking with Types started, as so many of my projects do, accidentally. I was unemployed, bored, and starting to get tired of answering the same questions over and over again in Haskell chat-rooms. ...
• In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and). <p>A value would have the same shape (also like tuples), with a value of appropriate type in ...
• </code> - enables feature engineering tied to information from maps, like big circle distances, distances to city centre, city, state, distances to POIs of different types, etc * <code>"NLP"...
• 11.1 Value typesA value type is either a struct type or an enumeration type. C# provides a set of predefined struct typescalled the simple types. The simple types are identified through reserved words
11.1 Value typesA value type is either a struct type or an enumeration type. C# provides a set of predefined struct typescalled the simple types. The simple types are identified through reserved words.value-type:struct-typeenum-typestruct-type:type-namesimple-typesimple-type:numeric-typeboolnumeric-type:integral-typefloating-point-typedecimalintegral-type:sbytebyteshortushortintuintlongulongcharC# LANGUAGE SPECIFICATION90floating-point-type:floatdoubleenum-type:type-nameAll value types implicitly inherit from class object. It is not possible for any type to derive from a valuetype, and value types are thus implicitly sealed (§17.1.1.2).A variable of a value type always contains a value of that type. Unlike reference types, it is not possible fora value of a value type to be null, or to reference an object of a more derived type.Assignment to a variable of a value type creates a copy of the value being assigned. This differs fromassignment to a variable of a reference type, which copies the reference but not the object identified by thereference.
展开全文
• Currently this PR introduces fewer new words than #332, but I expect that as the number of potentially-compressed media types grows this approach will become more tedious. But we may not see further ...
• <p>In addition there needs to be some modification to the existing types as a small number of types currently have clashing names (which is how we found this problem in the CTS).</p><p>该提问来源于...
• <p>My solution is straight forward: instead of appending all state machine types, insert empty tuples type right before any generic types. <p>For instance, suppose we have a generic type: </p><pre>...

...