NH-3549 - BasicFormatter throws exceptions for certain types of data containing "signal words2021-01-02 09:13:19Instead of just fixing this specific issue, might it be possible to make the formatter or it's callers more robust. I would prefer an unformatted sql to an exception. <p>Full code: </p><pre><code>...
Use of reserved words as identifiers2020-12-27 05:58:28t let you use reserved words like delete as identifier. I'm pretty sure that's the correct behavior too. <p>I'm just learning my way around your new 'types' feature so I'd ...
Immutability of PostGIS types?2021-01-04 16:28:33<div><p>While doing some general code cleanup on Npgsql, I came across a bit of strangeness in the PostGIS types; the types seem to be neither truly mutable, nor truly immutable. <p>For example, ...
Author surnames which consist of multiple words not supported2021-01-09 02:59:01<div><p>In the Netherlands we have surnames which consist of multiple words. For example, "van den Berg" is a really common name. So common, it has it's ...
Other types of2020-11-28 02:39:39<p>I have my nest hooked up but only after adding this skill, so echo is able to work with extra temperature words via the cloud, i.e. "set lounge to 20 degrees". Lounge being the device. <p>...
Fix encoding of unsupported types2020-12-26 21:22:38In other words, it would be a Viper consistency error or a Java exception if not for the fold-unfold pass. <p>It is very easy to encode all unsupported types using the <em>same</em> abstract ...
Problem with Search String consisting of two words on Mac OS X2020-12-08 23:59:28words, with single word Search Strings it is ok) 2. 3. What is the expected output? What do you see instead? If you put "The Beatles" Type "Music" File Types mp3 and wma in Google ...
Issue using visual bag of words with large images2021-01-10 11:00:47<div><p>I have been using the Visual Bag of Words to identify different types of standard scanned documents (the goal being to sort the 4 different with the possibility of using this classification to...
Allow structured cloning of native error types2021-01-10 19:27:20s not on the safelist of known native error types</li><li>Grab the tamper-with-able <code>.constructor</code> property and compare that to a safelist</li><li>Work with the JS spec folks to store the ...
Completion of qq commands without intermediate words2020-12-09 04:43:21<div><p>I find myself using things like <code>qq-enum-networks-masscan</code> but having trouble remembering exactly the set of words to go between <code>qq</code> and <code>masscan</code>. Equally, ...
Different types of Text Similarity Approaches2019-08-20 09:18:10In various tasks such as information retrieval, document clustering, word-sense disambiguation, machine translation and text summarization, it is essential to measure the similarity between words, sen...
In various tasks such as information retrieval, document clustering, word-sense disambiguation, machine translation and text summarization, it is essential to measure the similarity between words, sentences, paragraphs and documents. This post discusses the three different types of text similarity approaches: String-based, Corpus-based and Knowledge based. Furthermore, some example implementations using python libraries of some approaches are shown.
- String-Based Similarity 基于字符串
A string similarity or distance takes into account the degree to which two strings match with each other.字符串相似性或距离考虑了两个字符串彼此匹配的程度。
String-Based Similarity can be further classified as Character-Based Similarity Measures and Corpus-Based Similarity
LCS is a common example of Character-Based Similarity Measure
Longest Common SubString (LCS) algorithm considers the maximum length of contiguous chain of characters that exist in both strings.最长公共子串（LCS）算法考虑两个字符串中存在的连续字符串的最大长度。
def longestSubstring(str1,str2): seqMatch = SequenceMatcher(None,str1,str2) match = seqMatch.find_longest_match(0, len(str1), 0, len(str2)) if (match.size!=0): print (str1[match.a: match.a + match.size]) else: print ('None') sent1 = "It might help to study nlp if possible." sent2 = "It can help to play football again if possible." print('longest substring between sent1 and sent2 : ',sent_1_2)
The output: longest substring between sent1 and sent2 : if possible
Another example of Character-Based Similarity Measure is Levenshtein edit distance. It defines distance between two strings by counting the minimum number of operations(insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters) needed to transform one string into the other.
sent1 = "It might help to study nlp if possible." sent2 = "It can help to play football again if possible." sent_1_2 = nltk.edit_distance(sent1, sent2) print(sent_1_2, 'Edit Distance between sent1 and sent2')
The output: 22 Edit Distance between sent1 and sent2
Cosine similarity is a measure of similarity between two vectors that measures the cosine of the angle between them.
Euclidean distance or L2 distance is the square root of the sum of squared differences between corresponding elements of the two vectors
def compute_vectors(*strs): text = [t for t in strs] vectorizer = CountVectorizer(text) vectorizer.fit(text) return vectorizer.transform(text).toarray() def compute_cosine_sim(*strs): vectors = [t for t in compute_vectors(*strs)] return cosine_similarity(vectors) def compute_euc_dis(*strs): vectors = [t for t in compute_vectors(*strs)] return euclidean_distances(vectors) sent1 = "It might help to study nlp if possible." sent2 = "It can help to play football again if possible." print("cosine_sim",compute_cosine_sim(s1,s2)) print("euclidean_dis",compute_euc_dis(s1,s2))
The output : cosine_sim [[1. 0.58925565] [0.58925565 1. ]] euclidean_dis [[0. , 2.64575131], [2.64575131, 0. ]]
Dice’s coefficient is defined as twice the number of common terms in the two strings divided by the total number of terms in both strings 定义为两个字符串中常用术语数量的两倍除以两个字符串中术语的总数
Jaccard similarity is computed as the number of shared terms over union of all the terms in both strings计算为两个字符串中所有项的并集的共享项数
Overlap coefficient considers two strings a full match if one is a subset of the other.如果一个是另一个的子集，则认为两个字符串是完全匹配的。
def compute_jaccard_sim(str1, str2): a = set(str1.split()) b = set(str2.split()) c = a.intersection(b) return float(len(c)) / (len(a) + len(b) - len(c)) def compute_dice_sim(str1, str2): a = set(str1.split()) b = set(str2.split()) c = a.intersection(b) return 2*float(len(c)) / (len(a) + len(b)) def compute_overlap_sim(str1, str2): a = set(str1.split()) b = set(str2.split()) c = a.intersection(b) return float(len(c)) / min(len(a) , len(b) ) sent1 = "It might help to study nlp if possible." sent2 = "It can help to play football again if possible." print("jaccard: "compute_jaccard_sim(sent1, sent2) print("dice: ",compute_dice_sim(sent1, sent2) print("overlap: ",compute_overlap_sim(sent1, sent2)
The output : jaccard: 0.4166666666666667 dice: 0.5882352941176471 overlap: 0.625
Corpus-Based similarity determines the semantic similarity between words according to information gained from a large corpora. Pointwise Mutual Information is an example of corpus based similarity.
Pointwise Mutual Information — Information Retrieval is a method for computing the similarity between pairs of words The more often two words co-occur near each other on a web page, the higher is their PMI-IR similarity score.
基于语料库的相似性根据从大型语料库获得的信息确定单词之间的语义相似性。 Pointwise Mutual Information是基于语料库的相似性的一个例子。
点状互信息 - 信息检索是一种计算词对之间相似性的方法。网页上两个词彼此相近出现的次数越多，他们的PMI-IR相似度得分就越高。
text = “this is a foo bus red car foo bus bus blue car foo bar bar red car shep bus bus blue” bigram_measures = nltk.collocations.BigramAssocMeasures() finder = BigramCollocationFinder.from_words(word_tokenize(text)) for i in finder.score_ngrams(bigram_measures.pmi): print(i)
The output : (('is', 'a'), 4.392317422778761) (('this', 'is'), 4.392317422778761) (('a', 'foo'), 2.8073549220576046) (('car', 'shep'), 2.8073549220576046) (('red', 'car'), 2.8073549220576046) (('bar', 'bar'), 2.3923174227787607) (('bar', 'red'), 2.3923174227787607) (('car', 'foo'), 2.222392421336448) (('shep', 'bus'), 2.0703893278913985) (('bus', 'blue'), 2.070389327891398) (('blue', 'car'), 1.8073549220576046) (('foo', 'bar'), 1.8073549220576046) (('foo', 'bus'), 1.485426827170242) (('bus', 'red'), 1.070389327891398) (('bus', 'bus'), 0.7484612330040363)
Knowledge-Based Similarity measures the degree of similarity between words using information derived from semantic networks. WordNet is the most popular semantic network. It is a large lexical database of English words tagged as Nouns, verbs, adjectives and adverbs and the words are grouped into sets of synonyms (synsets), each expressing a distinct concept.基于知识的相似性使用从语义网络导出的信息来测量单词之间的相似度。 WordNet是最流行的语义网络。 它是一个大型的词汇数据库，英文单词被标记为名词，动词，形容词和副词，这些词被分为同义词集（同义词集），每个都表达一个独特的概念。
Resnik Similarity is based on the Information Content (IC) of the Least Common Subsumer (lowest node in the hierarchy that is a hypernymn).Resnik相似性基于最小公共子集的信息内容（IC）（层次结构中的最低节点是一个超级节点）。
Jiang-Conrath Similarity is based on the Information Content (IC) of the Least Common Subsumer and that of the two input Synsets.
Lin Similarity is based on the Information Content (IC) of the Least Common Subsumer and that of the two input Synsets.
#retrieving IC of the brown corpus from nltk.corpus import wordnet_ic from nltk.corpus import wordnet as wn brown_ic = wordnet_ic.ic('ic-brown.dat') #looking up noun words 'rat' and 'lion' using synset() rat = wn.synset('rat.n.01') lion = wn.synset('lion.n.01') print("resnick: "rat.res_similarity(lion, genesis_ic)) print("jc: "rat.res_similarity(lion, genesis_ic)) print("lin: "rat.res_similarity(lion, genesis_ic))
The output: resnick: 4.665415658815678 jc: 0.08207149300038069 lin: 0.5288091238271396
A Survey of Text Similarity Approaches, . Gomaa and Fahmy, International Journal of Computer Applications
nlp course slides - IIT Gandhinagar (https://sites.google.com/a/iitgn.ac.in/nlp-2018/)
Allow creation and import of multiple wallet types2021-01-10 12:55:36s a question of how to represent an account that is derived from your main seed words vs from a wallet file. <h2>Proposal: Organized vaults <p>The account list section could be divided into sections. ...
Really need 3 types of searches.2020-12-05 13:27:04New behaviour is to search for any words. "avengers x-men" now finds all avengers <em>and</em> all x-men books. Too many results. The third behaviour should be to search for all words. "...
Attempt casting of simple number types before throwing TypeError2020-12-09 12:19:30<p>In other words: If we don't specify what we're sending, NCC will first try to guess what the type is, and then proceed accordingly if successful at doing so. <p>But if we DO specify what we...
Retrieval of complex types with libsumo in java (TraCIBestLanesData struct)2020-12-05 02:18:38<p>Is it possible that there is some kind of derefencing error in the compiled libsumojni.dll in other words in the c++ world? The size of the StringVector ContinuationLanes is way to large. ...
Why use two types of word segmentation methods?2020-12-08 22:52:00<p>I have a question on the word segmentation methods on the contexts of mentions. I noticed that you apply two different methds. In the first method, you split the words from <code>m['context'...
CollectionAssert.AreEqual does not compare the types of the collections2021-01-11 12:01:15<p>In my opinion, this definition of equality is wrong, because it does not include the type of the collection. I would consider this a bug. It is possible to create and use an own comparer or an own ...
Supported types2020-12-07 08:02:26<p>In other words, lots of types. <p>Is there a possibility to extend the library in such a way that it can support any type? For example, I would like to put a time / date field in an appropriate ...
Upgrades node_modules @types definition of d3, google.visualization & selenium-webdriver2020-11-29 16:34:43<p>This PR Upgrades <strong>types definition of d3, google.visualization & selenium-webdriver To view the status of library upgrade please visit ;0">here</a></p> <p>for testing doc please visit ;...
4. Types2018-06-15 11:35:43值类型A value type is either a struct type or anenumeration... C# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words..1 The Syste...
A value type is either a struct type or anenumeration type. C# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words.
All value types implicitly inherit from theclass System.ValueType, which, in turn, inherits from class object. It is not possible for any type to derive from a value type, andvalue types are thus implicitly sealed (§10.1.1.2).
Note that System.ValueType is not itself a value-type. Rather, it is a class-type from which all value-types are automatically derived.
A struct type is a value type that candeclare constants, fields, methods, properties, indexers, operators, instanceconstructors, static constructors, and nested types. The declaration of struct typesis described in §11.1.
C# provides a set of predefined structtypes called the simple types. The simple types areidentified through reserved words, but these reserved words are simply aliasesfor predefined struct types in the Systemnamespace, as described in the table below.
Because a simple type aliases a structtype, every simple type has members. For example, int has the members declared in System.Int32and the members inherited from System.Object
C# supports nine integral types: sbyte, byte, short, ushort, int, uint, long, ulong, and char.
C# supports two floating point types: float and double.
The decimal type is a 128-bitdata type suitable for financial and monetary calculations.
The bool type representsboolean logical quantities.
An enumeration type is a distinct type withnamed constants. Every enumeration type has an underlying type, which must be byte, sbyte, short, ushort, int, uint, long or ulong. The set of valuesof the enumeration type is the same as the set of values of the underlyingtype. Values of the enumeration type are not restricted to the values of thenamed constants. Enumeration types are defined through enumeration declarations
A nullable type can represent all values ofits underlying type plus an additional null value. Anullable type is written T?, where T is the underlying type. This syntaxis shorthand for System.Nullable<T>, and the two forms can be used interchangeably.
A non-nullable value typeconversely is any value type other than System.Nullable<T> and its shorthand T? (for any T), plus any type parameter that is constrained to be a non-nullablevalue type (that is, any type parameter with a structconstraint). The System.Nullable<T> type specifies the value type constraint for T (§10.1.5),which means that the underlying type of a nullable type can be anynon-nullable value type. The underlying type of a nullable type cannot be anullable type or a reference type. For example, int?? and string? are invalidtypes.
An instance of a nullable type T? has two public read-only properties:
· A HasValue property of type bool
· A Value property of type T
引用类型 reference type
The ultimate base class of all other types. See §4.2.2.
The string type of the C# language. See §4.2.4.
The base class of all value types. See §4.1.1.
The base class of all enum types. See §14.
The base class of all array types. See §12.
The base class of all delegate types. See §15.
The base class of all exception types. See §16.
A reference type is a class type, aninterface type, an array type, or a delegate type.
The object classtype is the ultimate base class of all other types. Every type in C# directlyor indirectly derives from the object classtype.
The keyword object is simply an alias for the predefined class System.Object.
The dynamictype, like object, can reference any object. Whenoperators are applied to expressions of type dynamic,their resolution is deferred until the program is run. Thus, if the operatorcannot legally be applied to the referenced object, no error is given duringcompilation. Instead an exception will be thrown when resolution of the operatorfails at run-time.
The dynamic type is further described in §4.7, anddynamic binding in §7.2.2.
The string typeis a sealed class type that inherits directly from object. Instances of the string class representUnicode character strings.
Values of the string type can be written as string literals (§220.127.116.11).
The keyword string is simply an alias for the predefined class System.String.
An interface defines a contract. A class orstruct that implements an interface must adhere to its contract. An interfacemay inherit from multiple base interfaces, and a class or struct may implementmultiple interfaces.
Interface types are described in §13.
An array is a data structure that containszero or more variables which are accessed through computed indices. Thevariables contained in an array, also called the elements of the array, are allof the same type, and this type is called the element type of the array.
Array types are described in §12.
A delegate is a data structure that refersto one or more methods. For instance methods, it also refers to their correspondingobject instances.
The closest equivalent of a delegate in Cor C++ is a function pointer, but whereas a function pointer can only referencestatic functions, a delegate can reference both static and instance methods. Inthe latter case, the delegate stores not only a reference to the method’s entrypoint, but also a reference to the object instance on which to invoke themethod.
Delegate types are described in §15.
Words.pm - Simplified Words Triggering2020-12-31 13:55:26<p>While and I were trying to solve a potential triggering problem, we began refactoring a small portion of <code>Words.pm</code>. <p>The triple ternary used to check for trigger matches has been ...
Implicit Types Insertion2020-12-09 12:44:25Prior to this there was no mention of implicit types in the data model document, and the understanding (at least by many in the group if not all) was that all the types are explicit i.e. the data ...
11.1.3 Simple types2005-12-06 04:58:0011.1.3 Simple typesC# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words, but these reserved words are simply aliases for predefin
11.1.3 Simple types
C# provides a set of predefined struct types called the simple types. The
simple types are identified through
reserved words, but these reserved words are simply aliases for predefined
struct types in the System
namespace, as described in the table below.
Reserved word Aliased type
Because a simple type aliases a struct type, every simple type has members.
[Example: For example, int
has the members declared in System.Int32 and the members inherited from
System.Object, and the
following statements are permitted:
int i = int.MaxValue; // System.Int32.MaxValue constant
string s = i.ToString(); // System.Int32.ToString() instance method
string t = 123.ToString(); // System.Int32.ToString() instance method
end example] The simple types differ from other struct types in that they
permit certain additional
?Most simple types permit values to be created by writing literals (?.4.4).
[Example: For example, 123
is a literal of type int and 抋?is a literal of type char. end example] C#
makes no provision for literals
of struct types in general, and non-default values of other struct types
are ultimately always created
through instance constructors of those struct types.
?When the operands of an expression are all simple type constants, the
compiler evaluates the expression
at compile-time. Such an expression is known as a constant-expression (?4.15
). Expressions involving
operators defined by other struct types are not considered to be constant
?Through const declarations, it is possible to declare constants of the
simple types (?7.3). It is not
possible to have constants of other struct types, but a similar effect is
provided by static readonly
?Conversions involving simple types can participate in evaluation of
conversion operators defined by
other struct types, but a user-defined conversion operator can never
participate in evaluation of another
user-defined operator (?3.4.2).
Thinking with Types2019-07-04 11:05:24Thinking with Types started, as so many of my projects do, accidentally. I was unemployed, bored, and starting to get tired of answering the same questions over and over again in Haskell chat-rooms. ...
Anonymous sum types2021-01-08 18:49:01In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and). <p>A value would have the same shape (also like tuples), with a value of appropriate type in ...
More feature types2020-12-29 01:18:15</code> - enables feature engineering tied to information from maps, like big circle distances, distances to city centre, city, state, distances to POIs of different types, etc * <code>"NLP"...
11.1 Value types2005-12-06 04:51:0011.1 Value typesA value type is either a struct type or an enumeration type. C# provides a set of predefined struct typescalled the simple types. The simple types are identified through reserved words
11.1 Value types
A value type is either a struct type or an enumeration type. C# provides a
set of predefined struct types
called the simple types. The simple types are identified through reserved
C# LANGUAGE SPECIFICATION
All value types implicitly inherit from class object. It is not possible
for any type to derive from a value
type, and value types are thus implicitly sealed (§18.104.22.168).
A variable of a value type always contains a value of that type. Unlike
reference types, it is not possible for
a value of a value type to be null, or to reference an object of a more
Assignment to a variable of a value type creates a copy of the value being
assigned. This differs from
assignment to a variable of a reference type, which copies the reference
but not the object identified by the
media-types: Define layer media types with and without '+gzip'2020-11-27 00:16:32Currently this PR introduces fewer new words than #332, but I expect that as the number of potentially-compressed media types grows this approach will become more tedious. But we may not see further ...
Add reserved words to archive builder2020-12-08 20:33:28<p>In addition there needs to be some modification to the existing types as a small number of types currently have clashing names (which is how we found this problem in the CTS).</p><p>该提问来源于...
insert type tuples before generic types2020-12-02 00:07:25<p>My solution is straight forward: instead of appending all state machine types, insert empty tuples type right before any generic types. <p>For instance, suppose we have a generic type: </p><pre>...