精华内容
下载资源
问答
  • Instead of just fixing this specific issue, might it be possible to make the formatter or it's callers more robust. I would prefer an unformatted sql to an exception. <p>Full code: </p><pre><code>...
  • t let you use reserved words like delete as identifier. I'm pretty sure that's the correct behavior too. <p>I'm just learning my way around your new 'types' feature so I'd ...
  • <div><p>While doing some general code cleanup on Npgsql, I came across a bit of strangeness in the PostGIS types; the types seem to be neither truly mutable, nor truly immutable. <p>For example, ...
  • <div><p>In the Netherlands we have surnames which consist of multiple words. For example, "van den Berg" is a really common name. So common, it has it's ...
  • Other types of

    2020-11-28 02:39:39
    <p>I have my nest hooked up but only after adding this skill, so echo is able to work with extra temperature words via the cloud, i.e. "set lounge to 20 degrees". Lounge being the device. <p>...
  • In other words, it would be a Viper consistency error or a Java exception if not for the fold-unfold pass. <p>It is very easy to encode all unsupported types using the <em>same</em> abstract ...
  • words, with single word Search Strings it is ok) 2. 3. What is the expected output? What do you see instead? If you put "The Beatles" Type "Music" File Types mp3 and wma in Google ...
  • <div><p>I have been using the Visual Bag of Words to identify different types of standard scanned documents (the goal being to sort the 4 different with the possibility of using this classification to...
  • s not on the safelist of known native error types</li><li>Grab the tamper-with-able <code>.constructor</code> property and compare that to a safelist</li><li>Work with the JS spec folks to store the ...
  • <div><p>I find myself using things like <code>qq-enum-networks-masscan</code> but having trouble remembering exactly the set of words to go between <code>qq</code> and <code>masscan</code>. Equally, ...
  • In various tasks such as information retrieval, document clustering, word-sense disambiguation, machine translation and text summarization, it is essential to measure the similarity between words, sen...

    In various tasks such as information retrieval, document clustering, word-sense disambiguation, machine translation and text summarization, it is essential to measure the similarity between words, sentences, paragraphs and documents. This post discusses the three different types of text similarity approaches: String-based, Corpus-based and Knowledge based. Furthermore, some example implementations using python libraries of some approaches are shown.
    在不同种任务中,比如信息检索,文档归类,词义消歧,机器翻译和文本摘要,计算词之间,段落之间和文本之间的相似度是非常重要的。本文讨论了三种不同类型的文本相似性方法:基于字符串,基于语料库和基于知识。 此外,示出了使用一些方法的python库的一些示例实现。
    在这里插入图片描述

    • String-Based Similarity 基于字符串

    A string similarity or distance takes into account the degree to which two strings match with each other.字符串相似性或距离考虑了两个字符串彼此匹配的程度。

    String-Based Similarity can be further classified as Character-Based Similarity Measures and Corpus-Based Similarity
    基于字符串的相似性可以进一步分类为基于字符的相似性度量和基于语料库的相似性.

    1 Character-Based Similarity Measure

    LCS is a common example of Character-Based Similarity Measure

    Longest Common SubString (LCS) algorithm considers the maximum length of contiguous chain of characters that exist in both strings.最长公共子串(LCS)算法考虑两个字符串中存在的连续字符串的最大长度。

    def longestSubstring(str1,str2):
         seqMatch = SequenceMatcher(None,str1,str2) 
     
         match = seqMatch.find_longest_match(0, len(str1), 0, len(str2))
         if (match.size!=0): 
              print (str1[match.a: match.a + match.size])  
         else: 
              print ('None')
    sent1 = "It might help to study nlp if possible."
    sent2 = "It can help to play football again if possible."
    print('longest substring between sent1 and sent2 : ',sent_1_2)
    
    The output:
    longest substring between sent1 and sent2 : if possible
    

    Another example of Character-Based Similarity Measure is Levenshtein edit distance. It defines distance between two strings by counting the minimum number of operations(insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters) needed to transform one string into the other.
    基于角色的相似性度量的另一个例子是Levenshtein编辑距离。 它通过计算将一个字符串转换为另一个字符串所需的最小操作数(插入,删除或替换单个字符,或两个相邻字符的转置)来定义两个字符串之间的距离。

    sent1 = "It might help to study nlp if possible."
    sent2 = "It can help to play football again if possible."
    sent_1_2 = nltk.edit_distance(sent1, sent2)
    print(sent_1_2, 'Edit Distance between sent1 and sent2')
    
    The output:
    22 Edit Distance between sent1 and sent2
    

    Term-based Similarity Measures

    Cosine similarity is a measure of similarity between two vectors that measures the cosine of the angle between them.
    Euclidean distance or L2 distance is the square root of the sum of squared differences between corresponding elements of the two vectors
    余弦相似度是两个矢量之间相似性的度量,测量它们之间角度的余弦。
    欧几里德距离或L2距离是两个矢量的相应元素之间的平方差之和的平方根
    在这里插入图片描述

    def compute_vectors(*strs):
        text = [t for t in strs]
        vectorizer = CountVectorizer(text)
        vectorizer.fit(text)
        return vectorizer.transform(text).toarray()
    def compute_cosine_sim(*strs): 
        vectors = [t for t in compute_vectors(*strs)]
        return cosine_similarity(vectors)
    def compute_euc_dis(*strs): 
        vectors = [t for t in compute_vectors(*strs)]
        return euclidean_distances(vectors)
    sent1 = "It might help to study nlp if possible."
    sent2 = "It can help to play football again if possible."
    print("cosine_sim",compute_cosine_sim(s1,s2))
    print("euclidean_dis",compute_euc_dis(s1,s2))
    
    The output :
    cosine_sim [[1.         0.58925565]
               [0.58925565 1.        ]]
    euclidean_dis [[0.        , 2.64575131],
                  [2.64575131, 0.        ]]
    

    Dice’s coefficient is defined as twice the number of common terms in the two strings divided by the total number of terms in both strings 定义为两个字符串中常用术语数量的两倍除以两个字符串中术语的总数
    Jaccard similarity is computed as the number of shared terms over union of all the terms in both strings计算为两个字符串中所有项的并集的共享项数
    Overlap coefficient considers two strings a full match if one is a subset of the other.如果一个是另一个的子集,则认为两个字符串是完全匹配的。

    在这里插入图片描述

    def compute_jaccard_sim(str1, str2): 
        a = set(str1.split()) 
        b = set(str2.split())
        c = a.intersection(b)
        return float(len(c)) / (len(a) + len(b) - len(c))
    def compute_dice_sim(str1, str2): 
        a = set(str1.split()) 
        b = set(str2.split())
        c = a.intersection(b)
        return 2*float(len(c)) / (len(a) + len(b))
    def compute_overlap_sim(str1, str2): 
        a = set(str1.split()) 
        b = set(str2.split())
        c = a.intersection(b)
        return float(len(c)) / min(len(a) , len(b) )
    sent1 = "It might help to study nlp if possible."
    sent2 = "It can help to play football again if possible."
    print("jaccard: "compute_jaccard_sim(sent1, sent2)
    print("dice: ",compute_dice_sim(sent1, sent2)
    print("overlap: ",compute_overlap_sim(sent1, sent2)
    
    The output :
    jaccard: 0.4166666666666667
    dice: 0.5882352941176471
    overlap: 0.625
    

    2 Corpus-Based Similarity

    Corpus-Based similarity determines the semantic similarity between words according to information gained from a large corpora. Pointwise Mutual Information is an example of corpus based similarity.
    Pointwise Mutual Information — Information Retrieval is a method for computing the similarity between pairs of words The more often two words co-occur near each other on a web page, the higher is their PMI-IR similarity score.
    基于语料库的相似性根据从大型语料库获得的信息确定单词之间的语义相似性。 Pointwise Mutual Information是基于语料库的相似性的一个例子。
    点状互信息 - 信息检索是一种计算词对之间相似性的方法。网页上两个词彼此相近出现的次数越多,他们的PMI-IR相似度得分就越高。

    在这里插入图片描述

    text = “this is a foo bus red car foo bus bus blue car foo bar bar red car shep bus bus blue”
    bigram_measures = nltk.collocations.BigramAssocMeasures()
    finder = BigramCollocationFinder.from_words(word_tokenize(text))
    for i in finder.score_ngrams(bigram_measures.pmi):
     print(i)
    
    The output :
    (('is', 'a'), 4.392317422778761)
    (('this', 'is'), 4.392317422778761)
    (('a', 'foo'), 2.8073549220576046)
    (('car', 'shep'), 2.8073549220576046)
    (('red', 'car'), 2.8073549220576046)
    (('bar', 'bar'), 2.3923174227787607)
    (('bar', 'red'), 2.3923174227787607)
    (('car', 'foo'), 2.222392421336448)
    (('shep', 'bus'), 2.0703893278913985)
    (('bus', 'blue'), 2.070389327891398)
    (('blue', 'car'), 1.8073549220576046)
    (('foo', 'bar'), 1.8073549220576046)
    (('foo', 'bus'), 1.485426827170242)
    (('bus', 'red'), 1.070389327891398)
    (('bus', 'bus'), 0.7484612330040363)
    

    Knowledge-Based Similarity

    Knowledge-Based Similarity measures the degree of similarity between words using information derived from semantic networks. WordNet is the most popular semantic network. It is a large lexical database of English words tagged as Nouns, verbs, adjectives and adverbs and the words are grouped into sets of synonyms (synsets), each expressing a distinct concept.基于知识的相似性使用从语义网络导出的信息来测量单词之间的相似度。 WordNet是最流行的语义网络。 它是一个大型的词汇数据库,英文单词被标记为名词,动词,形容词和副词,这些词被分为同义词集(同义词集),每个都表达一个独特的概念。
    在这里插入图片描述

    Resnik Similarity is based on the Information Content (IC) of the Least Common Subsumer (lowest node in the hierarchy that is a hypernymn).Resnik相似性基于最小公共子集的信息内容(IC)(层次结构中的最低节点是一个超级节点)。

    在这里插入图片描述
    在这里插入图片描述
    Jiang-Conrath Similarity is based on the Information Content (IC) of the Least Common Subsumer and that of the two input Synsets.
    Jiang-Conrath相似性基于最小公共Subsumer的信息内容(IC)和两个输入Synset的信息内容。
    在这里插入图片描述
    Lin Similarity is based on the Information Content (IC) of the Least Common Subsumer and that of the two input Synsets.
    Lin Similarity基于最小公共Subsumer的信息内容(IC)和两个输入Synset的信息内容。
    在这里插入图片描述

    #retrieving IC of the brown corpus
    from nltk.corpus import wordnet_ic
    from nltk.corpus import wordnet as wn
    brown_ic = wordnet_ic.ic('ic-brown.dat')
    #looking up noun words 'rat' and 'lion' using synset()
    rat = wn.synset('rat.n.01')
    lion = wn.synset('lion.n.01')
    print("resnick: "rat.res_similarity(lion, genesis_ic))
    print("jc: "rat.res_similarity(lion, genesis_ic))
    print("lin: "rat.res_similarity(lion, genesis_ic))
    
    The output:
    resnick: 4.665415658815678
    jc: 0.08207149300038069
    lin: 0.5288091238271396
    

    References
    A Survey of Text Similarity Approaches, . Gomaa and Fahmy, International Journal of Computer Applications
    http://www.nltk.org/howto/wordnet.html
    nlp course slides - IIT Gandhinagar (https://sites.google.com/a/iitgn.ac.in/nlp-2018/)

    展开全文
  • s a question of how to represent an account that is derived from your main seed words vs from a wallet file. <h2>Proposal: Organized vaults <p>The account list section could be divided into sections. ...
  • New behaviour is to search for any words. "avengers x-men" now finds all avengers <em>and</em> all x-men books. Too many results. The third behaviour should be to search for all words. "...
  • <p>In other words: If we don't specify what we're sending, NCC will first try to guess what the type is, and then proceed accordingly if successful at doing so. <p>But if we DO specify what we...
  • <p>Is it possible that there is some kind of derefencing error in the compiled libsumojni.dll in other words in the c++ world? The size of the StringVector ContinuationLanes is way to large. ...
  • <p>I have a question on the word segmentation methods on the contexts of mentions. I noticed that you apply two different methds. In the first method, you split the words from <code>m['context'...
  • <p>In my opinion, this definition of equality is wrong, because it does not include the type of the collection. I would consider this a bug. It is possible to create and use an own comparer or an own ...
  • Supported types

    2020-12-07 08:02:26
    <p>In other words, lots of types. <p>Is there a possibility to extend the library in such a way that it can support any type? For example, I would like to put a time / date field in an appropriate ...
  • <p>This PR Upgrades <strong>types definition of d3, google.visualization & selenium-webdriver To view the status of library upgrade please visit ;0">here</a></p> <p>for testing doc please visit ;...
  • 4. Types

    2018-06-15 11:35:43
    值类型A value type is either a struct type or anenumeration... C# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words..1 The Syste...

    值类型

    A value type is either a struct type or anenumeration type. C# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words.

    .1 The System.ValueType typeSystem.ValueTypeSystem.ValueTypeSystem.ValueType

    System.ValueType

    All value types implicitly inherit from theclass System.ValueType, which, in turn, inherits from class object. It is not possible for any type to derive from a value type, andvalue types are thus implicitly sealed (§10.1.1.2).

    Note that System.ValueType is not itself a value-type. Rather, it is a class-type from which all value-types are automatically derived.


    A struct type is a value type that candeclare constants, fields, methods, properties, indexers, operators, instanceconstructors, static constructors, and nested types. The declaration of struct typesis described in §11.1.


    C# provides a set of predefined structtypes called the simple types. The simple types areidentified through reserved words, but these reserved words are simply aliasesfor predefined struct types in the Systemnamespace, as described in the table below.

     

    Reserved word

    Aliased type

    sbyte

    System.SByte

    byte

    System.Byte

    short

    System.Int16

    ushort

    System.UInt16

    int

    System.Int32

    uint

    System.UInt32

    long

    System.Int64

    ulong

    System.UInt64

    char

    System.Char

    float

    System.Single

    double

    System.Double

    bool

    System.Boolean

    decimal

    System.Decimal

     

    Because a simple type aliases a structtype, every simple type has members. For example, int has the members declared in System.Int32and the members inherited from System.Object


    C# supports nine integral types: sbyte, byte, short, ushort, int, uint, long, ulong, and char. 


    C# supports two floating point types: float and double.


    The decimal type is a 128-bitdata type suitable for financial and monetary calculations.


    The bool type representsboolean logical quantities. 

    An enumeration type is a distinct type withnamed constants. Every enumeration type has an underlying type, which must be byte, sbyte, short, ushort, int, uint, long or ulong. The set of valuesof the enumeration type is the same as the set of values of the underlyingtype. Values of the enumeration type are not restricted to the values of thenamed constants. Enumeration types are defined through enumeration declarations 


    A nullable type can represent all values ofits underlying type plus an additional null value. Anullable type is written T?, where T is the underlying type. This syntaxis shorthand for System.Nullable<T>, and the two forms can be used interchangeably.

    A non-nullable value typeconversely is any value type other than System.Nullable<T> and its shorthand T? (for any T), plus any type parameter that is constrained to be a non-nullablevalue type (that is, any type parameter with a structconstraint). The System.Nullable<T> type specifies the value type constraint for T (§10.1.5),which means that the underlying type of a nullable type can be anynon-nullable value type. The underlying type of a nullable type cannot be anullable type or a reference type. For example, int?? and string? are invalidtypes.

    An instance of a nullable type T? has two public read-only properties:

    ·        A HasValue property of type bool

    ·        A Value property of type T


    引用类型  reference type

    Class type

    Description

    System.Object

    The ultimate base class of all other types. See §4.2.2.

    System.String

    The string type of the C# language. See §4.2.4.

    System.ValueType

    The base class of all value types. See §4.1.1.

    System.Enum

    The base class of all enum types. See §14.

    System.Array

    The base class of all array types. See §12.

    System.Delegate

    The base class of all delegate types. See §15.

    System.Exception

    The base class of all exception types. See §16.



    A reference type is a class type, aninterface type, an array type, or a delegate type.

     

    1.1.1 The object type

    The object classtype is the ultimate base class of all other types. Every type in C# directlyor indirectly derives from the object classtype.

    The keyword object is simply an alias for the predefined class System.Object.

    1.1.2 Thedynamic type

    The dynamictype, like object, can reference any object. Whenoperators are applied to expressions of type dynamic,their resolution is deferred until the program is run. Thus, if the operatorcannot legally be applied to the referenced object, no error is given duringcompilation. Instead an exception will be thrown when resolution of the operatorfails at run-time.

    The dynamic type is further described in §4.7, anddynamic binding in §7.2.2.

    1.1.3 The string type

    The string typeis a sealed class type that inherits directly from object. Instances of the string class representUnicode character strings.

    Values of the string type can be written as string literals (§2.4.4.5).

    The keyword string is simply an alias for the predefined class System.String.

    1.1.4 Interfacetypes

    An interface defines a contract. A class orstruct that implements an interface must adhere to its contract. An interfacemay inherit from multiple base interfaces, and a class or struct may implementmultiple interfaces.

    Interface types are described in §13.

    1.1.5 Arraytypes

    An array is a data structure that containszero or more variables which are accessed through computed indices. Thevariables contained in an array, also called the elements of the array, are allof the same type, and this type is called the element type of the array.

    Array types are described in §12.

    1.1.6 Delegatetypes

    A delegate is a data structure that refersto one or more methods. For instance methods, it also refers to their correspondingobject instances.

    The closest equivalent of a delegate in Cor C++ is a function pointer, but whereas a function pointer can only referencestatic functions, a delegate can reference both static and instance methods. Inthe latter case, the delegate stores not only a reference to the method’s entrypoint, but also a reference to the object instance on which to invoke themethod.

    Delegate types are described in §15.
















    展开全文
  • <p>While and I were trying to solve a potential triggering problem, we began refactoring a small portion of <code>Words.pm</code>. <p>The triple ternary used to check for trigger matches has been ...
  • Implicit Types Insertion

    2020-12-09 12:44:25
    Prior to this there was no mention of implicit types in the data model document, and the understanding (at least by many in the group if not all) was that all the types are explicit i.e. the data ...
  • 11.1.3 Simple types

    2005-12-06 04:58:00
    11.1.3 Simple typesC# provides a set of predefined struct types called the simple types. The simple types are identified throughreserved words, but these reserved words are simply aliases for predefin
    11.1.3 Simple types
    C# provides a set of predefined struct types called the simple types. The
    simple types are identified through
    reserved words, but these reserved words are simply aliases for predefined
    struct types in the System
    namespace, as described in the table below.
    Reserved word Aliased type
    sbyte System.SByte
    byte System.Byte
    short System.Int16
    ushort System.UInt16
    int System.Int32
    uint System.UInt32
    long System.Int64
    ulong System.UInt64
    char System.Char
    float System.Single
    double System.Double
    bool System.Boolean
    decimal System.Decimal
    Because a simple type aliases a struct type, every simple type has members.
    [Example: For example, int
    has the members declared in System.Int32 and the members inherited from
    System.Object, and the
    following statements are permitted:
    int i = int.MaxValue; // System.Int32.MaxValue constant
    string s = i.ToString(); // System.Int32.ToString() instance method
    string t = 123.ToString(); // System.Int32.ToString() instance method
    end example] The simple types differ from other struct types in that they
    permit certain additional
    operations:
    ?Most simple types permit values to be created by writing literals (?.4.4).
    [Example: For example, 123
    is a literal of type int and 抋?is a literal of type char. end example] C#
    makes no provision for literals
    of struct types in general, and non-default values of other struct types
    are ultimately always created
    through instance constructors of those struct types.
    ?When the operands of an expression are all simple type constants, the
    compiler evaluates the expression
    at compile-time. Such an expression is known as a constant-expression (?4.15
    ). Expressions involving
    operators defined by other struct types are not considered to be constant
    expressions.
    ?Through const declarations, it is possible to declare constants of the
    simple types (?7.3). It is not
    possible to have constants of other struct types, but a similar effect is
    provided by static readonly
    fields.
    ?Conversions involving simple types can participate in evaluation of
    conversion operators defined by
    other struct types, but a user-defined conversion operator can never
    participate in evaluation of another
    user-defined operator (?3.4.2).
    展开全文
  • Thinking with Types

    2019-07-04 11:05:24
    Thinking with Types started, as so many of my projects do, accidentally. I was unemployed, bored, and starting to get tired of answering the same questions over and over again in Haskell chat-rooms. ...
  • Anonymous sum types

    2021-01-08 18:49:01
    In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and). <p>A value would have the same shape (also like tuples), with a value of appropriate type in ...
  • More feature types

    2020-12-29 01:18:15
    </code> - enables feature engineering tied to information from maps, like big circle distances, distances to city centre, city, state, distances to POIs of different types, etc * <code>"NLP"...
  • 11.1 Value types

    2005-12-06 04:51:00
    11.1 Value typesA value type is either a struct type or an enumeration type. C# provides a set of predefined struct typescalled the simple types. The simple types are identified through reserved words
    11.1 Value types
    A value type is either a struct type or an enumeration type. C# provides a
    set of predefined struct types
    called the simple types. The simple types are identified through reserved
    words.
    value-type:
    struct-type
    enum-type
    struct-type:
    type-name
    simple-type
    simple-type:
    numeric-type
    bool
    numeric-type:
    integral-type
    floating-point-type
    decimal
    integral-type:
    sbyte
    byte
    short
    ushort
    int
    uint
    long
    ulong
    char
    C# LANGUAGE SPECIFICATION
    90
    floating-point-type:
    float
    double
    enum-type:
    type-name
    All value types implicitly inherit from class object. It is not possible
    for any type to derive from a value
    type, and value types are thus implicitly sealed (§17.1.1.2).
    A variable of a value type always contains a value of that type. Unlike
    reference types, it is not possible for
    a value of a value type to be null, or to reference an object of a more
    derived type.
    Assignment to a variable of a value type creates a copy of the value being
    assigned. This differs from
    assignment to a variable of a reference type, which copies the reference
    but not the object identified by the
    reference.
    展开全文
  • Currently this PR introduces fewer new words than #332, but I expect that as the number of potentially-compressed media types grows this approach will become more tedious. But we may not see further ...
  • <p>In addition there needs to be some modification to the existing types as a small number of types currently have clashing names (which is how we found this problem in the CTS).</p><p>该提问来源于...
  • <p>My solution is straight forward: instead of appending all state machine types, insert empty tuples type right before any generic types. <p>For instance, suppose we have a generic type: </p><pre>...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 1,150
精华内容 460
关键字:

oftypeswords