精华内容
下载资源
问答
  • WordNet

    2019-09-05 14:04:30
    WordNet WordNet.java 问题主要是判断G是不是只有一个根的有向无环图。无环可以通过BFS或DFS去判断,这里使用内置的类DirectedCycle去判断;不难看出一个点是根的条件是它的出度为0,而在无环的前提下,每一个连通...

    WordNet

    WordNet.java

    问题主要是判断G是不是只有一个根的有向无环图。无环可以通过BFS或DFS去判断,这里使用内置的类DirectedCycle去判断;不难看出一个点是根的条件是它的出度为0,而在无环的前提下,每一个连通分量必然至少有一个出度为0的点(反证法很容易证明),所以只要判断出只有一个出度为0的点,则既能确定该有向图为连通图,又能确定该图只有一个根。

    import edu.princeton.cs.algs4.Digraph;
    import edu.princeton.cs.algs4.DirectedCycle;
    import edu.princeton.cs.algs4.In;
    
    import java.util.HashMap;
    import java.util.HashSet;
    import java.util.Map;
    import java.util.Set;
    
    public class WordNet {
        private final SAP sap;
        private final Map<Integer, String> ids;
        private final Map<String, Set<Integer>> nouns;
    
        // constructor takes the name of the two input files
        public WordNet(String synsets, String hypernyms) {
            if (synsets == null || hypernyms == null) {
                throw new IllegalArgumentException();
            }
    
            ids = new HashMap<>();
            nouns = new HashMap<>();
    
            // 读取synsets并处理
            In in = new In(synsets);
            while (in.hasNextLine()) {
                String[] synset = in.readLine().split(",");
                int id = Integer.parseInt(synset[0]);
                ids.put(id, synset[1]);
                for (String s : synset[1].split(" ")) {
                    if (!nouns.containsKey(s)) {
                        Set<Integer> temp = new HashSet<>();
                        temp.add(id);
                        nouns.put(s, temp);
                    } else {
                        nouns.get(s).add(id);
                    }
                }
            }
            in.close();
    
            // 读取hypernyms并生成图
            Digraph G = new Digraph(ids.size());
            in = new In(hypernyms);
            while (in.hasNextLine()) {
                String[] hypernym = in.readLine().split(",");
                int id = Integer.parseInt(hypernym[0]);
                for (int i = 1; i < hypernym.length; i++) {
                    int hyper = Integer.parseInt(hypernym[i]);
                    G.addEdge(id, hyper);
                }
            }
            in.close();
    
            // 判断是否有环及根的个数是否为1
            DirectedCycle cycle = new DirectedCycle(G);
            int rootNum = 0;
            for (int i = 0; i < ids.size(); i++) {
                if (G.outdegree(i) == 0) {
                    rootNum++;
                }
            }
            if (cycle.hasCycle() || rootNum != 1) {
                throw new IllegalArgumentException();
            }
    
            // 生成图对应的SAP
            sap = new SAP(G);
        }
    
        // returns all WordNet nouns
        public Iterable<String> nouns() {
            return nouns.keySet();
        }
    
        // is the word a WordNet noun?
        public boolean isNoun(String word) {
            if (word == null) {
                throw new IllegalArgumentException();
            }
    
            return nouns.containsKey(word);
        }
    
        // distance between nounA and nounB (defined below)
        public int distance(String nounA, String nounB) {
            if (nounA == null || nounB == null || !isNoun(nounA) || !isNoun(nounB)) {
                throw new IllegalArgumentException();
            }
    
            return sap.length(nouns.get(nounA), nouns.get(nounB));
        }
    
        // a synset (second field of synsets.txt) that is the common ancestor of nounA and nounB
        // in a shortest ancestral path (defined below)
        public String sap(String nounA, String nounB) {
            if (nounA == null || nounB == null || !isNoun(nounA) || !isNoun(nounB)) {
                throw new IllegalArgumentException();
            }
    
            return ids.get(sap.ancestor(nouns.get(nounA), nouns.get(nounB)));
        }
    
        // do unit testing of this class
        public static void main(String[] args) {
            WordNet wn = new WordNet("synsets.txt", "hypernyms.txt");
            System.out.println(wn.distance("1750s", "1790s"));
        }
    }
    

    SAP.java

    length(int v, int w)
    两次BFS先求出图中v到其余点的最短距离、w到其余点的最短距离,再遍历图中所有点,判断v和w是否都有路径到该点i,如果是则判断v->i和w->i路径之和是否小于先前所求最小路径和。

    length(Iterable v, Iterable w)
    问题关键在于如何求出图中一个点集合到其余所有点的最短距离,其余只要稍微修改一下BFS的实现即可:初始化队列时,将点集合中所有的点入队后再进行BFS。剩余步骤与整数参数版本的方法一致。

    为了代码简洁使用了提供的BreadthFirstDirectedPaths类,具体实现可以直接阅读源码。

    import edu.princeton.cs.algs4.BreadthFirstDirectedPaths;
    import edu.princeton.cs.algs4.Digraph;
    import edu.princeton.cs.algs4.In;
    import edu.princeton.cs.algs4.StdIn;
    import edu.princeton.cs.algs4.StdOut;
    
    public class SAP {
        private final Digraph G;
        private int ancestor;
    
        // constructor takes a digraph (not necessarily a DAG)
        public SAP(Digraph G) {
            if (G == null) {
                throw new IllegalArgumentException();
            }
    
            this.G = new Digraph(G);        // 复制G而不是直接引用,因为G应当是不可变的
        }
    
        // length of shortest ancestral path between v and w; -1 if no such path
        public int length(int v, int w) {
            if (v < 0 || v >= G.V() || w < 0 || w >= G.V()) {
                throw new IllegalArgumentException();
            }
    
            ancestor = -1;				// 顺便处理(v, w)的祖先
            int minLength = -1;
            BreadthFirstDirectedPaths bfsv = new BreadthFirstDirectedPaths(G, v);
            BreadthFirstDirectedPaths bfsw = new BreadthFirstDirectedPaths(G, w);
    
            for (int i = 0; i < G.V(); i++) {
                if (bfsv.hasPathTo(i) && bfsw.hasPathTo(i)) {
                    int length = bfsv.distTo(i) + bfsw.distTo(i);
                    if (minLength == -1 || length < minLength) {
                        minLength = length;
                        ancestor = i;
                    }
                }
            }
    
            return minLength;
        }
    
        // a common ancestor of v and w that participates in a shortest ancestral path; -1 if no such path
        public int ancestor(int v, int w) {
            length(v, w);				
            return ancestor;
        }
    
        // length of shortest ancestral path between any vertex in v and any vertex in w; -1 if no such path
        public int length(Iterable<Integer> v, Iterable<Integer> w) {
            if (v == null || w == null) {
                throw new IllegalArgumentException();
            }
            // 注意实际存储的是Integer而不是int,所以可能出现i==null的情况
            for (Integer i : v) {
                if (i == null || i < 0 || i >= G.V()) throw new IllegalArgumentException();
            }
            for (Integer i : w) {
                if (i == null || i < 0 || i >= G.V()) throw new IllegalArgumentException();
            }
    
            ancestor = -1;				// 顺便处理(v, w)的祖先
            int minLength = -1;
            BreadthFirstDirectedPaths bfsv = new BreadthFirstDirectedPaths(G, v);
            BreadthFirstDirectedPaths bfsw = new BreadthFirstDirectedPaths(G, w);
    
            for (int i = 0; i < G.V(); i++) {
                if (bfsv.hasPathTo(i) && bfsw.hasPathTo(i)) {
                    int length = bfsv.distTo(i) + bfsw.distTo(i);
                    if (minLength == -1 || length < minLength) {
                        minLength = length;
                        ancestor = i;
                    }
                }
            }
    
            return minLength;
        }
    
        // a common ancestor that participates in shortest ancestral path; -1 if no such path
        public int ancestor(Iterable<Integer> v, Iterable<Integer> w) {
            length(v, w);
            return ancestor;
        }
    
        // do unit testing of this class
        public static void main(String[] args) {
            In in = new In(args[0]);
            Digraph G = new Digraph(in);
            SAP sap = new SAP(G);
            while (!StdIn.isEmpty()) {
                int v = StdIn.readInt();
                int w = StdIn.readInt();
                int length = sap.length(v, w);
                int ancestor = sap.ancestor(v, w);
                StdOut.printf("length = %d, ancestor = %d\n", length, ancestor);
            }
        }
    }
    

    Outcast.java

    直接按照定义写即可。

    import edu.princeton.cs.algs4.In;
    import edu.princeton.cs.algs4.StdOut;
    
    public class Outcast {
        private final WordNet wordNet;
    
        // constructor takes a WordNet object
        public Outcast(WordNet wordnet) {
            if (wordnet == null) {
                throw new IllegalArgumentException();
            }
    
            this.wordNet = wordnet;
        }
    
        // given an array of WordNet nouns, return an outcast
        public String outcast(String[] nouns) {
            int maxDis = -1;
            String outcast = null;
    
            for (int i = 0; i < nouns.length; i++) {
                int dis = 0;
                for (int j = 0; j < nouns.length; j++) {
                    if (j != i) {
                        dis += wordNet.distance(nouns[i], nouns[j]);
                    }
                }
                if (dis > maxDis) {
                    maxDis = dis;
                    outcast = nouns[i];
                }
            }
    
            return outcast;
        }
    
        // see test client below
        public static void main(String[] args) {
            WordNet wordnet = new WordNet(args[0], args[1]);
            Outcast outcast = new Outcast(wordnet);
            for (int t = 2; t < args.length; t++) {
                In in = new In(args[t]);
                String[] nouns = in.readAllStrings();
                StdOut.println(args[t] + ": " + outcast.outcast(nouns));
            }
        }
    }
    
    展开全文
  • wordnet

    千次阅读 2016-01-18 17:00:46
    http://www.nltk.org/howto/wordnet.htmlWordNet InterfaceWordNet is just another NLTK corpus reader, and can be imported like ...WordNet的导入。from nltk.corpus import wordnetFor more compact code, we re

    http://www.nltk.org/howto/wordnet.html

    WordNet Interface

    WordNet is just another NLTK corpus reader, and can be imported like this:
    WordNet的导入。

    from nltk.corpus import wordnet

    For more compact code, we recommend:

    from nltk.corpus import wordnet as wn

    Words

    Look up a word using synsets(); this function has an optional pos argument which lets you constrain the part of speech of the word:
    synset是近义词集合。这个函数可以选择限制词性,比如说限定为VERB,动词,NOUN,名词等等。

    选择dog的近义词集合
    wn.synsets('dog') # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
    [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'),
    Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

    wn.synsets('dog', pos=wn.VERB)
    [Synset('chase.v.01')]

    The other parts of speech are NOUN, ADJ and ADV. A synset is identified with a 3-part name of the form: word.pos.nn:

    wn.synset('dog.n.01')
    Synset('dog.n.01')
    print(wn.synset(‘dog.n.01’).definition())
    a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds
    len(wn.synset('dog.n.01').examples())
    1
    print(wn.synset(‘dog.n.01’).examples()[0])
    the dog barked all night
    wn.synset('dog.n.01').lemmas()
    [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
    [str(lemma.name()) for lemma in wn.synset(‘dog.n.01’).lemmas()]
    [‘dog’, ‘domestic_dog’, ‘Canis_familiaris’]
    ` wn.lemma(‘dog.n.01.dog’).synset()
    Synset(‘dog.n.01’)
    The WordNet corpus reader gives access to the Open Multilingual WordNet, using ISO-639 language codes.

    sorted(wn.langs())
    ['als', 'arb', 'cat', 'cmn', 'dan', 'eng', 'eus', 'fas',
    'fin', 'fra', 'fre', 'glg', 'heb', 'ind', 'ita', 'jpn', 'nno',
    'nob', 'pol', 'por', 'spa', 'tha', 'zsm']
    wn.synsets(b’\xe7\x8a\xac’.decode(‘utf-8’), lang=’jpn’)
    [Synset(‘dog.n.01’), Synset(‘spy.n.01’)]
    wn.synset('spy.n.01').lemma_names('jpn')
    ['\u3044\u306c', '\u307e\u308f\u3057\u8005', '\u30b9\u30d1\u30a4', '\u56de\u3057\u8005',
    '\u56de\u8005', '\u5bc6\u5075', '\u5de5\u4f5c\u54e1', '\u5efb\u3057\u8005',
    '\u5efb\u8005', '\u63a2', '\u63a2\u308a', '\u72ac', '\u79d8\u5bc6\u635c\u67fb\u54e1',
    '\u8adc\u5831\u54e1', '\u8adc\u8005', '\u9593\u8005', '\u9593\u8adc', '\u96a0\u5bc6']
    wn.synset(‘dog.n.01’).lemma_names(‘ita’)
    [‘cane’, ‘Canis_familiaris’]
    wn.lemmas('cane', lang='ita')
    [Lemma('dog.n.01.cane'), Lemma('hammer.n.01.cane'), Lemma('cramp.n.02.cane'),
    Lemma('bad_person.n.01.cane'), Lemma('incompetent.n.01.cane')]
    sorted(wn.synset(‘dog.n.01’).lemmas(‘dan’))
    [Lemma(‘dog.n.01.hund’), Lemma(‘dog.n.01.k\xf8ter’),
    Lemma(‘dog.n.01.vovhund’), Lemma(‘dog.n.01.vovse’)]
    sorted(wn.synset('dog.n.01').lemmas('por'))
    [Lemma('dog.n.01.cachorro'), Lemma('dog.n.01.c\xe3es'),
    Lemma('dog.n.01.c\xe3o'), Lemma('dog.n.01.c\xe3o')]
    dog_lemma = wn.lemma(b’dog.n.01.c\xc3\xa3o’.decode(‘utf-8’), lang=’por’)
    dog_lemma
    Lemma('dog.n.01.c\xe3o')
    dog_lemma.lang()
    ‘por’
    ` len(wordnet.all_lemma_names(pos=’n’, lang=’jpn’))
    66027
    Synsets

    Synset: a set of synonyms that share a common meaning.
    同义词,
    dog = wn.synset('dog.n.01')
    dog.hypernyms()
    hypernyms()是上位词的意思
    [Synset(‘canine.n.02’), Synset(‘domestic_animal.n.01’)]
    dog.hyponyms() # doctest: +ELLIPSIS
    hyponyms()是下位词的意思。
    [Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), ...]
    dog.member_holonyms()
    [Synset(‘canis.n.01’), Synset(‘pack.n.06’)]
    dog.root_hypernyms()
    root_hypernymes是根部的上位词。
    [Synset('entity.n.01')]
    wn.synset(‘dog.n.01’).lowest_common_hypernyms(wn.synset(‘cat.n.01’))
    [Synset(‘carnivore.n.01’)]

    Each synset contains one or more lemmas, which represent a specific sense of a specific word.
    lemmas翻译成词条吧。每个近义词集合是由词条构成的。

    Note that some relations are defined by WordNet only over Lemmas:

    good = wn.synset('good.a.01')
    good.antonyms()
    Traceback (most recent call last):
    File “”, line 1, in
    AttributeError: ‘Synset’ object has no attribute ‘antonyms’
    ` good.lemmas()[0].antonyms()
    [Lemma(‘bad.a.01.bad’)]
    The relations that are currently defined in this way are antonyms, derivationally_related_forms and pertainyms.
    antonyms()应该是反义词吧。
    Lemmas

    eat = wn.lemma('eat.v.03.eat')
    eat
    Lemma(‘feed.v.06.eat’)
    print(eat.key())
    eat%2:34:02::
    eat.count()
    4
    wn.lemma_from_key(eat.key())
    Lemma('feed.v.06.eat')
    wn.lemma_from_key(eat.key()).synset()
    Synset(‘feed.v.06’)
    wn.lemma_from_key('feebleminded%5:00:00:retarded:00')
    Lemma('backward.s.03.feebleminded')
    for lemma in wn.synset(‘eat.v.03’).lemmas():
    … print(lemma, lemma.count())

    Lemma(‘feed.v.06.feed’) 3
    Lemma(‘feed.v.06.eat’) 4
    ` for lemma in wn.lemmas(‘eat’, ‘v’):
    … print(lemma, lemma.count())

    Lemma(‘eat.v.01.eat’) 61
    Lemma(‘eat.v.02.eat’) 13
    Lemma(‘feed.v.06.eat’) 4
    Lemma(‘eat.v.04.eat’) 0
    Lemma(‘consume.v.05.eat’) 0
    Lemma(‘corrode.v.01.eat’) 0
    Lemmas can also have relations between them:

    vocal = wn.lemma('vocal.a.01.vocal')
    vocal.derivationally_related_forms()
    [Lemma(‘vocalize.v.02.vocalize’)]
    vocal.pertainyms()
    [Lemma('voice.n.02.voice')]
    vocal.antonyms()
    [Lemma(‘instrumental.a.01.instrumental’)]
    The three relations above exist only on lemmas, not on synsets.

    Verb Frames

    wn.synset('think.v.01').frame_ids()
    [5, 9]
    for lemma in wn.synset(‘think.v.01’).lemmas():
    … print(lemma, lemma.frame_ids())
    … print(” | “.join(lemma.frame_strings()))

    Lemma(‘think.v.01.think’) [5, 9]
    Something think something Adjective/Noun | Somebody think somebody
    Lemma(‘think.v.01.believe’) [5, 9]
    Something believe something Adjective/Noun | Somebody believe somebody
    Lemma(‘think.v.01.consider’) [5, 9]
    Something consider something Adjective/Noun | Somebody consider somebody
    Lemma(‘think.v.01.conceive’) [5, 9]
    Something conceive something Adjective/Noun | Somebody conceive somebody
    wn.synset('stretch.v.02').frame_ids()
    [8]
    for lemma in wn.synset(‘stretch.v.02’).lemmas():
    … print(lemma, lemma.frame_ids())
    … print(” | “.join(lemma.frame_strings()))

    Lemma(‘stretch.v.02.stretch’) [8, 2]
    Somebody stretch something | Somebody stretch
    Lemma(‘stretch.v.02.extend’) [8]
    Somebody extend something
    Similarity

    dog = wn.synset('dog.n.01')
    cat = wn.synset(‘cat.n.01’)
    hit = wn.synset('hit.v.01')
    slap = wn.synset(‘slap.v.01’)
    synset1.path_similarity(synset2): Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1. By default, there is now a fake root node added to verbs so for cases where previously a path could not be found—and None was returned—it should return a value. The old behavior can be achieved by setting simulate_root to be False. A score of 1 represents identity i.e. comparing a sense with itself will return 1.

    dog.path_similarity(cat) # doctest: +ELLIPSIS
    0.2...
    hit.path_similarity(slap) # doctest: +ELLIPSIS
    0.142…
    wn.path_similarity(hit, slap) # doctest: +ELLIPSIS
    0.142...
    print(hit.path_similarity(slap, simulate_root=False))
    None
    ` print(wn.path_similarity(hit, slap, simulate_root=False))
    None
    synset1.lch_similarity(synset2): Leacock-Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. The relationship is given as -log(p/2d) where p is the shortest path length and d the taxonomy depth.

    dog.lch_similarity(cat) # doctest: +ELLIPSIS
    2.028...
    hit.lch_similarity(slap) # doctest: +ELLIPSIS
    1.312…
    wn.lch_similarity(hit, slap) # doctest: +ELLIPSIS
    1.312...
    print(hit.lch_similarity(slap, simulate_root=False))
    None
    ` print(wn.lch_similarity(hit, slap, simulate_root=False))
    None
    synset1.wup_similarity(synset2): Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). Note that at this time the scores given do not always agree with those given by Pedersen’s Perl implementation of Wordnet Similarity.

    The LCS does not necessarily feature in the shortest path connecting the two senses, as it is by definition the common ancestor deepest in the taxonomy, not closest to the two senses. Typically, however, it will so feature. Where multiple candidates for the LCS exist, that whose shortest path to the root node is the longest will be selected. Where the LCS has multiple paths to the root, the longer path is used for the purposes of the calculation.

    dog.wup_similarity(cat) # doctest: +ELLIPSIS
    0.857...
    hit.wup_similarity(slap)
    0.25
    wn.wup_similarity(hit, slap)
    0.25
    print(hit.wup_similarity(slap, simulate_root=False))
    None
    ` print(wn.wup_similarity(hit, slap, simulate_root=False))
    None
    wordnet_ic Information Content: Load an information content file from the wordnet_ic corpus.

    from nltk.corpus import wordnet_ic
    brown_ic = wordnet_ic.ic(‘ic-brown.dat’)
    ` semcor_ic = wordnet_ic.ic(‘ic-semcor.dat’)
    Or you can create an information content dictionary from a corpus (or anything that has a words() method).

    from nltk.corpus import genesis
    genesis_ic = wn.ic(genesis, False, 0.0)
    synset1.res_similarity(synset2, ic): Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node). Note that for any similarity measure that uses information content, the result is dependent on the corpus used to generate the information content and the specifics of how the information content was created.

    dog.res_similarity(cat, brown_ic) # doctest: +ELLIPSIS
    7.911...
    dog.res_similarity(cat, genesis_ic) # doctest: +ELLIPSIS
    7.204…
    synset1.jcn_similarity(synset2, ic): Jiang-Conrath Similarity Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).

    dog.jcn_similarity(cat, brown_ic) # doctest: +ELLIPSIS
    0.449...
    dog.jcn_similarity(cat, genesis_ic) # doctest: +ELLIPSIS
    0.285…
    synset1.lin_similarity(synset2, ic): Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).

    ` dog.lin_similarity(cat, semcor_ic) # doctest: +ELLIPSIS
    0.886…
    Access to all Synsets

    Iterate over all the noun synsets:

    ` for synset in list(wn.all_synsets(‘n’))[:10]:
    … print(synset)

    Synset(‘entity.n.01’)
    Synset(‘physical_entity.n.01’)
    Synset(‘abstraction.n.06’)
    Synset(‘thing.n.12’)
    Synset(‘object.n.01’)
    Synset(‘whole.n.02’)
    Synset(‘congener.n.03’)
    Synset(‘living_thing.n.01’)
    Synset(‘organism.n.01’)
    Synset(‘benthos.n.02’)
    Get all synsets for this word, possibly restricted by POS:

    wn.synsets('dog') # doctest: +ELLIPSIS
    [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), ...]
    wn.synsets(‘dog’, pos=’v’)
    [Synset(‘chase.v.01’)]
    Walk through the noun synsets looking at their hypernyms:

    from itertools import islice
    for synset in islice(wn.all_synsets(‘n’), 5):
    … print(synset, synset.hypernyms())

    Synset(‘entity.n.01’) []
    Synset(‘physical_entity.n.01’) [Synset(‘entity.n.01’)]
    Synset(‘abstraction.n.06’) [Synset(‘entity.n.01’)]
    Synset(‘thing.n.12’) [Synset(‘physical_entity.n.01’)]
    Synset(‘object.n.01’) [Synset(‘physical_entity.n.01’)]
    Morphy

    Look up forms not in WordNet, with the help of Morphy:

    wn.morphy('denied', wn.NOUN)
    print(wn.morphy(‘denied’, wn.VERB))
    deny
    wn.synsets('denied', wn.NOUN)
    []
    wn.synsets(‘denied’, wn.VERB) # doctest: +NORMALIZE_WHITESPACE
    [Synset(‘deny.v.01’), Synset(‘deny.v.02’), Synset(‘deny.v.03’), Synset(‘deny.v.04’),
    Synset(‘deny.v.05’), Synset(‘traverse.v.03’), Synset(‘deny.v.07’)]
    Morphy uses a combination of inflectional ending rules and exception lists to handle a variety of different possibilities:

    print(wn.morphy('dogs'))
    dog
    print(wn.morphy(‘churches’))
    church
    print(wn.morphy('aardwolves'))
    aardwolf
    print(wn.morphy(‘abaci’))
    abacus
    print(wn.morphy('book', wn.NOUN))
    book
    wn.morphy(‘hardrock’, wn.ADV)
    wn.morphy('book', wn.ADJ)
    wn.morphy(‘his’, wn.NOUN)
    `
    Synset Closures

    Compute transitive closures of synsets

    dog = wn.synset('dog.n.01')
    hypo = lambda s: s.hyponyms()
    hyper = lambda s: s.hypernyms()
    list(dog.closure(hypo, depth=1)) == dog.hyponyms()
    True
    list(dog.closure(hyper, depth=1)) == dog.hypernyms()
    True
    list(dog.closure(hypo))
    [Synset(‘basenji.n.01’), Synset(‘corgi.n.01’), Synset(‘cur.n.01’),
    Synset(‘dalmatian.n.02’), Synset(‘great_pyrenees.n.01’),
    Synset(‘griffon.n.02’), Synset(‘hunting_dog.n.01’), Synset(‘lapdog.n.01’),
    Synset(‘leonberg.n.01’), Synset(‘mexican_hairless.n.01’),
    Synset(‘newfoundland.n.01’), Synset(‘pooch.n.01’), Synset(‘poodle.n.01’), …]
    ` list(dog.closure(hyper))
    [Synset(‘canine.n.02’), Synset(‘domestic_animal.n.01’), Synset(‘carnivore.n.01’),
    Synset(‘animal.n.01’), Synset(‘placental.n.01’), Synset(‘organism.n.01’),
    Synset(‘mammal.n.01’), Synset(‘living_thing.n.01’), Synset(‘vertebrate.n.01’),
    Synset(‘whole.n.02’), Synset(‘chordate.n.01’), Synset(‘object.n.01’),
    Synset(‘physical_entity.n.01’), Synset(‘entity.n.01’)]
    Regression Tests

    Bug 85: morphy returns the base form of a word, if it’s input is given as a base form for a POS for which that word is not defined:

    wn.synsets('book', wn.NOUN)
    [Synset('book.n.01'), Synset('book.n.02'), Synset('record.n.05'), Synset('script.n.01'), Synset('ledger.n.01'), Synset('book.n.06'), Synset('book.n.07'), Synset('koran.n.01'), Synset('bible.n.01'), Synset('book.n.10'), Synset('book.n.11')]
    wn.synsets(‘book’, wn.ADJ)
    []
    wn.morphy('book', wn.NOUN)
    'book'
    wn.morphy(‘book’, wn.ADJ)
    Bug 160: wup_similarity breaks when the two synsets have no common hypernym

    t = wn.synsets('picasso')[0]
    m = wn.synsets(‘male’)[1]
    t.wup_similarity(m) # doctest: +ELLIPSIS
    0.631...
    t = wn.synsets(‘titan’)[1]
    s = wn.synsets('say', wn.VERB)[0]
    print(t.wup_similarity(s))
    None
    Bug 21: “instance of” not included in LCS (very similar to bug 160)

    a = wn.synsets("writings")[0]
    b = wn.synsets(“scripture”)[0]
    brown_ic = wordnet_ic.ic('ic-brown.dat')
    a.jcn_similarity(b, brown_ic) # doctest: +ELLIPSIS
    0.175…
    Bug 221: Verb root IC is zero

    from nltk.corpus.reader.wordnet import information_content
    s = wn.synsets(‘say’, wn.VERB)[0]
    ` information_content(s, brown_ic) # doctest: +ELLIPSIS
    4.623…
    Bug 161: Comparison between WN keys/lemmas should not be case sensitive

    k = wn.synsets("jefferson")[0].lemmas()[0].key()
    wn.lemma_from_key(k)
    Lemma(‘jefferson.n.01.Jefferson’)
    ` wn.lemma_from_key(k.upper())
    Lemma(‘jefferson.n.01.Jefferson’)
    Bug 99: WordNet root_hypernyms gives incorrect results

    from nltk.corpus import wordnet as wn
    for s in wn.all_synsets(wn.NOUN):
    … if s.root_hypernyms()[0] != wn.synset(‘entity.n.01’):
    … print(s, s.root_hypernyms())

    `
    Bug 382: JCN Division by zero error

    tow = wn.synset('tow.v.01')
    shlep = wn.synset(‘shlep.v.02’)
    from nltk.corpus import wordnet_ic
    brown_ic = wordnet_ic.ic(‘ic-brown.dat’)
    ` tow.jcn_similarity(shlep, brown_ic) # doctest: +ELLIPSIS
    1…e+300
    Bug 428: Depth is zero for instance nouns

    s = wn.synset("lincoln.n.01")
    s.max_depth() > 0
    True
    Bug 429: Information content smoothing used old reference to all_synsets

    ` genesis_ic = wn.ic(genesis, True, 1.0)
    Bug 430: all_synsets used wrong pos lookup when synsets were cached

    for ii in wn.all_synsets(): pass
    for ii in wn.all_synsets(): pass
    Bug 470: shortest_path_distance ignored instance hypernyms

    google = wordnet.synsets("google")[0]
    earth = wordnet.synsets(“earth”)[0]
    ` google.wup_similarity(earth) # doctest: +ELLIPSIS
    0.1…
    Bug 484: similarity metrics returned -1 instead of None for no LCS

    t = wn.synsets('fly', wn.VERB)[0]
    s = wn.synsets(‘say’, wn.VERB)[0]
    print(s.shortest_path_distance(t))
    None
    print(s.path_similarity(t, simulate_root=False))
    None
    print(s.lch_similarity(t, simulate_root=False))
    None
    print(s.wup_similarity(t, simulate_root=False))
    None
    Bug 427: “pants” does not return all the senses it should

    from nltk.corpus import wordnet
    wordnet.synsets(“pants”,’n’)
    [Synset(‘bloomers.n.01’), Synset(‘pant.n.01’), Synset(‘trouser.n.01’), Synset(‘gasp.n.01’)]
    Bug 482: Some nouns not being lemmatised by WordNetLemmatizer().lemmatize

    from nltk.stem.wordnet import WordNetLemmatizer
    WordNetLemmatizer().lemmatize(“eggs”, pos=”n”)
    ‘egg’
    ` WordNetLemmatizer().lemmatize(“legs”, pos=”n”)
    ‘leg’
    Bug 284: instance hypernyms not used in similarity calculations

    wn.synset('john.n.02').lch_similarity(wn.synset('dog.n.01')) # doctest: +ELLIPSIS
    1.335...
    wn.synset(‘john.n.02’).wup_similarity(wn.synset(‘dog.n.01’)) # doctest: +ELLIPSIS
    0.571…
    wn.synset('john.n.02').res_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
    2.224...
    wn.synset(‘john.n.02’).jcn_similarity(wn.synset(‘dog.n.01’), brown_ic) # doctest: +ELLIPSIS
    0.075…
    wn.synset('john.n.02').lin_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
    0.252...
    wn.synset(‘john.n.02’).hypernym_paths() # doctest: +ELLIPSIS
    [[Synset(‘entity.n.01’), …, Synset(‘john.n.02’)]]
    Issue 541: add domains to wordnet

    wn.synset('code.n.03').topic_domains()
    [Synset('computer_science.n.01')]
    wn.synset(‘pukka.a.01’).region_domains()
    [Synset(‘india.n.01’)]
    ` wn.synset(‘freaky.a.01’).usage_domains()
    [Synset(‘slang.n.02’)]
    Issue 629: wordnet failures when python run with -O optimizations

    # Run the test suite with python -O to check this
    wn.synsets(“brunch”)
    [Synset(‘brunch.n.01’), Synset(‘brunch.v.01’)]
    Issue 395: wordnet returns incorrect result for lowest_common_hypernyms of chef and policeman

    ` wn.synset(‘policeman.n.01’).lowest_common_hypernyms(wn.synset(‘chef.n.01’))
    [Synset(‘person.n.01’)]

    展开全文
  • Wordnet

    2008-12-03 21:50:00
    WordNet是由Princeton 大学的心理学家,语言学家和计算机工程师联合设计的一种基于认知语言学的英语词典。它不是光把单词以字母顺序排列,而且按照单词的意义组成一个“单词的网络”。 · WordNet的诸多应用中,最...
      WordNet是由Princeton 大学的
    心理学家语言学家和计算机工程师联合设计的一种基于认知语言学的英语词典。它不是光把单词以字母顺序排列,而且按照单词的意义组成一个“单词的网络”。
    
      · WordNet的诸多应用中,最具雄心壮志的也许是知识工程(见《WordNet》一书第15,16章)。
      · Harabagiu和Moldovan(见《WordNet》一书第16章)指出,为常识推理建模需要一个扩展的知识库,其中包括数量巨大的概念和关系。WordNet提供了前者,但在关系方面不足以支持推理。他们的解决方案是对WordNet中的注释进行排歧,得到词语之间的更多关系,从而将WordNet中的注释转变为语义网络,其中包含不同词类之间的关系。他们举了一个例子:在hungry(饿)和refrigerator(冰箱)之间存在一个路径,因为这两个标记词在food(食物)这个节点上相撞,即通过food,可以把hungry和refrigerator联系到一起,从而用于常识推理。
    展开全文
  • 词网 注意:此存储库不再维护。 有关具有类似API的用于词网的独立Python模块,请参见 。 安装 虽然不再维护该项目,但是您可以...>> > wordnet = WordNet ( wordnet_30_dir ) # Uses WordNet v3.0 to be comparable t
  • spaCy Wordnet是一个简单的自定义组件,用于将 , 和与一起。 该组件将 WordnetWordNet域结合在一起,以允许用户执行以下操作: 获取已处理令牌的所有同义词集。 例如,获取单词bank所有同义词集(单词义)。 ...
  • 字网 WordNet 可视化
  • 词网 WordNet词典库
  • wordnet:Stanford NLP WordNet的纯水晶实现
  • GF中的WordNet 这是尝试将移植到GF。 除了纯英文WordNet之外,我们还为瑞典语和保加利亚语构建了一个版本。 所有其他语言的WordNets从现有资源中引导,并使用统计方法进行调整。 有关详细信息,请检查: 与...
  • WordNet数据库 该项目将WordNet数据库文件打包到一个jar ,可以使用maven依赖项来依赖它。
  • WordNet介绍

    2020-09-25 09:37:14
  • WordNet简介

    2014-10-23 10:44:37
    WordNet的词汇组织结构;WorNet 中词语间的关系
  • WordNet-LMF-EN WordNet 词法标记框架 (LMF):英语 (EN) 关于 这是一个模块,提供许可,包含 163K 字,包含 [English WordNet 2020 ( ) 的动态下载的数据文件],它基于来自 ,以及相应的 SQLite 数据库文件,使用...
  • WordNet 精简信息集 (RIS):英语 (EN) 关于 这是一个模块,包含许可、156K 字,包含从即时下载的数据文件,位于精简信息集 (RIS 的 ) 格式,使用模块即时生成。 安装 $ npm install wordnet-ris wordnet-ris-en ...
  • wordnet doc

    2012-10-10 23:27:54
    brief intro to wordnet structure
  • WordNet综述

    2013-08-03 11:08:32
    介绍了wordnet的基本结构以及运行原理
  • WordNet 可视化 赫鲁库
  • WordNet3.0

    2013-08-24 10:15:36
    WordNet3.0是目前最新的版本。WordNet是由Princeton 大学的心理学家,语言学家和计算机工程师联合设计的一种基于认知语言学的英语词典。它不是光把单词以字母顺序排列,而且按照单词的意义组成一个“单词的网络”。
  • WordNet-LMF WordNet 词法标记框架 (LMF) 关于 这是一个命令行接口 (CLI) 和底层应用程序编程接口 (API),用于解析格式文件并将数据导入紧凑的 SQLite 数据库文件。 这种方法的动机是:一个 100MB 的 LMF XML 消耗...
  • wordnet20词库

    2018-10-15 19:24:15
    wordnet20词库

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 6,700
精华内容 2,680
关键字:

WordNet