精华内容
下载资源
问答
  • 信息检索领域的会议

    千次阅读 2011-02-22 15:00:00
    ACM信息检索特殊兴趣组(SIGIR)年会信息检索欧洲会议(ECIR)信息与知识管理(CIKM)网络搜索与数据挖掘会议(WSDM)TREC会议信息检索领域最经典的参考文献:Salton(1968;1983)和van Rijsbergen(1979)的三本书...

    ACM信息检索特殊兴趣组(SIGIR)年会

     

    信息检索欧洲会议(ECIR)

     

    信息与知识管理(CIKM)

     

    网络搜索与数据挖掘会议(WSDM)

     

    TREC会议

     

    信息检索领域最经典的参考文献:Salton(1968;1983)和van Rijsbergen(1979)的三本书

     

    van Rijsbergen--------http://www.dcs.gla.ac.uk/Keith/Preface.html#PREFACE

     

    最近的书包括Baeza-Yates和Ribeiro-Neto(1999)和Manning等(2008)。

     

    数据库相关会议:VLDB和SIGMOD

     

    ACL和HLT(计算语言学和人类语言技术协会)

     

    <搜索引擎:信息检索实践>,有时间读一下,不知道哪儿有电子版的.

     

    Common IR Test Collections

    http://web.eecs.utk.edu/research/lsi/corpa.html

    展开全文
  • 信息检索领域相关资料 (A Guide to Information Retrieval) Organized by Hongfei Yan Last updated on Sept. 16, 2009 --------------------- Contents Books + Finding Out About: Search Engine ...
    信息检索领域相关资料 (A Guide to Information Retrieval)
    Organized by Hongfei Yan
    Last updated on Sept. 16, 2009
    
    ---------------------
    Contents
    	Books
    		+ Finding Out About: Search Engine Technology from a cognitive 
    			Perspective (Belew, R.K., 2000)
    			http://www-cse.ucsd.edu/~rik/foa/
    		+ Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)
    		+ Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)
    			(full text)
    			http://www.dcs.gla.ac.uk/Keith/Preface.html
    		+ Information Retrieval: A Survey (Ed Greengrass, 2000)
    			http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
    		+ Information Retrieval: Data Structures & Algorithms
    			(Frakes, W. and Baeza-Yates, R., 1992)
    			http://www.dcc.uchile.cl/~rbaeza/iradsbook/irbook.html
    		+ Information Retrieval Interaction (Ingwersen, P., Taylor Graham, 1992)
    			http://www.db.dk/pi/iri/
    		+ Introduction to Information Retrieval
    			(Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze, 2008)
    			http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html
    		+ Managing Gigabytes:compressing and indexing documents and images,
    			2nd edition, (Ian H. Witten, Alistair Moffat,and Timothy Bell,1999)
    		+ Mining the Web: Discovering Knowledge from Hypertext Data 
    			(Soumen Chakrabarti, 2003)
    		+ Modeling the Internet and the Web: 
    			probabilistic Methods and Algorithms 
    			(Pierre Baldi, Paolo Frasconi and Padhraic Smyth, 2003)
    		+ Modern Information Retrieval 
    			(Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 2000)
    		+ Readings in Information Retrieval. 
    			(Sparck-Jones, K. and Willett, P., 1997)
    		+ Search Engines: Information Retrieval in Practice
    			(B. Croft, D. Metzler, T. Strohman, 2009)
    			http://www.pearsonhighered.com/croft1epreview/samples.html
    		+ Search Engine: Principle,Technology and Systems 
    			搜索引擎-原理、技术与系统
    			(Xiaoming Li,et al., 2005 ), (full text)
    			http://sewm.pku.edu.cn/book/dlbook.html
    		+ The Geometry of Information Retrieval 
    			(C.J. van Rijsbergen, 2004)
    			http://ir.dcs.gla.ac.uk/GeometryOfIR/
    		+ The Turn: Integration of Information Seeking and Retrieval in Context
    			(Ingwersen, P., and Jarvelin, K., 2005)
    		+ TREC: Experiment and Evaluation in Information Retrieval 
    			(Voorhees, E.M., and Harman, D.K., 2005)
    			http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10667
    
    	Conferences and Workshops
    		+ CIKM: Conference on Information and Knowledge Management
    			http://www.csee.umbc.edu/cikm/
    		+ SIGIR: Special Interest Group on Information Retrieval
    			http://www.sigir.org/
    		+ SIGKDD: Knowledge Discovery and Data Mining
    			http://www.kdd.org/
    		+ World Wide Web
    			http://www.iw3c2.org/
    		+ SEWM: Symposium of Search Engine and WebMining
    			全国搜索引擎和网上信息挖掘学术研讨会
    			http://net.pku.edu.cn/~sewm/
    
    	Courses
    		+ CMU Information Retrieval
    			http://nyc.lti.cs.cmu.edu/classes/11-741/ (Spring 2006)
    			Instructors: Jamie Callan and Yiming Yang 
    		+ Cornell University The Structure of Information Networks (Spring 2006)
    			http://www.cs.cornell.edu/courses/cs685/2006sp/
    			Instructor: Jon Kleinberg
    		+ Peking University Web Based Information Architectures (Fall 2006)
    			http://net.pku.edu.cn/~wbia/
    			Instructor: Xiaoming Li, Jimin Wang and Bo Peng
    		+ Stanford Univ. Text Information Retrieval and Web Mining (Autumn 2005)
    			http://www.stanford.edu/class/cs276/
    			Instructor: Christopher Manning and Prabhakar Raghavan
    		+ UIUC Introduction to Text Information Systems (Spring 2007)
    			http://sifaka.cs.uiuc.edu/course/410s07/
    			Instructor: ChengXiang Zhai
    		+ UMass Univ. Information retrieval course (Spring 2005)
    			http://ciir.cs.umass.edu/cmpsci646/
    			Instructors: James Allan
    		+ Washington Univ. Search Engines course
    			http://courses.washington.edu/lis544/
    
    	Evaluation Resources
    		+ CLEF: Cross-Language Evaluation Forum
    			http://clef.iei.pi.cnr.it/
    		+ CWIRF: Chinese Web Information Retrieval Forum
    			http://www.cwirf.org/
    		+ DUC: Document Understanding Conferences
    			http://duc.nist.gov/
    		+ INEX: INitiative for the Evaluation of XML Retrieval
    			http://inex.is.informatik.uni-duisburg.de/
    		+ NTCIR: NII-NACSIS Test Collection for IR Systems
    			http://research.nii.ac.jp/ntcir/
    		+ TREC: Text REtrieval Conference 
    			http://trec.nist.gov/
    
    	Journals
    		+ Briefings in Bioinformatics (full text)
    			http://bib.oxfordjournals.org/archive/
    		+ Computational Linguistics, The MIT Press
    			http://mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=10
    		+ Data & Knowledge Engineering (DKE), Elsevier
    			http://www.elsevier.com/wps/find/journaldescription.cws_home/505608/description?navopenmenu=-2
    		+ D-Lib Magazine
    			http://www.dlib.org/
    		+ Information Processing Letters, Elsevier
    			http://www.elsevier.com/locate/issn/00200190
    		+ Information Processing and Management (IP&M), Elsevier
    			http://www.elsevier.com/locate/infoproman
    		+ Information Retrieval, Springer
    			http://www.springer.com/sgw/cda/frontpage/0,11855,3-0-70-35744790-detailsPage%253Djournal%257Cdescription%257Cdescription,00.html
    		+ Information Research
    			http://informationr.net/ir
    		+ International Journal on Digital Libraries, Springer
    			http://link.springer.de/link/service/journals/00799/index.htm
    		+ International Journal of Cooperative Information Systems (IJCIS), 
    			World Scientific
    			http://ejournals.wspc.com.sg/ijcis/ijcis.shtml
    		+ International Journal on Document Analysis and Recognition, Springer
    			http://link.springer.de/link/service/journals/10032/index.htm
    		+ International Journal of Intelligent Systems, Wiley
    			http://www3.interscience.wiley.com/cgi-bin/jhome/36062
    		+ International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientific
    			http://ejournals.wspc.com.sg/ijufks/ijufks.shtml
    		+ Journal of the American Society for Information Science and Technology (JASIST), Wiley
    			http://www3.interscience.wiley.com/cgi-bin/jhome/76501873
    		+ Journal of Documentation (JDoc). Emerald
    			http://www.emeraldinsight.com/0022-0418.htm
    		+ Journal of Intelligent Information Systems (JIIS), Springer
    			http://www.wkap.nl/journalhome.htm/0925-9902
    		+ Knowledge and Information Systems (KAIS), Springer
    			http://link.springer.de/link/service/journals/10115/index.htm
    		+ Natural Language Engineering, Cambridge University Press
    			http://www.cambridge.org/journals/journal_catalogue.asp?mnemonic=NLE
    		+ Transactions On Information Systems (TOIS), ACM
    			http://www.acm.org/tois/
    		+ Transactions on Knowledge and Data Engineering (TKDE), IEEE 
    			http://www.computer.org/tkde/
    
    	List Archives
    		+ SIG-IRList, http://www.sigir.org/sigirlist/index.html
    
    	Organizations and Special Interest Groups
    		+ Cambridge NLIP, http://www.cl.cam.ac.uk/Research/NL/
    		+ CMU LTI, http://www.lti.cs.cmu.edu/
    		+ DEC laboratories in Palo Alto, Calif.
    		+ Glasgow Information Retrieval Group, http://www.dcs.gla.ac.uk/ir/
    		+ Google Labs, http://labs.google.com/
    		+ LTI, http://www.lti.cs.cmu.edu/
    		+ Massachusetts CIIR, http://ciir.cs.umass.edu/
    		+ MSR Asia, Web Search & Data Mining Group
    			http://research.microsoft.com/wsm/
    		+ Standford InfoLab, http://infolab.stanford.edu/
    		+ UIUC Information Retrieval Group, http://sifaka.cs.uiuc.edu/ir/
    		+ 北大天网组, http://sewm.pku.edu.cn/
    		+ 北京大学计算语言学研究所, http://icl.pku.edu.cn/
    		+ 复旦大学信息检索和自然语言处理组, 
    			http://www.cs.fudan.edu.cn/mcwil/irnlp/
    		+ 哈工大信息检索组, http://ir.hit.edu.cn/
    		+ 清华大学智能技术与系统国家重点实验室
    			http://www.csai.tsinghua.edu.cn/ 
    		#+ 中科院大规模内容计算组, http://159.226.40.18/ (fail to visit)
    
    	Researchers
    		+ Andrew McCallum,
    			http://www.cs.umass.edu/~mccallum/
    		+ ChengXiang Zhai, developing Lemur
    			http://www-faculty.cs.uiuc.edu/~czhai/
    		+ Gerard Salton
    			http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Salton.html
    		+ Karen Sparck, developing IDF
    			http://www.cl.cam.ac.uk/users/ksj/
    		+ Keith van Rijsbergen
    			http://www.dcs.gla.ac.uk/~keith/
    		+ Jamie Callan, 
    			http://www.cs.cmu.edu/~callan/
    		+ Jon Kleinberg, developing HIT
    			http://www.cs.cornell.edu/home/kleinber/
    		+ Li Xiaoming, developing Tianwang & Infomall
    		+ Nick Craswell, developing Terabyte Track
    			http://research.microsoft.com/~nickcr
    		+ Susan Dumais, developing LSI
    			http://research.microsoft.com/~sdumais/
    		+ Yiming Yang, developing text categorization
    			http://www.cs.cmu.edu/~yiming/
    		+ Stephen Robertson, 
    			http://research.microsoft.com/users/robertson/
    		+ Tefko Saracevic
    			http://www.scils.rutgers.edu/~tefko/
    		+ W. Bruce Croft
    			http://ciir.cs.umass.edu/personnel/croft.html
    
    	Research-related Resources
    		+ http://www-faculty.cs.uiuc.edu/~czhai/research.html
    
    	Software
    		+ Apache Lucene: a full-featured text search engine library
    			http://lucene.apache.org/java/docs/index.html
    		+ Gate: a general architecture for text engineering
    			http://gate.ac.uk/
    		+ Lemur: A full-text search engine
    			http://www.lemurproject.org/
    		+ MG: A full-text search engine
    			http://www.math.utah.edu/pub/mg/
    		+ Porter Stemmer: English stemming algorithm
    			http://www.tartarus.org/martin/PorterStemmer/
    		+ Nutch: an open source web search engine
    			http://sourceforge.net/projects/nutch/
    		+ TSE: A Tiny Search Engine
    			http://sewm.pku.edu.cn/src/TSE/
    
    ---------------------
    References: 
    [1] Information Retrieval Resources, http://www.sigir.org/resources.html
    [2] http://ir.dcs.gla.ac.uk/resources.html
    [3] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
    [4] Diekemar, Information Retrieval Links, Jan. 28, 1999. 
    	http://web.syr.edu/~diekemar/ir.html
    [5] 陈鸿标,网上研习信息检索,1999年11月. 
    	http://159.226.40.18/freshman/resources/网上研习信息检索.doc
    [6] 数据挖掘研究院, http://www.dmresearch.net/
    [7] 语音自然语言在线, http://www.snlpinfo.com/index.php
    [8] PKU SEWM Group, http://sewm.pku.edu.cn/
    [9] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
    [10] http://icl.pku.edu.cn/member/lisujian/maincontent.htm
    [11] http://www.cs.fudan.edu.cn/mcwil/irnlp/link.htm
    [12] Robert Krovetz, A Guide to the Literature of Information Retrieval,
    	http://159.226.40.18/freshman/resources/guide-to-ir-lit.ps
    [13] ACM Digital Library, 
    	http://portal.acm.org/portal.cfm
    	http://acm.lib.tsinghua.edu.cn/acm/
    [14] http://www.sigir.org/proceedings/Proc-Browse.html
    [15] SIGIR,
    	http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES278&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
    [16] WWW, International World Wide Web Conference
    	http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES968&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
    [17] China Digital Journal Community, http://wanfang.calis.edu.cn/wf/szhqk/index.html
    
    
    
    ---------------------
    
    More details are listed as follows
    ====================
    CIIR 
    (The Center for Intelligent Information Retrieval, 
    美国Massachusetts大学的智能信息检索中心)
    http://ciir.cs.umass.edu/
    
    The Center for Intelligent Information Retrieval, a National Science 
    Foundation-created S/IUCRC Center, is one of the leading information retrieval 
    research labs in the world. The CIIR develops tools that provide effective 
    and efficient access to large, heterogeneous, distributed, text and 
    multimedia databases.
    
    CIIR accomplishments include significant research advances in the areas of 
    distributed information retrieval, information filtering, topic detection, 
    multimedia indexing and retrieval, document image processing, terabyte 
    collections, data mining, summarization, resource discovery, interfaces 
    and visualization, and cross-lingual information retrieval.
    
    The Center for Intelligent Information Retrieval continues to support the 
    emerging information infrastructure, both through research and technology 
    transfer. The goal of the CIIR is to develop tools that provide effective 
    and efficient access to large, heterogeneous, distributed, text and 
    multimedia databases. 
    
    ====================
    Glasgow Information Retrieval Group
    http://www.dcs.gla.ac.uk/ir/
    由Keith van Rijsbergen率领的英国Glasgow大学信息检索研究小组。
    这个小组理论和实践并重,旨在建造一个高效、新颖、成功的多媒体信息检索系统,
    为终极用户服务。
    
    The Information Retrieval Group led by Professor Keith van Rijsbergen has a 
    vigorous programme of research, based on both theory and experiment, aimed at 
    giving end-users novel, effective, and efficient access to the world of 
    multi-media information. The group, part of the Department of Computing Science, 
    University of Glasgow, has a strong research history in a wide area of 
    information retrieval research from theoretical modelling of the retrieval 
    process to advanced system building and to the user-oriented evaluation of 
    information retrieval systems. The group's interests also include many areas 
    of Web information retrieval such as link analysis, summarisation and the 
    development of novel interaction techniques (e.g., ostension, implicit feedback 
    and graphical visualisation). Our research preserves a strong emphasis on 
    the evaluation of interactive IR systems, and the group maintains strong links 
    with researchers in Human-Computer Interaction and Psychology.
    
    ------
    Keith van Rijsbergen, http://www.dcs.gla.ac.uk/~keith/
    英国格拉斯哥大学。概率IR的逻辑推理学派代表人,出版了著名的IR经典教材 
    INFORMATION RETRIEVAL, 重点介绍用概率研究信息检的方法。
    
    =====================
    Cambridge NLIP Group 
    (Natural Language and Information Processing Group)
    http://www.cl.cam.ac.uk/Research/NL/
    
    Research in NLIP has been done in the Computer Laboratory for nearly fifty years. 
    The earliest work, by Roger Needham and Karen Sparck Jones, was on automatic 
    thesaurus construction, in the context of document retrieval and machine translation. 
    Subsequent research by Karen Sparck Jones during the 1960s and 70s focused on 
    statistical approaches to retrieval and included innovative work on term 
    weighting.  From the later 1970s research in language processing developed, 
    with work on syntax, semantics and discourse processing,
    
    ------
    Karen Sparck Jones, http://www.cl.cam.ac.uk/users/ksj/
    Karen Sparck Jones has been one of the most influential figures in Computing 
    since the 1950’s. Her work on Information Retrieval and Natural Language Processing 
    has never been so central as it is are today, with its implications for 
    search engine technology, the semantic web and even bioinformatics.
    
    In 1972, Karen Sparck Jones published in the Journal of Documentation the paper 
    which defined the term weighting scheme now known as inverse document frequency (IDF).
    
    Karen Sparck Jones is emeritus Professor of Computers and Information at the 
    Computer Laboratory, University of Cambridge. She has worked in automatic 
    language and information processing research since the late fifties, 
    and has many publications including several books, most recently `Evaluating 
    Natural Language Processing Systems' with Julia Galliers, and `Readings in 
    Information Retrieval', edited with Peter Willett. 
    
    1988年度Salton奖得主。现代概率IR模型的另一创始人。在NLP、IR等领域都颇有建树,
    而且做了大量的组织性工作。现在供职于英国剑桥大学计算机学院。
    
    ====================
    LTI
    CMU (Carnegie Mellon Universit) Language Technologies Institute,
    http://www.lti.cs.cmu.edu/
    
    The Language Technologies Institute (LTI) of the School of Computer Science at
    Carnegie Mellon University conducts research and provides graduate education
    in all aspects of language technology and information management. The LTI was
    established in 1996, as an expansion of the Center for Machine Translation
    (CMT).
    
    The Center for Machine Translation (CMT) was a research branch of the School
    of Computer Science devoted to basic and applied research in all aspects of
    natural language processing, with a primary focus on machine translation,
    speech processing, and information retrieval. Containing a unique mix of
    academic and industrial researchers specializing in various aspects of
    computer science, artificial intelligence, computational linguistics and
    theoretical linguistics, the CMT provided a rich and diverse environment for
    collaboration among faculty, staff, visiting scholars, and qualified students.
    
    ------
    Lemur Toolkit
    Lemur is a collection of search engine algorithms and information retrieval
    applications used for IR research, development and education. Lemur provides a
    rich query language that supports search against simple texts, structured
    (XML) texts, and texts annotated with part-of-speech, named-entity, and other
    annotations used in NLP and text-mining applications. Lemur's search engines
    comfortably support collections ranging from a few gigabytes to a few
    terabytes of text. The software is distributed under open-source license, and
    is used widely in the IR research community.
    
    ====================
    Standford InfoLab
    http://infolab.stanford.edu/
    
    The Stanford WebBase Project
    http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/
    
    The Stanford WebBase project is investigating various issues in crawling,
    storage, indexing, and querying of large collections of Web pages. The project
    builds on the previous Google activity that was part of the DLI1 initiative.
    The DLI2 WebBase project aims to build the necessary infrastructure to
    facilitate the development and testing of new algorithms for clustering,
    searching, mining, and classification of Web content.
    ====================
    北大天网组, http://sewm.pku.edu.cn/
    
        北京大学网络实验室自1997年开始从事搜索引擎方面的研究与系统开发,
    技术积累深厚,综合实力和学术影响在国内一直处于领先地位。我们研发的
    “天网”搜索引擎系统是全国最有影响的出自校园的搜索引擎,从1997年10月
    开始一直运行至今。“天网”在增量搜索技术、快速检索技术,海量信息存储
    技术等方面都具有较强的优势,她的不断发展培育了一批批在海量网络文本
    信息处理方面有实战经验的学生,受到中外IT企业的普遍欢迎。
        从2001年开始,本研究组在搜索引擎技术的基础上,展开了中国互联网
    信息历史的收集与存档工作,形成了“中国互联网信息博物馆”,至今已
    收藏20亿在不同时期出现过的中文网页,是目前全国规模最大的历史网页收藏
    与回放系统。同时,我们还尝试了在其基础上进行多学科交叉的研究。
    
    ====================
    中科院大规模内容计算组
    http://159.226.40.18/
    
        信息检索小组主要针对文本信息的检索开展研究,多次参加TREC会议,
    取得了很好的研究成果。小组开发的天罗检索系统在很多国家重要的信息部门
    得到了广泛的应用,目前主要的研究方向包括WEB信息的获取,WEB信息检索等。
        信息分析小组的研究主要集中在大规模多源异构信息的分析与挖掘方面,
    主要包括文本分类与聚类、信息过滤、个性化服务、自然语言问答和浅层
    自然语言处理等。小组研制了一系列文本信息加工处理的实验平台,目前实验
    平台可以通过主页中“成果演示”进行演示。值得一提的是小组开展的公开源码
    计划,其中的高性能分词系统ICTCLAS得到了研究人员的广泛认同与使用。
    
    ====================
    复旦大学信息检索和自然语言处理组, 
    http://www.cs.fudan.edu.cn/mcwil/irnlp/ 	
    
    大规模文本处理主要研究自然语言(特别是中文信息)的处理技术和方法,
    包括二个方面内容:首先是基础性工作,主要是基础性的理论和算法, 包括
    自动分词、未登录词识别、词性和概念标注、句法分析和语义分析等,也包括
    语料库的搜集整理等;其次是中文信息处理的应用技术,包括自动索引、
    文本检索、文本摘要、文本分类和文本过滤,特别是上述技术在网络环境下
    的应用。这部分工作是文本方向的研究重点。
    
    ====================
    HIT-IRLab, http://ir.hit.edu.cn/
    
        哈工大信息检索研究室 (HIT-IRLab) 成立于 2001 年 3月。研究方向
    包括文本检索、问答系统、自动文摘、文本挖掘和语言分析等, 研究室以
    语言分析为基础研究,以文本过滤为应用研究,以信息抽取为语言分析从
    句子理解向 篇章理解的延伸,以句子检索为在语言分析和篇章理解的支持
    下的智能化精准检索技术。 
    
    ====================
    SIGIR(美国计算机学会信息检索特别兴趣小组)、
    TREC(文本检索学术年会)
    MUC(消息理解学术年会)
    TIPSTER(美国国防部高级研究计划署的IR实践基地)
    
    ====================
    北京大学计算语言学研究所
    http://icl.pku.edu.cn/
    
        北京大学计算语言学研究所成立于1986年。致力于计算语言学理论、语言
    信息处理的基础资源和应用技术三方面的研究。
        围绕计算语言学和自然语言处理,包括如下三个主要的方向:首先基础资源
    的研究与建设:计算词典学与机器词典,综合型语言知识库,语料库语言学与
    语料库加工技术,术语学、术语自动提取、术语标准化研究等。其次是基础理论、
    NLP的模型和方法:计算语言学基础,自然语言处理核心技术,现代汉语语法,
    汉语的词/句法/语义分析,NLP统计模型,语言处理的信息论方法等。另外是
    应用技术:机器翻译的方法、技术与系统实现,信息检索与提取,自然语言
    信息处理系统的评价方法和技术,受限汉语及其辅助写作系统,中国古诗词计算机
    辅助研究等。
    
    ====================
    清华大学智能技术与系统国家重点实验室
    http://www.csai.tsinghua.edu.cn/ 
    
        智能技术与系统国家重点实验室依托于清华大学。实验室于1990年2月
    对外开放运行。主要从事人工智能基本原理、基本方法的基础与应用基础研究,
    包括智能信息处理、机器学习、智能控制,以及神经网络理论等,还从事与
    人工智能有关的应用技术与系统集成技术的研究,主要有智能机器人、声音、
    图形、图像、文字及语言处理等。
    
    ================
    Susan Dumais, 
    http://research.microsoft.com/~sdumais/
    
    I am interested in algorithms and interfaces for improved information
    retrieval, as well as general issues in and human-computer interaction. I
    joined Microsoft Research in July 1997. I work on a wide variety of
    information access and management issues, including: personal information
    management, web search, question answering, information retrieval, text
    categorization, collaborative filtering, interfaces for improved search and
    navigation, and user/task modeling.
    
    Prior to coming to Microsoft, I worked on a statistical method for
    concept-based retrieval known as Latent Semantic Indexing. You can find
    pointers to this work on the Bellcore (now Telcordia) LSI page. 
    
    ===============
    UIUC Information Retrieval Group
    http://sifaka.cs.uiuc.edu/ir/
    
    The Information Retrieval (IR) group is part of the Database and Information
    Systems (DAIS) Lab  of the Computer Science Department at University of
    Illinois at Urbana-Champaign. We work on a wide spectrum of problems in the
    general area of text information management, including  retrieval,
    organization, filtering , and mining of textual information, aiming at
    developing advanced text information management techniques and systems that
    help people make better use of text information.
    
    ------
    ChengXiang Zhai, 
    http://www-faculty.cs.uiuc.edu/~czhai/
    
    Research Interests: Information Retrieval, Text Mining, Natural Language
    Processing, Bioinformatics
    
    University of Illinois at Urbana-Champaign, is recognized for
    his work on user-centered, adaptive intelligent information access. His
    techniques expect to improve search-engine performance, support better
    information organization and enable understanding of large volumes of
    information. Zhai's work in information retrieval is expected to enhance
    curricula and provide new educational tools for the growing information
    technology workforce.
    
    ===============
    Stephen Robertson, 
    http://research.microsoft.com/users/robertson/
    
    Stephen Robertson joined Microsoft Research Cambridge in April 1998.
    
    In 1998, he was awarded the Tony Kent STRIX award by the Institute of
    Information Scientists. In 2000, he was awarded the Salton Award by ACM SIGIR.
    He is a Fellow of Girton College, Cambridge.
    
    At Microsoft, he runs a group called Information Retrieval and Analysis, which
    is concerned with core search processes such as term weighting, document
    scoring and ranking algorithms, and combination of evidence from different
    sources. These are studied theoretically through the use of formal models,
    mainly statistical, and statistical methods including machine learning
    methods, and experimentally, through activities such as the Text Retrieval
    Conference (TREC) and with internally generated evaluation sets. The group
    (with its Keenbow evaluation environment) has had some excellent results at
    TREC. The group works closely with product groups to transfer ideas and
    techniques.
    
    His main research interests are in the design and evaluation of retrieval
    systems. He is the author, jointly with Karen Sparck Jones, of a probabilistic
    theory of information retrieval, which has been moderately influential. A
    further development of that model, with Stephen Walker, led to the term
    weighting and document ranking function known as Okapi BM25, which is used in
    many experimental text retrieval systems.
    
    Prior to joining Microsoft, he was at City University London, where he retains
    a part-time position as Professor of Information Systems in the Department of
    Information Science (homepage). He was Head of Department for eight years,
    during which time it achieved the highest possible rating in two successive
    research assessment exercises. He also started the Centre for Interactive
    Systems Research, the main research vehicle of which is the Okapi text
    retrieval system, which has also done well at TREC.
    
    Before joining City, he was a research fellow at University College London,
    where he took his PhD in the School of Library Archive and Information
    Studies. Before that he was in the research department at Aslib. He has an MSc
    in Information Science from City and a first degree in mathematics from
    Cambridge. 
    
    ===================
    Nick Craswell
    http://research.microsoft.com/~nickcr
    
    I am an associate researcher at Microsoft Research Cambridge, in the
    Information Retrieval and Analysis Group.
    
    Research Overview
    
    I am interested in Web search evaluation, mostly on enterprise-scale webs but
    also the World Wide Web. I built the VLC, VLC2, WT2g and .GOV test
    collections, which have been made available to research groups around the
    world. David Hawking and I coordinated the TREC Web Track experiments. I am
    currently involved in the TREC Terabyte Track and Enterprise Track. Some
    publications: Book chapter preprint (pdf), IR'01 (citeseer) and CSIRO'01
    (pdf).
    
    I also work on effective Web search, which means making use of information in
    pages, link structure and URL structure to generate more useful Web search
    results. Some papers: SIGIR'05 (pdf), SIGIR'01 (pdf), TOIS'03 (pdf) (copying
    is by permission of ACM, Inc.) and ADCS'03 (pdf).
    
    My PhD was in distributed information retrieval (thesis pdf) which means
    building a system on top of multiple engines/databases that already exist. My
    recent work in the area has considered whether (or when) DIR is really
    practical. Some papers: ADC'99 (ps), DL'00 (pdf), ADC'03 (pdf) and ADC'04
    (pdf). 
    
    ===============
    Web Search & Data Mining Group of MSR Asia
    http://research.microsoft.com/wsm/
    
    The goal of the Web Search & Data Mining Group of MSR Asia is to drive the
    next generation of Web search by leveraging data mining, machine learning, and
    knowledge discovery techniques for information analysis, organization,
    retrieval, and visualization. In addition, in contrast with current Web search
    methods, which essentially do document-level ranking and retrieval, the Web
    Search & Data Mining Group has created search at the object level to bring
    increased knowledge and intelligence to users.
    
    A Glimpse at Several Core Innovations:
    
    Large-scale Experimental Web Search Platform
    
    The Web Search & Data Mining Group is creating a large scale search platform
    to efficiently store, parse, index and search billions of Web pages and other
    types of documents. The search platform is flexible enough to allow for
    testing of various state-of-the-art search techniques that have been created
    at the lab using new technologies.
    
    Structuralizing the Web
    
    The biggest challenge facing both users and search engines over the next
    several decades is the continued unstructured growth of the Internet. As such,
    search functions that can effectively and efficiently dig out
    machine-understandable information and knowledge layers from unorganized and
    unstructured Web data will be the key to supporting relevant search results.
    To meet this challenge, the group is exploring technologies, namely Web
    information extraction, deep Web mining, and Web structure mining that can
    automatically classify structures and extract objects from the Web. The
    information and knowledge gathered using these new techniques greatly improves
    the performance of current Web search and even facilitates the creation of
    more sophisticated next generation search technologies.
    
    Vertical Search
    
    Today's conventional search engines can be described as page-level search
    engines whose main function is to rank web pages according to their relevance
    to a given query. Driving the future of the search industry are functions that
    delve deeper into vertical domains to provide knowledge and intelligence to
    query results. At MSR Asia, the Web Search & Data Mining Group is addressing
    the greatest challenges faced by vertical search including large scale web
    classification, object-level information extraction, object identification and
    integration, and object relationship mining and ranking. The results of these
    efforts are leading to more advanced search engines that deliver intelligence
    and insight to search results.
    
    Mobile Search
    
    The explosive growth of new computing devices such as handheld computers,
    Windows Mobile-based PocketPCs, and SmartPhones is driving demand for greater
    and more efficient information access. These devices, which leverage the power
    of the Web and allow greater access to information than ever before, are still
    not capable of performing at the level of a desktop PC. At MSR Asia, the Web
    Search & Data Mining Group is inventing new technologies to improve the mobile
    search and browsing experience and deliver the capabilities of a PC to users
    of these new devices. Project initiatives include developing innovative
    presentation schemes and user interfaces to facilitate search and browsing
    tasks on mobile devices and developing context aware search technologies to
    address the special information needs of mobile users.
    
    Multimedia Search
    
    The Web Search & Data Mining Group is conducting research into new
    technologies that index multimedia content such as images, videos, and audio.
    Through content analysis and advanced visualization techniques, the group is
    transforming today's conventional text based search engines to include
    multimedia content thus delivering more intelligent search results to users.
    For example, the group recently developed a new multimedia news reader which
    mines large archival news databases presenting text, map information, images,
    and background music within a unique user interface providing readers with a
    more efficient news search engine and a more enjoyable reading experience.
    
    ------
    Wei-Ying Ma
    http://research.microsoft.com/users/wyma/
    
    Senior Researcher, Research Manager, Microsoft Research Asia
    
    Dr. Wei-Ying Ma received the B.S. degree in electrical engineering from the
    National Tsing Hua University in Taiwan in 1990, and the M.S. and Ph.D.
    degrees in electrical and computer engineering from the University of
    California at Santa Barbara in 1994 and 1997, respectively. From 1994 to 1997
    he was engaged in the Alexandria Digital Library (ADL) project in UCSB while
    completing his Ph.D. He developed a web-based image retrieval system called
    Netra which has been frequently cited by other researchers and is regarded as
    one of the most representative image retrieval systems. From 1997 to 2001, he
    was with HP Labs where he worked in the field of multimedia adaptation and
    distributed media services infrastructure. He joined Microsoft Research Asia
    in 2001. Since then, he has been leading a research group to conduct research
    in the areas of information retrieval, web search, data mining, mobile
    browsing, and multimedia management. He currently serves as an Editor for the
    ACM/Springer Multimedia Systems Journal and Associate Editor for ACM
    Transactions on Information System (TOIS). He has served on the organizing and
    program committees of many international conferences including ACM Multimedia,
    ACM SIGIR, ACM CIKM, WWW, ICME, CVPR, SPIE Multimedia Storage and Archiving
    Systems, SPIE Multimedia Communication and Networking, etc. He is also the
    general co-chair of International Multimedia Modeling (MMM) Conference 2005
    and International Conference on Image and Video Retrieval (CIVR) 2005. He has
    published 5 book chapters and over 100 international journal and conference
    papers.
    
    ====================
    Google Labs
    http://labs.google.com/
    
    Google Labs is a playground for Google engineers and adventurous Google users.
    Google staffers with wild and crazy ideas post their prototypes on Google Labs
    and solicit feedback on how the technology could be used or improved. None of
    these experiments are guaranteed to make it onto Google.com, as this is really
    the first phase in the development process. Google users with a desire to jump
    over the cutting edge are invited to check out any or all of the posted
    prototypes and send their comments directly to the Googlers who developed
    them. Please, remember to wear your safety goggles while using this site.
    
    Labs.google.com, Google's technology playground.
    Google labs showcases a few of our favorite ideas that aren't quite ready for
    prime time. Your feedback can help us improve them. Please play with these
    prototypes and send your comments directly to the Googlers who developed them. 
    
    Want to learn more about Google technology? Here are some papers.
    http://labs.google.com/papers/index.html
    
    Passionate about these topics? You should work at Google.
    algorithms, artificial intelligence, compiler optimization,
    computer architecture, computer graphics,
    data compression, data mining, file system design,
    genetic algorithms, information retrieval,
    machine learning, natural language processing, operating systems,
    profiling, robotics, 
    text processing, user interface design,
    web information retrieval, and more! 
    
    http://www.google.com/press/podium.html
    Google Press Center: The Google Podium
     Here you'll find a selection of public presentations made by Google
    executives. From time to time, we will continue to add transcripts, audio or
    video clips and links to presentations hosted elsewhere.
    
    ====================
    Jon Kleinberg
    http://www.cs.cornell.edu/home/kleinber/
    
    Professor of Computer Science, Cornell University
    
    My research is concerned with algorithms that exploit the combinatorial
    structure of networks and information. My recent work has included
    * link analysis and modeling of the World Wide Web and related information networks;
    * discrete optimization and network algorithms; and
    * algorithmic approaches to clustering, indexing, and data mining. 
    ====================
    
    展开全文
  • 近日,在美国休斯敦闭幕的第13届网络搜索与数据挖掘国际会议(WSDM 2020)上,...WSDM被誉为全球信息检索领域最有影响力也最权威的会议之一,会议关注社交网络上的搜索与数据挖掘,尤其关注搜索与数据挖掘模型、算...

    近日,在美国休斯敦闭幕的第13届网络搜索与数据挖掘国际会议(WSDM 2020)上,华为云语音语义创新Lab带领来自华南理工大学、华中科技大学、江南大学、武汉大学学生组成的联合团队,摘得WSDM Cup 2020大赛“论文引用意图识别任务”金牌(Gold Medal)。

    WSDM被誉为全球信息检索领域最有影响力也最权威的会议之一,会议关注社交网络上的搜索与数据挖掘,尤其关注搜索与数据挖掘模型、算法设计与分析、产业应用和提升准确性与效果的实验分析。今年已经是WSDM的第十三届会议。

    本文将详细介绍本次获奖的解决方案。

    1、背景

    几个世纪以来,社会技术进步的关键在于科学家之间坦诚的学术交流。新发现和新理论在已发表的文章中公开分发和讨论,有影响力的贡献则通常被研究界以引文的形式认可。然而,随着科研经费申请竞争日趋激烈,越来越多的人把学术研究当成一种资源争夺的手段,而不是单纯为了推动知识进步。部分期刊作者“被迫”在特定期刊中引用相关文章,以提高期刊的影响因子,而论文审稿人也只能增加期刊的引用次数或h指数。这些行为是对科学家和技术人员所要求的最高诚信的冒犯,如果放任这种情况发展,可能会破坏公众的信任并阻碍科学技术的未来发展。因此,本次WSDM Cup 2020赛题之一将重点放在识别作者的引文意图:要求参赛者开发一种系统,该系统可以识别学术文章中给定段落的引文意图并检索相关内容。  

    华为云语音语义创新Lab在自然语言处理领域有着全栈的技术积累,包括自然语言处理基础中的分词、句法解析,自然语言理解中的情感分析、文本分类、语义匹配,自然语言生成,对话机器人,知识图谱等领域。其中和本次比赛最相关的技术是语义匹配技术。Xiong团队通过对赛题任务进行分析,针对该问题制定了一种“整体召回+重排+集成”的方案,该方案以轻量化的文本相似度计算方法(如BM25等)对文章进行召回,然后基于深度学习的预训练语言模型BERT等进行重排,最后通过模型融合进行集成。

    2、赛题介绍

    本次比赛将提供一个论文库(约含80万篇论文),同时提供对论文的描述段落,来自论文中对同类研究的介绍。参赛选手需要为描述段落匹配三篇最相关的论文。

    例子:              

    描述:

    An efficient implementation based on BERT [1] and graph neural network (GNN) [2] is introduced.

    相关论文:

    [1] BERT: Pre-training of deep bidirectional transformers for language understanding.[2] Relational inductive biases, deep learning, and graph networks.

    评测方案:

    3、数据分析

    本次赛题共给出80多万条候选论文,6万多条训练样本和3万多条本测试样本,候选论文包含paper_id,title,abstract,journal,keyword,year这六个字段的信息,训练样本包含description_id,paper_id,description_text这三个字段的信息,而测试数据则给出description_id和description_text两个字段,需要匹配出相应的paper_id。

    我们对数据中候选论文的title,abstract以及描述文本的长度做了一些统计分析,如图1所示,从图中我们可以看到文本长度都比较长,并且针对我们后续的单模型,我们将模型最大长度从300增加到512后,性能提升了大约1%。

    图1 候选论文的Title(a),Abstract(b)以及描述文本(c)的长度分布

    4、整体方案    

    我们方案的整体架构如图2所示,整体方案分为四个部分:数据处理,候选论文的召回,候选论文的重排以及模型融合。

    图2 整体方案架构(部分图引自[5]

    4.1 数据处理

    通过观察数据我们发现,在标题给出的描述语句中,有许多相同的描述文本,但是参考标记的位置却不同。也就是说,在同一篇文章中,不同的句子引用了不同的论文。为此,我们抽取句子中引用标记位置处的语句作为新的描述语句生成候选集。

    如表1所示,我们选取描述中[[**##**]]之前的句子作为描述关键句。

    表1 描述关键句生成

    4.2候选论文召回

    如图3所示,我们运用BM25和TF-IDF来进行论文的召回,选取BM25召回的前80篇论文和TF-IDF召回的前20篇论文构成并集组成最终的召回论文。

    图3 召回示意图

    4.3候选论文重排

    在本方案中,我们用BERT模型作为基础模型,BERT是一种能在基于查询的文章重排任务中取得良好性能的语义表示模型。通过观察数据发现,论文主要数据生物医学领域,于是我们聚焦到采用生物医学领域数据训练预训练模型。然后将查询与描述字段以句子对的形式输入进BERT模型进行训练。我们的实验表明,在该任务上,单个的BioBERT的性能要比BERT性能高5个百分点。如图4为BioBERT的结构图。

     

    4 BioBERT结构图 (图引自[6])

    4.4 模型融合

    在模型融合的过程中,我们运用了6种共9个经过科学和生物医药语料库训练的预训练模型分别为:BioBERT_v1.1* 3, BioBERT_v1.0_PubMed_PMC * 2, BioBERT_v1.0_PubMed* 1,BioBERT_v1.0_PMC * 1, BioBERT_dish*1,SciBERT* 1。他们的单模型在该任务中的性能如表2所示。

    2 单模型性能

    然后我们对单模型输出的概率结果进行blending操作如图5所示,得到最后的模型结果,其比最好的单模型结果提升了1个百分点左右。

    5 模型融合

    5、总结与展望

    本文主要对比赛中所使用的关键技术进行了介绍,如数据处理,候选论文的召回与重排,模型融合等。在比赛中使用专有领域训练后的预训练模型较通用领域预训练模型效果有较大的提升。由于比赛时间的限制,许多方法还没来得及试验,比如在比赛中由于正负样本不平衡,导致模型训练结果不理想,可以合理的使用上采样或下采样来使样本达到相对平衡,提升模型训练效果。

    参考文献

    [1] Yang W, Zhang H, Lin J. Simple applications of BERT for ad hoc document

    retrieval[J]. arXiv preprint arXiv:1903.10972, 2019.

    [2] Gupta V, Chinnakotla M, Shrivastava M. Retrieve and re-rank: A simple and

    effective IR approach to simple question answering over knowledge

    graphs[C]//Proceedings of the First Workshop on Fact Extraction and

    VERification (FEVER). 2018: 22-27.

    [3] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word

    representations[J]. arXiv preprint arXiv:1802.05365, 2018.

    [4] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask

    learners[J]. OpenAI Blog, 2019, 1(8): 9.

    [5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. (2018)

    BERT: Pre-training of Deep Bidirectional Transformers for Language

    Understanding. arXiv preprint arXiv:1810.04805,.

    [6] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim,

    Chan Ho So, Jaewoo Kang,(2019) BioBERT: a pre-trained biomedical language

    representation model for biomedical text mining, Bioinformatics,

    [7] Iz Beltagy, Kyle Lo, Arman Cohan. (2019) SciBERT: A Pretrained Language

    Model for Scientific Text, arXiv preprint arXiv:1903.10676SciBERT: A

    Pretrained Language Model for Scientific Text, arXiv preprint arXiv:1903.10676,

    2019.

    [8] Nogueira R, Cho K.(2019) Passage Re-ranking with BERT. arXiv preprint

    arXiv:1901.04085.

    [9] Alsentzer E, Murphy J R, Boag W, et al. Publicly available clinical BERT

    embeddings[J]. arXiv preprint arXiv:1904.03323, 2019.

    来源:华为云社区

    展开全文
  • 信息检索领域相关资料 zz zzfrom http://net.pku.edu.cn/~webg/IR-Guide.txt 信息检索领域相关资料 (A Guide to Information Retrieval)Organized by Hongfei YanLast updated on April 19, 2006---------...

    信息检索领域相关资料 zz


    zz from http://net.pku.edu.cn/~webg/IR-Guide.txt

    信息检索领域相关资料 (A Guide to Information Retrieval)
    Organized by Hongfei Yan
    Last updated on April 19, 2006

    ---------------------
    Contents
    Books
    + Finding Out About: Search Engine Technology from a cognitive
    Perspective (Belew, R.K., 2000)
    http://www-cse.ucsd.edu/~rik/foa/
    + Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)
    + Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)
    (full text)
    http://www.dcs.gla.ac.uk/Keith/Preface.html
    + Information Retrieval: A Survey (Ed Greengrass, 2000)
    http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
    + Information Retrieval: Data Structures & Algorithms
    (Frakes,

    转载于:https://www.cnblogs.com/cy163/archive/2007/05/21/754869.html

    展开全文
  • 论文链接:... Information Retrieval(IR)的典型问题是给出一些查询词(query),返回一个排序的文档列表(documents),但 IR 的应用范围可以扩展到文档检索、网页搜索、推荐系统、QA 问答系统和...
  • 头一次发现线性代数在解线性方程组以外的应用,即线性代数在信息检索领域中的应用,在此记录一下。 假设数据库中有以下书籍。 B1. Applied Linear Algebra B2. Elementary Linear Algebra B3. Elementary Linear ...
  • 信息检索领域的四个方向

    千次阅读 2007-08-07 11:31:00
    这是我在中科院实习时,从带我实习的程老师那了解到的,他在信息检索领域打拼了十余年的。1. 第三,四代搜索引擎,即所谓的结构化,人性的搜索。2. 协同式搜索,类P2P搜索。(自己上网搜)3. 深度搜索和挖掘4. 信息...
  • 信息检索领域书籍推荐

    千次阅读 2013-04-23 17:23:11
    River’s comment这里的书涉及到信息检索,自然语言处理,机器学习,模式识别,数据挖掘的方方面面,每一本书都是值得读者深入的阅读,研究和讨论的。因此,我只能就我个人的理解对几本相对熟悉的或者读者的评价很高...
  • 信息检索领域相关资料 (A Guide to Information Retrieval)Organized by Hongfei YanLast updated on July 27, 2007---------------------Contents Books + Finding Out About: Search Engine Technology from
  • 信息检索领域书籍推荐【转】

    千次阅读 2014-10-08 14:24:45
    River’s comment这里的书涉及到信息检索,自然语言处理,机器学习,模式识别,数据挖掘的方方面面,每一本书都是值得读者深入的阅读,研究和讨论的。因此,我只能就我个人的理解对几本相对熟悉的或者读者的评价很高...
  • 2月15日,在澳大利亚墨尔本圆满闭幕的第12届网络搜索与数据挖掘国际会议(WSDM 2019)上,卧龙大数据AI团队陈维龙获得 WSDM Cup 挑战赛第一名的成绩,这是...
  • A Guide to Information RetrievalOrganized by Hongfei YanLast updated on July 27, 2007http://sewm.pku.edu.cn/IR-Guide.txtContentsBooks + Finding Out About: Search Engine Technology from a cognitive .....
  • 准确率(accuracy),精确率(Precision)和召回率(Recall)[2] 是信息检索,人工智能,和搜索引擎的设计中很重要的几个概念和指标。中文中这几个评价指标翻译各有不同,所以一般情况下推荐使用英文。 概念介绍 ...
  • 为了解决传统检索技术无法为用户提供个性化服务和检索效率低的问题,提出了一种基于领域本体的个性化文本信息检索模型,阐述了该模型的结构和关键算法,并验证了算法的可行性。实验结果表明:基于领域本体的个性化文本...
  • 参考:信息检索-慕课,2014,武汉大学,黄如花老师 信息素养【 information literacy;】:ability of seek,find and decipher info; 信息/情报/资讯检索: information retrieval ,从信息集合中找出所需信息 ...
  • 看懂信息检索和网络数据挖掘领域论文的必备知识总结 信息检索和网络数据领域(WWW, SIGIR, CIKM, WSDM, ACL, EMNLP等)的论文中常用的模型和技术总结 引子:对于这个领域的博士生来说,看懂论文是入行...
  • 基于领域本体的语义信息检索研究

    千次阅读 2009-01-03 16:56:00
    基于领域本体的语义信息检索研究(马文虎 南京理工大学信息管理系) 目 录引言... 11信息检索与本体概述... 11.1 信息检索... 11.1.1 信息检索的概念... 11.1.2 信息检索模型... 21.1.3 信息检索技术... 21.1.4 ...
  • 信息检索评价是对信息检索系统性能(主要满足用户信息需求的能力)进行评估的活动。通过评估可以评价不同技术的优劣,不同因素对系统的影响,从而促进本领域研究水平的不断提高。信息检索系统的目标是较少消耗情况下...
  • 领域知识库构建及信息检索系统

    千次阅读 2019-05-07 17:38:13
    通过对网络爬虫工具的设置并扩展,自动抓取行业领域资料,或通过语料管理模块上传领域相关资料,形成语料库。调用信息抽取模块和信息去噪模块,提取语料库中pdf、doc、ppt、html、excel、txt及专利等文件中的内容...
  • 对近几年在多媒体信息检索领域的研究成果进行分析,总结了多媒体信息检索的研究现状,指出了该研究领域的发展方向,最后提出了多媒体信息检索技术研究面临的挑战。
  • 化学化工领域科技信息检索 与利用实用技巧;高影响力论文 被引频次降序排列ESI高水平论文 最新发表的论文文献级别用量指标使用次数 锁定相关领域的论文 精炼检索结果Web of Science类别 综述文章 精炼检索结果文献...
  • 信息检索和网络数据领域(WWW, SIGIR, CIKM, WSDM, ACL, EMNLP等)的论文中常用的模型和技术总结 引子:对于这个领域的博士生来说,看懂论文是入行了解大家在做什么的研究基础,通常我们会去看一本书。看一本书...
  • 原文地址:信息检索和网络数据挖掘领域论文技术基础作者:北武飘风 信息检索和网络数据领域(WWW, SIGIR, CIKM, WSDM, ACL, EMNLP等)的论文中常用的模型和技术总结 引子:对于这个领域的博士生来说,看懂...
  • 数据挖掘和信息检索

    2012-09-07 06:38:53
     信息检索领域的任务是使用数据库管理系统查找个别的记录,或通过因特网的搜索引擎查找特定的web页面。  而数据挖掘则是知识发现不可缺少的一部分,是将未加工的数据转换为有用信息的过程。  信息检索主要依赖...
  • 信息检索和网络数据领域(WWW, SIGIR, CIKM, WSDM, ACL, EMNLP等)的论文中常用的模型和技术总结 引子:对于这个领域的博士生来说,看懂论文是入行了解大家在做什么的研究基础,通常我们会去看一本书。看一本书固然...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 2,375
精华内容 950
关键字:

信息检索领域