Keynotes

14 True Facts About Knowledge Graphs

Abstract

Knowledge Graphs have been sprouting in the past decade driving a wedge between the traditional relational database and IR based indices. The fundamental change is modelling data as a graph, with first class relationships between entities, where edges separate them from relational databases tables and nodes strong identities separate them from word based IR indices. We present a survey of the known knowledge graphs and the application classes that they enable, including the LinkedIn Economic Graph. We discuss how graph databases differ from relational ones and IR indices and where the graph databases would benefit from IR techniques. We conclude with a set of challenges we see in our experience in scaling Knowledge graphs and graph databases in low latency on-line applications.

Bio

Bogdan Arsintescu has worked with graphs and semantic data throughout his whole career, most recently at Google Knowledge Graph leading the graph query language team, in Google Research working on semantic trajectories using location data and at LinkedIn as a manager in the graph database team. He received his Ph.D. in CS from Technical University Delft, the Netherlands and MSc in EE from Politehnica University in Bucharest, Romania.

Deep Approaches to Semantic Text Matching

Abstract

Semantic matching is critical in many text applications, including paraphrase identification, information retrieval, and question answering. A variety of machine learning techniques have been developed for various semantic matching tasks, referred to as “learning to match”. Recently, deep learning approaches have shown their effectiveness in this area, and a number of methods have been proposed. In this talk, I will discuss the deep solutions to semantic matching from the aspects of the word and the sentence. At the word-level matching, I will discuss the distributed word representations that bridge the semantic gap between different words. At the sentence-level matching, I will discuss the matching methods that capture the proximity and text matching patterns. Potential applications and future directions of semantic text matching will also be discussed.

Bio

Dr. Jun Xu received his Ph.D. in Computer Science from Nankai University, China, in 2006. After that, he worked as an associate researcher, researcher, and senior researcher at Microsoft Research Asia and Huawei Noah’s Ark Lab. In 2014, he joined Institute of Computing Technology, Chinese Academy of Sciences. Jun Xu’s research interests focus on applying machine learning to information retrieval. He has about 40 papers at top international journals and conferences, including TOIS, JMLR, SIGIR, WWW etc. He has also been active in the research communities and severed or is serving the top conferences and journals. For example, in 2017, he was the Senior PCs/Area Chairs of SIGIR ’17 and ACML 2017; PC members of KDD ’17, NIPS ’17, CIKM ’17, and WSDM ’17; reviewers of TOIS, JMLR, and TKDE etc.

From Facts to Acts: Knowledge Graphs for Personal Assistant

Abstract

The current generation of knowledge graphs (KGs) make it easy to deliver answers to popular factoid questions, but provide weak support for more personalized, task-oriented assistance. There are general KGs and KGs associated with assistants, but they have quite different characteristics and are distinctly separate: the former are relatively well understood, stable, focused on explicit factual knowledge; the latter are more volatile, focused on personal user state and implicit knowledge, and still largely being defined. Interfacing the two is not well understood. This talk will present open research areas for both KGs in the assistant domain including properties of their construction, representation, and inference.

Bio

Jeff Dalton is a Lecturer in the School of Computing at the University of Glasgow. Until recently he was a Software Engineer at Google, where projects included the Google Assistant Natural Language Understanding and automatic knowledge graph construction. He completed his PhD at the University of Massachusetts Amherst with James Allan in the Center for Intelligent Information Retrieval. His research focuses on the intersection of Information Retrieval and Natural Language Processing.

What People are Asking About You: Mining Entity Search Intents in CQA Sites

Abstract

In this work we propose a novel representation for named entities that is based on the questions people ask about them in a CQA site. The representation is composed of entity related questions, answered by community members, which depict a meaningful search intent about the entity and are referred to as Entity Search Intents (ESI). Based on the hypothesis that people ask similar questions about strongly related entities, we utilize the ESI representation for the task of entity relatedness estimation. Specifically, we estimate the relatedness between two entities based on the similarity between their associated search intents. The performance is evaluated by measuring the correlation of our proposed approach with human relatedness judgment over a dataset of entity pairs. Our method has been shown to be highly effective for this task, as high correlation was obtained. In addition, we show that combining ESI-based relatedness measurement with other entity similarity measurements based on word embedding significantly improves the relatedness measurement accuracy. This is joint work with Hadas Raviv (Technion) and Idan Szpektor (Google)

Bio

David is a Principal Research Scientist at Yahoo Research, Haifa, and an ACM Distinguished Engineer. David’s research is focused on search and content quality analysis in Web and Email, query performance prediction, entity search, and text mining. David has published more than 100 papers in IR and Web journals and conferences, and serves on the editorial board of the IR journal and as a senior PC member or as Area Chair of many ACM conferences (SIGIR, WWW, WSDM. CIKM). He organized a number of workshops and taught several tutorials at SIGIR, and WWW. David is co-author of the book “Estimating the Query Difficulty for Information Retrieval”, published by Morgan & Claypool in 2010, and the co-author of the paper “Learning to estimate query difficulty”, which won the Best Paper Award at SIGIR 2005. David earned his PhD in Computer Science from the Technion, Israel Institute of Technology in 1997.

Overview of and Lessons from the FEIII Challenges

Abstract

In 2014, the Office of Financial Research at the US Department of the Treasury approached NIST to think about how to set up information extraction challenge problems for the newly-emerging field of computational finance. Led by Louiqa Raschid at the University of Maryland, we established the Financial Entity Identification and Information Integration (FEIII) Challenges. These challenges focused on linking entities across structured databases from different sources and identifying entity roles and relationships between entities.

Bio

Dr. Ian Soboroff is a computer scientist and leader of the Retrieval Group at the National Institute of Standards and Technology (NIST). The Retrieval Group organizes the Text REtrieval Conference (TREC), the Text Analysis Conference (TAC), and the TREC Video Retrieval Evaluation (TRECVID). These are all large, community-based research workshops that drive the state-of-the-art in information retrieval, video search, web search, information extraction, text summarization and other areas of information access. He has co-authored many publications in information retrieval evaluation, test collection building, text filtering, collaborative filtering, and intelligent software agents. His current research interests include building test collections for social media environments and nontraditional retrieval tasks

Semantic Search on Text and Knowledge Bases

Abstract

I will present an overview of our current research on semantic search on text and knowledge bases. By semantic search I mean search with meaning. Our goal is to make the search as convenient as possible without losing transparency. On approach is to aid the user in incremental query construction. Another approach is to allow questions in free-form natural language and provide feedback on the interpretation of the question in an easily digestible form. I also believe that it is a good idea to combine text and knowledge bases. Structured information is best represented in a knowledge base (the semantics is then explicit in the data) but the bulk of our knowledge of the world will always be in the form of text. I will a show a lot of examples and demos.

Bio

Hannah Bast is a professor of computer science at the University of Freiburg at the foot of the beautiful black forest in the southwest of Germany. She is a big fan of easy to use and powerful information systems of all kinds. One of the algorithms from her work is used for public transit routing on Google Maps. Her CompleteSearch engine powers the bibliography search on DBLP. She is convinced of the great potential of deep learning for natural language understanding. She believes that the world has more pressing problems than whether AI will eventually take over.