Enabling Machines to Understand Human Language by Knowledge Graphs

Can machine think like
humans?

http://gdm.fudan.edu.cn/GDMWiki/attach/Yanghuaxiao/Language%20Understanding.pdf

Language is the tool of thinking


It is the ability of language speaking and understanding that distinguish us from animals
Enabling machine to understand human language is the essential path to realize intelligent information processing and smart robot brain.

Obstacles of machine language
understanding
• Language understanding of machines
needs knowledge bases
• Large scale
• Semantically rich
• Friendly structure
• Traditional knowledge representations
can not satisfy these requirements
• Ontology
• Semantic network
• Texts

Knowledge Graph

• Knowledge graph is a large scale semantic


network consisting of entities/concepts as well as
the semantic relationships among them
• Higher coverage over entities and concept
• Richer semantic relationships
• Usually organized as RDF
• Quality insurance by Crowdsourcing
• Why Knowledge Graphs?
• Understanding the semantic of text needs
background knowledge
• A robot brain needs knowledge base to understand
the world
• Yago҅WordNet, FreeBase, Probase, NELL, CYC,
DBPedia….

 

How to enable machine to understand human languages by knowledge graphs?

Understanding Human Languages
• Understanding a concept/category (IJCAI2016)
• Understanding a set of entities (under review)
• Understanding a bag of words (IJCAI2015)
• Understanding verb phrases (AAAI2016)
• Understanding short texts (EMNLP2016)
• Understanding natural language questions (VLDB2017҅
IJCAI2016)
• Inference of missing facts (AAAI2017)

 

Language Cognitive Ability
• Conceptualization
• Newton-> Scientist
• Association
• Microsoft-> Bill Gates
• Inference
• Man has brain, brain can think-> Man can think
• Induction
• Ceremony, bride, rose-> wedding
• Categorization
• Sex=man, Marriage status=unmarried! Bachelor

 

 

Probase and Probase+

Extracted by Hearst pattern
• NP such as NP, NP, …, and|or NP such
NP as NP,* or|and NP
• NP, NP*, or other NP
• NP, NP*, and other NP
• NP, including NP,* or | and NP NP,
especially NP,* or|and NP
• domestic animals such as cats and
dogs …
• China is a developing country.
• Life is a box of chocolate.

 

 

 

DBPedia and CN-DBPedia

• DBpedia


• Extract structured information
from Wikipedia
• Make this information available on
the Web under an open license
• Interlink the DBpedia dataset with
other datasets on the Web
• Contributors
• Freie Universität Berlin (Germany)
• Universität Leipzig (Germany)
• OpenLink Software (UK)
• Linking Open Data

CN-DBpedia: a Chinese counterpart of
DBPedia
• Developed by Knowledge Works at Fudan
• Rich for entities’ structured information.
• Contains many category, tags for entities

 

Understanding a Concept/Category

Problem

How do we understand a concept category?

Exampleғ

Bachelor : Sex=man Marriage status=unmarried

What is Defining Features of Categories

Defining features are assumed to establish the
necessary and sufficient conditions to
characterize the meaning of the category.
• Any entity with the defining features should
belong to the category
• Any entity belonging to the category must
contain the defining features

E.g., Category “Jay Chou albums”
Defining Features
{(Type, album), (Singer, Jay Chou)}
Non-Defining Features
{(Type, album), (Singer, Jay Chou), (genre, Pop music)}
{(Type, single), (Singer, Jay Chou)}

 

Solution and Results

How to measure the goodness
of a set of features

Challenge and Solutions

The search space of candidate feature set is of
exponential
Using frequent pattern mining to find candidate
defining feature sets that are frequent enough

 

Solution Framework

Repeat until no new DFs can be found
Step 1: Using a score function to find DFs of some
categories
Step 2 & 3: Using a rule-based method to get more DFs
of categories
Step 4: Populate DBpedia by using DFs of categories
discovered so far

Results
We finally obtain 60,247 new C-DFs with average quality
score 2.82

Understanding a Set of Entities

Problem:
Given a set of entities, can we understand
its concept and recommend a most
related entity?

Applicationsғ
E-commerce: if users are searching
samsung s6, and iPhone 6, what should we
recommend and why?

 

 

 

 

 

 

Understanding a Set of Entities

A naive solution:
Use the taxonomy, such as Probase, to find
the nearest common ancestor

• Problem:
• A concept not necessarily exists
• We can find China, Russia, Brazil and India
under the topic ‘developing country’, but
there is not an exact topic ‘BRIC’.
• A concept may cover many non-relevant
instances
• Under the topic ‘developing country’ there
are many other countries which makes
us difficult to find the most related
entities.
• The best concept in most cases is implicit
• The information in Probase is not clean
enoug

 

Model 1: using concepts as hidden
variables and punish concepts with too
much member entities

Model 2: the entity whose union with the
query entities should preserve the concept
distribution of queries.

Understanding a Set of Entities

Understanding Verb Phrase

E.g. I watched The Amazing Spider-man 2 and
thought it’s impressive.
How to understand “The Amazing Spider-man
2” using verb “watch”?
Pattern: watch $movie -> “The Amazing Spiderman
2” isA movie
Linguists [Sinclair 1990] found two principles for
verb phrases:
• idiom patterns: Kick the ass/ watch step
• conceptualized patterns: eat fruit (apple/
banana etc.) drink beverage (wine, tea
etc.)
Model: extract the patterns of verb phrases

Applications:
Conceptualization using verbs
The apple(object) he ate(verb) yesterday
has a bad taste.
Pattern: eat $food -> apple isA $food
Parsing: Finding subjective/objective/etc. of a
verb

Understanding Verb Phrase

Challenge: trade-off between generality and specificity

Generality: One general pattern is
better than several specific pattern.
Specificity: A pattern’s assigned
entities and the pattern itself should
be matched

By using MDL

 

 

Results

Our approach outperforms the competitors
Verb patterns are helpful for conceptualization

Understanding Short Texts

Cover of iPhone 6 plus

• Short texts are
everywhere
• web queries
• instant messages

 

 

distance from earth from moon

Understanding the semantic
of short texts
• syntactic parsing

 

thai food located in houston

 

 

 

 

Understanding Short Texts

Syntactic parsing of short
texts is challenging
• Grammatical signals from
function words and word
order are not available
• There is no labeled
dependency trees
(treebank) for web
queries, nor is there a
standard for constructing
such dependency trees

Our solution
• Inferring tree decency from
complete sentences by heuristic
rules
• e.g. Connected via function
words

Train a transition based
decency parser

 

Understanding Short Texts

Results
• Stanford Parser heavily relies on grammar signals
such as function words and word order, while
QueryParser relies more on the semantics of the query
• QueryParser consistently outperformed competitors
on short query parsing task

Understanding a bag of words

 

Problem:
Given a bag of words, can we inference
what the article is talking about?
Example:
china, japan, india, korea -> asian country
dinner, lunch, food, child, girl -> meal, child
bride, groom, dress, celebration -> wedding

Challenge: how to measure the “goodness” of the
labels we assign to a bag of words
Coverage: the conceptual labels should cover as
many words and phrases in the input as
possible
Minimality: the number of conceptual labels
should be as small as possible

Applications
Topic labelling
• A topic is a bag of words that do not have
explicit semantics
• Conceptual labeling turns each topic into
a small set of meaningful concepts
Language understanding
• Verb role labeling
• Can summarize verb eat’s direct objects
apple, breakfast, pork, beef, bullet, into
a small set of concepts, such as fruit,
meal, meat, bullet

 

 

Solution

Minimum description length

The best concepts should capture the regularities of the words as much as possible, which
enables us to compress the data as much as possible

concepts
concepts
Problem: Given a bag of word xm,find

Results

• Our solutions can find minimum
number of concepts to label a
bag of words
• Most conceptual labels are specific
enough
• Noise words will be ignored

 

 

 

 

 

• Our models can fully employ the
attributes of concepts to
generate a better label

 

 

 

Understanding Natural Language Questions

Online procedure parses and answers a
question
• Question Parsing: convert questions to
templates by NER and
conceptualization
• Predicate Lookup: lookup the entity and
the predicate of given template and
return corresponding value
Offline procedure learns the mapping from
templates to predicates
• Template Extraction: learns templates
and their corresponding predicates
• Predicate Expansion: learn predicate
paths

 

 

 

Key idea:


Understanding a question’s intent by its
template

 

 

 

A probabilistic generative model for the template based


predicate inference
1. Starting from question q, generate its entity e according to the
distribution P(e|q).
2. Generate the template t according to the distribution P(t|q,e).
3. Infer predicate p by P(p|t), where the predicate p only
depends on t.
4. Generate the answer value v by P(v|e,p).

 

 

KBQA finds significantly more templates and predicates
than its competitors despite that the corpus size of
bootstrapping is larger

Missing isA Facts Inference

There are many missing links in a data driven conceptual
taxonomy, such as Probase
Newton isA scientist
Steve jobs isA billionaire

Problemғ
Can we infer missing facts from existing
facts in knowledge base?

Data Bias: many common
sense like facts can not
be observed from data

Example:
Can we infer that Steve Jobs is a billionaire
from the fact that Bill Gates is a billionaire?

Missing isA facts Inference- Ideas and Results

Inference from similar instances

Inference from similar concepts

Results:
• Our features are effective to find
missing facts
• Our models can consistently achieve
90% precision
• More similar entities/concepts, the
higher the accuracy

Open Challenges
• Common sense knowledge
• human cannot fly
• the sun rises from the east
• the object will fall to
ground without any
support
• Reasoning in language
understanding
• Obama is a white manҘ

• Why understanding common
sense knowledge is challenging
• No one will mention it
explicitly in texts
• No source to extract
• Why reasoning is so hard
• Hard inference always suffers
from exceptions
• birds can fly but ostrich
cannot fly

Research Outline

 

Graph Analytic
1̵Models for symmetry (Physical Review E 2008)
2̵Graph Simplification (Physical Review E 2008)
3̵Complexity/distance measurement (Pattern
Recognition 2008, Physica A 2008)
4̵Graph Index Compression (EDBT2009)
5̵Graph anonymization (EDBT2010)

 

 

Knowledge Graph Construction
1̵IsA taxonomy completion (TKDE2017)
2̵Implicit isA relation inference (AAAI2017)
3̵Error isA correction (AAAI2017)
4̵Cross-lingual type inference(DASFAA2016)
5̵End-to-end knowledge harvesting
6̵Domain-specific knowledge harvesting

 

 

Natural Language Understanding by KG
1̵Understanding bag of words (IJCAI2015)
2̵Understanding a set of entities
3̵Understanding verb phrase (AAAI2016)
4̵Understanding a concept (IJCAI 2106)
5̵Understanding short text (EMNLP2016)
6̵Understanding natural languages (IJCAI2016)

 

Knowledgable Search/Recommendation
1̵Recommendation by KG (WWW2014̵DASFAA2015)
2̵User profiling by KG (ICDM2015̵CIKM2015)
3̵Categorization by KG (CIKM 2015)
4̵Entity suggestion with conceptual explanation
5̵Entity search by long concept query

 

Big Graph Management
1̵Big graph systems(SIGMOD12)
2̵Overlapping community search (SIGMOD2013)
3̵Local Community search (SIGMOD2014)
4̵Big graph partitioning (ICDE2014҂
5̵Shortest distance query (VLDB2014҂
6̵Fast graph exploration (VLDB 2016)

 

 

1. CN-DBPedia CN-DBpedia is an effort to extract
structured information from Chinese encyclopedia
sites, such as Baidu Baike, and make this information
available on the Web. CN-DBpedia allows you to ask
sophisticated queries against Chinese encyclopedia
sites, and to link the different data sets on the Web to
Chinese encyclopedia sites data
2. Probase Plus Probase is a web-scale taxonomy
that contains 10 millions of concepts/entities and 16
millions of isA relations. In addition, ProbasePlus is a
updated taxonomy that has more isA relations inferred
from the original Probase. They are useful for
conceptualization, reasoning, etc
3. Verb Base
Verb pattern is a probabilistic semantic representation on
verbs. We introduce verb patterns to represent verbs’
semantics, such that each pattern corresponds to a single
semantic of the verb. We constructed verb patterns with
the consideration of their generality and specificity.
K

 

• Kowledge Works@FUDAN
• http://Kw.fudan.edu.cn
• Knowledge works is a studio focusing on building
and managing large scale knowledge graphs of
high quality as well as the applications of
knowledge graphs in text understanding,
intelligent search and robot brain.
• Graph Data Management Lab@FUDAN
• http://gdm.fudan.edu.cn
• GDM@FUDAN focuses on studying and developing effective and efficient solutions to
manage and mine these graph data, aiming at
understanding real graphs and supporting real
applications built upon large real graphs.
Recently, we are especially interested in
knowlege graphs and its application.

Our Mission: The construction, management and application of large scale
knowledge graphs

 

Knowledge Graph
a kind of semantic network that consists of entities/
concepts as well as their semantic relationships. Higher
coverage over entities and concepts, more abundant
semantic relationships, constructed in an more
automatic way, higher accuracy is expected.
The key of intelligent information processing.
KG has shown its potential power in solve problems such as search intent understanding,
relationship explaining, user profiling. It is of great business value in intelligent search,
intelligent software, cybernetic security and intelligent business.
The key to build a machine that think like human
KG provides necessary background knowledge to enable machine to understand language
and think like human.

 

Leave a Reply

Your email address will not be published. Required fields are marked *