## 知识图谱基础内容知识分享：

1. 语料标注:

2. 词向量：

word2vec
3. 常用语文本的特征：
TF-IDF

PMI
4. 常用的机器学习关系抽取

5. 科研方向的深度学习 CNN + RNN 在知识抽取上的应用方向

## DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

https://arxiv.org/pdf/1707.06690.pdf

https://github.com/xwhan/DeepPath

Abstract
We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.1

1 Introduction

Deep neural networks for acoustic modeling in speech recognitionIn recent years, deep learning techniques have obtained many state-of-theart results in various classification and recognition problems (Krizhevsky et al., 2012; Hinton et al., 2012; Kim, 2014). However, complex natural language processing problems often require multiple inter-related decisions, and empowering deep learning models with the ability of learning to reason is still a challenging issue. To handle complex queries where there are no obvious answers, intelligent machines must be able to reason with existing resources, and learn to infer an unknown answer.

More specifically, we situate our study in the context of multi-hop reasoning, which is the task of learning explicit inference formulas, given a large KG. For example, if the KG includes the beliefs such as Neymar plays for Barcelona, and Barcelona are in the La Liga league, then machines should be able to learn the following formula: playerPlaysForTeam(P,T) ∧ teamPlaysInLeague(T,L) ⇒ playerPlaysInLeague(P,L). In the testing time, by plugging in the learned formulas, the system should be able to automatically infer the missing link between a pair of entities. This kind of reasoning machine will potentially serve as an essential components of complex QA systems

## 使机器能够通过知识图理解人类语言

http://gdm.fudan.edu.cn/GDMWiki/attach/Yanghuaxiao/Language%20Understanding.pdf

• 机器的语言理解

•大规模
•语义丰富
•友好结构
•传统知识表示

•本体论
•语义网络
•文本

## Enabling Machines to Understand Human Language by Knowledge Graphs

Can machine think like
humans?

http://gdm.fudan.edu.cn/GDMWiki/attach/Yanghuaxiao/Language%20Understanding.pdf

Language is the tool of thinking

It is the ability of language speaking and understanding that distinguish us from animals
Enabling machine to understand human language is the essential path to realize intelligent information processing and smart robot brain.

Obstacles of machine language
understanding
• Language understanding of machines
needs knowledge bases
• Large scale
• Semantically rich
• Friendly structure
can not satisfy these requirements
• Ontology
• Semantic network
• Texts

## Expeditious Generation of Knowledge Graph Embeddings

https://arxiv.org/abs/1803.07828v1
https://github.com/AKSW/KG2Vec

Tommaso Soru
1
, Stefano Ruberto
2
, Diego Moussallem
1
, Edgard Marx
1
, Diego
Esteves
3
, and Axel-Cyrille Ngonga Ngomo
4
1 AKSW, University of Leipzig, D-04109 Leipzig, Germany
{tsoru,moussallem,marx
}@informatik.uni-leipzig.de
2 Gran Sasso Science Institute, INFN, I-67100 L’Aquila, Italy
stefano.ruberto@gssi.infn.it
3
SDA, University of Bonn, D-53113 Bonn, Germany
esteves@cs.uni-bonn.de
axel.ngonga@upb.de
Abstract. Knowledge Graph Embedding methods aim at representing entities
and relations in a knowledge base as points or vectors in a continuous vector
space. Several approaches using embeddings have shown promising results on
triplet classification. However, only a few methods can compute low-dimensional
embeddings of very large knowledge bases. In this paper, we propose KG2VEC
,
a novel approach to Knowledge Graph Embedding based on the skip-gram model.
Instead of using a predefined scoring function, we learn it relying on Long ShortTerm
Memories. We evaluated the goodness of our embeddings on knowledge
graph completion and show that KG2VEC is comparable to the quality of the
scalable state-of-the-art approach RDF2Vec and can process large graphs by parsing
more than a hundred million triples in less than 6 hours on common hardware.\

## Neural Relation Extraction with Selective Attention over Instances

Yankai Lin1 , Shiqi Shen1 , Zhiyuan Liu1,2∗ , Huanbo Luan1 , Maosong Sun1,2 1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, China 2 Jiangsu Collaborative Innovation Center for Language Competence, Jiangsu, China

Abstract

Distant supervised relation extraction has been widely used to find novel relational facts from text. However, distant supervision inevitably accompanies with the wrong labelling problem, and these noisy data will substantially hurt the performance of relation extraction. To alleviate this issue, we propose a sentence-level attention-based model for relation extraction. In this model, we employ convolutional neural networks to embed the semantics of sentences. Afterwards, we build sentence-level attention over multiple instances, which is expected to dynamically reduce the weights of those noisy instances. Experimental results on real-world datasets show that, our model can make full use of all informative sentences and effectively reduce the influence of wrong labelled instances. Our model achieves significant and consistent improvements on relation extraction as compared with baselines. The source code of this paper can be obtained from https: //github.com/thunlp/NRE.

1 Introduction

In recent years, various large-scale knowledge bases (KBs) such as Freebase (Bollacker et al., 2008), DBpedia (Auer et al., 2007) and YAGO (Suchanek et al., 2007) have been built and widely used in many natural language processing (NLP) tasks, including web search and question answering. These KBs mostly compose of relational facts with triple format, e.g., (Microsoft, founder, Bill Gates). Although existing KBs contain a ∗

Corresponding author: Zhiyuan Liu (liuzy@tsinghua.edu.cn).

massive amount of facts, they are still far from complete compared to the infinite real-world facts. To enrich KBs, many efforts have been invested in automatically finding unknown relational facts. Therefore, relation extraction (RE), the process of generating relational data from plain text, is a crucial task in NLP.

Most existing

# 1. 创建一个节点

CREATE (ee:Person { name: "Emil", from: "Sweden", klout: 99 })

• CREATE 创建数据
• () 表示节点
• ee:Person，ee是变量名，Person是label名称
• {} 是节点属性

## Cypher介绍

“Cypher”是一个描述性的图形查询语言，允许不必编写图形结构的遍历代码对图形存储有表现力和效率的查询。Cypher还在继续发展和成熟，这也就意味着有可能会出现语法的变化。同时也意味着作为组件没有经历严格的性能测试。

Cyper通过一系列不同的方法和建立于确定的实践为表达查询而激发的。许多关键字如like和order by是受SQL的启发。模式匹配的表达式来自于SPARQL。正则表达式匹配实现实用Scala programming language语言。

Cypher是一个申明式的语言。对比命令式语言如Java和脚本语言如Gremlin和JRuby，它的焦点在于从图中如何找回（what to retrieve），而不是怎么去做。这使得在不对用户公布的实现细节里关心的是怎么优化查询。

## Neo4J(Cypher语句)初识

### 创建节点、关系

创建节点（小明）：create (n:people{name:’小明’,age:’18’,sex:’男’})  return  n;

start a =node(0),b=node(1) create (a)-[n:gift]->(b)return n