https://arxiv.org/abs/1803.07828v1

https://github.com/AKSW/KG2Vec

Tommaso Soru

1

, Stefano Ruberto

2

, Diego Moussallem

1

, Edgard Marx

1

, Diego

Esteves

3

, and Axel-Cyrille Ngonga Ngomo

4

1 AKSW, University of Leipzig, D-04109 Leipzig, Germany

{tsoru,moussallem,marx

}@informatik.uni-leipzig.de

2 Gran Sasso Science Institute, INFN, I-67100 L’Aquila, Italy

stefano.ruberto@gssi.infn.it

3

SDA, University of Bonn, D-53113 Bonn, Germany

esteves@cs.uni-bonn.de

4 Data Science Group, Paderborn University, D-33098 Paderborn, Germany

axel.ngonga@upb.de

Abstract. Knowledge Graph Embedding methods aim at representing entities

and relations in a knowledge base as points or vectors in a continuous vector

space. Several approaches using embeddings have shown promising results on

tasks such as link prediction, entity recommendation, question answering, and

triplet classification. However, only a few methods can compute low-dimensional

embeddings of very large knowledge bases. In this paper, we propose KG2VEC

,

a novel approach to Knowledge Graph Embedding based on the skip-gram model.

Instead of using a predefined scoring function, we learn it relying on Long ShortTerm

Memories. We evaluated the goodness of our embeddings on knowledge

graph completion and show that KG2VEC is comparable to the quality of the

scalable state-of-the-art approach RDF2Vec and can process large graphs by parsing

more than a hundred million triples in less than 6 hours on common hardware.\

1 Introduction

Recently, the number of public datasets in the Linked Data cloud has significantly

grown to almost 10 thousands. At the time of writing, at least four of these datasets contain

more than one billion triples each.5 This huge amount of available data has become

a fertile ground for Machine Learning and Data Mining algorithms. Today, applications

of machine-learning techniques comprise a broad variety of research areas related to

Linked Data, such as Link Discovery, Question Answering, and Ontology Engineering.

The field of Knowledge Graph Embedding (KGE) has emerged in the Machine Learning

community during the last two years [22,8]. The underlying concept of KGE is

that in a knowledge base, each entity can be regarded as a point in a continuous vector

space while relations can be modelled as translation vectors. The generated vector representations

can be used by algorithms employing machine learning, deep learning, or

statistical relational learning to accomplish a given task. Several KGE approaches have 5 http://lodstats.aksw.org

arXiv:1803.07828v1 [cs.CL] 21 Mar 2018

already shown promising results on tasks such as link prediction, entity recommendation,

question answering, and triplet classification [21,9,10]. Moreover, Distributional

Semantics techniques (e.g., WORD2VEC or DOC2VEC) is relatively new in the Semantic

Web community; HolE [14] and the RDF2Vec approaches [17,3] are examples

of pioneering research. To date, RFD2Vec is the only option for learning embeddings

on a large knowledge graph. However, the computation of embeddings on large graphs

is known to be expensive with regard to runtime and required memory. To this end, we

devise the KG2VEC approach, which comprises skip-gram techniques for creating embeddings

on large knowledge graphs in a feasible time but still maintaining the quality

of state-of-the-art embeddings. Our evaluation shows that KG2VEC might achieve a

better quality than RDF2Vec in terms of cosine function and can scale to large graphs,

processing more than a 100 million triples in less than 6 hours on common hardware.

2 Related Work

KGE on Semantic Web. The field of KGE has considerably grown during the last

two years, earning a spot also in the Semantic Web community. In 2016, Nickel et al.

proposed HolE [14], which relies on holographic models of associative memory by employing

circular correlation to create compositional representations. HolE can capture

rich interactions by using correlation as the compositional operator but it simultaneously

remains efficient to compute, easy to train, and scalable to very large datasets.

In the same year, Ristoski et al. presented RDF2Vec [17] which uses language modeling

approaches for unsupervised feature extraction from sequences of words and adapts

them to RDF graphs. After generating sequences by leveraging local information from

graph substructures by random walks. RDF2Vec learns latent numerical representations

of entities in RDF graphs. RFD2Vec has recently been extended in order to reduce the

computational time and the biased regarded the random walking [4,3].

State of the Art of KGE approaches. An early effort to automatically generate features

from structured knowledge was proposed in [2]. RESCAL is a relational-learning algorithm

based on Tensor Factorization using Alternating Least-Squares which has showed

to scale to large RDF datasets such as YAGO and reach good results in the tasks of

link prediction, entity resolution, or collective classification [15,16,13]. Manifold approaches

which rely on translations have been implemented so far [1,20,5,10,18,21].

TransE is the first method where relationships are interpreted as translations operating

on the low-dimensional embeddings of the entities [1]. On the other hand, TransH models

a relation as a hyperplane together with a translation operation on it [20]. TransA explores

embedding methods for entities and relations belonging to two different knowledge

graphs finding the optimal loss function [5], whilst PTransE relies on paths to

build the final vectors [9]. The algorithms TransR and CTransR proposed in [10] aim

at building entity and relation embeddings in separate entity space and relation spaces,

so as to learn embeddings through projected translations in the relation space; an extension

of this algorithm makes use of rules to learn embeddings [18]. An effort to jointly

embed structured and unstructured data (such as text) was proposed in [19]. More recently,

TransG, a generative model address the issue of multiple relation semantics of a

relation, has showed to go beyond state-of-the-art results [21].

3 KG2Vec

Formally, let G = (V, E) be a directed labeled graph composed by a set of vertices

(i.e., entities) V , a set of edges (i.e., relationships) E, and an edge labeling function

l : E → Σ+. A transformation function F defined as

F : T → R

n

(1)

assigns a vector of size n to each element of the set T of things. Each element of T can

be a vertex or an edge (i.e, T := V ∪ E); however, some approaches consider only the

transformation of vertices (i.e, T := V ) or subjects (i.e, T := {v ∈ V : ∃(v, v0

) ∈ G}).

Existing KGE approaches based on the skip-gram model such as RDF2Vec [17]

submit paths built using random walks to the WORD2VEC algorithm. Instead, we preprocess

the input knowledge base by converting each triple into a small sentence of

three words. Our method is faster as it allows us to avoid the path generation step. The

generated text corpus is thus processed by the skip-gram model as follows.

The Skip-Gram model. Considering a sequence of T words w1, w2, …, wT , the aim

of the skip-gram model is to maximize the average log probability

1

T

X

T

t=1

X

−c≤j≤c,j6=0

log p(wt+j |wt) (2)

where c is the context window around word wt [11]. In our case, we adopt c = 2, since

the sequence size is always T = 3. The probability above is theoretically defined as:

p(wt+j |wt) =

exp(v

0

wt+j

>

vwt

)

PW

w=1 exp(v

0

w

>vwt

)

(3)

where vw and v

0

w are the input and output vector representations of w and W is the

vocabulary size. We imply a negative sampling of 5, i.e. 5 words are randomly selected

to have an output of 0 and consequently update the weights.

Scoring Function. Several methods have been proposed to evaluate word embeddings.

The most common ones are based on analogies [12,7], where word vectors are summed

among each other, e.g.:

v[”queen”] ≈ v[”king”] + v[”woman”] − v[”man”] (4)

An analogy can thus predict relationships among words, which in our environment

means to predict new links among entities.

score(¯s, p, ¯ o¯) = X

(s,p,o ¯ )∈KB (

1 if vs¯ + vo − vs ≈ o¯

0 otherwise

(5)

We evaluate the scoring function above against a neural network based on Long

Short-Term Memories (LSTM). The neural network takes a sequence of embeddings as

input, namely vs, vp, vo for a triple (s, p, o) ∈ KB. A dense hidden layer of the same

size of the embeddings is connected to a single output neuron with sigmoid activation,

which returns a value between 0 and 1. The negative triples are generated using two

strategies, i.e. for each triple in the training set (1) randomly extract a relation and its

two nodes or (2) corrupt the subject or the object. We use the Adam optimizer and 100

epochs of training.

4 Evaluation

We implemented KG2VEC in Python 2.7 using the Gensim and Keras libraries with

Theano environment. The code, datasets, and vectors obtained are available online.6

All experiments were carried out on an Ubuntu 16.04 server with 128 GB RAM and 40

CPUs.

Table 1. Details and runtimes for the generation of KG2VEC embeddings on two datasets.

Dataset AKSW-bib DBpedia

Source AKSW.org + Bibsonomy DBpedia EN 2015-10

Number of triples train 3,530 + test 392 164,369,887

Number of vectors 954 14,921,691

Dimensions 10 300

Runtime (s) 2.2 18,332.0

Rate (triples/s) 1,604 8,966

In this study, we aim at generating embeddings at a high rate while preserving accuracy.

In Table 1, we show that our simple pipeline can achieve a rate of almost 9, 000

triples per second on a large dataset such as DBpedia [6]. In the following, we show the

most similar resources to the entity dbr:Tokyo in DBpedia EN 2015-10. The cosine

function of KG2VEC presents a better quality and higher accuracy than the RDF2Vec

models. This result paves the way for a more detailed evaluation of the learned embeddings.

Most similar to ‘Tokyo’ (RDF2Vec, 8-depth, 200d): Honshu (0.864),

Category:Populated places established in 1457 (0.859), Prefectures of Japan

(0.857), dichi Masuzoe (0.851), Category:Tokyo, (0.824), Category:Kant region

(0.796), Kant region (0.768), Tar (0.762), Shinz Abe (0.751), Akihito

(0.744)

Most similar to ‘tokyo’ (KG2Vec, 300d): osaka (0.923), kyoto (0.900),

kanagawa prefecture (0.864), yokohama (0.860), nagoya (0.855), osaka prefecture

(0.837), chiba prefecture (0.836), fukuoka (0.835), kobe (0.829),

aichi prefectur (0.816)

Comparing scoring functions. In Table 2, we show the results between the different

strategies. Our LSTM-based scoring function outperforms the analogy-based one in

both settings.

6 http://tsoru.aksw.org/kg2vec/

Fig. 1. A selection of DBpedia resources along with their vectors in 3 dimensions obtained using

Principal Component Analysis. Blue points are resources, whilst red points are classes. As can

be seen, resources follow the distributional hypothesis.

Table 2. Filtered Hits@10 values on link prediction on AKSW-bib using different strategies.

Hits@1 Hits@3 Hits@10

LSTM + corrupted 3.84% 9.79% 19.23%

LSTM + random 1.39% 4.89% 10.49%

Analogy 0.00% 0.51% 3.82%

5 Conclusion and Future Work

We presented a fast approach for generating knowledge graph embeddings dubbed

KG2VEC. We conclude that the skip-gram model, if trained directly on triples as

small sentences of length three, not only gains in runtime but also in plausibility of

the generated embeddings w.r.t. existing RDF embedding approaches. We plan to evaluate

CBOW embeddings and compute the Hits@N accuracy on the benchmark datasets

for link prediction using our LSTM-based scoring function.

References

1. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana

Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in

Neural Information Processing Systems, pages 2787–2795, 2013.

2. Weiwei Cheng, Gjergji Kasneci, Thore Graepel, David Stern, and Ralf Herbrich. Automated

feature generation from structured knowledge. In CIKM, pages 1395–1404. ACM, 2011.

3. Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. Biased graph

walks for rdf graph embeddings. In Proceedings of the 7th International Conference on Web

Intelligence, Mining and Semantics, page 21. ACM, 2017.

4. Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. Global

rdf vector space embeddings. In International Semantic Web Conference, pages 190–207.

Springer, 2017.

5. Yantao Jia, Yuanzhuo Wang, Hailun Lin, Xiaolong Jin, and Xueqi Cheng. Locally adaptive

translation for knowledge graph embedding. arXiv preprint arXiv:1512.01370, 2015.

6. Jens Lehmann, Chris Bizer, Georgi Kobilarov, Soren Auer, Christian Becker, Richard Cyga- ¨

niak, and Sebastian Hellmann. DBpedia – a crystallization point for the web of data. Journal

of Web Semantics, 7(3):154–165, 2009.

7. Omer Levy and Yoav Goldberg. Linguistic regularities in sparse and explicit word representations.

In Proceedings of the eighteenth conference on computational natural language

learning, pages 171–180, 2014.

8. Yang Li and Tao Yang. Word embedding for understanding natural language: A survey. In

Guide to Big Data Applications, pages 83–104. Springer, 2018.

9. Yankai Lin, Zhiyuan Liu, and Maosong Sun. Modeling relation paths for representation

learning of knowledge bases. CoRR, abs/1506.00379, 2015.

10. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and

relation embeddings for knowledge graph completion. In AAAI, pages 2181–2187, 2015.

11. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed

representations of words and phrases and their compositionality. In C.J.C. Burges, L. Bottou,

M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information

Processing Systems 26, pages 3111–3119. Curran Associates, Inc., 2013.

12. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous

space word representations. In HLT-NAACL, pages 746–751, 2013.

13. Maximilian Nickel, Xueyan Jiang, and Volker Tresp. Reducing the rank in relational factorization

models by including observable patterns. In Advances in Neural Information Processing

Systems, pages 1179–1187, 2014.

14. Maximilian Nickel, Lorenzo Rosasco, Tomaso A Poggio, et al. Holographic embeddings of

knowledge graphs. In AAAI, pages 1955–1961, 2016.

15. Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A three-way model for collective

learning on multi-relational data. In Proceedings of the 28th international conference on

machine learning (ICML-11), pages 809–816, 2011.

16. Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. Factorizing yago: scalable machine

learning for linked data. In WWW, pages 271–280. ACM, 2012.

17. Petar Ristoski and Heiko Paulheim. Rdf2vec: Rdf graph embeddings for data mining. In

International Semantic Web Conference, pages 498–514. Springer, 2016.

18. Quan Wang, Bin Wang, and Li Guo. Knowledge base completion using embeddings and

rules. In IJCAI, pages 1859–1865, 2015.

19. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph and text

jointly embedding. In EMNLP, pages 1591–1601. Citeseer, 2014.

20. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding

by translating on hyperplanes. In AAAI, pages 1112–1119. Citeseer, 2014.

21. Han Xiao, Minlie Huang, Yu Hao, and Xiaoyan Zhu. Transg: A generative mixture model

for knowledge graph embedding. arXiv preprint arXiv:1509.05488, 2015.

22. Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna

Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, et al. Neural information

retrieval: A literature review. arXiv preprint arXiv:1611.06792, 2016