Personal named entity linking based on simple partial tree matching and context free grammar
Abstract
Personal name disambiguation is the task of linking a personal name to a unique comparable
entry in the real world, also known as named entity linking (NEL). Algorithms for NEL
consist of three main components: extractor, searcher, and disambiguator.
Existing approaches for NEL use exact-matched look-up over the surface form to generate
a set of candidate entities in each of the mentioned names. The exact-matched look-up
is wholly inadequate to generate a candidate entity due to the fact that the personal names
within a web page lack uniform representation. In addition, the performance of a disambiguator
in ranking candidate entities is limited by context similarity. Context similarity is
an inflexible feature for personal disambiguation because natural language is highly variable.
We propose a new approach that can be used to both identify and disambiguate personal
names mentioned on a web page. Our NEL algorithm uses: as an extractor: a control flow
graph; AlchemyAPI, as a searcher: Personal Name Transformation Modules (PNTM) based
on Context Free Grammar and the Jaro-Winkler text similarity metric and as a disambiguator:
the entity coherence method: the Occupation Architecture for Personal Name Disambiguation
(OAPnDis), personal name concepts and Simple Partial Tree Matching (SPTM).
Experimental results, evaluated on real-world data sets, show that the accuracy of our NEL
is 92%, which is higher than the accuracy of previously used methods.