dc.description.abstract | Developments in the ability to analyse online data of people’s names have provided
breakthroughs for social research, however, there are several existing challenges. This
thesis proposes several novel approaches for identifying hidden properties and text
diversity translation of people’s names. We start by studying the hidden properties
of people’s names and found that there are limited existing methods to identify more
than one hidden property of people’s names at one time. We, therefore, propose
a ’Hidden Property Bayes’ model that achieves identifying more than one hidden
property of people’s names in Kanji and Hanzi at one time. In addition, our model
performs better than an existing system on name origin identification. We then
moved on to text diversity translation and found that translating romanised names
to the original language is a challenge. Therefore, we propose two novel models to
translate Pinyin names to Hanzi names. These two novel models perform better
than ’google translate’ on Mandarin name translation. We next investigated gender
prediction of people’s names and found that limited existing tools can predict and
analyse the data in one process. Therefore, we propose a ’Name-Gender’ tool that
achieves predicting the gender of people’s names and also provides a statistical graph
directly. In addition, our tool has better performance than an existing system on
predicting the genders of people’s names in Latin and Hanzi characters. We also
provide novel findings of gender analysis in computer science using our ’Name-Gender’
tool approaches. Overall, our contributions provide effective novel approaches to
support social researchers analysing online data sources of people’s names to aid
them in understanding real-world events. | en |