Sentiment Analysis for micro-blogging platforms in Arabic
Refaee, Eshrag Ali Ahmad
MetadataShow full item record
Sentiment Analysis (SA) concerns the automatic extraction and classification of sentiments conveyed in a given text, i.e. labelling a text instance as positive, negative or neutral. SA research has attracted increasing interest in the past few years due to its numerous real-world applications. The recent interest in SA is also fuelled by the growing popularity of social media platforms (e.g. Twitter), as they provide large amounts of freely available and highly subjective content that can be readily crawled. Most previous SA work has focused on English with considerable success. In this work, we focus on studying SA in Arabic, as a less-resourced language. This work reports on a wide set of investigations for SA in Arabic tweets, systematically comparing three existing approaches that have been shown successful in English. Specifically, we report experiments evaluating fully-supervised-based (SL), distantsupervision- based (DS), and machine-translation-based (MT) approaches for SA. The investigations cover training SA models on manually-labelled (i.e. in SL methods) and automatically-labelled (i.e. in DS methods) data-sets. In addition, we explored an MT-based approach that utilises existing off-the-shelf SA systems for English with no need for training data, assessing the impact of translation errors on the performance of SA models, which has not been previously addressed for Arabic tweets. Unlike previous work, we benchmark the trained models against an independent test-set of >3.5k instances collected at different points in time to account for topic-shifts issues in the Twitter stream. Despite the challenging noisy medium of Twitter and the mixture use of Dialectal and Standard forms of Arabic, we show that our SA systems are able to attain performance scores on Arabic tweets that are comparable to the state-of-the-art SA systems for English tweets. The thesis also investigates the role of a wide set of features, including syntactic, semantic, morphological, language-style and Twitter-specific features. We introduce a set of affective-cues/social-signals features that capture information about the presence of contextual cues (e.g. prayers, laughter, etc.) to correlate them with the sentiment conveyed in an instance. Our investigations reveal a generally positive impact for utilising these features for SA in Arabic. Specifically, we show that a rich set of morphological features, which has not been previously used, extracted using a publicly-available morphological analyser for Arabic can significantly improve the performance of SA classifiers. We also demonstrate the usefulness of languageindependent features (e.g. Twitter-specific) for SA. Our feature-sets outperform results reported in previous work on a previously built data-set.