Extracting business performance signals from Twitter news
MetadataShow full item record
Social media and social networks underpin a revolution in communication between people, with the particular feature that much of that communication is open to all. This provides a massive pool of data that can be exploited by researchers for a wide variety of different applications. Data from Twitter is of particular interest in this sense, given its large global usage levels, and the availability of APIs and other tools that enable easy access to the publicly available stream of tweets. Owing to the wide public penetration of Twitter, many businesses make use of it to share their latest news, effectively using Twitter as a gateway to connect to end-users, consumers and/or investors. In this thesis, we focus on the potential for extracting information from Twitter that is relevant to the financial and competitiveness status of a business. We consider a collection of well-regarded Twitter accounts that are known for communicating recent business news, and we investigate the automated analysis of the stream of tweets from these sources, with a view to learning business-relevant information about specific companies. A key aspect of our approach is the idea of extracting specific areas of business performance: we explore three such areas: productivity, competitiveness, and industrial risk. We propose a two-step model which first classifies a tweet into one of these areas, and then assigns a sentiment value (on a positive/negative scale). The resulting sentiment values across specific aspects represent novel business indicators that could add significant value to the toolset used by business analysts. Our experiments are based on a new manually pre-classified data set (available from a URL provided). Additionally, we propose n-grams made from non-contiguous words as a novel feature to enhance performance in this context. Experiments involving a range of feature selection methods show that these new features provide valuable benefits in comparison with standard n-gram features. We also interduce the concept of an extra layer added to the primary classifier, with the role of filtering out noisy tweets before they enter the system. We use a One-Class SVM for this purpose. Broadly, we show that the methods developed in this thesis achieve promising results in both topic and sentiment classification in the business performance context, suggesting that twitter can indeed be a useful source of signals related to different aspects of business performance. We also find that our system can provide valuable insight into unseen test data. However, more research is needed to be able to extract robust signals for industrial risk, and there seems to be a considerable promise for further development.