Predictive power of web Big Data in Financial Economics
Due to the availability of big datasets, the digital revolution is profoundly changing our capability of understanding society and forecasting the outcome of many social and economic systems. Increasingly sophisticated semantic techniques are adopted to automatically interpret information published in articles, blogs, newspapers etc. Unfortunately, irrelevant or already commonly known information can increase the noise of these signals and make their predictive power severely affected or
... d. In this thesis we present a novel methodology which combines the information coming from the sentiment conveyed by public news with the browsing activity of the users of a finance specialized portal to forecast price returns at daily and intra-day time scale. To this aim we leverage a unique dataset consisting of a fragment of the log of Yahoo! Finance, containing the news articles displayed on the web site and the respective number of "clicks", i.e. the visualizations made by the users. Our analysis considers 100 highly capitalized US stocks in a one-year period between 2012 and 2013. Noticeably the sentiment signal and the browsing activity individually taken have very small or no predictive power. Conversely, constructing a signal which in a given time interval gives the average sentiment of the clicked news, weighted by the number of clicks, we show that for more than 50% of the investigated companies it Granger causes price returns. Our result indicates a wisdom of the crowd effect which allows to exploit users' activity to identify and weight properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment. In addition we study the presence of predictive power between Twitter messages and price return both n terms of volumes and aggregate sentiment and we present an "event study" methodology to measure the impact of days of high attention on Twitter on the stock price.