Essays on Business Analytics
Mostafa Rezaei
2019
This dissertation consists of three separate essays on business analytics. Abstract of the three essays are as follows: Essay 1: Peer-production projects are increasingly attracting the attention of Human-Computer Interactions (HCI) scholars. Such complex socio-technical systems could be viewed as Complex Adaptive Systems (CAS). The complexity of such projects presents a challenge to researchers trying to understand the dynamics of co-production processes. Visualization makes relevant processes
more »
... visible that would otherwise be difficult to interpret, and thus our objective is to develop an information visualization tool that would surface important pattern in peer-production. In this paper we introduce an interactive visualization simulation-WikiAttractors-that is inspired by techniques from the area of CAS to visualize the process by which a knowledge-based product evolves over time. Using Wikipedia as an example, we trace the evolution of articles from their inception until they are fully developed (i.e., Featured Articles). We show how WikiAttractors is able to identify both local (vandalism, negotiation) and global patterns (rate of convergence) of co-production. Essay 2: Although data mining problems require a flat mining table as input, in many real-world applications analysts are interested in finding patterns in a relational database. To this end, new methods and software have been recently developed that automatically add attributes (or features) to a target table of a relational database which summarize information from all other tables. When attributes are automatically constructed by these methods, selecting the important attributes is particularly difficult, because a large number of the atii tributes are highly correlated. In this setting, attribute selection techniques such as the Least Absolute Shrinkage and Selection Operator (lasso), elastic net, and other machine learning methods tend to under-perform. In this paper, we introduce a novel attribute selection procedure, where after an initial screening step, we cluster the attributes into different groups and apply sparse modelling techniques (e.g. Group lasso) to select both the true attributes groups and the true attributes. The procedure is particularly suited to high dimensional data sets where the attributes are highly correlated. We test our procedure on several simulated data sets and a real-world data set from a marketing database. The results show that our proposed procedure obtains a higher predictive performance while selecting a much smaller set of attributes when compared to other state-of-the-art methods. Essay 3: Forecasting Emergency Medical Services (EMS) call volumes is critical for resource allocation and planning. The value of call volume forecasts increases with forecast accuracy and with spatial resolution; however as the spatial resolution is increased, sample sizes for each spatial unit decrease, and hence accuracy decreases. Thus, there is a trade-off between forecast accuracy and spatial resolution. We study this trade-off in this paper, using 5 years of data from 3 cities in Alberta. We compare various exponential smoothing methods to capture weekly seasonality, differences, and correlations across neighbourhoods. Our findings suggest that including a seasonal component in the forecasting method improves the accuracy while having a trend component, transforming the call volumes, or accounting for autocorrelations among errors have little pay off. Furthermore, a top-down approach, where forecasts are made on a lower resolution than the resolution of interest and then divided, performs as well as a bottom-up approach, where forecasts are made at a higher resolution and then aggregated. iii Preface
doi:10.7939/r3-dg0j-gn29
fatcat:uyr6ix5morclraxvod7v5q3gwe