Análise de componentes principais em data warehouses [thesis]

Rafael Germano Rossi
Rossi, R. G. Principal Components Analysis in Data Warehouses. 2017.. Dissertation (Masters Degree) -Institute of Mathematics and Statistics, University of São Paulo, São Paulo, 2017. The Principal Component Analysis (PCA) technique has as the main goal the description of the variance and covariance between a set of variables. This technique is used to mitigate redundancies in the set of variables and as a mean of achieving dimensional reduction in various applications in the scientific,
more » ... ogical and administrative areas. On the other hand, the multidimensional data model is composed by fact and dimension relations (tables) that describe an event using metrics and the relationship between their dimensions. However, the volume of data stored and the complexity of their dimensions usually involved in this model, specially in data warehouse environment, makes the correlation analyses between dimensions very difficult and sometimes impracticable. In this work, we propose the development of an Application Programming Interface (API) for the application of PCA on multidimensional data model in order to facilitate the characterization task and dimension reduction, integrating the technique with Data Warehouses environments. For verifying the effectiveness of this API, a case study was carried out using the scientific production data obtained from the Lattes Platform, the Web of Science, Google Scholar and Scopus, provided by the IT Superintendence at University of São Paulo.
doi:10.11606/d.45.2018.tde-07012018-182730 fatcat:ql2pmnz3tjhcxnkkx25jsqbguq