DETECCIÓN DE OUTLIERS USANDO MÉTRICAS DE DISTANCIA Y ANÁLISIS CLUSTER [post]

Santiago Cartagena Agudelo, Camilo Cossio Alzate
2022 unpublished
In many techniques appropriate for conducting data science and machine learning, it is necessary to be able to measure the separation between different records. For example, in cluster analysis methods it is necessary to obtain a degree of similarity between the records. The way to do this is by using distances or metrics, thus assuming that the data are points in an n-dimensional space. Distance measurements play an important role in grouping data points. Choosing the correct distance measure
more » ... or a given data set is not a trivial problem, and requires some prior knowledge to carry out this process in a good way. In this work, several of the most well-known distance measurements are studied and implemented today, such as the Mahalanobis distance, the Euclidean distance, the Manhattan distance, and the cosine distance, which attracted the interest of the authors by name, despite not being as well known as the other three, said the analysis was carried out with the aim of observing the application of these distances in real life with a set of real data taken from daily actions of companies from yahoo finance.
doi:10.31219/osf.io/ckqng fatcat:chjk6zslfbe55bfvecdlvzsaka