Automatic Lithofacies Classification with t-SNE and K-Nearest Neighbors Algorithm

Guilherme Loriato Potratz, Smith Washington Arauco Canchumuni, Jose David Bermudez Castro, Júlia Potratz, Marco Aurélio C. Pacheco
2021 Anuário do Instituto de Geociências  
One of the critical processes in the exploration of hydrocarbons is the identification and prediction of lithofacies that constitute the reservoir. One of the cheapest and most efficient ways to carry out that process is from the interpretation of well log data, which are often obtained continuously and in the majority of drilled wells. The main methodologies used to correlate log data to data obtained in well cores are based on statistical analyses, machine learning models and artificial
more » ... nd artificial neural networks. This study aims to test an algorithm of dimension reduction of data together with an unsupervised classification method of predicting lithofacies automatically. The performance of the methodology presented was compared to predictions made with artificial neural networks. We used the t-Distributed Stochastic Neighbor Embedding (t-SNE) as an algorithm for mapping the wells logging data in a smaller feature space. Then, the predictions of facies are performed using a KNN algorithm. The method is assessed in the public dataset of the Hugoton and Panoma fields. Prediction of facies through traditional artificial neural networks obtained an accuracy of 69%, where facies predicted through the t-SNE + K-NN algorithm obtained an accuracy of 79%. Considering the nature of the data, which have high dimensionality and are not linearly correlated, the efficiency of t SNE+KNN can be explained by the ability of the algorithm to identify hidden patterns in a fuzzy boundary in data set. It is important to stress that the application of machine learning algorithms offers relevant benefits to the hydrocarbon exploration sector, such as identifying hidden patterns in high-dimensional datasets, searching for complex and non-linear relationships, and avoiding the need for a preliminary definition of mathematic relations among the model's input data.
doi:10.11137/1982-3908_2021_44_35024 fatcat:ho42n7b3kfh5zlfvwgbgn3ao5y