Performance analysis of random forest and support vector machine models in predicting pore pressure from well-log data
Palavras-chave:
pore pressure prediction, machine learning, random forest, SVMResumo
Pore pressure (PP) prediction is critical for well drilling operations and oil reservoir characterization
and management. Recent advances in the development of Machine Learning (ML) models have led to a growing
application of these methods for pore pressure prediction using well log records. In this work, we have evaluated
the performance of two ML models for the task of PP prediction, one based on Random forest (RF) and another
based on Support vector machine (SVM). The study used geophysical logs (Gamma-ray, Sonic, and Density) of
stratigraphic wells drilled in the offshore Sergipe Basin, NE Brazil, to predict the PP in the regional sedimentary
column of the basin. The values obtained by the ML models were compared with values of PP obtained by
classic approaches used in the industry to establish the actual accuracy of the methods tested. We divided the
data used in the study in training and testing into the proportion of 70% and 30%, respectively. We also used the
metrics Mean square error – MSE and R-squared to evaluate the performance. The MSE of the SVM model was
about one order of magnitude greater than that obtained by the RF in the training data. The validation data
showed a similar result. This behavior appeared for different training data sizes, which shows the invariability of
the relative performance of the models related to the amount of data used. Another aspect observed was the
scalability of the models. The results show that the RF model presents a linear behavior concerning the model
fitting time as a function of the amount of data, while the SVM model has an exponential behavior. Finally, in
the test data, the RF model presented better results in all evaluated metrics, with an MSE of about 90%, which
was lower than that obtained by the SVM model. By comparing the values predicted by the models and the
actual values, the RF model has an r-squared of 0.99, while the SVM model has an r-squared of 0.96. Thus, the
performance of the RF model was superior to that of the SVM in all treated aspects.