Feature selection for reservoir analogues similarity ranking as model-based causal inference
Abstract
Petroleum geoscience has always been an important part of Earth sciences with a focus on investigating subsurface reservoirs of oil and gas. It is conventional to find teleconnections and long-distance dependencies in climate, environmental and social sciences based on available spatially distributed data across the globe. In the same token, similar attempts in subsurface geoscience face numerous obstacles, including very sparse and confidential datasets, a massive stratigraphic time scale of geological events, and scanty classification of the enormous variety of sedimentological and lithological concepts. One of the few data-driven workflows widely accepted inside the petroleum industry is the analysis of reservoir analogs in order to estimate missing values in available data and transfer geological assumptions and development strategies from similar reservoirs. Indeed, most of the datasets suffers with a high number of missing parameters compromising the quality of the predictions. Therefore, the similarity ranking of reservoirs and their formations had a few successful implementations recently. To tackle this issue, we propose enhanced feature selection approaches for similarity ranking enabling to perform robust missing values imputation and visual analytics along with discovering insightful causal relationships between reservoir parameters. Similarity measures for different reservoirs are the primary tool to obtain a ranking of analogs formations. The measure must be constrained for working with categorical and continuous features as key geological parameters (up to 200 parameters in dataset). We conducted several sensitivity analyses of various similarity measures, including a combined approach of Gower function with weighted parameters. The selection of relevant features highly depends on the response feature, which, in our case, is the recovery factor for hydrocarbon reserves. We employed Boruta and SHAP methods for feature selection process based on similarity ranking of reservoir analogs, which allow us to delineate causal relationships between petrophysical parameters across petroleum basins worldwide. Methodology is tested on a few target reservoirs from Middle East region with more than a thousand reservoirs available for ranking.