ADASYN-CatBoost Method for Intelligent Identification of Logging Lithology Considering Unbalanced Data:A Case Study of Zhaoxian Gold Deposit in Northwestern Jiaodong Peninsula
Received date: 2023-04-24
Revised date: 2023-06-30
Online published: 2023-11-21
Logging lithology identification is helpful to quickly and accurately identify the underlying strata and rock mass in the overburden area,which is of great significance to the geological prospecting exploration of metal mines. Based on the actual logging data of the Zhaoxian gold deposit in the northwest of Jiaodong Peninsula,this paper combined machine learning methods to research on intelligent identification of lithology. In view of the diversity and non-equilibrium of lithology distribution of complex rock formations in the deposit,considering the strong non-linear relationship between logging response and lithology,this paper proposed an intelligent identification method for logging lithology based on ADASYN imbalanced data processing and CatBoost machine learning.Firstly,the ADASYN algorithm was used to process the unbalanced logging sample data and generate synthetic samples according to the weighted distribution of small class samples. Then,the CatBoost algorithm was used to construct a machine learning model between logging characteristic and lithology. The validation curve was used to determine the hyperparametric grid search range of the model. Parameters were optimized by combining grid search with grid search and 10-fold cross validation to establish the optimal lithology classification model.Finally,the performance of the model was evaluated by indices such as accuracy,recall and F1 score on the test set,while the results of the lithology classification were interpreted by the model output of the feature importance and the partial dependence map.An example was given on the logging data from the Zhaoxian gold deposit in northwest Jiaodong peninsula,the lithology identification and interpretation analysis were conducted on 10 types of lithologies based on sample data equalisation. The model evaluation results show that the accuracy,recall and F1 score on the test set reached 98.21%,98.20% and 98.20%,respectively.CatBoost lithology classification was compared with GBDT and LightGBM algorithms,and the results show that CatBoost classifier has the best performance and is superior to the lithology recognition effect of sample data without equalization processing.The comparison with the lithology of example logging section cores verifies the validity of the model classification results.The results of the feature importance of the model output indicate that the logging features contribute to lithology classification are resistivity,natural potential and natural gamma.The strong correlation between these logging features and the identification of the lithology is a good indication of further mineralization.
Fangying XU , Yanhong ZOU , Zhuowei YI , Fuqiang YANG , Xiancheng MAO . ADASYN-CatBoost Method for Intelligent Identification of Logging Lithology Considering Unbalanced Data:A Case Study of Zhaoxian Gold Deposit in Northwestern Jiaodong Peninsula[J]. Gold Science and Technology, 2023 , 31(5) : 721 -735 . DOI: 10.11872/j.issn.1005-2518.2023.05.063
null | Batista G, Prati R C, Monard M C,2004.A study of the behavior of several methods for balancing machine learning training data[J].Acm Sigkdd Explorations Newsletter,6(1):20-29. |
null | Chawla N V, Bowyer K W,Hal 1 L O,et al,2002.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,16:321-357. |
null | Chen Ganghua, Liang Shasha, Wang Jun,et al,2019.Application of convolutional neural network in lithology identification[J].Well Logging Technology,43(2):129-134. |
null | Dawson H L, Olivier D, Cédric M J,2023. Impact of dataset size and convolutional neural network architecture on transfer learning for carbonate rock classification[J].Computers and Geosciences,171:105284. |
null | Elith J, Leathwick J R, Hastie T,2008.A working guide to boosted regression trees[J].Journal of Animal Ecology,77(4):802-813. |
null | Elnahas M M, Hussein M, Keshk A,2021.Imbalanced data over-sampling technique based on convex combination method[J].International Journal of Computers and Information,9(1):15-28. |
null | Friedman J H,2001.Greedy function approximation: A gradient boosting machine[J]. Annals of Statistics,29(5):1189-1232. |
null | Fu Guangming, Yan Jiayong, Zhang Kun,et al,2017.Current status and progress of lithology identification technology[J].Progress in Geophysics,32(1):26-40. |
null | Ge Yunfeng, Zhong Peng, Tang Huiming,et al,2019. Intelligent measurement on geometric information of rock discontinuities based on borehole image[J].Rock and Soil Me-chanics,40(11):4467-4476. |
null | Gu Y F, Bao Z D, Song X,et al,2019.Complex lithology prediction using probabilistic neural network improved by continuous restricted Boltzmann machine and particle swarm optimization[J].Journal of Petroleum Science and Engineering,179:966-978. |
null | Gui Zhou, Chen Jianguo, Wang Chengbin,2017.Classification of imbalance geological data based on PCA-SMOTE algorithm and random forest:A case study of geochemical data from the eastern Tianshan of China[J].Journal of Guilin University of Technology,37(4):587-593. |
null | Han Qidi, Zhang Xiaotong, Shen Wei,2019.Application of support vector machine based on decision tree feature extraction in lithology classification[J].Journal of Jilin University(Earth Science Edition),49(2):611-620. |
null | He H B, Yang B, Garcia E A,et al,2008.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks.Hong Kong:IEEE. |
null | He Y W, Li W R, Dong Z Z,et al,2023.Lithologic identification of complex reservoir based on PSO-LSTM-FCN algorithm[J]. Energies,16(5):2135. |
null | Hui H, Wang W Y, Mao B H,2005.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin,Heidelberg:Springer Berlin Heidelberg. |
null | Jiang J, Fang L, Zhang H B,et al,2022. Adaptive multiexpert learning for lithology recognition[J]. SPE Journal,27(6):3802-3813. |
null | Kang Qiankun, LU Laijun,2020.Application of random forest algorithm in classification of logging lithology[J].Global Geology,39(2):398-405. |
null | Liu J M, Gao Y B, Hu F J,2021.A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM[J].Computers and Security,106:102289. |
null | Liu Ziyun, Wang Xianggong,1989.Determination of lithology through probability statistics[J].Journal of Oil and Gas Technology,(2):35-40. |
null | Liu J J, Liu J C,2022. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs[J].Geoscience Frontiers,13(1):101311. |
null | Qingtian Lü, Zhang Xiaopei, Tang Jingtian,et al,2019. Review on advancement in technology and equipment of geophysical exploration for metallic deposits in China[J].Chinese Journal Geophysics,62(10):3629-3664. |
null | Mou Dan, Wang Zhuwen, Huang Yulong,et al,2015.Lithological identification of volcanic rocks from SVM well logging data:Case study in the eastern depression of Liaohe Basin[J].Chinese Journal of Geophysics,58(5):1785-1793. |
null | Ren X X, Hou J G, Song S H,et al,2019.Lithology identification using well logs:A method by integrating artificial neural networks and sedimentary patterns[J].Journal of Petroleum Science and Engineering,182:1-15. |
null | Sun Jian, Zhou Kui, Ran Xiaofeng,et al,2009.Bayes discriminant analysis method in lithology recognition[J].Journal of Oil and Gas Technology,(2):74-77. |
null | Tian Y, Xu H, Zhang X Y,et al,2016.Multi-resolution graph-based clustering analysis for lithofacies identification from well log data:Case study of intraplatform bank gas fields,Amu Darya Basin[J].Applied Geophysics,13(4):598-607. |
null | Tripathy A, Agrawal A, Rath S K,2016.Classification of sentiment reviews using n-gram machine learning approach[J].Expert Systems with Applications,57:117-126. |
null | Vikrant A D, Mario R E,2019.Formation lithology classification using scalable gradient boosted decision trees[J].Com-puters and Chemical Engineering,128:392-404. |
null | Wang Chuanying, Zhong Sheng, Sun Weichun,2009. Study of connectivity of discontinuities of borehole based on digital borehole images[J].Chinese Journal of Rock Mechanics and Engineering,28(12):2405-2410. |
null | Wang Heng, Jiang Yanan, Zhang Xin,et al,2021.Lithology identification method based on gradient boosting algorithm[J].Journal of Jilin University(Earth Science Edition),51(3):940-950. |
null | Wang X W, Brownlee A, Woodward J R,et al,2021.Aircraft taxi time prediction:Feature importance and their implications[J].Transportation Research Part C:Emerging Techno-logies,124(1):102892. |
null | Wang Yingpeng, Zhu Peigang, Zhang Wen,et al,2022.Geological significances and geochemical compositions of gold and gold-bearing minerals from Zhaoxian deeply-seated gold deposit,Jiaodong area[J].Mineral Deposits,41(2):255-272. |
null | Xu Delong, Li Tao, Huang Baohua,et al,2012.Research on the identification of the lithology and fluid type of foreign oilfield by using the crossplot method[J].Progress in Geophysics,27(3):1123-1132. |
null | Xu T T, Coco G, Neale M,2020.A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning[J].Water Research,177(15):115788. |
null | Xun Zhifeng, Yu Jifeng,2008.The application of cluster and discriminant analyses in logging lithology recognition[J].Jo-urnal of Shandong University of Science and Technology(Natural Science Edition),27(5):10-13. |
null | Yang L Q, Deng J, Guo L N,et al,2016.Origin and evolution of ore fluid,and gold-deposition processes at the giant Taishang gold deposit,Jiaodong Peninsula,Eastern China[J].Ore Geology Reviews,72:585-602. |
null | Yao Jinzhu, Fu Yaoqing, Wang Zhengyong,et al,2014.Identification of cuttings based on color and texture feature[J].Journal of Sichuan University(Natural Science Edition),51(2):313-318. |
null | Zhang H, Yang S, Guo L,et al,2015.Comparisons of isomiR patterns and classification performance using the rank-based MANOVA and 10-fold cross-validation[J].Gene,569(1):21-26. |
null | Zhang Tao, Li Yanping, Liu Xiaoyu,et al,2023.Lithology interpretation of deep metamorphic rocks with well logging based on APSO-LSSVM algorithm[J].Progress in Geophysics,38(1):382-392. |
null | Zhang Xuchun,2021.Based on the CatBoost Model to Realize Monitoring and Early Warning for Discharge Situation of the Sewage Treatment Plant[D].Lanzhou:Lanzhou University. |
null | Zhao Jian, Gao Fuhong,2003.Application of crossplots based on well log data in identifying volcanic lithology[J].Global Geology,(2):136-140. |
null | Zhao S W, Zhou J H, Yang G R,2019.Averaging estimators for discrete choice by M-fold cross-validation[J].Economics Letters,174:65-69. |
null | Zhao Xianling, Wang Guiwen, Zhou Zhenglong,et al,2015.A review of lithology interpretation methods using geophysical well logs[J].Progress in Geophysics,30(3):1278-1287. |
null | Zheng J, Wang Y, Xu W,et al,2020.GSSA:Pay attention to graph feature importance for GCN via statistical self-attention[J].Neurocomputing,417:458-470. |
null | Zhu L P, Li H Q, Yang Z G,et al,2018.Intelligent logging lithological interpretation with convolution neural networks[J].Petrophysics,59(6):799-810. |
null | Zhu X Z, Wan Z H, Tsang D C,et al,2020.Machine learning for the selection of carbon-based materials for tetracycline and sulfamethoxazole adsorption[J].Chemical Engineering Jou-rnal,406:126782. |
null | Zou Y H, Chen Y T, Deng H,2021.Gradient boosting decision tree for lithology identification with well logs:A case study of Zhaoxian gold deposit,Shandong Peninsula,China[J].Natural Resources Research,30(5):3197-3217. |
null | 陈钢花,梁莎莎,王军,等,2019.卷积神经网络在岩性识别中的应用[J].测井技术,43(2):129-134. |
null | 付光明,严加永,张昆,等,2017.岩性识别技术现状与进展[J].地球物理学进展,32(1):26-40. |
null | 葛云峰,钟鹏,唐辉明,等,2019.基于钻孔图像的岩体结构面几何信息智能测量[J].岩土力学,40(11):4467-4476. |
null | 桂州,陈建国,王成彬,2017.基于PCA-SMOTE-随机森林的地质不平衡数据分类方法——以东天山地球化学数据为例[J]. 桂林理工大学学报,37(4):587-593. |
null | 韩启迪,张小桐,申维,2019.基于决策树特征提取的支持向量机在岩性分类中的应用[J].吉林大学学报(地球科学版),49(2):611-620. |
null | 康乾坤,路来君,2020.随机森林算法在测井岩性分类中的应用[J].世界地质,39(2):398-405. |
null | 刘子云,王向公,1989.利用概率统计方法判断岩性[J].石油天然气学报,(2):35-40. |
null | 吕庆田,张晓培,汤井田,等,2019.金属矿地球物理勘探技术与设备:回顾与进展[J].地球物理学报,62(10):3629-3664. |
null | 牟丹,王祝文,黄玉龙,等,2015.基于SVM测井数据的火山岩岩性识别——以辽河盆地东部坳陷为例[J]. 地球物理学报,58(5):1785-1793. |
null | 孙健,周魁,冉小丰,等,2009.Bayes判别分析方法在岩性识别中的应用[J].石油天然气学报,(2):74-77. |
null | 王川婴,钟声,孙卫春,2009.基于数字钻孔图像的结构面连通性研究[J].岩石力学与工程学报,28(12):2405-2410. |
null | 王恒,姜亚楠,张欣,等,2021.基于梯度提升算法的岩性识别方法[J].吉林大学学报(地球科学版),51(3):940-950. |
null | 王英鹏,祝培刚,张文,等,2022.胶东地区招贤深部金矿床金和载金矿物化学成分及其地质意义[J].矿床地质,41(2):255-272. |
null | 徐德龙,李涛,黄宝华,等,2012.利用交会图法识别国外M油田岩性与流体类型的研究[J].地球物理学进展,27(3):1123-1132. |
null | 寻知锋,余继峰,2008.聚类和判别分析在测井岩性识别中的应用[J].山东科技大学学报(自然科学版),27(5):10-13. |
null | 姚金铸,符耀庆,王正勇,等,2014.基于颜色特征和纹理特征的岩屑岩性识别[J].四川大学学报(自然科学版),51(2):313-318. |
null | 张涛,李艳萍,刘晓宇,等,2023.基于自适应粒子群优化最小二乘支持向量机的深层变质岩测井岩性识别[J].地球物理学进展,38(1):382-392. |
null | 张旭春,2021.基于CatBoost模型实现对污水处理厂排污情况的监测预警[D].兰州:兰州大学. |
null | 赵建,高福红,2003.测井资料交会图法在火山岩岩性识别中的应用[J].世界地质,(2):136-140. |
null | 赵显令,王贵文,周正龙,等,2015.地球物理测井岩性解释方法综述[J].地球物理学进展,30(3):1278-1287. |
/
〈 | 〉 |