img

Wechat

Adv. Search

Gold Science and Technology ›› 2023, Vol. 31 ›› Issue (5): 721-735.doi: 10.11872/j.issn.1005-2518.2023.05.063

• Mineral Exploration and Resource Evaluation • Previous Articles     Next Articles

ADASYN-CatBoost Method for Intelligent Identification of Logging Lithology Considering Unbalanced Data:A Case Study of Zhaoxian Gold Deposit in Northwestern Jiaodong Peninsula

Fangying XU1,2(),Yanhong ZOU1,2(),Zhuowei YI1,2,Fuqiang YANG1,2,Xiancheng MAO1,2   

  1. 1.Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring, Ministry of Education, Central South University, Changsha 410083, Hunan, China
    2.School of Geosciences and Info-Physics, Central South University, Changsha 410083, Hunan, China
  • Received:2023-04-24 Revised:2023-06-30 Online:2023-10-31 Published:2023-11-21
  • Contact: Yanhong ZOU E-mail:205012135@csu.edu.cn;zouyanhong@csu.edu.cn

Abstract:

Logging lithology identification is helpful to quickly and accurately identify the underlying strata and rock mass in the overburden area,which is of great significance to the geological prospecting exploration of metal mines. Based on the actual logging data of the Zhaoxian gold deposit in the northwest of Jiaodong Peninsula,this paper combined machine learning methods to research on intelligent identification of lithology. In view of the diversity and non-equilibrium of lithology distribution of complex rock formations in the deposit,considering the strong non-linear relationship between logging response and lithology,this paper proposed an intelligent identification method for logging lithology based on ADASYN imbalanced data processing and CatBoost machine learning.Firstly,the ADASYN algorithm was used to process the unbalanced logging sample data and generate synthetic samples according to the weighted distribution of small class samples. Then,the CatBoost algorithm was used to construct a machine learning model between logging characteristic and lithology. The validation curve was used to determine the hyperparametric grid search range of the model. Parameters were optimized by combining grid search with grid search and 10-fold cross validation to establish the optimal lithology classification model.Finally,the performance of the model was evaluated by indices such as accuracy,recall and F1 score on the test set,while the results of the lithology classification were interpreted by the model output of the feature importance and the partial dependence map.An example was given on the logging data from the Zhaoxian gold deposit in northwest Jiaodong peninsula,the lithology identification and interpretation analysis were conducted on 10 types of lithologies based on sample data equalisation. The model evaluation results show that the accuracy,recall and F1 score on the test set reached 98.21%,98.20% and 98.20%,respectively.CatBoost lithology classification was compared with GBDT and LightGBM algorithms,and the results show that CatBoost classifier has the best performance and is superior to the lithology recognition effect of sample data without equalization processing.The comparison with the lithology of example logging section cores verifies the validity of the model classification results.The results of the feature importance of the model output indicate that the logging features contribute to lithology classification are resistivity,natural potential and natural gamma.The strong correlation between these logging features and the identification of the lithology is a good indication of further mineralization.

Key words: lithology identification, ADASYN-CatBoost, logging, unbalanced data, machine learning, Zhaoxian gold deposit

CLC Number: 

  • P631.81

Fig.1

Flow chart of intelligent lithology identification for unbalanced logging data"

Fig.2

Schematic diagram of ADASYN algorithm (Elnahas et al.,2021)"

Fig.3

Geological map of Zhaoxian gold deposit in Northwest Jiaodong (modified after Yang et al.,2016)"

Fig.4

Logging curves and observed lithology"

Table 1

Part of logging data training set"

电阻率/(Ω·m)自然伽马/API自然电位/mV岩性编码
84.630.823.04钾化绢英岩化花岗质碎裂岩10
79.254.623.10黄铁绢英岩化碎裂岩9
9053.26.09绢英岩化花岗质碎裂岩4
76.540.613.94绢英岩化花岗岩6
9923.827.70中粒含黑云二长花岗岩3
?????
94.543.419.03含黑云二长花岗岩7
83.732.216.97钾化绢英岩化花岗质碎裂岩10
1 584.939.24.74钾化花岗质碎裂岩2
75.639.212.01绢英岩化花岗岩6
93.654.621.97中粒含黑云二长花岗岩3

Table 2

Sample statistics for each lithological category in example logging data"

岩性类别样本数/个
处理前ADASYN处理后
总计2 60910 540
含角闪黑云英云闪长岩质片麻岩1831 047
钾化花岗质碎裂岩21731 038
中粒含黑云二长花岗岩31 0451 045
绢英岩化花岗质碎裂岩45701 094
钾化含黑云二长花岗岩5271 047
绢英岩化花岗岩62171 038
含黑云二长花岗岩71401 072
黄铁绢英岩化花岗质碎裂岩81491 073
黄铁绢英岩化碎裂岩9801 050
钾化绢英岩化花岗质碎裂岩101251 036

Fig.5

CatBoost validation curves"

Table 3

Numerical range of hyperparameter of the model and its optimal solution"

分类器超参数搜索范围最优参数
GBDT学习率0.000001~0.50.1
弱学习器个数50~130119
叶子节点最小样本数5~5010
树的最大深度2~3025
LightGBM学习率0.001~0.8000.2
弱学习器个数50~130102
树的最大深度1~5024
树的叶子节点个数15~6046
叶子节点最小数据量5~5530
CatBoost学习率0.001~0.8000.1
树的深度3~1710
最大迭代次数50~500300
L2正则化参数1~201

Fig.6

Comparison of accuracy of the training and test sets of several models"

Table 4

Precision,recall rate and F1 score (weighted average) of lithology identification on the test set"

分类器精确率召回率F1分数
GBDT0.93550.93490.9327
LightGBM0.95610.95540.9552
CatBoost0.95030.96000.9600
ADASYN-GBDT0.94720.94690.9466
ADASYN-LightGBM0.96950.96950.9695
ADASYN-CatBoost0.98210.98200.9820

Fig.7

Confusion matrix diagram of lithology identification of test set"

Fig.8

Verification diagram of lithological identificationresults"

Table 5

Ranking results of feature importance for CatBoost model"

排序特征CatBoost
1电阻率测井52.4%
2自然电位测井28.9%
3自然伽马测井18.7%

Fig.9

Partial dependence diagram of single logging characteristics and lithology"

Fig.10

Partial dependence diagram of logging feature combination and lithology"

Batista G, Prati R C, Monard M C,2004.A study of the behavior of several methods for balancing machine learning training data[J].Acm Sigkdd Explorations Newsletter,6(1):20-29.
Chawla N V, Bowyer K W,Hal 1 L O,et al,2002.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,16:321-357.
Chen Ganghua, Liang Shasha, Wang Jun,et al,2019.Application of convolutional neural network in lithology identification[J].Well Logging Technology,43(2):129-134.
Dawson H L, Olivier D, Cédric M J,2023. Impact of dataset size and convolutional neural network architecture on transfer learning for carbonate rock classification[J].Computers and Geosciences,171:105284.
Elith J, Leathwick J R, Hastie T,2008.A working guide to boosted regression trees[J].Journal of Animal Ecology,77(4):802-813.
Elnahas M M, Hussein M, Keshk A,2021.Imbalanced data over-sampling technique based on convex combination method[J].International Journal of Computers and Information,9(1):15-28.
Friedman J H,2001.Greedy function approximation: A gradient boosting machine[J]. Annals of Statistics,29(5):1189-1232.
Fu Guangming, Yan Jiayong, Zhang Kun,et al,2017.Current status and progress of lithology identification technology[J].Progress in Geophysics,32(1):26-40.
Ge Yunfeng, Zhong Peng, Tang Huiming,et al,2019. Intelligent measurement on geometric information of rock discontinuities based on borehole image[J].Rock and Soil Me-chanics,40(11):4467-4476.
Gu Y F, Bao Z D, Song X,et al,2019.Complex lithology prediction using probabilistic neural network improved by continuous restricted Boltzmann machine and particle swarm optimization[J].Journal of Petroleum Science and Engineering,179:966-978.
Gui Zhou, Chen Jianguo, Wang Chengbin,2017.Classification of imbalance geological data based on PCA-SMOTE algorithm and random forest:A case study of geochemical data from the eastern Tianshan of China[J].Journal of Guilin University of Technology,37(4):587-593.
Han Qidi, Zhang Xiaotong, Shen Wei,2019.Application of support vector machine based on decision tree feature extraction in lithology classification[J].Journal of Jilin University(Earth Science Edition),49(2):611-620.
He H B, Yang B, Garcia E A,et al,2008.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks.Hong Kong:IEEE.
He Y W, Li W R, Dong Z Z,et al,2023.Lithologic identification of complex reservoir based on PSO-LSTM-FCN algorithm[J]. Energies,16(5):2135.
Hui H, Wang W Y, Mao B H,2005.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin,Heidelberg:Springer Berlin Heidelberg.
Jiang J, Fang L, Zhang H B,et al,2022. Adaptive multiexpert learning for lithology recognition[J]. SPE Journal,27(6):3802-3813.
Kang Qiankun, LU Laijun,2020.Application of random forest algorithm in classification of logging lithology[J].Global Geology,39(2):398-405.
Liu J M, Gao Y B, Hu F J,2021.A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM[J].Computers and Security,106:102289.
Liu Ziyun, Wang Xianggong,1989.Determination of lithology through probability statistics[J].Journal of Oil and Gas Technology,(2):35-40.
Liu J J, Liu J C,2022. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs[J].Geoscience Frontiers,13(1):101311.
Qingtian Lü, Zhang Xiaopei, Tang Jingtian,et al,2019. Review on advancement in technology and equipment of geophysical exploration for metallic deposits in China[J].Chinese Journal Geophysics,62(10):3629-3664.
Mou Dan, Wang Zhuwen, Huang Yulong,et al,2015.Lithological identification of volcanic rocks from SVM well logging data:Case study in the eastern depression of Liaohe Basin[J].Chinese Journal of Geophysics,58(5):1785-1793.
Ren X X, Hou J G, Song S H,et al,2019.Lithology identification using well logs:A method by integrating artificial neural networks and sedimentary patterns[J].Journal of Petroleum Science and Engineering,182:1-15.
Sun Jian, Zhou Kui, Ran Xiaofeng,et al,2009.Bayes discriminant analysis method in lithology recognition[J].Journal of Oil and Gas Technology,(2):74-77.
Tian Y, Xu H, Zhang X Y,et al,2016.Multi-resolution graph-based clustering analysis for lithofacies identification from well log data:Case study of intraplatform bank gas fields,Amu Darya Basin[J].Applied Geophysics,13(4):598-607.
Tripathy A, Agrawal A, Rath S K,2016.Classification of sentiment reviews using n-gram machine learning approach[J].Expert Systems with Applications,57:117-126.
Vikrant A D, Mario R E,2019.Formation lithology classification using scalable gradient boosted decision trees[J].Com-puters and Chemical Engineering,128:392-404.
Wang Chuanying, Zhong Sheng, Sun Weichun,2009. Study of connectivity of discontinuities of borehole based on digital borehole images[J].Chinese Journal of Rock Mechanics and Engineering,28(12):2405-2410.
Wang Heng, Jiang Yanan, Zhang Xin,et al,2021.Lithology identification method based on gradient boosting algorithm[J].Journal of Jilin University(Earth Science Edition),51(3):940-950.
Wang X W, Brownlee A, Woodward J R,et al,2021.Aircraft taxi time prediction:Feature importance and their implications[J].Transportation Research Part C:Emerging Techno-logies,124(1):102892.
Wang Yingpeng, Zhu Peigang, Zhang Wen,et al,2022.Geological significances and geochemical compositions of gold and gold-bearing minerals from Zhaoxian deeply-seated gold deposit,Jiaodong area[J].Mineral Deposits,41(2):255-272.
Xu Delong, Li Tao, Huang Baohua,et al,2012.Research on the identification of the lithology and fluid type of foreign oilfield by using the crossplot method[J].Progress in Geophysics,27(3):1123-1132.
Xu T T, Coco G, Neale M,2020.A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning[J].Water Research,177(15):115788.
Xun Zhifeng, Yu Jifeng,2008.The application of cluster and discriminant analyses in logging lithology recognition[J].Jo-urnal of Shandong University of Science and Technology(Natural Science Edition),27(5):10-13.
Yang L Q, Deng J, Guo L N,et al,2016.Origin and evolution of ore fluid,and gold-deposition processes at the giant Taishang gold deposit,Jiaodong Peninsula,Eastern China[J].Ore Geology Reviews,72:585-602.
Yao Jinzhu, Fu Yaoqing, Wang Zhengyong,et al,2014.Identification of cuttings based on color and texture feature[J].Journal of Sichuan University(Natural Science Edition),51(2):313-318.
Zhang H, Yang S, Guo L,et al,2015.Comparisons of isomiR patterns and classification performance using the rank-based MANOVA and 10-fold cross-validation[J].Gene,569(1):21-26.
Zhang Tao, Li Yanping, Liu Xiaoyu,et al,2023.Lithology interpretation of deep metamorphic rocks with well logging based on APSO-LSSVM algorithm[J].Progress in Geophysics,38(1):382-392.
Zhang Xuchun,2021.Based on the CatBoost Model to Realize Monitoring and Early Warning for Discharge Situation of the Sewage Treatment Plant[D].Lanzhou:Lanzhou University.
Zhao Jian, Gao Fuhong,2003.Application of crossplots based on well log data in identifying volcanic lithology[J].Global Geology,(2):136-140.
Zhao S W, Zhou J H, Yang G R,2019.Averaging estimators for discrete choice by M-fold cross-validation[J].Economics Letters,174:65-69.
Zhao Xianling, Wang Guiwen, Zhou Zhenglong,et al,2015.A review of lithology interpretation methods using geophysical well logs[J].Progress in Geophysics,30(3):1278-1287.
Zheng J, Wang Y, Xu W,et al,2020.GSSA:Pay attention to graph feature importance for GCN via statistical self-attention[J].Neurocomputing,417:458-470.
Zhu L P, Li H Q, Yang Z G,et al,2018.Intelligent logging lithological interpretation with convolution neural networks[J].Petrophysics,59(6):799-810.
Zhu X Z, Wan Z H, Tsang D C,et al,2020.Machine learning for the selection of carbon-based materials for tetracycline and sulfamethoxazole adsorption[J].Chemical Engineering Jou-rnal,406:126782.
Zou Y H, Chen Y T, Deng H,2021.Gradient boosting decision tree for lithology identification with well logs:A case study of Zhaoxian gold deposit,Shandong Peninsula,China[J].Natural Resources Research,30(5):3197-3217.
陈钢花,梁莎莎,王军,等,2019.卷积神经网络在岩性识别中的应用[J].测井技术,43(2):129-134.
付光明,严加永,张昆,等,2017.岩性识别技术现状与进展[J].地球物理学进展,32(1):26-40.
葛云峰,钟鹏,唐辉明,等,2019.基于钻孔图像的岩体结构面几何信息智能测量[J].岩土力学,40(11):4467-4476.
桂州,陈建国,王成彬,2017.基于PCA-SMOTE-随机森林的地质不平衡数据分类方法——以东天山地球化学数据为例[J]. 桂林理工大学学报,37(4):587-593.
韩启迪,张小桐,申维,2019.基于决策树特征提取的支持向量机在岩性分类中的应用[J].吉林大学学报(地球科学版),49(2):611-620.
康乾坤,路来君,2020.随机森林算法在测井岩性分类中的应用[J].世界地质,39(2):398-405.
刘子云,王向公,1989.利用概率统计方法判断岩性[J].石油天然气学报,(2):35-40.
吕庆田,张晓培,汤井田,等,2019.金属矿地球物理勘探技术与设备:回顾与进展[J].地球物理学报,62(10):3629-3664.
牟丹,王祝文,黄玉龙,等,2015.基于SVM测井数据的火山岩岩性识别——以辽河盆地东部坳陷为例[J]. 地球物理学报,58(5):1785-1793.
孙健,周魁,冉小丰,等,2009.Bayes判别分析方法在岩性识别中的应用[J].石油天然气学报,(2):74-77.
王川婴,钟声,孙卫春,2009.基于数字钻孔图像的结构面连通性研究[J].岩石力学与工程学报,28(12):2405-2410.
王恒,姜亚楠,张欣,等,2021.基于梯度提升算法的岩性识别方法[J].吉林大学学报(地球科学版),51(3):940-950.
王英鹏,祝培刚,张文,等,2022.胶东地区招贤深部金矿床金和载金矿物化学成分及其地质意义[J].矿床地质,41(2):255-272.
徐德龙,李涛,黄宝华,等,2012.利用交会图法识别国外M油田岩性与流体类型的研究[J].地球物理学进展,27(3):1123-1132.
寻知锋,余继峰,2008.聚类和判别分析在测井岩性识别中的应用[J].山东科技大学学报(自然科学版),27(5):10-13.
姚金铸,符耀庆,王正勇,等,2014.基于颜色特征和纹理特征的岩屑岩性识别[J].四川大学学报(自然科学版),51(2):313-318.
张涛,李艳萍,刘晓宇,等,2023.基于自适应粒子群优化最小二乘支持向量机的深层变质岩测井岩性识别[J].地球物理学进展,38(1):382-392.
张旭春,2021.基于CatBoost模型实现对污水处理厂排污情况的监测预警[D].兰州:兰州大学.
赵建,高福红,2003.测井资料交会图法在火山岩岩性识别中的应用[J].世界地质,(2):136-140.
赵显令,王贵文,周正龙,等,2015.地球物理测井岩性解释方法综述[J].地球物理学进展,30(3):1278-1287.
[1] Jianhua HU,Mengmeng GUO,Tan ZHOU,Tao ZHANG. Rock Mass Quality Evaluation Model Based on Improved Transfer Learning Algorithm [J]. Gold Science and Technology, 2021, 29(6): 826-833.
[2] Rui TIAN,Haidong MENG,Shijiang CHEN,Chuangye WANG,Dening SUN,Lei SHI. Comparative Study on Three Rockburst Prediction Models of Intensity Classi-fication Based on Machine Learning [J]. Gold Science and Technology, 2020, 28(6): 920-929.
[3] Mufan WANG,Zhouquan LUO,Qi YU. Stability Prediction of Goaf Based on Stacking Model [J]. Gold Science and Technology, 2020, 28(6): 894-901.
[4] Zhiqin LIAO, Liguan WANG, Zhengxiang HE. Feature Extraction and Classification of Mine Microseismic Signals Based on EEMD and Correlation Dimension [J]. Gold Science and Technology, 2020, 28(4): 585-594.
[5] SUN Huanying,LI Yongsheng,XIANG Yongsheng,WU Fadong,XIA Guobin. Actuality Analysis and Resolve Scheme on Digital Geological Logging System of Prospecting Engineering [J]. J4, 2010, 18(5): 43-46.
[6] LI Yong-Sheng, SUN Huan-Yang, BAI Qing, YIN Zhong, HUANG Hui. Research on Data Model and Realization Method of Digital Geological Logging of Trenching [J]. J4, 2009, 17(3): 20-24.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!