img

QQ群聊

img

官方微信

高级检索

黄金科学技术 ›› 2023, Vol. 31 ›› Issue (5): 721-735.doi: 10.11872/j.issn.1005-2518.2023.05.063

• 矿产勘查与资源评价 • 上一篇    下一篇

基于非均衡数据的ADASYN-CatBoost测井岩性智能识别——以胶西北招贤金矿床为例

许方颖1,2(),邹艳红1,2(),易卓炜1,2,杨福强1,2,毛先成1,2   

  1. 1.中南大学有色金属成矿预测与地质环境监测教育部重点实验室,湖南 长沙 410083
    2.中南大学地球科学与信息物理学院,湖南 长沙 410083
  • 收稿日期:2023-04-24 修回日期:2023-06-30 出版日期:2023-10-31 发布日期:2023-11-21
  • 通讯作者: 邹艳红 E-mail:205012135@csu.edu.cn;zouyanhong@csu.edu.cn
  • 作者简介:许方颖(1999-),女,湖南岳阳人,硕士研究生,从事三维地质建模研究工作。205012135@csu.edu.cn
  • 基金资助:
    国家自然科学基金项目“断裂控制热液蚀变及其成矿过程动力学计算模拟——以胶东焦家式金矿为例”(41872249);“矿床时空结构定量表征与智能理解”(42030809);湖南省科技创新计划项目“关键金属资源勘查创新团队”(2021RC4055)

ADASYN-CatBoost Method for Intelligent Identification of Logging Lithology Considering Unbalanced Data:A Case Study of Zhaoxian Gold Deposit in Northwestern Jiaodong Peninsula

Fangying XU1,2(),Yanhong ZOU1,2(),Zhuowei YI1,2,Fuqiang YANG1,2,Xiancheng MAO1,2   

  1. 1.Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring, Ministry of Education, Central South University, Changsha 410083, Hunan, China
    2.School of Geosciences and Info-Physics, Central South University, Changsha 410083, Hunan, China
  • Received:2023-04-24 Revised:2023-06-30 Online:2023-10-31 Published:2023-11-21
  • Contact: Yanhong ZOU E-mail:205012135@csu.edu.cn;zouyanhong@csu.edu.cn

摘要:

快速准确地识别覆盖区下伏地层与岩体,对于金属矿山地质找矿工作具有重要意义。针对矿床地层与岩体中复杂岩性分布的多样性和非均衡性,考虑测井响应特征与岩性之间的强非线性关系,提出了一种基于ADASYN非均衡数据处理和CatBoost机器学习的测井岩性智能识别方法。首先,利用ADASYN算法处理非均衡测井样本数据,根据小类样本加权分布生成合成样本;然后,采用CatBoost算法结合网格搜索以及十折交叉验证建立最优岩性识别模型;最后,通过模型输出的特征重要性及部分依赖图对岩性分类结果进行解译。以胶西北招贤金矿床实例测井数据为基础,针对10类岩性进行识别和解译分析,模型评价结果表明:测试集上的精确率、召回率和F1分数分别达到98.21%、98.20%和98.20%。将CatBoost岩性分类与GBDT、LightGBM算法进行对比,结果表明CatBoost分类效果最优,且均优于样本数据未均衡化处理的岩性识别效果。通过与实例录井剖面岩芯岩性进行对比,验证了模型分类结果的有效性。

关键词: 岩性识别, ADASYN-CatBoost, 测井, 非均衡数据, 机器学习, 招贤金矿床

Abstract:

Logging lithology identification is helpful to quickly and accurately identify the underlying strata and rock mass in the overburden area,which is of great significance to the geological prospecting exploration of metal mines. Based on the actual logging data of the Zhaoxian gold deposit in the northwest of Jiaodong Peninsula,this paper combined machine learning methods to research on intelligent identification of lithology. In view of the diversity and non-equilibrium of lithology distribution of complex rock formations in the deposit,considering the strong non-linear relationship between logging response and lithology,this paper proposed an intelligent identification method for logging lithology based on ADASYN imbalanced data processing and CatBoost machine learning.Firstly,the ADASYN algorithm was used to process the unbalanced logging sample data and generate synthetic samples according to the weighted distribution of small class samples. Then,the CatBoost algorithm was used to construct a machine learning model between logging characteristic and lithology. The validation curve was used to determine the hyperparametric grid search range of the model. Parameters were optimized by combining grid search with grid search and 10-fold cross validation to establish the optimal lithology classification model.Finally,the performance of the model was evaluated by indices such as accuracy,recall and F1 score on the test set,while the results of the lithology classification were interpreted by the model output of the feature importance and the partial dependence map.An example was given on the logging data from the Zhaoxian gold deposit in northwest Jiaodong peninsula,the lithology identification and interpretation analysis were conducted on 10 types of lithologies based on sample data equalisation. The model evaluation results show that the accuracy,recall and F1 score on the test set reached 98.21%,98.20% and 98.20%,respectively.CatBoost lithology classification was compared with GBDT and LightGBM algorithms,and the results show that CatBoost classifier has the best performance and is superior to the lithology recognition effect of sample data without equalization processing.The comparison with the lithology of example logging section cores verifies the validity of the model classification results.The results of the feature importance of the model output indicate that the logging features contribute to lithology classification are resistivity,natural potential and natural gamma.The strong correlation between these logging features and the identification of the lithology is a good indication of further mineralization.

Key words: lithology identification, ADASYN-CatBoost, logging, unbalanced data, machine learning, Zhaoxian gold deposit

中图分类号: 

  • P631.81

图1

面向非平衡测井数据的岩性智能识别流程图"

图2

ADASYN算法示意图(Elnahas et al.,2021)"

图3

胶西北招贤金矿床地质简图(修改自Yang et al.,2016)1.第四系;2.郭家岭序列;3.玲珑序列;4.马连庄序列;5.破碎蚀变带;6.断裂;7.金矿床;8.研究区"

图4

测井曲线和观察的岩性1.黄铁绢英岩化花岗质碎裂岩;2.绢英岩化花岗质碎裂岩;3.钾化绢英岩化花岗质碎裂岩"

表1

部分测井数据训练集"

电阻率/(Ω·m)自然伽马/API自然电位/mV岩性编码
84.630.823.04钾化绢英岩化花岗质碎裂岩10
79.254.623.10黄铁绢英岩化碎裂岩9
9053.26.09绢英岩化花岗质碎裂岩4
76.540.613.94绢英岩化花岗岩6
9923.827.70中粒含黑云二长花岗岩3
?????
94.543.419.03含黑云二长花岗岩7
83.732.216.97钾化绢英岩化花岗质碎裂岩10
1 584.939.24.74钾化花岗质碎裂岩2
75.639.212.01绢英岩化花岗岩6
93.654.621.97中粒含黑云二长花岗岩3

表2

实例测井数据中各岩性类别对应的样本统计"

岩性类别样本数/个
处理前ADASYN处理后
总计2 60910 540
含角闪黑云英云闪长岩质片麻岩1831 047
钾化花岗质碎裂岩21731 038
中粒含黑云二长花岗岩31 0451 045
绢英岩化花岗质碎裂岩45701 094
钾化含黑云二长花岗岩5271 047
绢英岩化花岗岩62171 038
含黑云二长花岗岩71401 072
黄铁绢英岩化花岗质碎裂岩81491 073
黄铁绢英岩化碎裂岩9801 050
钾化绢英岩化花岗质碎裂岩101251 036

图5

CatBoost验证曲线图"

表3

模型的超参数数值范围及其最优解"

分类器超参数搜索范围最优参数
GBDT学习率0.000001~0.50.1
弱学习器个数50~130119
叶子节点最小样本数5~5010
树的最大深度2~3025
LightGBM学习率0.001~0.8000.2
弱学习器个数50~130102
树的最大深度1~5024
树的叶子节点个数15~6046
叶子节点最小数据量5~5530
CatBoost学习率0.001~0.8000.1
树的深度3~1710
最大迭代次数50~500300
L2正则化参数1~201

图6

几种模型训练集和测试集的准确率对比"

表4

测试集上岩性识别精确率、召回率和F1分数(加权平均)"

分类器精确率召回率F1分数
GBDT0.93550.93490.9327
LightGBM0.95610.95540.9552
CatBoost0.95030.96000.9600
ADASYN-GBDT0.94720.94690.9466
ADASYN-LightGBM0.96950.96950.9695
ADASYN-CatBoost0.98210.98200.9820

图7

测试集的岩性识别混淆矩阵图"

图8

岩性识别结果验证图1.含角闪黑云英云闪长岩质片麻岩;2.钾化花岗质碎裂岩;3.中粒含黑云二长花岗岩;4.绢英岩化花岗质碎裂岩;5.钾化含黑云二长花岗岩;6.绢英岩化花岗岩;7.含黑云二长花岗岩;8.黄铁绢英岩化花岗质碎裂岩;9.黄铁绢英岩化碎裂岩;10.钾化绢英岩化花岗质碎裂岩"

表5

CatBoost模型的特征重要性排序结果"

排序特征CatBoost
1电阻率测井52.4%
2自然电位测井28.9%
3自然伽马测井18.7%

图9

单个测井特征与岩性的部分依赖图"

图10

测井特征组合与岩性的部分依赖图"

Batista G, Prati R C, Monard M C,2004.A study of the behavior of several methods for balancing machine learning training data[J].Acm Sigkdd Explorations Newsletter,6(1):20-29.
Chawla N V, Bowyer K W,Hal 1 L O,et al,2002.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,16:321-357.
Chen Ganghua, Liang Shasha, Wang Jun,et al,2019.Application of convolutional neural network in lithology identification[J].Well Logging Technology,43(2):129-134.
Dawson H L, Olivier D, Cédric M J,2023. Impact of dataset size and convolutional neural network architecture on transfer learning for carbonate rock classification[J].Computers and Geosciences,171:105284.
Elith J, Leathwick J R, Hastie T,2008.A working guide to boosted regression trees[J].Journal of Animal Ecology,77(4):802-813.
Elnahas M M, Hussein M, Keshk A,2021.Imbalanced data over-sampling technique based on convex combination method[J].International Journal of Computers and Information,9(1):15-28.
Friedman J H,2001.Greedy function approximation: A gradient boosting machine[J]. Annals of Statistics,29(5):1189-1232.
Fu Guangming, Yan Jiayong, Zhang Kun,et al,2017.Current status and progress of lithology identification technology[J].Progress in Geophysics,32(1):26-40.
Ge Yunfeng, Zhong Peng, Tang Huiming,et al,2019. Intelligent measurement on geometric information of rock discontinuities based on borehole image[J].Rock and Soil Me-chanics,40(11):4467-4476.
Gu Y F, Bao Z D, Song X,et al,2019.Complex lithology prediction using probabilistic neural network improved by continuous restricted Boltzmann machine and particle swarm optimization[J].Journal of Petroleum Science and Engineering,179:966-978.
Gui Zhou, Chen Jianguo, Wang Chengbin,2017.Classification of imbalance geological data based on PCA-SMOTE algorithm and random forest:A case study of geochemical data from the eastern Tianshan of China[J].Journal of Guilin University of Technology,37(4):587-593.
Han Qidi, Zhang Xiaotong, Shen Wei,2019.Application of support vector machine based on decision tree feature extraction in lithology classification[J].Journal of Jilin University(Earth Science Edition),49(2):611-620.
He H B, Yang B, Garcia E A,et al,2008.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks.Hong Kong:IEEE.
He Y W, Li W R, Dong Z Z,et al,2023.Lithologic identification of complex reservoir based on PSO-LSTM-FCN algorithm[J]. Energies,16(5):2135.
Hui H, Wang W Y, Mao B H,2005.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin,Heidelberg:Springer Berlin Heidelberg.
Jiang J, Fang L, Zhang H B,et al,2022. Adaptive multiexpert learning for lithology recognition[J]. SPE Journal,27(6):3802-3813.
Kang Qiankun, LU Laijun,2020.Application of random forest algorithm in classification of logging lithology[J].Global Geology,39(2):398-405.
Liu J M, Gao Y B, Hu F J,2021.A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM[J].Computers and Security,106:102289.
Liu Ziyun, Wang Xianggong,1989.Determination of lithology through probability statistics[J].Journal of Oil and Gas Technology,(2):35-40.
Liu J J, Liu J C,2022. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs[J].Geoscience Frontiers,13(1):101311.
Qingtian Lü, Zhang Xiaopei, Tang Jingtian,et al,2019. Review on advancement in technology and equipment of geophysical exploration for metallic deposits in China[J].Chinese Journal Geophysics,62(10):3629-3664.
Mou Dan, Wang Zhuwen, Huang Yulong,et al,2015.Lithological identification of volcanic rocks from SVM well logging data:Case study in the eastern depression of Liaohe Basin[J].Chinese Journal of Geophysics,58(5):1785-1793.
Ren X X, Hou J G, Song S H,et al,2019.Lithology identification using well logs:A method by integrating artificial neural networks and sedimentary patterns[J].Journal of Petroleum Science and Engineering,182:1-15.
Sun Jian, Zhou Kui, Ran Xiaofeng,et al,2009.Bayes discriminant analysis method in lithology recognition[J].Journal of Oil and Gas Technology,(2):74-77.
Tian Y, Xu H, Zhang X Y,et al,2016.Multi-resolution graph-based clustering analysis for lithofacies identification from well log data:Case study of intraplatform bank gas fields,Amu Darya Basin[J].Applied Geophysics,13(4):598-607.
Tripathy A, Agrawal A, Rath S K,2016.Classification of sentiment reviews using n-gram machine learning approach[J].Expert Systems with Applications,57:117-126.
Vikrant A D, Mario R E,2019.Formation lithology classification using scalable gradient boosted decision trees[J].Com-puters and Chemical Engineering,128:392-404.
Wang Chuanying, Zhong Sheng, Sun Weichun,2009. Study of connectivity of discontinuities of borehole based on digital borehole images[J].Chinese Journal of Rock Mechanics and Engineering,28(12):2405-2410.
Wang Heng, Jiang Yanan, Zhang Xin,et al,2021.Lithology identification method based on gradient boosting algorithm[J].Journal of Jilin University(Earth Science Edition),51(3):940-950.
Wang X W, Brownlee A, Woodward J R,et al,2021.Aircraft taxi time prediction:Feature importance and their implications[J].Transportation Research Part C:Emerging Techno-logies,124(1):102892.
Wang Yingpeng, Zhu Peigang, Zhang Wen,et al,2022.Geological significances and geochemical compositions of gold and gold-bearing minerals from Zhaoxian deeply-seated gold deposit,Jiaodong area[J].Mineral Deposits,41(2):255-272.
Xu Delong, Li Tao, Huang Baohua,et al,2012.Research on the identification of the lithology and fluid type of foreign oilfield by using the crossplot method[J].Progress in Geophysics,27(3):1123-1132.
Xu T T, Coco G, Neale M,2020.A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning[J].Water Research,177(15):115788.
Xun Zhifeng, Yu Jifeng,2008.The application of cluster and discriminant analyses in logging lithology recognition[J].Jo-urnal of Shandong University of Science and Technology(Natural Science Edition),27(5):10-13.
Yang L Q, Deng J, Guo L N,et al,2016.Origin and evolution of ore fluid,and gold-deposition processes at the giant Taishang gold deposit,Jiaodong Peninsula,Eastern China[J].Ore Geology Reviews,72:585-602.
Yao Jinzhu, Fu Yaoqing, Wang Zhengyong,et al,2014.Identification of cuttings based on color and texture feature[J].Journal of Sichuan University(Natural Science Edition),51(2):313-318.
Zhang H, Yang S, Guo L,et al,2015.Comparisons of isomiR patterns and classification performance using the rank-based MANOVA and 10-fold cross-validation[J].Gene,569(1):21-26.
Zhang Tao, Li Yanping, Liu Xiaoyu,et al,2023.Lithology interpretation of deep metamorphic rocks with well logging based on APSO-LSSVM algorithm[J].Progress in Geophysics,38(1):382-392.
Zhang Xuchun,2021.Based on the CatBoost Model to Realize Monitoring and Early Warning for Discharge Situation of the Sewage Treatment Plant[D].Lanzhou:Lanzhou University.
Zhao Jian, Gao Fuhong,2003.Application of crossplots based on well log data in identifying volcanic lithology[J].Global Geology,(2):136-140.
Zhao S W, Zhou J H, Yang G R,2019.Averaging estimators for discrete choice by M-fold cross-validation[J].Economics Letters,174:65-69.
Zhao Xianling, Wang Guiwen, Zhou Zhenglong,et al,2015.A review of lithology interpretation methods using geophysical well logs[J].Progress in Geophysics,30(3):1278-1287.
Zheng J, Wang Y, Xu W,et al,2020.GSSA:Pay attention to graph feature importance for GCN via statistical self-attention[J].Neurocomputing,417:458-470.
Zhu L P, Li H Q, Yang Z G,et al,2018.Intelligent logging lithological interpretation with convolution neural networks[J].Petrophysics,59(6):799-810.
Zhu X Z, Wan Z H, Tsang D C,et al,2020.Machine learning for the selection of carbon-based materials for tetracycline and sulfamethoxazole adsorption[J].Chemical Engineering Jou-rnal,406:126782.
Zou Y H, Chen Y T, Deng H,2021.Gradient boosting decision tree for lithology identification with well logs:A case study of Zhaoxian gold deposit,Shandong Peninsula,China[J].Natural Resources Research,30(5):3197-3217.
陈钢花,梁莎莎,王军,等,2019.卷积神经网络在岩性识别中的应用[J].测井技术,43(2):129-134.
付光明,严加永,张昆,等,2017.岩性识别技术现状与进展[J].地球物理学进展,32(1):26-40.
葛云峰,钟鹏,唐辉明,等,2019.基于钻孔图像的岩体结构面几何信息智能测量[J].岩土力学,40(11):4467-4476.
桂州,陈建国,王成彬,2017.基于PCA-SMOTE-随机森林的地质不平衡数据分类方法——以东天山地球化学数据为例[J]. 桂林理工大学学报,37(4):587-593.
韩启迪,张小桐,申维,2019.基于决策树特征提取的支持向量机在岩性分类中的应用[J].吉林大学学报(地球科学版),49(2):611-620.
康乾坤,路来君,2020.随机森林算法在测井岩性分类中的应用[J].世界地质,39(2):398-405.
刘子云,王向公,1989.利用概率统计方法判断岩性[J].石油天然气学报,(2):35-40.
吕庆田,张晓培,汤井田,等,2019.金属矿地球物理勘探技术与设备:回顾与进展[J].地球物理学报,62(10):3629-3664.
牟丹,王祝文,黄玉龙,等,2015.基于SVM测井数据的火山岩岩性识别——以辽河盆地东部坳陷为例[J]. 地球物理学报,58(5):1785-1793.
孙健,周魁,冉小丰,等,2009.Bayes判别分析方法在岩性识别中的应用[J].石油天然气学报,(2):74-77.
王川婴,钟声,孙卫春,2009.基于数字钻孔图像的结构面连通性研究[J].岩石力学与工程学报,28(12):2405-2410.
王恒,姜亚楠,张欣,等,2021.基于梯度提升算法的岩性识别方法[J].吉林大学学报(地球科学版),51(3):940-950.
王英鹏,祝培刚,张文,等,2022.胶东地区招贤深部金矿床金和载金矿物化学成分及其地质意义[J].矿床地质,41(2):255-272.
徐德龙,李涛,黄宝华,等,2012.利用交会图法识别国外M油田岩性与流体类型的研究[J].地球物理学进展,27(3):1123-1132.
寻知锋,余继峰,2008.聚类和判别分析在测井岩性识别中的应用[J].山东科技大学学报(自然科学版),27(5):10-13.
姚金铸,符耀庆,王正勇,等,2014.基于颜色特征和纹理特征的岩屑岩性识别[J].四川大学学报(自然科学版),51(2):313-318.
张涛,李艳萍,刘晓宇,等,2023.基于自适应粒子群优化最小二乘支持向量机的深层变质岩测井岩性识别[J].地球物理学进展,38(1):382-392.
张旭春,2021.基于CatBoost模型实现对污水处理厂排污情况的监测预警[D].兰州:兰州大学.
赵建,高福红,2003.测井资料交会图法在火山岩岩性识别中的应用[J].世界地质,(2):136-140.
赵显令,王贵文,周正龙,等,2015.地球物理测井岩性解释方法综述[J].地球物理学进展,30(3):1278-1287.
[1] 胡建华,郭萌萌,周坦,张涛. 基于改进迁移学习算法的岩体质量评价模型[J]. 黄金科学技术, 2021, 29(6): 826-833.
[2] 田睿,孟海东,陈世江,王创业,孙德宁,石磊. 基于机器学习的3种岩爆烈度分级预测模型对比研究[J]. 黄金科学技术, 2020, 28(6): 920-929.
[3] 王牧帆,罗周全,于琦. 基于 Stacking 模型的采空区稳定性预测[J]. 黄金科学技术, 2020, 28(6): 894-901.
[4] 廖智勤, 王李管, 何正祥. 基于EEMD和关联维数的矿山微震信号特征提取和分类[J]. 黄金科学技术, 2020, 28(4): 585-594.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!