화학 및 약물 독성 예측을 위한 설명 가능한 인공 지능 모델 개발 = Development of explainable artificial intelligence models for chemical and drug toxicity prediction
저자
발행사항
전주 : 전북대학교 일반대학원, 2023
학위논문사항
학위논문(박사)-- 전북대학교 일반대학원 : 전자.정보공학부(전자공학) 2023. 8
발행연도
2023
작성언어
영어
주제어
발행국(도시)
전북특별자치도
형태사항
xii, 157 p. ; 26 cm
일반주기명
지도교수: 정길도
UCI식별코드
I804:45011-000000056818
소장기관
Toxicity prediction is a crucial aspect of drug discovery and safety assessment, and the development of explainable artificial intelligence (AI) models for toxicity prediction is of great interest to the scientific community. Traditional methods for toxicity prediction, such as animal testing, are costly, time-consuming, and ethically questionable. Therefore, the use of AI models has emerged as a promising alternative for toxicity prediction, with the ability to process large datasets and identify complex patterns. However, AI models are often criticized for being black boxes, lacking transparency and interpretability, which limits their practical applications in the drug discovery process. This thesis aims to develop and apply explainable AI models for the toxicity prediction of chemicals and drugs. The proposed models use optimal molecular descriptors and different machine learning algorithms to predict toxicity while maintaining a high level of transparency and interpretability. The models are developed and evaluated using three separate studies, each focusing on a different type of toxicity.
The first study focuses on addressing the critical concern of drug-induced liver toxicity, which poses a significant safety risk in drug development. The main objective is to develop quantitative structure-activity relationship (QSAR) models using machine learning algorithms and systematic feature selection methods for a comprehensive set of molecular descriptors. A dataset comprising 1253 diverse drug compounds was utilized to construct these models, and their performance was assessed through rigorous internal validation using 10-fold cross-validation. To enhance the predictive accuracy of the models, various feature selection techniques were employed to identify the optimal subset of descriptors. Among the different classifiers tested, the support vector machine (SVM) emerged as the most effective, yielding superior classification accuracy even with a reduced number of molecular descriptors. The final optimized model exhibited an impressive accuracy of 81.10%, a sensitivity of 84.0%, a specificity of 78.30%, and Matthew's correlation coefficient of 0.623 when evaluated against the internal validation set. Remarkably, the proposed model outperformed previous studies not only in the internal test sets but also when tested against external datasets. This achievement can be attributed to the careful selection of distinct molecular descriptors as crucial modeling features, resulting in a powerful in silico model with exceptional predictive performance.
The second study focuses on addressing the significant public health concern of respiratory toxicity, which arises from the adverse effects of drugs or chemicals. It is crucial for the pharmaceutical and chemical industries to have reliable computational tools to accurately assess the respiratory toxicity of compounds. The main objective of this study was to develop robust quantitative structure-activity relationship (QSAR) models using a large dataset of chemical compounds associated with respiratory system toxicity. To enhance the efficiency of modeling, various feature selection techniques were explored to identify the optimal subset of molecular descriptors. Eight different machine learning algorithms were employed to construct respiratory toxicity prediction models. Among these models, the support vector machine (SVM) classifier demonstrated superior performance, surpassing all other optimized models during 10-fold cross-validation. It achieved an impressive prediction accuracy of 86.20% and Matthew's correlation coefficient (MCC) of 0.722 on the test set. To gain insights into the predictions made by the proposed SVM model, the SHapley Additive explanation (SHAP) approach was utilized. This approach prioritizes the identification of key modeling descriptors that influence the prediction of respiratory toxicity. By understanding the relevance of these descriptors, the model's predictions can be better comprehended and interpreted. The proposed SVM model, with its high prediction accuracy and explainability through SHAP, holds tremendous potential in the early stages of drug development. It can effectively predict and provide a deeper understanding of potential respiratory toxic compounds, offering valuable insights for decision-making processes.
The third study addresses the significant issue of organ toxicity caused by chemicals, including medications, insecticides, chemical products, and cosmetics. The presence and development of chemical-induced organ damage have been linked to various adverse effects, particularly mitochondrial dysfunction. In this study, an explainable artificial intelligence (XAI) model was proposed to classify compounds as either mitochondrial toxic or non-toxic. To construct the model, the Mordred feature descriptor was carefully selected after applying feature selection techniques. These selected features were then combined with the CatBoost learning algorithm. The proposed model exhibited a remarkable prediction accuracy of 85% during 10-fold cross-validation and achieved an accuracy of 87.10% in independent testing. These results demonstrate a substantial improvement in prediction accuracy compared to existing state-of-the-art methods described in the literature. The proposed model, which utilizes a tree-based ensemble approach, provides valuable insights into the prediction of mitochondrial toxicity. Furthermore, the global model explanation offered by the XAI model aids pharmaceutical chemists in gaining a better understanding of the underlying factors influencing the prediction of mitochondrial toxicity. This enhanced understanding can contribute to more informed decision-making and support the development of safer chemical compounds in various industries.
Overall, the results of this thesis demonstrate the potential of explainable artificial intelligence models for the toxicity prediction of chemicals and drugs. By providing insight into the key molecular descriptors driving toxicity prediction, these models have the potential to improve our understanding of toxicity mechanisms and aid in the early identification of potentially toxic compounds, ultimately leading to safer and more effective drugs and chemicals.
분석정보
서지정보 내보내기(Export)
닫기소장기관 정보
닫기권호소장정보
닫기오류접수
닫기오류 접수 확인
닫기음성서비스 신청
닫기음성서비스 신청 확인
닫기이용약관
닫기학술연구정보서비스 이용약관 (2017년 1월 1일 ~ 현재 적용)
학술연구정보서비스(이하 RISS)는 정보주체의 자유와 권리 보호를 위해 「개인정보 보호법」 및 관계 법령이 정한 바를 준수하여, 적법하게 개인정보를 처리하고 안전하게 관리하고 있습니다. 이에 「개인정보 보호법」 제30조에 따라 정보주체에게 개인정보 처리에 관한 절차 및 기준을 안내하고, 이와 관련한 고충을 신속하고 원활하게 처리할 수 있도록 하기 위하여 다음과 같이 개인정보 처리방침을 수립·공개합니다.
주요 개인정보 처리 표시(라벨링)
목 차
3년
또는 회원탈퇴시까지5년
(「전자상거래 등에서의 소비자보호에 관한3년
(「전자상거래 등에서의 소비자보호에 관한2년
이상(개인정보보호위원회 : 개인정보의 안전성 확보조치 기준)개인정보파일의 명칭 | 운영근거 / 처리목적 | 개인정보파일에 기록되는 개인정보의 항목 | 보유기간 | |
---|---|---|---|---|
학술연구정보서비스 이용자 가입정보 파일 | 한국교육학술정보원법 | 필수 | ID, 비밀번호, 성명, 생년월일, 신분(직업구분), 이메일, 소속분야, 웹진메일 수신동의 여부 | 3년 또는 탈퇴시 |
선택 | 소속기관명, 소속도서관명, 학과/부서명, 학번/직원번호, 휴대전화, 주소 |
구분 | 담당자 | 연락처 |
---|---|---|
KERIS 개인정보 보호책임자 | 정보보호본부 김태우 | - 이메일 : lsy@keris.or.kr - 전화번호 : 053-714-0439 - 팩스번호 : 053-714-0195 |
KERIS 개인정보 보호담당자 | 개인정보보호부 이상엽 | |
RISS 개인정보 보호책임자 | 대학학술본부 장금연 | - 이메일 : giltizen@keris.or.kr - 전화번호 : 053-714-0149 - 팩스번호 : 053-714-0194 |
RISS 개인정보 보호담당자 | 학술진흥부 길원진 |
자동로그아웃 안내
닫기인증오류 안내
닫기귀하께서는 휴면계정 전환 후 1년동안 회원정보 수집 및 이용에 대한
재동의를 하지 않으신 관계로 개인정보가 삭제되었습니다.
(참조 : RISS 이용약관 및 개인정보처리방침)
신규회원으로 가입하여 이용 부탁 드리며, 추가 문의는 고객센터로 연락 바랍니다.
- 기존 아이디 재사용 불가
휴면계정 안내
RISS는 [표준개인정보 보호지침]에 따라 2년을 주기로 개인정보 수집·이용에 관하여 (재)동의를 받고 있으며, (재)동의를 하지 않을 경우, 휴면계정으로 전환됩니다.
(※ 휴면계정은 원문이용 및 복사/대출 서비스를 이용할 수 없습니다.)
휴면계정으로 전환된 후 1년간 회원정보 수집·이용에 대한 재동의를 하지 않을 경우, RISS에서 자동탈퇴 및 개인정보가 삭제처리 됩니다.
고객센터 1599-3122
ARS번호+1번(회원가입 및 정보수정)