Machine-learning diagnostics of breast cancer using piRNA biomarkers.

Publication Title

Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals

Document Type

Article

Publication Date

3-1-2025

Keywords

Breast Neoplasms; Humans; Machine Learning; Female; RNA, Small Interfering; Biomarkers, Tumor; Logistic Models; Piwi-Interacting RNA; Biomarkers; blood-based piRNAs; breast cancer; circulating piRNAs; machine learning; california; santa monica; machine learning; pni

Abstract

BACKGROUND AND OBJECTIVES: Prior studies have shown that small non-coding RNAs (sncRNAs) are associated with cancer occurrence or development. Recently, a newly discovered class of small ncRNAs known as PIWI-interacting RNAs (piRNAs) have been found to play a vital role in physiological processes and cancer initiation. This study aims to utilize piRNAs as innovative, noninvasive diagnostic biomarkers for breast cancer. Our objective is to develop computational methods that leverage piRNA attributes for breast cancer prediction and its application in diagnostics.

METHODS: We created a set of piRNA sequence descriptors using information extracted from the piRNA sequences. To ensure accuracy, we found a path to convert non-standard piRNA names to standard ones to enable precise identification of these sequences. Using these descriptors, we applied machine-learning (ML) techniques in WEKA (Waikato Environment for Knowledge Analysis) to a dataset of piRNA to assess the predictive accuracy of the following classifiers: Logistic Regression model, Sequential Minimal Optimization (SMO), Random Forest classifier, and Logistic Model Tree (LMT). Furthermore, we performed Shapley additive explanations (SHAP) Analysis to understand which descriptors were the most relevant to the prediction accuracy. The ML models were then validated on an independent dataset to evaluate their effectiveness in predicting breast cancer.

RESULTS: The top three performing classifiers in WEKA were Logistic Regression, SMO, and LMT. The Logistic Regression model achieved an accuracy of 90.7% in predicting breast cancer, while SMO and LMT attained 89.7% and 85.65%, respectively.

CONCLUSIONS: Our study demonstrates the effectiveness of using ML-based piRNA classifiers in diagnosing breast cancer and contributes to the growing body of evidence supporting piRNAs as biomarkers in cancer diagnosis. However, additional research is needed to validate these findings and further assess the clinical applicability of this approach.

Area of Special Interest

Cancer

Area of Special Interest

Women & Children

Area of Special Interest

Neurosciences (Brain & Spine)

Specialty/Research Institute

Oncology

Specialty/Research Institute

Neurosciences

DOI

10.1080/1354750X.2025.2461067

Share

COinS