Articles, Abstracts, and Reports

Data leakage in deep learning studies of translational EEG.

Publication Title

Front Neurosci

Geoffrey Brookshire
Jake Kasper
Nicholas M Blauch
Yunan Charles Wu
Ryan M. Glatt, Pacific Brain Health Center, Pacific Neuroscience Institute and Foundation, Santa Monica, CA, United StatesFollow
David A Merrill, Pacific Brain Health Center, Pacific Neuroscience Institute and Foundation, Santa Monica, CA, United States; Saint John's Cancer Institute at Providence Saint John's Health Center, Santa Monica, CA, United States.Follow
Spencer Gerrol
Keith J Yoder
Colin Quirk
Ché Lucero

Document Type

Article

Publication Date

1-1-2024

Keywords

Alzheimer's disease; cross-validation; data leakage; deep neural networks; electroencephalography; epilepsy. california; sjci; pacific neurosci

Abstract

A growing number of studies apply deep neural networks (DNNs) to recordings of human electroencephalography (EEG) to identify a range of disorders. In many studies, EEG recordings are split into segments, and each segment is randomly assigned to the training or test set. As a consequence, data from individual subjects appears in both the training and the test set. Could high test-set accuracy reflect data leakage from subject-specific patterns in the data, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (in which segments from one subject can appear in both the training and test set), and comparing this to their performance using subject-based holdout (where all segments from one subject appear exclusively in either the training set or the test set). In two datasets (one classifying Alzheimer's disease, and the other classifying epileptic seizures), we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Finally, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout. Most published DNN-EEG studies may dramatically overestimate their classification performance on new subjects.

Area of Special Interest

Neurosciences (Brain & Spine)

Specialty/Research Institute

Neurosciences

DOI

10.3389/fnins.2024.1373515

Link to Full Text

Providence Full Text

COinS

Articles, Abstracts, and Reports

Data leakage in deep learning studies of translational EEG.

Publication Title

Document Type

Publication Date

Keywords

Abstract

Area of Special Interest

Specialty/Research Institute

DOI

Browse

Links

Search

Providence Research

Articles, Abstracts, and Reports

Data leakage in deep learning studies of translational EEG.

Publication Title

Authors

Document Type

Publication Date

Keywords

Abstract

Area of Special Interest

Specialty/Research Institute

DOI

Share

Browse

Links

Search

Providence Research