Analysis of clinical, single cell, and spatial data from the Human Tumor Atlas Network (HTAN) with massively distributed cloud-based queries.
Publication Title
Res Sq
Document Type
Article
Publication Date
11-17-2025
Keywords
washington; isb; artificial intelligence
Abstract
Cancer research increasingly relies on large-scale, multimodal datasets that capture the complexity of tumor ecosystems across diverse patients, cancer types, and disease stages. The Human Tumor Atlas Network (HTAN) generates such data, including single-cell transcriptomics, proteomics, and multiplexed imaging. However, the volume and heterogeneity of the data present challenges for researchers seeking to integrate, explore, and analyze these datasets at scale. To this end, HTAN developed a cloud-based infrastructure that transforms clinical and assay metadata into aggregate Google BigQuery tables, hosted through the Institute for Systems Biology Cancer Gateway in the Cloud (ISB-CGC). This infrastructure introduces two key innovations: (1) a provenance-based HTAN ID table that simplifies cohort construction and cross-assay integration, and (2) the novel adaptation of BigQuery's geospatial functions for use in spatial biology, enabling neighborhood and correlation analysis of tumor microenvironments. We demonstrate these capabilities through R and Python notebooks that highlight use cases such as identifying precancer and organ-specific sample cohorts, integrating multimodal datasets, and analyzing single-cell and spatial data. By lowering technical and computational barriers, this infrastructure provides a cost-effective and intuitive entry point for researchers, highlighting the potential of cloud-based platforms to accelerate cancer discoveries.
Area of Special Interest
Cancer
Specialty/Research Institute
Institute for Systems Biology
Specialty/Research Institute
Oncology
DOI
10.21203/rs.3.rs-7769205/v1