About scMORA

A barcode-matched 10x Multiome resource for paired RNA-ATAC analysis

scMORA curates public 10x Multiome datasets in which RNA expression and ATAC accessibility are measured from the same cell barcodes. The database is designed for browsing paired single-cell regulatory profiles, selecting datasets by biological context, and reusing curated data for model training and benchmark construction.

Why scMORA

What makes this database different

True paired multiome profiles

Each accepted dataset preserves same-cell RNA and ATAC measurements, avoiding post hoc matching of separate scRNA-seq and scATAC-seq experiments.

Sample-level curation

Entries are organized by processed sample or analysis directory, while study accessions and biological source labels are retained for traceability.

Model-ready labels

Curated usage tags indicate whether a dataset is suitable for model training, controls, perturbation analysis, disease modeling, cancer studies, or related tasks.

Reusable assets

The website serves precomputed UMAP, QC, gene-expression, gene-activity and nearby-peak summaries for fast interactive exploration.

Data Inclusion Criteria

scMORA focuses on datasets generated by 10x Multiome or equivalent paired RNA-ATAC workflows where both modalities share cell barcode identifiers.

  • Single-cell RNA expression and ATAC accessibility from the same cells
  • Barcode-level pairing retained after processing
  • Sample source, condition, species and reference information available
  • Precomputed visualization assets available for website exploration

Barcode Matching Definition

Barcode matching means that one cell barcode maps to one RNA profile and one ATAC profile. This pairing is the core data unit used by scMORA.

cell_barcode_i
  -> RNA expression profile_i
  -> ATAC accessibility profile_i
  -> shared metadata_i

Metadata Organization

The website reads curated sample metadata from data/metadata/metadata.csv. Public dataset cards display clean dataset IDs rather than local processing-folder suffixes.

  • Dataset_id: sample-level display ID and URL query ID
  • GSE_id: GEO study accession
  • Detail_source: curated source, tissue, cell type or model label
  • Condition and Detailed_condition: broad and specific biological context
  • Usage_primary and Usage_tags: model-usage labels

Model-Usage Labels

Usage labels describe how each dataset can be reused for model training, evaluation or biological comparison. A dataset can have multiple tags.

model_training control cell_line disease_model perturbation induced_model organoid_or_differentiation_model cancer perturbation_control
Website Modules

How to use scMORA

Home

Provides the database overview, key statistics, searchable entry access and interactive metadata summaries.

Open Home

Browser

Filters datasets by condition, sample source and model-usage label, then opens sample-level visualization pages.

Open Browser

Visualization

Displays Joint, RNA and ATAC UMAPs with metadata coloring, QC summaries and composition charts.

Select a dataset

Analysis

Supports gene-centered RNA FeaturePlot, ATAC-derived gene activity and nearby peak summaries.

Open Analysis

Download

Provides access to full paired .h5mu datasets through Hugging Face and the scmora-db package.

Open Download

Visualization Assets

Dataset pages load lightweight precomputed files instead of opening full .h5mu objects at request time.

data/visual/umap_qc/h5mu_plot/
  GSE166797/
    GSM5085810_GM12878_rep1/
      cell_embeddings.csv.gz
      rna_gene_index.csv.gz
      atac_peak_index.csv.gz
      dataset_summary.json
  • UMAP coordinates for Joint, RNA and ATAC embeddings when available
  • Metadata fields used for coloring and composition summaries
  • QC metrics such as RNA counts, detected genes, ATAC counts and detected peaks

Gene Activity Assets

Gene-level analysis is served from the Gene_Activity asset store, which is generated before website deployment.

data/visual/Gene_Activity/
  GSM5085810_GM12878_rep1/
    gene_metadata.parquet
    rna_values/
    activity_values/
    peak_summary.parquet
    peak_cluster_summary.parquet
  • RNA expression values for selected genes
  • ATAC-derived gene activity scores
  • Nearby peak accessibility and cluster summaries

Data Download

Full analysis-ready .h5mu files are hosted externally to keep the website lightweight. Users can search and retrieve selected datasets with the scmora-db Python package.

pip install scmora-db

scmora-db search --usage-tag control
scmora-db download --dataset-id GSM5085810_GM12878_rep1

Repository: shiny321/genome-db

Gene Activity Method

Gene activity is computed as an ATAC peak-based approximation for web visualization and fast interactive querying.

  • Gene window: gene body +/- 100 kb
  • TSS upstream extension: 5 kb
  • Distance decay: exp(-abs(distance) / 5000) + exp(-1)
  • Count ceiling: 4
  • Cell normalization: scale to 10,000
  • Final transform: log1p

API Endpoints

  • /api/genomes/examples: dataset list
  • /api/genomes/filter: condition, source and label filtering
  • /api/visualization/meta: dataset visualization metadata
  • /api/visualization/points: UMAP points and composition summaries
  • /api/visualization/gene-feature: RNA FeaturePlot payload
  • /api/visualization/gene-activity: ATAC-derived gene activity payload
  • /api/visualization/gene-peaks: nearby peak payload

Open API documentation

Current Scope and Limitations

  • scMORA prioritizes sample-level browsing rather than study-level re-integration.
  • FeaturePlot and gene activity require precomputed web assets for each dataset.
  • Gene activity is peak-based when fragment-level matrices are unavailable.
  • Fragment-dependent metrics such as footprinting are outside the current web scope.

Citation

If scMORA is used in published work, please cite the database manuscript and include the accessed version/date of the website and downloaded datasets.

scMORA: a barcode-matched 10x Multiome resource for paired RNA-ATAC data.
Version 0.1.0, 2026.