spVIPES#

Shared-private Variational Inference with Product of Experts and Supervision

PyPI Documentation


About#

spVIPES enables robust integration of multi-group single-cell datasets through a principled shared-private latent space decomposition. The method leverages a Product of Experts (PoE) framework to learn both shared biological signals common across datasets and private representations capturing group-specific variations.

Integration Strategies#

spVIPES provides three complementary approaches for dataset alignment:

Method

Description

Best Use Case

Label-based PoE

Uses cell type annotations for direct supervision

High-quality cell type labels available

OT Paired PoE

Direct cell-to-cell correspondences via optimal transport

Known cellular correspondences (e.g., time series)

OT Cluster-based PoE

Automated cluster matching with transport plans

Similar cell populations, no direct correspondences

Note: The method automatically selects the most appropriate strategy based on available annotations and transport information.

Installation#

Requirements#

  • Python 3.9-3.10

  • PyTorch (GPU support strongly recommended)

Quick Install#

Install the latest stable release from PyPI:

pip install spVIPES

For the development version:

pip install git+https://github.com/nrclaudio/spVIPES.git@main

Quick Start#

Basic Workflow#

import spVIPES
import scanpy as sc

# Load your multi-group dataset
adata = sc.read_h5ad("data.h5ad")

# Configure integration strategy
spVIPES.model.setup_anndata(
    adata,
    groups_key="dataset",
    label_key="cell_type",  # Optional: for supervised integration
)

# Initialize and train model
model = spVIPES.model(adata)
model.train(max_epochs=200)

# Extract integrated representations
latent = model.get_latent_representation()
adata.obsm["X_spVIPES"] = latent

Integration Strategies#

📋 Label-based Integration

Use when high-quality cell type annotations are available:

spVIPES.model.setup_anndata(
    adata,
    groups_key="dataset",
    label_key="cell_type",
    batch_key="batch",  # Optional batch correction
)
🔄 Optimal Transport: Paired Cells

For datasets with known cell-to-cell correspondences:

# Assumes transport plan stored in adata.uns["transport_plan"]
spVIPES.model.setup_anndata(
    adata,
    groups_key="dataset",
    transport_plan_key="transport_plan",
    match_clusters=False,
)
🧩 Optimal Transport: Cluster Matching

For automatic cluster-based alignment:

spVIPES.model.setup_anndata(
    adata,
    groups_key="dataset",
    transport_plan_key="transport_plan",
    match_clusters=True,  # Enables automated cluster matching
)

Advanced Configuration#

# Custom model parameters
model = spVIPES.model(
    adata,
    n_dimensions_shared=25,  # Shared latent dimensions
    n_dimensions_private=10,  # Private latent dimensions
    n_hidden=128,  # Hidden layer size
    dropout_rate=0.1,  # Regularization
)

# Training with custom settings
model.train(
    max_epochs=300, batch_size=512, early_stopping=True, check_val_every_n_epoch=10
)

Documentation & Tutorials#

📚 Getting Started

Support & Community#

💬 Get Help

Citation#

If you use spVIPES in your research, please cite:

@article{spVIPES2023,
  title={Integrative learning of disentangled representations},
  author={C. Novella-Rausell, D.J.M Peters and A. Mahfouz},
  journal={bioRxiv},
  year={2023},
  doi={10.1101/2023.11.07.565957},
  url={https://www.biorxiv.org/content/10.1101/2023.11.07.565957v1}
}

Paper: bioRxiv preprint