spVIPES.data.prepare_adatas.prepare_adatas#

spVIPES.data.prepare_adatas.prepare_adatas(adatas, layers=None)#

Prepare and concatenate multiple AnnData objects for spVIPES integration.

This function takes multiple single-cell datasets and prepares them for multi-group integration by concatenating them into a single AnnData object while preserving group-specific metadata. It sets up all the necessary data structures for spVIPES to perform shared-private latent space learning.

Parameters:

adatas (dict[str, AnnData]) – Dictionary mapping group names (strings) to their corresponding AnnData objects. Each AnnData contains single-cell expression data for one group/dataset. Currently supports exactly 2 groups.
layers (list[list[str or None]], optional) – Specification of which layers to use from each AnnData object. Currently not implemented in the function body.

Returns:

AnnData Concatenated AnnData object containing all groups with additional metadata:

groups : Added to .obs indicating which group each cell belongs to
indices : Added to .obs with within-group cell indices
groups_var_indices : In .uns, indices of variables for each group
groups_obs_indices : In .uns, indices of observations for each group
groups_obs_names : In .uns, observation names for each group
groups_obs : In .uns, observation metadata for each group
groups_lengths : In .uns, number of features per group
groups_var_names : In .uns, variable names for each group
groups_mapping : In .uns, mapping from indices to group names

Raises:

ValueError – If more or fewer than 2 groups are provided (current limitation).

Notes

The function performs several important preprocessing steps:

Variable name prefixing: Adds group prefixes to avoid name conflicts
Metadata harmonization: Combines observation metadata across groups
Index tracking: Creates mappings to track group-specific indices
Outer join concatenation: Preserves all variables from all groups

This prepared data structure enables spVIPES to handle datasets with different feature sets (genes) while maintaining the ability to separate shared and private latent representations.

Examples

Basic usage with two datasets:

>>> import spVIPES
>>> import scanpy as sc
>>>
>>> # Load your datasets
>>> adata1 = sc.read_h5ad("dataset1.h5ad")
>>> adata2 = sc.read_h5ad("dataset2.h5ad")
>>>
>>> # Prepare for spVIPES
>>> adatas_dict = {"treatment": adata1, "control": adata2}
>>> combined_adata = spVIPES.data.prepare_adatas(adatas_dict)
>>>
>>> # Now ready for spVIPES setup
>>> spVIPES.model.spVIPES.setup_anndata(combined_adata, groups_key="groups")

Integration with different feature sets:

>>> # Datasets can have different genes
>>> print(f"Dataset 1: {adata1.n_vars} genes")
>>> print(f"Dataset 2: {adata2.n_vars} genes")
>>>
>>> combined = spVIPES.data.prepare_adatas({"batch1": adata1, "batch2": adata2})
>>> print(f"Combined: {combined.n_vars} genes")  # Union of all genes

spVIPES.data.prepare_adatas.prepare_adatas

Contents

spVIPES.data.prepare_adatas.prepare_adatas#