spVIPES.model.spvipes.spVIPES#
- class spVIPES.model.spvipes.spVIPES#
Bases:
MultiGroupTrainingMixin,BaseModelClassImplementation of the spVIPES model.
spVIPES (shared-private Variational Inference with Product of Experts and Supervision) is a method for integrating multi-group single-cell datasets using a shared-private latent space approach. The model learns both shared representations (common across groups) and private representations (group-specific) through a Product of Experts (PoE) framework.
- Parameters:
adata (
AnnData) – AnnData object that has been registered viasetup_anndata().n_hidden (
int, default128) – Number of nodes per hidden layer in the neural networks.n_dimensions_shared (
int, default25) – Dimensionality of the shared latent space. This space captures features common across all groups/datasets.n_dimensions_private (
int, default10) – Dimensionality of the private latent spaces. Each group gets its own private latent space of this dimensionality.dropout_rate (
float, default0.1) – Dropout rate for neural networks to prevent overfitting.**model_kwargs – Additional keyword arguments passed to the underlying module.
Examples
Basic usage with cell type labels:
>>> import spVIPES >>> adata = spVIPES.data.prepare_adatas({"dataset1": dataset1, "dataset2": dataset2}) >>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="groups", label_key="cell_type") >>> model = spVIPES.model.spVIPES(adata) >>> model.train() >>> latents = model.get_latent_representation()
Usage with optimal transport:
>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="groups", transport_plan_key="transport_plan") >>> model = spVIPES.model.spVIPES(adata) >>> model.train()
Notes
We recommend setting n_dimensions_private < n_dimensions_shared for optimal performance
The model automatically selects the appropriate PoE variant based on provided inputs
GPU acceleration is strongly recommended for large datasets
- __init__(adata, n_hidden=128, n_dimensions_shared=25, n_dimensions_private=10, dropout_rate=0.1, **model_kwargs)#
- classmethod setup_anndata(cls, adata, groups_key, match_clusters=False, transport_plan_key=None, label_key=None, batch_key=None, layer=None, **kwargs)#
Set up AnnData object for spVIPES model.
This method registers the AnnData object with the model, configuring the appropriate data fields and PoE strategy based on the provided parameters. The method automatically determines whether to use label-based PoE, optimal transport PoE, or cluster-based PoE.
- Parameters:
adata (
AnnData) – Annotated data object containing the single-cell data to be integrated.groups_key (
str) – Key inadata.obsthat defines the grouping of cells (e.g., dataset, batch, condition). This determines which cells belong to which group for integration.match_clusters (
bool, defaultFalse) – Whether to match clusters when using optimal transport. If True, enables cluster-based PoE which automatically matches cell clusters between groups.transport_plan_key (
str, optional) – Key inadata.unscontaining the precomputed optimal transport plan. If provided, enables optimal transport PoE for data integration.label_key (
str, optional) – Key inadata.obscontaining cell type labels. If provided, enables label-based PoE which uses supervised alignment based on cell types.batch_key (
str, optional) – Key inadata.obsfor batch information to enable batch effect correction.layer (
str, optional) – Key inadata.layersto use for the expression data. If None, usesadata.X.**kwargs – Additional keyword arguments passed to the parent setup method.
- Return type:
- Returns:
None The method modifies the AnnData object in place and registers it with the model.
Notes
Priority of PoE strategies (when multiple options are available): 1. Label-based PoE (if
label_keyis provided) 2. Optimal transport PoE (iftransport_plan_keyis provided) 3. Cluster-based PoE (ifmatch_clusters=True)Examples
Basic setup with groups only:
>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="dataset")
Setup with cell type supervision:
>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="dataset", label_key="cell_type")
Setup with optimal transport:
>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="dataset", transport_plan_key="transport_matrix")
- get_latent_representation(group_indices_list, adata=None, indices=None, normalized=False, give_mean=True, mc_samples=5000, batch_size=None, drop_last=None)#
Return the latent representation for each cell.
- Parameters:
group_indices_list (
list[list[int]]) – List of lists containing the indices of cells in each of the groups used as input for spVIPES.adata (
Optional[AnnData] (default:None)) – AnnData object with equivalent structure to initial AnnData. IfNone, defaults to the AnnData object used to initialize the model.indices (
Optional[Sequence[int]] (default:None)) – Indices of cells in adata to use. IfNone, all cells are used.normalized (
bool(default:False)) – Whether to return the normalized cell embedding (softmaxed) or notgive_mean (
bool(default:True)) – Give mean of distribution or sample from it.mc_samples (
int(default:5000)) – For distributions with no closed-form mean (e.g.,logistic normal), how many Monte Carlo samples to take for computing mean.batch_size (
Optional[int] (default:None)) – Minibatch size for data loading into model. Defaults toscvi.settings.batch_size.drop_last (
Optional[bool] (default:None)) – Whether to drop the last incomplete batch. If None, automatically determined based on whether using paired PoE (True for paired, False for others).
- Return type:
- Returns:
Low-dimensional topic for each cell.
- get_loadings()#
Extract per-gene weights in the linear decoder.
Shape is genes by
n_latent.- Return type:
- train(group_indices_list, batch_size=128, max_epochs=None, use_gpu=None, train_size=0.9, validation_size=None, early_stopping=False, plan_kwargs=None, n_steps_kl_warmup=None, n_epochs_kl_warmup=400, **trainer_kwargs)#
Train a multigroup spVIPES model.
This method trains the model using a custom data splitter that handles multiple groups of cells separately while maintaining the shared-private latent space learning objective.
- Parameters:
group_indices_list (
list[list[int]]) – List of indices corresponding to each group of samples. Each inner list contains the indices for cells belonging to that specific group.max_epochs (
int, optional) – Number of passes through the dataset. If None, defaults tonp.min([round((20000 / n_cells) * 400), 400]).use_gpu (
str,int,bool, optional) – GPU usage specification. Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., “cuda:0”), or use CPU (if False).train_size (
float, default0.9) – Size of training set in the range [0.0, 1.0].validation_size (
float, optional) – Size of the validation set. If None, defaults to1 - train_size. Iftrain_size + validation_size < 1, the remaining cells belong to the test set.batch_size (
int, default128) – Mini-batch size to use during training.early_stopping (
bool, defaultFalse) – Whether to perform early stopping. Additional arguments can be passed in**trainer_kwargs.plan_kwargs (
dict, optional) – Keyword arguments for the training plan. Arguments passed totrain()will overwrite values present inplan_kwargs, when appropriate.n_steps_kl_warmup (
int, optional) – Number of training steps for KL warmup. Takes precedence over n_epochs_kl_warmup.n_epochs_kl_warmup (
int, default400) – Number of epochs for KL divergence warmup.**trainer_kwargs – Additional keyword arguments for the trainer.
- Return type:
- Returns:
None The model is trained in-place.
Notes
This method uses a specialized MultiGroupDataSplitter that ensures proper handling of multiple cell groups during training, maintaining the integrity of the shared-private latent space learning.