spVIPES.model.spvipes.spVIPES#

class spVIPES.model.spvipes.spVIPES#

Bases: MultiGroupTrainingMixin, BaseModelClass

Implementation of the spVIPES model.

spVIPES (shared-private Variational Inference with Product of Experts and Supervision) is a method for integrating multi-group single-cell datasets using a shared-private latent space approach. The model learns both shared representations (common across groups) and private representations (group-specific) through a Product of Experts (PoE) framework.

Parameters:
  • adata (AnnData) – AnnData object that has been registered via setup_anndata().

  • n_hidden (int, default 128) – Number of nodes per hidden layer in the neural networks.

  • n_dimensions_shared (int, default 25) – Dimensionality of the shared latent space. This space captures features common across all groups/datasets.

  • n_dimensions_private (int, default 10) – Dimensionality of the private latent spaces. Each group gets its own private latent space of this dimensionality.

  • dropout_rate (float, default 0.1) – Dropout rate for neural networks to prevent overfitting.

  • **model_kwargs – Additional keyword arguments passed to the underlying module.

Examples

Basic usage with cell type labels:

>>> import spVIPES
>>> adata = spVIPES.data.prepare_adatas({"dataset1": dataset1, "dataset2": dataset2})
>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="groups", label_key="cell_type")
>>> model = spVIPES.model.spVIPES(adata)
>>> model.train()
>>> latents = model.get_latent_representation()

Usage with optimal transport:

>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="groups", transport_plan_key="transport_plan")
>>> model = spVIPES.model.spVIPES(adata)
>>> model.train()

Notes

  • We recommend setting n_dimensions_private < n_dimensions_shared for optimal performance

  • The model automatically selects the appropriate PoE variant based on provided inputs

  • GPU acceleration is strongly recommended for large datasets

__init__(adata, n_hidden=128, n_dimensions_shared=25, n_dimensions_private=10, dropout_rate=0.1, **model_kwargs)#
classmethod setup_anndata(cls, adata, groups_key, match_clusters=False, transport_plan_key=None, label_key=None, batch_key=None, layer=None, **kwargs)#

Set up AnnData object for spVIPES model.

This method registers the AnnData object with the model, configuring the appropriate data fields and PoE strategy based on the provided parameters. The method automatically determines whether to use label-based PoE, optimal transport PoE, or cluster-based PoE.

Parameters:
  • adata (AnnData) – Annotated data object containing the single-cell data to be integrated.

  • groups_key (str) – Key in adata.obs that defines the grouping of cells (e.g., dataset, batch, condition). This determines which cells belong to which group for integration.

  • match_clusters (bool, default False) – Whether to match clusters when using optimal transport. If True, enables cluster-based PoE which automatically matches cell clusters between groups.

  • transport_plan_key (str, optional) – Key in adata.uns containing the precomputed optimal transport plan. If provided, enables optimal transport PoE for data integration.

  • label_key (str, optional) – Key in adata.obs containing cell type labels. If provided, enables label-based PoE which uses supervised alignment based on cell types.

  • batch_key (str, optional) – Key in adata.obs for batch information to enable batch effect correction.

  • layer (str, optional) – Key in adata.layers to use for the expression data. If None, uses adata.X.

  • **kwargs – Additional keyword arguments passed to the parent setup method.

Return type:

None

Returns:

None The method modifies the AnnData object in place and registers it with the model.

Notes

Priority of PoE strategies (when multiple options are available): 1. Label-based PoE (if label_key is provided) 2. Optimal transport PoE (if transport_plan_key is provided) 3. Cluster-based PoE (if match_clusters=True)

Examples

Basic setup with groups only:

>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="dataset")

Setup with cell type supervision:

>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="dataset", label_key="cell_type")

Setup with optimal transport:

>>> spVIPES.model.spVIPES.setup_anndata(adata, groups_key="dataset", transport_plan_key="transport_matrix")
get_latent_representation(group_indices_list, adata=None, indices=None, normalized=False, give_mean=True, mc_samples=5000, batch_size=None, drop_last=None)#

Return the latent representation for each cell.

Parameters:
  • group_indices_list (list[list[int]]) – List of lists containing the indices of cells in each of the groups used as input for spVIPES.

  • adata (Optional[AnnData] (default: None)) – AnnData object with equivalent structure to initial AnnData. If None, defaults to the AnnData object used to initialize the model.

  • indices (Optional[Sequence[int]] (default: None)) – Indices of cells in adata to use. If None, all cells are used.

  • normalized (bool (default: False)) – Whether to return the normalized cell embedding (softmaxed) or not

  • give_mean (bool (default: True)) – Give mean of distribution or sample from it.

  • mc_samples (int (default: 5000)) – For distributions with no closed-form mean (e.g., logistic normal), how many Monte Carlo samples to take for computing mean.

  • batch_size (Optional[int] (default: None)) – Minibatch size for data loading into model. Defaults to scvi.settings.batch_size.

  • drop_last (Optional[bool] (default: None)) – Whether to drop the last incomplete batch. If None, automatically determined based on whether using paired PoE (True for paired, False for others).

Return type:

ndarray

Returns:

Low-dimensional topic for each cell.

get_loadings()#

Extract per-gene weights in the linear decoder.

Shape is genes by n_latent.

Return type:

dict

static __new__(cls, *args, **kwargs)#
Return type:

Any

train(group_indices_list, batch_size=128, max_epochs=None, use_gpu=None, train_size=0.9, validation_size=None, early_stopping=False, plan_kwargs=None, n_steps_kl_warmup=None, n_epochs_kl_warmup=400, **trainer_kwargs)#

Train a multigroup spVIPES model.

This method trains the model using a custom data splitter that handles multiple groups of cells separately while maintaining the shared-private latent space learning objective.

Parameters:
  • group_indices_list (list[list[int]]) – List of indices corresponding to each group of samples. Each inner list contains the indices for cells belonging to that specific group.

  • max_epochs (int, optional) – Number of passes through the dataset. If None, defaults to np.min([round((20000 / n_cells) * 400), 400]).

  • use_gpu (str, int, bool, optional) – GPU usage specification. Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., “cuda:0”), or use CPU (if False).

  • train_size (float, default 0.9) – Size of training set in the range [0.0, 1.0].

  • validation_size (float, optional) – Size of the validation set. If None, defaults to 1 - train_size. If train_size + validation_size < 1, the remaining cells belong to the test set.

  • batch_size (int, default 128) – Mini-batch size to use during training.

  • early_stopping (bool, default False) – Whether to perform early stopping. Additional arguments can be passed in **trainer_kwargs.

  • plan_kwargs (dict, optional) – Keyword arguments for the training plan. Arguments passed to train() will overwrite values present in plan_kwargs, when appropriate.

  • n_steps_kl_warmup (int, optional) – Number of training steps for KL warmup. Takes precedence over n_epochs_kl_warmup.

  • n_epochs_kl_warmup (int, default 400) – Number of epochs for KL divergence warmup.

  • **trainer_kwargs – Additional keyword arguments for the trainer.

Return type:

None

Returns:

None The model is trained in-place.

Notes

This method uses a specialized MultiGroupDataSplitter that ensures proper handling of multiple cell groups during training, maintaining the integrity of the shared-private latent space learning.