API

DISCERN

DISCERN for expression reconstruction.

class discern.DISCERN(**kwargs)

Basic DISCERN model holding a lot of configuration.

Parameters

**kwargs – DISCERNConfig init args.

wae_model

Keras model.

Type

Union[None, tf.keras.Model]

start_step

Epoch to start training from

Type

int

build_model(n_genes: int, n_labels: int, scale: float)

Initialize the auto-encoder model and defining the loss and optimizer.

compile(optimizer: tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2, scale: float = 15000.0)

Compile the model and sets losses and metrics.

Parameters
  • optimizer (tf.keras.optimizers.Optimizer) – Optimizer to use.

  • scale (float) – Numeric scaling factor for the losses. Defaults to 15000.

property decoder: tensorflow.python.keras.engine.training.Model

Return the decoder.

Returns

The decoder model.

Return type

tf.keras.Model

Raises
  • ValueError – If the decoder is not present.

  • AttributeError – If the model is not build.

property encoder: tensorflow.python.keras.engine.training.Model

Return the encoder.

Returns

The encoder model.

Return type

tf.keras.Model

Raises
  • ValueError – If the encoder is not present.

  • AttributeError – If the model is not build.

classmethod from_json(jsondata: Dict[str, Any]) DISCERNConfigType

Create an DISCERNConfig instance form hyperparameter json dictionary.

Parameters

jsondata (Dict[str, Any]) – Hyperparameters for this model.

Returns

An initialized DISCERNConfig instance

Return type

“DISCERNConfig”

Raises

KeyError – If required key not found in hyperparameter json.

generate_cells_from_latent(latent_codes: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], output_batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) Tuple[numpy.ndarray, numpy.ndarray]

Generate counts from latent codes and batch labels.

Parameters
  • latent_codes (Union[tf.Tensor, np.ndarray]) – Latent codes produced by encoder.

  • output_batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels for the output. Can also be continous for fuzzy batch association.

  • batch_size (int) – Size of one batch.

Returns

the generated count data and dropout probabilities.

Return type

Tuple[np.ndarray, np.ndarray]

generate_latent_codes(counts: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) Tuple[numpy.ndarray, numpy.ndarray]

Generate latent codes from count and batch labels.

Parameters
  • counts (Union[tf.Tensor, np.ndarray]) – Count data.

  • batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels. Can also be continous for fuzzy batch association.

  • batch_size (int) – Size of one batch.

Returns

latent codes and sigma values.

Return type

Tuple[np.ndarray, np.ndarray]

get_optimizer() tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2

Create an Optimizer instance.

Returns

The created optimizer.

Return type

tf.keras.optimizers.Optimizer

project_to_metadata(input_data: discern.io.DISCERNData, metadata: List[Tuple[str, str]], save_path: pathlib.Path, store_sigmas: bool = False)

Project to average batch with filtering for certain metadata.

Parameters
  • input_data (io.DISCERNData) – Input cells.

  • metadata (List[Tuple[str, str]]) – Column-value-Pair used for filerting the cells. Column should match to name in input_data.obs and value to a key in this column.

  • save_path (pathlib.Path) – Path for saving the created AnnData objects.

  • store_sigmas (bool, optional) – Save sigmas in obsm. Defaults to False.

reconstruct(input_data: discern.io.DISCERNData, column: Optional[str], column_value: Optional[str], store_sigmas: bool = False) anndata._core.anndata.AnnData

Reconstruct expression data.

Parameters
  • input_data (io.DISCERNData) – DISCERN preprocessed input data

  • column (Optional[str]) – Column value used for reconstruction. If None just auto-encode the data.

  • column_value (Optional[str], optional) – Value in column used for reconstruction. If None, project to the average from column.

  • store_sigmas (bool, optional) – Store latent space sigma values in final output. Defaults to False.

Returns

Reconstructed expression data with

input data in raw and DISCERN latent space in obsm.

Return type

anndata.AnnData

restore_model(directory: pathlib.Path)

Restores model from hdf5 checkpoint and compiles it.

Parameters

directory (pathlib.Path) – checkpoint directory.

training(inputdata: discern.io.DISCERNData, callbacks: Optional[List[tensorflow.python.keras.callbacks.Callback]] = None, savepath: Optional[pathlib.Path] = None, max_steps: int = 25) Dict[str, float]

Train the network max_steps times.

Parameters
  • inputdata (io.DISCERNData) – Training data.

  • max_steps (int) – Maximum number of epochs to train. Defaults to 25.

  • callbacks – (List[tf.keras.callbacks.Callback], optional): List of keras callbacks to use. Defaults to None.

  • savepath (pathlib.Path, optional) – Filename to save model. Defaults to None.

Returns

Metrics from fit method.

Return type

Dict[str,float]

class discern.WAERecipe(params: Dict[str, Any], inputs: Optional[Dict[str, anndata._core.anndata.AnnData]] = None, input_files: Optional[Union[Dict[pathlib.Path, str], List[pathlib.Path]]] = None, n_jobs: int = - 1)

For storing and processing data.

Can apply filtering, clustering. merging and splitting.

Parameters
  • params (Dict[str,Any]) – Default parameters for preprocessing.

  • inputs (Dict[str,anndata.AnnData]) – Input AnnData with batchname as dict-key. Defaults to None.

  • input_files (List[pathlib.Path]) – Paths to raw input data.

  • None. (Defaults to) –

  • n_jobs (int) – Number of jobs/processes to use. Defaults to -1.

sc_raw

Read and concatenated input data.

Type

io.DISCERNData

config

Parameters calculated during preprocessing.

Type

Dict[str, Any]

params

Default parameters for preprocessing.

Type

Dict[str,Any]

celltypes()

Aggregate celltype information.

property config

Configuration from preprocessing.

dump(job_dir: pathlib.Path)

Dump recipe results to directory.

Parameters

job_dir (pathlib.Path) – The directory to save the results at.

dump_tf_records(path: pathlib.Path)

Dump the TFRecords to disk.

Parameters

path (pathlib.Path) – Folder to save the TFrecords in.

filtering(min_genes: int, min_cells: int)

Apply filtering in-place.

Parameters
  • min_genes (int) – Minimum number of genes to be present for cell to be considered.

  • min_cells (int) – Minimum number of cells to be present for gene to be considered.

classmethod from_path(job_dir: pathlib.Path) discern.preprocessing.WAERecipe

Create WAERecipe from DISCERN directory.

Returns

The initalized object.

Return type

WAERecipe

kernel_mmd(neighbors_mmd: int = 50, no_cells_mmd: int = 2000)

Apply kernel mmd metrics based on nearest neighbors in-place.

Parameters
  • neighbors_mmd (int) – Number of neighbors Defaults to 50.

  • no_cells_mmd (int) – Number of cells used for calculation of mmd. Defaults to 2000.

  • projector (Optional[np.ndarray]) – PCA-Projector to compute distancs in precomputed PCA space. Defaults to None.

mean_var_scaling()

Apply Mean-Variance scaling if ‘fixed_scaling’ is present in params.

projection_pca(pcs: int = 25)

Apply PCA projection.

Parameters

pcs (int) – Number of principle components. Defaults to 32.

scaling(scale: int)

Apply scaling in-place.

Parameters

scale (int) – Value use to scale with LSN.

split(split_seed: int, valid_cells_ratio: Union[int, float], mmd_cells_ratio: Union[int, float] = 1.0)

Split cells to train and validation set.

Parameters
  • split_seed (int) – Seed used with numpy.

  • valid_cells_ratio (Union[int,float]) – Number or ratio of cells in the validation set.

  • mmd_cells_ratio (Optional[Union[int, float]]) – Number of validation

  • optimization. (cells to use for mmd calculation during hyperparameter) – Defaults to 1. which is valid_cells_no.

Reconstruction functions

Basic module containing all functions for running and execution of Model.

class discern.estimators.DISCERNRunner(debug: bool = False, gpus: Optional[List[int]] = None)

Run DISCERN training or project.

Basic DISCERN architecture.

class discern.estimators.batch_integration.DISCERN(**kwargs)

Basic DISCERN model holding a lot of configuration.

Parameters

**kwargs – DISCERNConfig init args.

wae_model

Keras model.

Type

Union[None, tf.keras.Model]

start_step

Epoch to start training from

Type

int

build_model(n_genes: int, n_labels: int, scale: float)

Initialize the auto-encoder model and defining the loss and optimizer.

compile(optimizer: tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2, scale: float = 15000.0)

Compile the model and sets losses and metrics.

Parameters
  • optimizer (tf.keras.optimizers.Optimizer) – Optimizer to use.

  • scale (float) – Numeric scaling factor for the losses. Defaults to 15000.

property decoder: tensorflow.python.keras.engine.training.Model

Return the decoder.

Returns

The decoder model.

Return type

tf.keras.Model

Raises
  • ValueError – If the decoder is not present.

  • AttributeError – If the model is not build.

property encoder: tensorflow.python.keras.engine.training.Model

Return the encoder.

Returns

The encoder model.

Return type

tf.keras.Model

Raises
  • ValueError – If the encoder is not present.

  • AttributeError – If the model is not build.

classmethod from_json(jsondata: Dict[str, Any]) DISCERNConfigType

Create an DISCERNConfig instance form hyperparameter json dictionary.

Parameters

jsondata (Dict[str, Any]) – Hyperparameters for this model.

Returns

An initialized DISCERNConfig instance

Return type

“DISCERNConfig”

Raises

KeyError – If required key not found in hyperparameter json.

generate_cells_from_latent(latent_codes: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], output_batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) Tuple[numpy.ndarray, numpy.ndarray]

Generate counts from latent codes and batch labels.

Parameters
  • latent_codes (Union[tf.Tensor, np.ndarray]) – Latent codes produced by encoder.

  • output_batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels for the output. Can also be continous for fuzzy batch association.

  • batch_size (int) – Size of one batch.

Returns

the generated count data and dropout probabilities.

Return type

Tuple[np.ndarray, np.ndarray]

generate_latent_codes(counts: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) Tuple[numpy.ndarray, numpy.ndarray]

Generate latent codes from count and batch labels.

Parameters
  • counts (Union[tf.Tensor, np.ndarray]) – Count data.

  • batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels. Can also be continous for fuzzy batch association.

  • batch_size (int) – Size of one batch.

Returns

latent codes and sigma values.

Return type

Tuple[np.ndarray, np.ndarray]

get_optimizer() tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2

Create an Optimizer instance.

Returns

The created optimizer.

Return type

tf.keras.optimizers.Optimizer

project_to_metadata(input_data: discern.io.DISCERNData, metadata: List[Tuple[str, str]], save_path: pathlib.Path, store_sigmas: bool = False)

Project to average batch with filtering for certain metadata.

Parameters
  • input_data (io.DISCERNData) – Input cells.

  • metadata (List[Tuple[str, str]]) – Column-value-Pair used for filerting the cells. Column should match to name in input_data.obs and value to a key in this column.

  • save_path (pathlib.Path) – Path for saving the created AnnData objects.

  • store_sigmas (bool, optional) – Save sigmas in obsm. Defaults to False.

reconstruct(input_data: discern.io.DISCERNData, column: Optional[str], column_value: Optional[str], store_sigmas: bool = False) anndata._core.anndata.AnnData

Reconstruct expression data.

Parameters
  • input_data (io.DISCERNData) – DISCERN preprocessed input data

  • column (Optional[str]) – Column value used for reconstruction. If None just auto-encode the data.

  • column_value (Optional[str], optional) – Value in column used for reconstruction. If None, project to the average from column.

  • store_sigmas (bool, optional) – Store latent space sigma values in final output. Defaults to False.

Returns

Reconstructed expression data with

input data in raw and DISCERN latent space in obsm.

Return type

anndata.AnnData

restore_model(directory: pathlib.Path)

Restores model from hdf5 checkpoint and compiles it.

Parameters

directory (pathlib.Path) – checkpoint directory.

training(inputdata: discern.io.DISCERNData, callbacks: Optional[List[tensorflow.python.keras.callbacks.Callback]] = None, savepath: Optional[pathlib.Path] = None, max_steps: int = 25) Dict[str, float]

Train the network max_steps times.

Parameters
  • inputdata (io.DISCERNData) – Training data.

  • max_steps (int) – Maximum number of epochs to train. Defaults to 25.

  • callbacks – (List[tf.keras.callbacks.Callback], optional): List of keras callbacks to use. Defaults to None.

  • savepath (pathlib.Path, optional) – Filename to save model. Defaults to None.

Returns

Metrics from fit method.

Return type

Dict[str,float]

Module for custom callbacks, especially visualization(UMAP).

class discern.estimators.callbacks.DelayedEarlyStopping(delay: int = 0, monitor: str = 'val_loss', min_delta: float = 0.0, patience: int = 0, verbose: int = 0, mode: str = 'auto', baseline: Optional[float] = None, restore_best_weights: bool = False)

Stop when a monitored quantity has stopped improving after some delay time. :param delay: Number of epochs to wait until applying early stopping. :type delay: int :param Defaults to 0: :param which means standard early stopping.: :param monitor: Quantity to be monitored. :type monitor: str :param min_delta: Minimum change in the monitored quantity :type min_delta: float :param to qualify as an improvement: :param i.e. an absolute: :param change of less than min_delta: :param will count as no: :param improvement. Defaults to val_loss.: :param patience: Number of epochs with no improvement :type patience: int :param after which training will be stopped. Defaults to 0.: :param verbose: verbosity mode. Defaults to 0. :type verbose: int :param mode: One of {“auto”, “min”, “max”}. In min mode, :type mode: str :param training will stop when the quantity: :param monitored has stopped decreasing; in max: :param mode it will stop when the quantity: :param monitored has stopped increasing; in auto: :param mode: :param the direction is automatically inferred: :param from the name of the monitored quantity. Defaults to auto.: :param baseline: Baseline value for the monitored quantity. :type baseline: float, optional :param Training will stop if the model doesn’t show improvement over the: :param baseline. Defaults to None.: :param restore_best_weights: Whether to restore model weights from :type restore_best_weights: bool :param the epoch with the best value of the monitored quantity.: :param If False: :param the model weights obtained at the last step of: :param training are used. Defaults to False.:

on_batch_begin(batch, logs=None)

A backwards compatibility alias for on_train_batch_begin.

on_batch_end(batch, logs=None)

A backwards compatibility alias for on_train_batch_end.

on_epoch_begin(epoch, logs=None)

Called at the start of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Parameters
  • epoch – integer, index of epoch.

  • logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_epoch_end(epoch: int, logs: Optional[Dict[str, Any]] = None)

Call on epoch end to check for early stopping.

on_predict_batch_begin(batch, logs=None)

Called at the beginning of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_predict_batch_end(batch, logs=None)

Called at the end of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Metric results for this batch.

on_predict_begin(logs=None)

Called at the beginning of prediction.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_predict_end(logs=None)

Called at the end of prediction.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_batch_begin(batch, logs=None)

Called at the beginning of a batch in evaluate methods.

Also called at the beginning of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_test_batch_end(batch, logs=None)

Called at the end of a batch in evaluate methods.

Also called at the end of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Metric results for this batch.

on_test_begin(logs=None)

Called at the beginning of evaluation or validation.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_end(logs=None)

Called at the end of evaluation or validation.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_train_batch_begin(batch, logs=None)

Called at the beginning of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_train_batch_end(batch, logs=None)

Called at the end of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Metric results for this batch.

on_train_begin(logs=None)

Called at the beginning of training.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_train_end(logs=None)

Called at the end of training.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

class discern.estimators.callbacks.VisualisationCallback(outdir: Union[str, pathlib.Path], data: anndata._core.anndata.AnnData, batch_size: int, freq: int = 10)

Redo prediction on datasets and visualize via UMAP.

Parameters
  • outdir (pathlib.Path) – Output directory for the figures.

  • data (anndata.AnnData) – Input cells.

  • batch_size (int) – Numer of cells to visualize.

  • freq (int) – Frequency for computing visualisations in epochs. Defaults 10.

on_batch_begin(batch, logs=None)

A backwards compatibility alias for on_train_batch_begin.

on_batch_end(batch, logs=None)

A backwards compatibility alias for on_train_batch_end.

on_epoch_begin(epoch, logs=None)

Called at the start of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Parameters
  • epoch – integer, index of epoch.

  • logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_epoch_end(epoch: int, logs: Optional[Dict[str, float]] = None)

Run on epoch end. Executes only at specified frequency.

Parameters
  • epoch (int) – Epochnumber.

  • logs (Optional[Dict[str, float]]) – losses and metrics passed by tensorflow fit . Defaults to None.

on_predict_batch_begin(batch, logs=None)

Called at the beginning of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_predict_batch_end(batch, logs=None)

Called at the end of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Metric results for this batch.

on_predict_begin(logs=None)

Called at the beginning of prediction.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_predict_end(logs=None)

Called at the end of prediction.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_batch_begin(batch, logs=None)

Called at the beginning of a batch in evaluate methods.

Also called at the beginning of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_test_batch_end(batch, logs=None)

Called at the end of a batch in evaluate methods.

Also called at the end of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Metric results for this batch.

on_test_begin(logs=None)

Called at the beginning of evaluation or validation.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_end(logs=None)

Called at the end of evaluation or validation.

Subclasses should override for any actions to run.

Parameters

logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_train_batch_begin(batch, logs=None)

Called at the beginning of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_train_batch_end(batch, logs=None)

Called at the end of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters
  • batch – integer, index of batch within the current epoch.

  • logs – dict. Metric results for this batch.

on_train_begin(logs: Optional[Dict[str, float]] = None)

Run on training start.

Parameters

logs (Optional[Dict[str, float]]) – logs, not used only for compatibility reasons.

on_train_end(logs: Optional[Dict[str, float]] = None)

Run on training end.

Parameters

logs (Optional[Dict[str, float]]) – losses and metrics passed by tensorflow fit . Defaults to None.

discern.estimators.callbacks.create_callbacks(early_stopping_limits: Dict[str, Any], exp_folder: pathlib.Path, inputdata: Optional[discern.io.DISCERNData] = None, umap_cells_no: Optional[int] = None, profile_batch: int = 2, freq_of_viz: int = 30) List[tensorflow.python.keras.callbacks.Callback]

Generate list of callbacks used by tensorflow model.fit.

Parameters
  • early_stopping_limits (Dict[str,Any) – Patience, min_delta, and delay for early stopping.

  • exp_folder (str) – Folder where everything is saved.

  • inputdata (io.DISCERNData, optional) – Input data to use. Defaults to None

  • umap_cells_no (int) – Number of cells for UMAP.

  • profile_batch (int) – Number of the batch to do extensive profiling. Defaults to 2. (see tf.keras.callbacks.Tensorboard)

  • freq_of_viz (int) – Frequency of visualization callback in epochs. Defaults to 30.

Returns

callbacks used by tensorflow model.fit.

Return type

List[callbacks.Callback]

Custom Keras Layers.

class discern.estimators.customlayers.GaussianReparametrization(trainable=True, name=None, dtype=None, dynamic=False, **kwargs)

Reparametrization layer using gaussians.

build(input_shape: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor])

Build the layer, usually automatically called at first call.

Parameters

input_shape (Tuple[tf.Tensor, tf.Tensor]) – Shape of the inputs. Both should have as last dimension the size of the latent space.

static call(inputs: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], **kwargs: Dict[str, Any]) tensorflow.python.framework.ops.Tensor

Call the layer.

Parameters
  • inputs (Tuple[tf.Tensor, tf.Tensor]) – latent codes and sigmas from encoder

  • **kwargs (Dict[str,Any]) – Additional attributes, should contain ‘training’”.

Returns

Rescaled latent codes.

Return type

tf.Tensor

class discern.estimators.customlayers.MMDPP(scale: float, **kwargs)

mmdpp penalty calculation in keras layer.

Parameters

scale (float) – Value used to scale the output.

scale

Value used to scale the output.

Type

float

build(input_shape: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor])

Build the layer, usually automatically called at first call.

Parameters

input_shape (Tuple[tf.Tensor, tf.Tensor]) – Shape of the inputs. Both shapes should have the size of the latent space as last dimension.

call(inputs: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], **kwargs: Dict[str, Any]) tensorflow.python.framework.ops.Tensor

Call the layer.

Parameters

inputs (Tuple[tf.Tensor, tf.Tensor]) – The latent codes and sigma values from encoder.

Returns

mmdpp penalty loss.

Return type

tf.Tensor

get_config() Dict[str, Any]

Return configuration of the layer. Used for serialization.

Returns

Configuration of the layer

Return type

Dict[str,Any]

class discern.estimators.customlayers.SigmaRegularization(trainable=True, name=None, dtype=None, dynamic=False, **kwargs)

Regularization term to push sigmas near to one.

build(input_shape: tensorflow.python.framework.ops.Tensor)

Build the layer, usually automatically called at first call.

Parameters

input_shape (tf.Tensor) – Shape of the input.

call(inputs: tensorflow.python.framework.ops.Tensor, **kwargs: Dict[str, Any]) tensorflow.python.framework.ops.Tensor

Call the layer.

Parameters

inputs (tf.Tensor) – Inputs to layer consisting of sigma values.

Returns

Regularization loss

Return type

tf.Tensor

discern.estimators.customlayers.condlayernorm(input_cells: tensorflow.python.framework.ops.Tensor, labels: tensorflow.python.framework.ops.Tensor, size: int, regularization: Optional[Dict[str, Any]] = None) tensorflow.python.framework.ops.Tensor

Create a conditioning layer.

Parameters
  • input_cells (tf.Tensor) – Input to the laxer

  • labels (tf.Tensor) – Label for each sample.

  • size (int) – Size of the output/input

Returns

The output of the conditioning layer, with the same size

as the input and spezified in size.

Return type

tf.Tensor

discern.estimators.customlayers.getmembers() Dict[str, tensorflow.python.keras.engine.base_layer.Layer]

Return a dictionary of all custom layers defined in this module.

Returns

Name and class of custom layers.

Return type

Dict[str, tf.keras.layers.Layer]

discern.estimators.customlayers.mmdpp_penalty(sample_qz: tensorflow.python.framework.ops.Tensor, sample_pz: tensorflow.python.framework.ops.Tensor, encoder_sigma: tensorflow.python.framework.ops.Tensor, total_number_cells: float, latent_dim: int) tensorflow.python.framework.ops.Tensor

Calculate the mmdpp penalty.

Based on https://github.com/tolstikhin/wae/blob/master/improved_wae.py

Parameters
  • sample_qz (tf.Tensor) – Sample from the aggregated posterior.

  • sample_pz (tf.Tensor) – Sample from the prior.

  • encoder_sigma (tf.Tensor) – Sigma values from the random encoder.

  • total_number_cells (int) – Total number of samples for scaling.

  • latent_dim (int) – Dimension of the latent space.

Returns

mmdpp penalty loss.

Return type

tf.Tensor

Module containing all losses.

class discern.estimators.losses.DummyLoss(reduction: int = 'auto', name: str = 'Dummy')

Dummy loss simpy passing the input y_pred as loss output.

Parameters
  • reduction (int) – Reduction type to use. Defaults to tf.keras.losses.Reduction.AUTO.

  • name (str) – Name of the loss. Defaults to ‘Dummy’.

static call(y_true, y_pred)

Call the loss and returns the predicted value.

classmethod from_config(config)

Instantiates a Loss from its config (output of get_config()).

Parameters

config – Output of get_config().

Returns

A Loss instance.

class discern.estimators.losses.HuberLoss(delta=1.0, reduction='auto', name='huber_loss')

Huber loss.

call(y_true, y_pred)

Calculate Huber loss.

classmethod from_config(config)

Instantiates a Loss from its config (output of get_config()).

Parameters

config – Output of get_config().

Returns

A Loss instance.

class discern.estimators.losses.Lnorm(p: int, name: str = 'LNorm', reduction: str = 'auto', axis: int = 0, epsilon: float = 1e-20, use_root: bool = False)

Calculate the Lnorm of input and output.

Parameters
  • p (int) – Which Lnorm to calculate, for example p=1 means L1-Norm.

  • name (str) – Description of parameter name. Defaults to ‘LNorm’.

  • reduction (int) – Reduction type to use. Defaults to tf.keras.losses.Reduction.AUTO.

  • axis (int) – Axis on which the norm is calculated. Defaults to 0.

  • epsilon (float) – Small value to add if (square)root is used.. Defaults to 1e-20.

  • use_root (bool) – Use (square)root. Defaults to False.

pnorm

Which Lnorm to calculate, for example p=1 means L1-Norm.

Type

int

epsilon

Small value to add if (square)root is used.

Type

float

axis

Axis on which the norm is calculated.

Type

int

use_root

Use (square)root.

Type

bool

call(y_true, y_pred)

Call and returns the loss.

classmethod from_config(config)

Instantiates a Loss from its config (output of get_config()).

Parameters

config – Output of get_config().

Returns

A Loss instance.

get_config()

Serialize the loss.

class discern.estimators.losses.MaskedCrossEntropy(zeros: numpy.ndarray, zeros_eps: float = 1e-06, lower_label_smoothing: float = 0.0, **kwargs)

Categorical crossentropy Loss with creates mask in true data.

Parameters
  • zeros (np.ndarray) – Value(s) which represent values to be zeros.

  • zeros_eps (float) – Value to check for approximate matching to zeros.

call(y_true: tensorflow.python.framework.ops.Tensor, y_pred: tensorflow.python.framework.ops.Tensor) tensorflow.python.framework.ops.Tensor

Call of the loss.

classmethod from_config(config)

Instantiates a Loss from its config (output of get_config()).

Parameters

config – Output of get_config().

Returns

A Loss instance.

get_config()

Return the configuration of the loss.

discern.estimators.losses.getmembers() Dict[str, Union[tensorflow.python.keras.losses.Loss, tensorflow.python.keras.metrics.Metric]]

Return a dictionary of all custom losses and metrics defined in this module.

Returns

Name and class of custom losses and metrics.

Return type

Dict[str, Union[tf.keras.losses.Loss,tf.keras.metrics.Metric]]

discern.estimators.losses.reconstruction_loss(loss_type: Dict[str, Any]) tensorflow.python.framework.ops.Tensor

Generate different loss classes based on dictionary.

Parameters

loss_type (Dict[str, Any]) – Dictionary with name as classname of the loss and all parameter to be set.

Returns

Calculated loss (object)

Return type

tf.Tensor

Raises

KeyError – When the loss name is not supported.

Basic module for running an experiment.

class discern.estimators.run_exp.CheckMetaData(dataframe: pandas.core.frame.DataFrame)

Check MetaData column value pair in dataframe lazy.

check(metadata_tuple: List[str]) Tuple[str, str]

Check if column value pair is present.

Parameters

metadata_tuple (List[str]) – Column value pair

Returns

Input column value pair.

Return type

Tuple[str, str]

class discern.estimators.run_exp.DISCERNRunner(debug: bool = False, gpus: Optional[List[int]] = None)

Run DISCERN training or project.

discern.estimators.run_exp.run_exp_multiprocess(exp_folder: pathlib.Path, available_gpus: List[int], func: Callable[..., None], kwargs: Optional[Dict[str, Any]] = None) int

Run an experiment with forced GPU setting (suitable for python mp).

Parameters
  • exp_folder (pathlib.Path) – Path to the experiement.

  • available_gpus (List[int]) – List of available GPUs.

  • func (Callable[..., None]) – Train or eval function.

  • kwargs (Optional[Dict[str, Any]]) – Additional arguments passed to the called functions. Defaults to None.

Returns

Status code, 0 is success, 1 is failure.

Return type

int

discern.estimators.run_exp.run_projection(exp_folder: pathlib.Path, metadata: List[str], infile: Optional[Union[str, pathlib.Path]], all_batches: bool, store_sigmas: bool)

Run projection to metadata on trained model.

Parameters
  • exp_folder (pathlib.Path) – Folder/ Experiment name to the trained model.

  • metadata (List[str]) – Metadata to use for integration. Should be like List[column name:value,…]

  • infile (Optional[Union[str, pathlib.Path]]) – Alternative input file.

  • all_batches – (bool): Project to all batches.

  • store_sigmas – (bool): Store sigmas after projection.

discern.estimators.run_exp.run_train(exp_folder: pathlib.Path, input_path: Optional[pathlib.Path] = None)

Run an experiment.

Parameters
  • exp_folder (pathlib.Path) – Experiments folders.

  • input_path (Optional[pathlib.Path]) – Input path for the TFRecords, if None the experiments folder is used. Defaults to None.

discern.estimators.run_exp.setup_exp(exp_folder: pathlib.Path) Tuple[discern.estimators.batch_integration.DISCERN, Dict[str, Any]]

Setup experiment, by assigning the GPU and parsing the model.

Parameters

exp_folder (pathlib.Path) – Experiment folder.

Returns

The model, the output path for training and all parameters.

Return type

Tuple[batch_integration.DISCERN, pathlib.Path, Dict[str, Any]]

A number of classes and functions used across all types of models.

discern.estimators.utilities_wae.create_decoder(latent_dim: int, output_cells_dim: int, dec_layers: List[int], dec_norm_type: List[str], activation_fn: Callable[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], output_fn: Optional[str], n_labels: int, regularization: float, output_lsn: Optional[float] = None, conditional_regularization: Optional[Dict[str, Any]] = None) tensorflow.python.keras.engine.training.Model

Create a decoder.

Parameters
  • latent_dim (int) – Dimension of the latent space.

  • output_cells_dim (int) – Dimension of the output.

  • dec_layers (List[int]) – Dimensions for the decoder layers.

  • dec_norm_type (List[str]) – Normalization type, eg. BatchNormalization.

  • activation_fn (Callable[[tf.Tensor], tf.Tensor]) – Activation function in the model.

  • output_fn (str) – Function to produce gene counts.

  • n_labels (int) – Number of labels for the batch labels.

  • regularization (float) – Dropout rate.

  • output_lsn (Optional[float]) – Scaling parameter, used for softmax and LSN.

Returns

The decoder.

Return type

tf.keras.Model

discern.estimators.utilities_wae.create_encoder(latent_dim: int, enc_layers: List[int], enc_norm_type: List[str], activation_fn: Callable[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], input_dim: int, n_labels: int, regularization: float, conditional_regularization: Optional[Dict[str, Any]] = None) tensorflow.python.keras.engine.training.Model

Create an Encoder.

Parameters
  • latent_dim (int) – Dimension of the latent space.

  • enc_layers (List[int]) – Dimension of the encoding layers.

  • enc_norm_type (List[str]) – Normalization type, eg. BatchNormalization.

  • activation_fn (Callable[[tf.Tensor], tf.Tensor]) – Activation function in the model.

  • input_dim (int) – Dimension of the input.

  • n_labels (int) – Number of labels for the batch labels.

  • regularization (float) – Rate of dropout.

Returns

The encoder.

Return type

tf.keras.Model

Raises

NotImplementedError – If enc_norm_type is not understood.

discern.estimators.utilities_wae.create_model(encoder: tensorflow.python.keras.engine.training.Model, decoder: tensorflow.python.keras.engine.training.Model, total_number_cells: float, name: str = 'WAE') tensorflow.python.keras.engine.training.Model

Generate a model from encoder and decoder, adding gaussian noise (reparametrization).

Parameters
  • encoder (tf.keras.Model) – The encoder.

  • decoder (tf.keras.Model) – The decoder.

  • total_number_cells (int) – Total number of cells used for scaling MMDPP.

  • name (str) – Name of the model. Defaults to “WAE”.

Returns

The created model including SigmaRegularization and MMDPP loss.

Return type

tf.keras.Model

discern.estimators.utilities_wae.load_model_from_directory(directory: pathlib.Path) Tuple[Union[None, tensorflow.python.keras.engine.training.Model], int]

Load model from latest checkpoint using its hdf5 file.

Parameters

directory (pathlib.Path) – Name of the directory with hdf5 files.

Returns

Full model and last step.

None and zero if no models could be loaded.

Return type

Tuple[Union[None, tf.keras.Model], int]

I/O functions

discern i/o operations.

class discern.io.DISCERNData(adata: anndata._core.anndata.AnnData, batch_size: int, cachefile: Optional[Union[str, pathlib.Path]] = '')

DISCERNData for storing and reading inputs.

property batch_size: int

Get batch size.

property config: Dict[str, Any]

Get DISCERN data dependent configuration.

classmethod from_folder(folder: pathlib.Path, batch_size: int, cachefile: Optional[Union[str, pathlib.Path]] = '') discern.io.DISCERNData

Read data from DISCERN folder.

Returns

The data including AnnData and TFRecords.

Return type

DISCERNData

classmethod read_h5ad(filename: pathlib.Path, batch_size: int, cachefile: Optional[Union[str, pathlib.Path]] = '') discern.io.DISCERNData

Create DISCERNData from anndata H5AD file.

Returns

The single cell data.

Return type

DISCERNData

property tfdata: Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, tensorflow.python.data.ops.dataset_ops.DatasetV2]

The accociated tf.data.Datasets.

Returns

Training and validation data.

Return type

Tuple[tf.data.Dataset, tf.data.Dataset]

property zeros

Get Zero representation in current data.

class discern.io.TFRecordsWriter(out_dir: pathlib.Path)

Context manager to be used for writing tf.data.Dataset to TFRecord file.

Parameters

out_dir (pathlib.Path) – Path to the directory where to write the TFRecords.

out_dir

Path to the directory where to write the TFRecords.

Type

str

write_dataset(dataset: tensorflow.python.data.ops.dataset_ops.DatasetV2, split: str)

Write tf.data.Dataset to TFRecord specified by split.

Parameters
  • dataset (tf.data.Dataset) – Dataset to be written.

  • split (str) – Subfile to use: train or valid.

Raises

ValueError – If split is not supported.

discern.io.estimate_csr_nbytes(mat: numpy.ndarray) int

Estimates the size of a sparse matrix generated from numpy array.

Parameters

mat (np.ndarray) – Input array.

Returns

Estimated size of the sparse matrix.

Return type

int

discern.io.generate_h5ad(counts: Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]], var: pandas.core.frame.DataFrame, obs: pandas.core.frame.DataFrame, save_path: Optional[pathlib.Path] = None, threshold: float = 0.1, **kwargs) anndata._core.anndata.AnnData

Generate AnnData format and can save it to file.

Parameters
  • counts (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – Count data (X in AnnData).

  • var (pd.DataFrame) – Variables dataframe.

  • obs (pd.DataFrame) – Observations dataframe.

  • save_path (Optional[pathlib.Path]) – Save path for the AnnData in h5py file. Defaults to None.

  • threshold (float) – Set values lower than threshold to zero. Defaults to 0.1.

  • kwargs – Keyword arguments passed to anndata.AnnData.

Returns

The AnnData file.

Return type

anndata.AnnData

discern.io.make_dataset_from_anndata(adata: anndata._core.anndata.AnnData, for_tfrecord: bool = False) Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, tensorflow.python.data.ops.dataset_ops.DatasetV2]

Generate TensorFlow Dataset from AnnData object.

Parameters
  • adata (anndata.AnnData) – Input cells

  • for_tfrecord (bool) – make output for writing TFrecords. Defaults to False.

Returns

The training and validation datasets

Return type

Tuple[tf.data.Dataset, tf.data.Dataset]

discern.io.np_one_hot(labels: pandas.core.arrays.categorical.Categorical) numpy.ndarray

One hot encode a numpy array.

Parameters

labels (pd.Categorical) – integer values used as indices

Returns

One hot encoded labels

Return type

np.ndarray

discern.io.parse_tfrecords(tfr_files: Union[pathlib.Path, List[pathlib.Path]], genes_no: int, n_labels: int) tensorflow.python.data.ops.dataset_ops.DatasetV2

Generate TensorFlow dataset from TensorFlow records file(s).

Parameters
  • tfr_files (Union[pathlib.Path, List[pathlib.Path]]) – TFRecord file(s).

  • genes_no (int) – Number of genes in the TFRecords.

  • n_labels (int) – Number of batch labels

  • batch_size (int) – Size of one batch

Returns

Dataset containing ‘input_data’,

’batch_input_enc’ and ‘batch_input_dec’

Return type

tf.data.Dataset

Preprocessing functions

Contains the GeneMatrix class, used to represent the scRNA-seq data.

class discern.preprocessing.WAERecipe(params: Dict[str, Any], inputs: Optional[Dict[str, anndata._core.anndata.AnnData]] = None, input_files: Optional[Union[Dict[pathlib.Path, str], List[pathlib.Path]]] = None, n_jobs: int = - 1)

For storing and processing data.

Can apply filtering, clustering. merging and splitting.

Parameters
  • params (Dict[str,Any]) – Default parameters for preprocessing.

  • inputs (Dict[str,anndata.AnnData]) – Input AnnData with batchname as dict-key. Defaults to None.

  • input_files (List[pathlib.Path]) – Paths to raw input data.

  • None. (Defaults to) –

  • n_jobs (int) – Number of jobs/processes to use. Defaults to -1.

sc_raw

Read and concatenated input data.

Type

io.DISCERNData

config

Parameters calculated during preprocessing.

Type

Dict[str, Any]

params

Default parameters for preprocessing.

Type

Dict[str,Any]

celltypes()

Aggregate celltype information.

property config

Configuration from preprocessing.

dump(job_dir: pathlib.Path)

Dump recipe results to directory.

Parameters

job_dir (pathlib.Path) – The directory to save the results at.

dump_tf_records(path: pathlib.Path)

Dump the TFRecords to disk.

Parameters

path (pathlib.Path) – Folder to save the TFrecords in.

filtering(min_genes: int, min_cells: int)

Apply filtering in-place.

Parameters
  • min_genes (int) – Minimum number of genes to be present for cell to be considered.

  • min_cells (int) – Minimum number of cells to be present for gene to be considered.

classmethod from_path(job_dir: pathlib.Path) discern.preprocessing.WAERecipe

Create WAERecipe from DISCERN directory.

Returns

The initalized object.

Return type

WAERecipe

kernel_mmd(neighbors_mmd: int = 50, no_cells_mmd: int = 2000)

Apply kernel mmd metrics based on nearest neighbors in-place.

Parameters
  • neighbors_mmd (int) – Number of neighbors Defaults to 50.

  • no_cells_mmd (int) – Number of cells used for calculation of mmd. Defaults to 2000.

  • projector (Optional[np.ndarray]) – PCA-Projector to compute distancs in precomputed PCA space. Defaults to None.

mean_var_scaling()

Apply Mean-Variance scaling if ‘fixed_scaling’ is present in params.

projection_pca(pcs: int = 25)

Apply PCA projection.

Parameters

pcs (int) – Number of principle components. Defaults to 32.

scaling(scale: int)

Apply scaling in-place.

Parameters

scale (int) – Value use to scale with LSN.

split(split_seed: int, valid_cells_ratio: Union[int, float], mmd_cells_ratio: Union[int, float] = 1.0)

Split cells to train and validation set.

Parameters
  • split_seed (int) – Seed used with numpy.

  • valid_cells_ratio (Union[int,float]) – Number or ratio of cells in the validation set.

  • mmd_cells_ratio (Optional[Union[int, float]]) – Number of validation

  • optimization. (cells to use for mmd calculation during hyperparameter) – Defaults to 1. which is valid_cells_no.

discern.preprocessing.merge_data_sets(raw_inputs: Dict[str, anndata._core.anndata.AnnData], batch_keys: Dict[str, str]) Tuple[anndata._core.anndata.AnnData, Dict[int, str]]

Merge a dictionary of AnnData files to a single AnnData object.

Parameters

raw_inputs (Dict[str, anndata.AnnData]) – Names and AnnData objects.

Returns

Merged AnnData and mapping from codes to names.

Return type

Tuple[anndata.AnnData, Dict[int, str]]

discern.preprocessing.read_process_serialize(job_path: pathlib.Path, with_tfrecords: bool = True)

Read data, preprocesses it and write output as anndata.AnnData and TFRecords.

Parameters
  • job_path (pathlib.Path) – Path of the experiments folder.

  • with_tfrecords (bool) – write tfrecord files. Defaults to True.

discern.preprocessing.read_raw_input(file_path: pathlib.Path) anndata._core.anndata.AnnData

Read input and converts it to anndata.AnnData object.

Currently h5, h5ad, loom, txt and a directory with matrix.mtx. genes.tsv and optional barcodes.tsv is supported.

Parameters

file_path (pathlib.Path) – (File-) Path to the input data.

Returns

The read AnnData object.

Return type

anndata.AnnData

Raises

ValueError – Datatype of input could not be interfered.

Online/Incremental learning

Module for supporting online learning.

class discern.online_learning.OnlineDISCERNRunner(debug: bool, gpus: List[int])

DISCERNRunner supporting online learning.

class discern.online_learning.OnlineWAERecipe(reference_adata: discern.io.DISCERNData, *args, **kwargs)

WAERecipe for the online setting.

This class allows the same preprocessing as done for reference data, which allows further processing in using DISCERN online learning.

celltypes()

Aggregate celltype information.

property config

Configuration from preprocessing.

dump(job_dir: pathlib.Path)

Dump recipe results to directory.

Parameters

job_dir (pathlib.Path) – The directory to save the results at.

dump_tf_records(path: pathlib.Path)

Dump the TFRecords to disk.

Parameters

path (pathlib.Path) – Folder to save the TFrecords in.

filtering(min_genes: int, *unused_args, **unused_kwargs)

Apply filtering in-place.

Parameters

min_genes (int) – Minimum number of genes to be present for cell to be considered.

fix_batch_labels()

Fix batch labels codes by including old categories.

classmethod from_path(job_dir: pathlib.Path) discern.preprocessing.WAERecipe

Create WAERecipe from DISCERN directory.

Returns

The initalized object.

Return type

WAERecipe

kernel_mmd(neighbors_mmd: int = 50, no_cells_mmd: int = 2000)

Apply kernel mmd metrics based on nearest neighbors in-place.

Parameters
  • neighbors_mmd (int) – Number of neighbors Defaults to 50.

  • no_cells_mmd (int) – Number of cells used for calculation of mmd. Defaults to 2000.

  • projector (Optional[np.ndarray]) – PCA-Projector to compute distancs in precomputed PCA space. Defaults to None.

mean_var_scaling()

Apply Mean-Variance scaling if ‘fixed_scaling’ is present.

projection_pca(pcs: int = 25)

Apply PCA projection from reference.

Parameters

pcs (int) – Number of principle components. Defaults to 32.

scaling(scale: int)

Apply scaling in-place.

Parameters

scale (int) – Value use to scale with LSN.

split(split_seed: int, valid_cells_ratio: Union[int, float], mmd_cells_ratio: Union[int, float] = 1.0)

Split cells to train and validation set.

Parameters
  • split_seed (int) – Seed used with numpy.

  • valid_cells_ratio (Union[int,float]) – Number or ratio of cells in the validation set.

  • mmd_cells_ratio (Optional[Union[int, float]]) – Number of validation

  • optimization. (cells to use for mmd calculation during hyperparameter) – Defaults to 1. which is valid_cells_no.

discern.online_learning.online_training(exp_folder: pathlib.Path, filename: pathlib.Path, freeze: bool)

Continue running an experiment.

Parameters
  • exp_folder (pathlib.Path) – Experiments folders.

  • filename (pathlib.Path) – Input path for new data set.

  • freeze (bool) – Freeze non conditional layers.

discern.online_learning.save_data(file: pathlib.Path, data: discern.io.DISCERNData, old_data: discern.io.DISCERNData)

Save data by concatenating to reference.

discern.online_learning.update_model(old_model: tensorflow.python.keras.engine.training.Model, new_model: tensorflow.python.keras.engine.training.Model, freeze_unchanged: bool = False) tensorflow.python.keras.engine.training.Model

Update the weights from an old model to a new model.

New model can have a bigger weight size for the layers in the first dimension.

Parameters
  • old_model (tf.keras.Model) – Old, possibly trained model.

  • new_model (tf.keras.Model) – New model, for which the weights should be set (inplace).

  • freeze_unchanged (bool, optional) – Freeze layers in the new model, which weights didn’t changed in size compared to the old model. Defaults to False.

Returns

The updated new model.

Return type

tf.keras.Model

Other functions

Module containing diverse TensorFlow related functions.

discern.functions.get_function_by_name(func_str: str) Callable[..., Any]

Get a function by its name.

Parameters

func_str (str) – Name of function including module, like ‘tensorflow.nn.softplus’.

Returns

the function.

Return type

Callable[Any]

Raises

KeyError – Function does not exists.

discern.functions.getmembers(name: str) Dict[str, Any]

Return a dictionary of all classes defined in this module.

Parameters

name (str) – Name of the module. Usually __name__.

Returns

Name and class of module.

Return type

Dict[str,Any]

discern.functions.parse_mean_var(features: pandas.core.frame.DataFrame, scalings: Dict[str, Union[str, float]]) Tuple[numpy.ndarray, numpy.ndarray]

Get mean and variance from anndata.var and scaling dict.

Parameters
  • features (pd.DataFrame) – anndata.var.

  • scalings – (Dict[str, Union[str, float]]): Scalings dict.

Returns

Mean and variance

Return type

Tuple[np.ndarray, np.ndarray]

discern.functions.prepare_train_valid(input_tfr: pathlib.Path) Tuple[pathlib.Path, pathlib.Path]

Get all filennames for train and validation files.

Parameters

input_tfr (pathlib.Path) – Name of input directory.

Returns

Add train and validation

files to seperate lists.

Return type

Tuple[List[pathlib.Path], List[pathlib.Path]]

discern.functions.rescale_by_params(adata: anndata._core.anndata.AnnData, scalings: Dict[str, Union[str, float, int]]) anndata._core.anndata.AnnData

Rescale counts by fixed mean and variance (inplace).

Reverting function scale_by_params.

Parameters
  • adata (anndata.AnnData) – Data to be rescaled.

  • scalings (Dict[str, Union[str, float, int]]) – Mean and scale values used for rescaling. Can be numeric or genes. genes means using precomputed values in anndata object like adata.var[‘mean_scaling’] and adata.var[‘var_scaling’], respectively.

Raises

ValueError – mean and variance not numeric or genes.

Returns

The rescaled AnnData object

Return type

anndata.AnnData

discern.functions.sample_counts(counts: numpy.ndarray, probabilities: numpy.ndarray, var: Optional[pandas.core.frame.DataFrame] = None, uns: Optional[Dict[str, Any]] = None) numpy.ndarray

Sample counts using probabilities.

Parameters
  • counts (np.ndarray) – Count data.

  • probabilities (np.ndarray) – Probability of being non-zero.

Returns

Sampled count data.

Return type

np.ndarray

discern.functions.scale(adata: anndata._core.anndata.AnnData, mean: Optional[Union[numpy.ndarray, float]] = None, var: Optional[Union[numpy.ndarray, float]] = None) anndata._core.anndata.AnnData

Scale counts by fixed mean and variance.

Parameters
  • adata (anndata.AnnData) – Data to be scaled.

  • mean (Optional[np.ndarray]) – Mean for scaling (will be zero-centered). Defaults to None

  • var (Optional[np.ndarray]) – Variance for scaling (will be rescaled to 1). Defaults to None

Returns

The AnnData file.

Return type

anndata.AnnData

discern.functions.scale_by_params(adata: anndata._core.anndata.AnnData, scalings: Dict[str, Union[str, float]]) anndata._core.anndata.AnnData

Scale counts by fixed mean and variance (inplace).

Parameters
  • adata (anndata.AnnData) – Data to be scaled.

  • scalings (Dict[str, Union[str, float]]) – Mean and scale values used for scaling. Can be numeric or genes. genes means using precomputed values in anndata object like adata.var[‘mean_scaling’] and adata.var[‘var_scaling’], respectively.

Raises

ValueError – mean and variance not numeric or genes.

Returns

The scaled AnnData object

Return type

anndata.AnnData

discern.functions.set_gpu_and_threads(n_threads: int, gpus: Optional[List[int]])

Limits CPU and GPU usage.

Parameters
  • n_threads (int) – Number of threads to use (get splittet to inter- and intra-op threads). Can be disabled by feeding 0.

  • gpus (List[int]) – List of GPUs to use. Use all GPUs by passing None and no GPUs by passing an empty list.

Module for fast MMD calculations.

discern.mmd.mmd_loss(random_cells: numpy.ndarray, valid_cells: numpy.ndarray, sigma: float) float

Compute mmd loss between random cells and valid cells.

Parameters
  • random_cells (np.ndarray) – Random generated cells.

  • valid_cells (np.ndarray) – Valid (decoded) cells.

  • sigma (float) – Precalculated Sigma value.

Returns

MMD loss between random and valid cells.

Return type

float