API¶

DISCERN¶

DISCERN for expression reconstruction.

class discern.DISCERN(**kwargs)¶

Basic DISCERN model holding a lot of configuration.

Parameters: **kwargs – DISCERNConfig init args.

wae_model¶

Keras model.

Type: Union[None, tf.keras.Model]

start_step¶

Epoch to start training from

Type: int

build_model(n_genes: int, n_labels: int, scale: float)¶: Initialize the auto-encoder model and defining the loss and optimizer.

compile(optimizer: tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2, scale: float = 15000.0)¶

Compile the model and sets losses and metrics.

Parameters

optimizer (tf.keras.optimizers.Optimizer) – Optimizer to use.
scale (float) – Numeric scaling factor for the losses. Defaults to 15000.

property decoder: tensorflow.python.keras.engine.training.Model¶

Return the decoder.

Returns

The decoder model.

Return type

tf.keras.Model

Raises

ValueError – If the decoder is not present.
AttributeError – If the model is not build.

property encoder: tensorflow.python.keras.engine.training.Model¶

Return the encoder.

Returns

The encoder model.

Return type

tf.keras.Model

Raises

ValueError – If the encoder is not present.
AttributeError – If the model is not build.

classmethod from_json(jsondata: Dict[str, Any]) → DISCERNConfigType¶

Create an DISCERNConfig instance form hyperparameter json dictionary.

Parameters: jsondata (Dict[str, Any]) – Hyperparameters for this model.
Returns: An initialized DISCERNConfig instance
Return type: “DISCERNConfig”
Raises: KeyError – If required key not found in hyperparameter json.

generate_cells_from_latent(latent_codes: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], output_batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) → Tuple[numpy.ndarray, numpy.ndarray]¶

Generate counts from latent codes and batch labels.

Parameters

latent_codes (Union[tf.Tensor, np.ndarray]) – Latent codes produced by encoder.
output_batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels for the output. Can also be continous for fuzzy batch association.
batch_size (int) – Size of one batch.

Returns

the generated count data and dropout probabilities.

Return type

Tuple[np.ndarray, np.ndarray]

generate_latent_codes(counts: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) → Tuple[numpy.ndarray, numpy.ndarray]¶

Generate latent codes from count and batch labels.

Parameters

counts (Union[tf.Tensor, np.ndarray]) – Count data.
batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels. Can also be continous for fuzzy batch association.
batch_size (int) – Size of one batch.

Returns

latent codes and sigma values.

Return type

Tuple[np.ndarray, np.ndarray]

get_optimizer() → tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2¶

Create an Optimizer instance.

Returns: The created optimizer.
Return type: tf.keras.optimizers.Optimizer

project_to_metadata(input_data: discern.io.DISCERNData, metadata: List[Tuple[str, str]], save_path: pathlib.Path, store_sigmas: bool = False)¶

Project to average batch with filtering for certain metadata.

Parameters

input_data (io.DISCERNData) – Input cells.
metadata (List[Tuple[str, str]]) – Column-value-Pair used for filerting the cells. Column should match to name in input_data.obs and value to a key in this column.
save_path (pathlib.Path) – Path for saving the created AnnData objects.
store_sigmas (bool, optional) – Save sigmas in obsm. Defaults to False.

reconstruct(input_data: discern.io.DISCERNData, column: Optional[str], column_value: Optional[str], store_sigmas: bool = False) → anndata._core.anndata.AnnData¶

Reconstruct expression data.

Parameters

input_data (io.DISCERNData) – DISCERN preprocessed input data
column (Optional[str]) – Column value used for reconstruction. If None just auto-encode the data.
column_value (Optional[str], optional) – Value in column used for reconstruction. If None, project to the average from column.
store_sigmas (bool, optional) – Store latent space sigma values in final output. Defaults to False.

Returns

Reconstructed expression data with: input data in raw and DISCERN latent space in obsm.

Return type

anndata.AnnData

restore_model(directory: pathlib.Path)¶

Restores model from hdf5 checkpoint and compiles it.

Parameters: directory (pathlib.Path) – checkpoint directory.

training(inputdata: discern.io.DISCERNData, callbacks: Optional[List[tensorflow.python.keras.callbacks.Callback]] = None, savepath: Optional[pathlib.Path] = None, max_steps: int = 25) → Dict[str, float]¶

Train the network max_steps times.

Parameters

inputdata (io.DISCERNData) – Training data.
max_steps (int) – Maximum number of epochs to train. Defaults to 25.
callbacks – (List[tf.keras.callbacks.Callback], optional): List of keras callbacks to use. Defaults to None.
savepath (pathlib.Path, optional) – Filename to save model. Defaults to None.

Returns

Metrics from fit method.

Return type

Dict[str,float]

class discern.WAERecipe(params: Dict[str, Any], inputs: Optional[Dict[str, anndata._core.anndata.AnnData]] = None, input_files: Optional[Union[Dict[pathlib.Path, str], List[pathlib.Path]]] = None, n_jobs: int = - 1)¶

For storing and processing data.

Can apply filtering, clustering. merging and splitting.

Parameters

params (Dict[str,Any]) – Default parameters for preprocessing.
inputs (Dict[str,anndata.AnnData]) – Input AnnData with batchname as dict-key. Defaults to None.
input_files (List[pathlib.Path]) – Paths to raw input data.
None. (Defaults to) –
n_jobs (int) – Number of jobs/processes to use. Defaults to -1.

sc_raw¶

Read and concatenated input data.

Type: io.DISCERNData

config¶

Parameters calculated during preprocessing.

Type: Dict[str, Any]

params¶

Default parameters for preprocessing.

Type: Dict[str,Any]

celltypes()¶: Aggregate celltype information.

property config¶: Configuration from preprocessing.

dump(job_dir: pathlib.Path)¶

Dump recipe results to directory.

Parameters: job_dir (pathlib.Path) – The directory to save the results at.

dump_tf_records(path: pathlib.Path)¶

Dump the TFRecords to disk.

Parameters: path (pathlib.Path) – Folder to save the TFrecords in.

filtering(min_genes: int, min_cells: int)¶

Apply filtering in-place.

Parameters

min_genes (int) – Minimum number of genes to be present for cell to be considered.
min_cells (int) – Minimum number of cells to be present for gene to be considered.

classmethod from_path(job_dir: pathlib.Path) → discern.preprocessing.WAERecipe¶

Create WAERecipe from DISCERN directory.

Returns: The initalized object.
Return type: WAERecipe

kernel_mmd(neighbors_mmd: int = 50, no_cells_mmd: int = 2000)¶

Apply kernel mmd metrics based on nearest neighbors in-place.

Parameters

neighbors_mmd (int) – Number of neighbors Defaults to 50.
no_cells_mmd (int) – Number of cells used for calculation of mmd. Defaults to 2000.
projector (Optional[np.ndarray]) – PCA-Projector to compute distancs in precomputed PCA space. Defaults to None.

mean_var_scaling()¶: Apply Mean-Variance scaling if ‘fixed_scaling’ is present in params.

projection_pca(pcs: int = 25)¶

Apply PCA projection.

Parameters: pcs (int) – Number of principle components. Defaults to 32.

scaling(scale: int)¶

Apply scaling in-place.

Parameters: scale (int) – Value use to scale with LSN.

split(split_seed: int, valid_cells_ratio: Union[int, float], mmd_cells_ratio: Union[int, float] = 1.0)¶

Split cells to train and validation set.

Parameters

split_seed (int) – Seed used with numpy.
valid_cells_ratio (Union[int,float]) – Number or ratio of cells in the validation set.
mmd_cells_ratio (Optional[Union[int, float]]) – Number of validation
optimization. (cells to use for mmd calculation during hyperparameter) – Defaults to 1. which is valid_cells_no.

Reconstruction functions¶

Basic module containing all functions for running and execution of Model.

class discern.estimators.DISCERNRunner(debug: bool = False, gpus: Optional[List[int]] = None)¶: Run DISCERN training or project.

Basic DISCERN architecture.

class discern.estimators.batch_integration.DISCERN(**kwargs)¶

Basic DISCERN model holding a lot of configuration.

Parameters: **kwargs – DISCERNConfig init args.

wae_model¶

Keras model.

Type: Union[None, tf.keras.Model]

start_step¶

Epoch to start training from

Type: int

build_model(n_genes: int, n_labels: int, scale: float)¶: Initialize the auto-encoder model and defining the loss and optimizer.

compile(optimizer: tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2, scale: float = 15000.0)¶

Compile the model and sets losses and metrics.

Parameters

optimizer (tf.keras.optimizers.Optimizer) – Optimizer to use.
scale (float) – Numeric scaling factor for the losses. Defaults to 15000.

property decoder: tensorflow.python.keras.engine.training.Model¶

Return the decoder.

Returns

The decoder model.

Return type

tf.keras.Model

Raises

ValueError – If the decoder is not present.
AttributeError – If the model is not build.

property encoder: tensorflow.python.keras.engine.training.Model¶

Return the encoder.

Returns

The encoder model.

Return type

tf.keras.Model

Raises

ValueError – If the encoder is not present.
AttributeError – If the model is not build.

classmethod from_json(jsondata: Dict[str, Any]) → DISCERNConfigType¶

Create an DISCERNConfig instance form hyperparameter json dictionary.

Parameters: jsondata (Dict[str, Any]) – Hyperparameters for this model.
Returns: An initialized DISCERNConfig instance
Return type: “DISCERNConfig”
Raises: KeyError – If required key not found in hyperparameter json.

generate_cells_from_latent(latent_codes: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], output_batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) → Tuple[numpy.ndarray, numpy.ndarray]¶

Generate counts from latent codes and batch labels.

Parameters

latent_codes (Union[tf.Tensor, np.ndarray]) – Latent codes produced by encoder.
output_batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels for the output. Can also be continous for fuzzy batch association.
batch_size (int) – Size of one batch.

Returns

the generated count data and dropout probabilities.

Return type

Tuple[np.ndarray, np.ndarray]

generate_latent_codes(counts: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_labels: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray], batch_size: int) → Tuple[numpy.ndarray, numpy.ndarray]¶

Generate latent codes from count and batch labels.

Parameters

counts (Union[tf.Tensor, np.ndarray]) – Count data.
batch_labels (Union[tf.Tensor, np.ndarray]) – (One Hot) Encoded batch labels. Can also be continous for fuzzy batch association.
batch_size (int) – Size of one batch.

Returns

latent codes and sigma values.

Return type

Tuple[np.ndarray, np.ndarray]

get_optimizer() → tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2¶

Create an Optimizer instance.

Returns: The created optimizer.
Return type: tf.keras.optimizers.Optimizer

project_to_metadata(input_data: discern.io.DISCERNData, metadata: List[Tuple[str, str]], save_path: pathlib.Path, store_sigmas: bool = False)¶

Project to average batch with filtering for certain metadata.

Parameters

input_data (io.DISCERNData) – Input cells.
metadata (List[Tuple[str, str]]) – Column-value-Pair used for filerting the cells. Column should match to name in input_data.obs and value to a key in this column.
save_path (pathlib.Path) – Path for saving the created AnnData objects.
store_sigmas (bool, optional) – Save sigmas in obsm. Defaults to False.

reconstruct(input_data: discern.io.DISCERNData, column: Optional[str], column_value: Optional[str], store_sigmas: bool = False) → anndata._core.anndata.AnnData¶

Reconstruct expression data.

Parameters

input_data (io.DISCERNData) – DISCERN preprocessed input data
column (Optional[str]) – Column value used for reconstruction. If None just auto-encode the data.
column_value (Optional[str], optional) – Value in column used for reconstruction. If None, project to the average from column.
store_sigmas (bool, optional) – Store latent space sigma values in final output. Defaults to False.

Returns

Reconstructed expression data with: input data in raw and DISCERN latent space in obsm.

Return type

anndata.AnnData

restore_model(directory: pathlib.Path)¶

Restores model from hdf5 checkpoint and compiles it.

Parameters: directory (pathlib.Path) – checkpoint directory.

training(inputdata: discern.io.DISCERNData, callbacks: Optional[List[tensorflow.python.keras.callbacks.Callback]] = None, savepath: Optional[pathlib.Path] = None, max_steps: int = 25) → Dict[str, float]¶

Train the network max_steps times.

Parameters

inputdata (io.DISCERNData) – Training data.
max_steps (int) – Maximum number of epochs to train. Defaults to 25.
callbacks – (List[tf.keras.callbacks.Callback], optional): List of keras callbacks to use. Defaults to None.
savepath (pathlib.Path, optional) – Filename to save model. Defaults to None.

Returns

Metrics from fit method.

Return type

Dict[str,float]

Module for custom callbacks, especially visualization(UMAP).

class discern.estimators.callbacks.DelayedEarlyStopping(delay: int = 0, monitor: str = 'val_loss', min_delta: float = 0.0, patience: int = 0, verbose: int = 0, mode: str = 'auto', baseline: Optional[float] = None, restore_best_weights: bool = False)¶

Stop when a monitored quantity has stopped improving after some delay time. :param delay: Number of epochs to wait until applying early stopping. :type delay: int :param Defaults to 0: :param which means standard early stopping.: :param monitor: Quantity to be monitored. :type monitor: str :param min_delta: Minimum change in the monitored quantity :type min_delta: float :param to qualify as an improvement: :param i.e. an absolute: :param change of less than min_delta: :param will count as no: :param improvement. Defaults to val_loss.: :param patience: Number of epochs with no improvement :type patience: int :param after which training will be stopped. Defaults to 0.: :param verbose: verbosity mode. Defaults to 0. :type verbose: int :param mode: One of {“auto”, “min”, “max”}. In min mode, :type mode: str :param training will stop when the quantity: :param monitored has stopped decreasing; in max: :param mode it will stop when the quantity: :param monitored has stopped increasing; in auto: :param mode: :param the direction is automatically inferred: :param from the name of the monitored quantity. Defaults to auto.: :param baseline: Baseline value for the monitored quantity. :type baseline: float, optional :param Training will stop if the model doesn’t show improvement over the: :param baseline. Defaults to None.: :param restore_best_weights: Whether to restore model weights from :type restore_best_weights: bool :param the epoch with the best value of the monitored quantity.: :param If False: :param the model weights obtained at the last step of: :param training are used. Defaults to False.:

on_batch_begin(batch, logs=None)¶: A backwards compatibility alias for on_train_batch_begin.

on_batch_end(batch, logs=None)¶: A backwards compatibility alias for on_train_batch_end.

on_epoch_begin(epoch, logs=None)¶

Called at the start of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Parameters

epoch – integer, index of epoch.
logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_epoch_end(epoch: int, logs: Optional[Dict[str, Any]] = None)¶: Call on epoch end to check for early stopping.

on_predict_batch_begin(batch, logs=None)¶

Called at the beginning of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_predict_batch_end(batch, logs=None)¶

Called at the end of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Metric results for this batch.

on_predict_begin(logs=None)¶

Called at the beginning of prediction.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_predict_end(logs=None)¶

Called at the end of prediction.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_batch_begin(batch, logs=None)¶

Called at the beginning of a batch in evaluate methods.

Also called at the beginning of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_test_batch_end(batch, logs=None)¶

Called at the end of a batch in evaluate methods.

Also called at the end of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Metric results for this batch.

on_test_begin(logs=None)¶

Called at the beginning of evaluation or validation.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_end(logs=None)¶

Called at the end of evaluation or validation.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_train_batch_begin(batch, logs=None)¶

Called at the beginning of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_train_batch_end(batch, logs=None)¶

Called at the end of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Metric results for this batch.

on_train_begin(logs=None)¶

Called at the beginning of training.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_train_end(logs=None)¶

Called at the end of training.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

class discern.estimators.callbacks.VisualisationCallback(outdir: Union[str, pathlib.Path], data: anndata._core.anndata.AnnData, batch_size: int, freq: int = 10)¶

Redo prediction on datasets and visualize via UMAP.

Parameters

outdir (pathlib.Path) – Output directory for the figures.
data (anndata.AnnData) – Input cells.
batch_size (int) – Numer of cells to visualize.
freq (int) – Frequency for computing visualisations in epochs. Defaults 10.

on_batch_begin(batch, logs=None)¶: A backwards compatibility alias for on_train_batch_begin.

on_batch_end(batch, logs=None)¶: A backwards compatibility alias for on_train_batch_end.

on_epoch_begin(epoch, logs=None)¶

Called at the start of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Parameters

epoch – integer, index of epoch.
logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_epoch_end(epoch: int, logs: Optional[Dict[str, float]] = None)¶

Run on epoch end. Executes only at specified frequency.

Parameters

epoch (int) – Epochnumber.
logs (Optional[Dict[str, float]]) – losses and metrics passed by tensorflow fit . Defaults to None.

on_predict_batch_begin(batch, logs=None)¶

Called at the beginning of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_predict_batch_end(batch, logs=None)¶

Called at the end of a batch in predict methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Metric results for this batch.

on_predict_begin(logs=None)¶

Called at the beginning of prediction.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_predict_end(logs=None)¶

Called at the end of prediction.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_batch_begin(batch, logs=None)¶

Called at the beginning of a batch in evaluate methods.

Also called at the beginning of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_test_batch_end(batch, logs=None)¶

Called at the end of a batch in evaluate methods.

Also called at the end of a validation batch in the fit methods, if validation data is provided.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Metric results for this batch.

on_test_begin(logs=None)¶

Called at the beginning of evaluation or validation.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_test_end(logs=None)¶

Called at the end of evaluation or validation.

Subclasses should override for any actions to run.

Parameters: logs – dict. Currently no data is passed to this argument for this method but that may change in the future.

on_train_batch_begin(batch, logs=None)¶

Called at the beginning of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Has keys batch and size representing the current batch number and the size of the batch.

on_train_batch_end(batch, logs=None)¶

Called at the end of a training batch in fit methods.

Subclasses should override for any actions to run.

Parameters

batch – integer, index of batch within the current epoch.
logs – dict. Metric results for this batch.

on_train_begin(logs: Optional[Dict[str, float]] = None)¶

Run on training start.

Parameters: logs (Optional[Dict[str, float]]) – logs, not used only for compatibility reasons.

on_train_end(logs: Optional[Dict[str, float]] = None)¶

Run on training end.

Parameters: logs (Optional[Dict[str, float]]) – losses and metrics passed by tensorflow fit . Defaults to None.

discern.estimators.callbacks.create_callbacks(early_stopping_limits: Dict[str, Any], exp_folder: pathlib.Path, inputdata: Optional[discern.io.DISCERNData] = None, umap_cells_no: Optional[int] = None, profile_batch: int = 2, freq_of_viz: int = 30) → List[tensorflow.python.keras.callbacks.Callback]¶

Generate list of callbacks used by tensorflow model.fit.

Parameters

early_stopping_limits (Dict[str,Any) – Patience, min_delta, and delay for early stopping.
exp_folder (str) – Folder where everything is saved.
inputdata (io.DISCERNData, optional) – Input data to use. Defaults to None
umap_cells_no (int) – Number of cells for UMAP.
profile_batch (int) – Number of the batch to do extensive profiling. Defaults to 2. (see tf.keras.callbacks.Tensorboard)
freq_of_viz (int) – Frequency of visualization callback in epochs. Defaults to 30.

Returns

callbacks used by tensorflow model.fit.

Return type

List[callbacks.Callback]

Custom Keras Layers.

class discern.estimators.customlayers.GaussianReparametrization(trainable=True, name=None, dtype=None, dynamic=False, **kwargs)¶

Reparametrization layer using gaussians.

build(input_shape: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor])¶

Build the layer, usually automatically called at first call.

Parameters: input_shape (Tuple[tf.Tensor, tf.Tensor]) – Shape of the inputs. Both should have as last dimension the size of the latent space.

static call(inputs: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], **kwargs: Dict[str, Any]) → tensorflow.python.framework.ops.Tensor¶

Call the layer.

Parameters

inputs (Tuple[tf.Tensor, tf.Tensor]) – latent codes and sigmas from encoder
**kwargs (Dict[str,Any]) – Additional attributes, should contain ‘training’”.

Returns

Rescaled latent codes.

Return type

tf.Tensor

class discern.estimators.customlayers.MMDPP(scale: float, **kwargs)¶

mmdpp penalty calculation in keras layer.

Parameters: scale (float) – Value used to scale the output.

scale¶

Value used to scale the output.

Type: float

build(input_shape: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor])¶

Build the layer, usually automatically called at first call.

Parameters: input_shape (Tuple[tf.Tensor, tf.Tensor]) – Shape of the inputs. Both shapes should have the size of the latent space as last dimension.

call(inputs: Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], **kwargs: Dict[str, Any]) → tensorflow.python.framework.ops.Tensor¶

Call the layer.

Parameters: inputs (Tuple[tf.Tensor, tf.Tensor]) – The latent codes and sigma values from encoder.
Returns: mmdpp penalty loss.
Return type: tf.Tensor

get_config() → Dict[str, Any]¶

Return configuration of the layer. Used for serialization.

Returns: Configuration of the layer
Return type: Dict[str,Any]

class discern.estimators.customlayers.SigmaRegularization(trainable=True, name=None, dtype=None, dynamic=False, **kwargs)¶

Regularization term to push sigmas near to one.

build(input_shape: tensorflow.python.framework.ops.Tensor)¶

Build the layer, usually automatically called at first call.

Parameters: input_shape (tf.Tensor) – Shape of the input.

call(inputs: tensorflow.python.framework.ops.Tensor, **kwargs: Dict[str, Any]) → tensorflow.python.framework.ops.Tensor¶

Call the layer.

Parameters: inputs (tf.Tensor) – Inputs to layer consisting of sigma values.
Returns: Regularization loss
Return type: tf.Tensor

discern.estimators.customlayers.condlayernorm(input_cells: tensorflow.python.framework.ops.Tensor, labels: tensorflow.python.framework.ops.Tensor, size: int, regularization: Optional[Dict[str, Any]] = None) → tensorflow.python.framework.ops.Tensor¶

Create a conditioning layer.

Parameters

input_cells (tf.Tensor) – Input to the laxer
labels (tf.Tensor) – Label for each sample.
size (int) – Size of the output/input

Returns

The output of the conditioning layer, with the same size: as the input and spezified in size.

Return type

tf.Tensor

discern.estimators.customlayers.getmembers() → Dict[str, tensorflow.python.keras.engine.base_layer.Layer]¶

Return a dictionary of all custom layers defined in this module.

Returns: Name and class of custom layers.
Return type: Dict[str, tf.keras.layers.Layer]

discern.estimators.customlayers.mmdpp_penalty(sample_qz: tensorflow.python.framework.ops.Tensor, sample_pz: tensorflow.python.framework.ops.Tensor, encoder_sigma: tensorflow.python.framework.ops.Tensor, total_number_cells: float, latent_dim: int) → tensorflow.python.framework.ops.Tensor¶

Calculate the mmdpp penalty.

Based on https://github.com/tolstikhin/wae/blob/master/improved_wae.py

Parameters

sample_qz (tf.Tensor) – Sample from the aggregated posterior.
sample_pz (tf.Tensor) – Sample from the prior.
encoder_sigma (tf.Tensor) – Sigma values from the random encoder.
total_number_cells (int) – Total number of samples for scaling.
latent_dim (int) – Dimension of the latent space.

Returns

mmdpp penalty loss.

Return type

tf.Tensor

Module containing all losses.

class discern.estimators.losses.DummyLoss(reduction: int = 'auto', name: str = 'Dummy')¶

Dummy loss simpy passing the input y_pred as loss output.

Parameters

reduction (int) – Reduction type to use. Defaults to tf.keras.losses.Reduction.AUTO.
name (str) – Name of the loss. Defaults to ‘Dummy’.

static call(y_true, y_pred)¶: Call the loss and returns the predicted value.

classmethod from_config(config)¶

Instantiates a Loss from its config (output of get_config()).

Parameters: config – Output of get_config().
Returns: A Loss instance.

class discern.estimators.losses.HuberLoss(delta=1.0, reduction='auto', name='huber_loss')¶

Huber loss.

call(y_true, y_pred)¶: Calculate Huber loss.

classmethod from_config(config)¶

Instantiates a Loss from its config (output of get_config()).

Parameters: config – Output of get_config().
Returns: A Loss instance.

class discern.estimators.losses.Lnorm(p: int, name: str = 'LNorm', reduction: str = 'auto', axis: int = 0, epsilon: float = 1e-20, use_root: bool = False)¶

Calculate the Lnorm of input and output.

Parameters

p (int) – Which Lnorm to calculate, for example p=1 means L1-Norm.
name (str) – Description of parameter name. Defaults to ‘LNorm’.
reduction (int) – Reduction type to use. Defaults to tf.keras.losses.Reduction.AUTO.
axis (int) – Axis on which the norm is calculated. Defaults to 0.
epsilon (float) – Small value to add if (square)root is used.. Defaults to 1e-20.
use_root (bool) – Use (square)root. Defaults to False.

pnorm¶

Which Lnorm to calculate, for example p=1 means L1-Norm.

Type: int

epsilon¶

Small value to add if (square)root is used.

Type: float

axis¶

Axis on which the norm is calculated.

Type: int

use_root¶

Use (square)root.

Type: bool

call(y_true, y_pred)¶: Call and returns the loss.

classmethod from_config(config)¶

Instantiates a Loss from its config (output of get_config()).

Parameters: config – Output of get_config().
Returns: A Loss instance.

get_config()¶: Serialize the loss.

class discern.estimators.losses.MaskedCrossEntropy(zeros: numpy.ndarray, zeros_eps: float = 1e-06, lower_label_smoothing: float = 0.0, **kwargs)¶

Categorical crossentropy Loss with creates mask in true data.

Parameters

zeros (np.ndarray) – Value(s) which represent values to be zeros.
zeros_eps (float) – Value to check for approximate matching to zeros.

call(y_true: tensorflow.python.framework.ops.Tensor, y_pred: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor¶: Call of the loss.

classmethod from_config(config)¶

Instantiates a Loss from its config (output of get_config()).

Parameters: config – Output of get_config().
Returns: A Loss instance.

get_config()¶: Return the configuration of the loss.

discern.estimators.losses.getmembers() → Dict[str, Union[tensorflow.python.keras.losses.Loss, tensorflow.python.keras.metrics.Metric]]¶

Return a dictionary of all custom losses and metrics defined in this module.

Returns: Name and class of custom losses and metrics.
Return type: Dict[str, Union[tf.keras.losses.Loss,tf.keras.metrics.Metric]]

discern.estimators.losses.reconstruction_loss(loss_type: Dict[str, Any]) → tensorflow.python.framework.ops.Tensor¶

Generate different loss classes based on dictionary.

Parameters: loss_type (Dict[str, Any]) – Dictionary with name as classname of the loss and all parameter to be set.
Returns: Calculated loss (object)
Return type: tf.Tensor
Raises: KeyError – When the loss name is not supported.

Basic module for running an experiment.

class discern.estimators.run_exp.CheckMetaData(dataframe: pandas.core.frame.DataFrame)¶

Check MetaData column value pair in dataframe lazy.

check(metadata_tuple: List[str]) → Tuple[str, str]¶

Check if column value pair is present.

Parameters: metadata_tuple (List[str]) – Column value pair
Returns: Input column value pair.
Return type: Tuple[str, str]

class discern.estimators.run_exp.DISCERNRunner(debug: bool = False, gpus: Optional[List[int]] = None)¶: Run DISCERN training or project.

discern.estimators.run_exp.run_exp_multiprocess(exp_folder: pathlib.Path, available_gpus: List[int], func: Callable[..., None], kwargs: Optional[Dict[str, Any]] = None) → int¶

Run an experiment with forced GPU setting (suitable for python mp).

Parameters

exp_folder (pathlib.Path) – Path to the experiement.
available_gpus (List[int]) – List of available GPUs.
func (Callable[..., None]) – Train or eval function.
kwargs (Optional[Dict[str, Any]]) – Additional arguments passed to the called functions. Defaults to None.

Returns

Status code, 0 is success, 1 is failure.

Return type

int

discern.estimators.run_exp.run_projection(exp_folder: pathlib.Path, metadata: List[str], infile: Optional[Union[str, pathlib.Path]], all_batches: bool, store_sigmas: bool)¶

Run projection to metadata on trained model.

Parameters

exp_folder (pathlib.Path) – Folder/ Experiment name to the trained model.
metadata (List[str]) – Metadata to use for integration. Should be like List[column name:value,…]
infile (Optional[Union[str, pathlib.Path]]) – Alternative input file.
all_batches – (bool): Project to all batches.
store_sigmas – (bool): Store sigmas after projection.

discern.estimators.run_exp.run_train(exp_folder: pathlib.Path, input_path: Optional[pathlib.Path] = None)¶

Run an experiment.

Parameters

exp_folder (pathlib.Path) – Experiments folders.
input_path (Optional[pathlib.Path]) – Input path for the TFRecords, if None the experiments folder is used. Defaults to None.

discern.estimators.run_exp.setup_exp(exp_folder: pathlib.Path) → Tuple[discern.estimators.batch_integration.DISCERN, Dict[str, Any]]¶

Setup experiment, by assigning the GPU and parsing the model.

Parameters: exp_folder (pathlib.Path) – Experiment folder.
Returns: The model, the output path for training and all parameters.
Return type: Tuple[batch_integration.DISCERN, pathlib.Path, Dict[str, Any]]

A number of classes and functions used across all types of models.

discern.estimators.utilities_wae.create_decoder(latent_dim: int, output_cells_dim: int, dec_layers: List[int], dec_norm_type: List[str], activation_fn: Callable[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], output_fn: Optional[str], n_labels: int, regularization: float, output_lsn: Optional[float] = None, conditional_regularization: Optional[Dict[str, Any]] = None) → tensorflow.python.keras.engine.training.Model¶

Create a decoder.

Parameters

latent_dim (int) – Dimension of the latent space.
output_cells_dim (int) – Dimension of the output.
dec_layers (List[int]) – Dimensions for the decoder layers.
dec_norm_type (List[str]) – Normalization type, eg. BatchNormalization.
activation_fn (Callable[[tf.Tensor], tf.Tensor]) – Activation function in the model.
output_fn (str) – Function to produce gene counts.
n_labels (int) – Number of labels for the batch labels.
regularization (float) – Dropout rate.
output_lsn (Optional[float]) – Scaling parameter, used for softmax and LSN.

Returns

The decoder.

Return type

tf.keras.Model

discern.estimators.utilities_wae.create_encoder(latent_dim: int, enc_layers: List[int], enc_norm_type: List[str], activation_fn: Callable[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor], input_dim: int, n_labels: int, regularization: float, conditional_regularization: Optional[Dict[str, Any]] = None) → tensorflow.python.keras.engine.training.Model¶

Create an Encoder.

Parameters

latent_dim (int) – Dimension of the latent space.
enc_layers (List[int]) – Dimension of the encoding layers.
enc_norm_type (List[str]) – Normalization type, eg. BatchNormalization.
activation_fn (Callable[[tf.Tensor], tf.Tensor]) – Activation function in the model.
input_dim (int) – Dimension of the input.
n_labels (int) – Number of labels for the batch labels.
regularization (float) – Rate of dropout.

Returns

The encoder.

Return type

tf.keras.Model

Raises

NotImplementedError – If enc_norm_type is not understood.

discern.estimators.utilities_wae.create_model(encoder: tensorflow.python.keras.engine.training.Model, decoder: tensorflow.python.keras.engine.training.Model, total_number_cells: float, name: str = 'WAE') → tensorflow.python.keras.engine.training.Model¶

Generate a model from encoder and decoder, adding gaussian noise (reparametrization).

Parameters

encoder (tf.keras.Model) – The encoder.
decoder (tf.keras.Model) – The decoder.
total_number_cells (int) – Total number of cells used for scaling MMDPP.
name (str) – Name of the model. Defaults to “WAE”.

Returns

The created model including SigmaRegularization and MMDPP loss.

Return type

tf.keras.Model

discern.estimators.utilities_wae.load_model_from_directory(directory: pathlib.Path) → Tuple[Union[None, tensorflow.python.keras.engine.training.Model], int]¶

Load model from latest checkpoint using its hdf5 file.

Parameters

directory (pathlib.Path) – Name of the directory with hdf5 files.

Returns

Full model and last step.: None and zero if no models could be loaded.

Return type

Tuple[Union[None, tf.keras.Model], int]

I/O functions¶

discern i/o operations.

class discern.io.DISCERNData(adata: anndata._core.anndata.AnnData, batch_size: int, cachefile: Optional[Union[str, pathlib.Path]] = '')¶

DISCERNData for storing and reading inputs.

property batch_size: int¶: Get batch size.

property config: Dict[str, Any]¶: Get DISCERN data dependent configuration.

classmethod from_folder(folder: pathlib.Path, batch_size: int, cachefile: Optional[Union[str, pathlib.Path]] = '') → discern.io.DISCERNData¶

Read data from DISCERN folder.

Returns: The data including AnnData and TFRecords.
Return type: DISCERNData

classmethod read_h5ad(filename: pathlib.Path, batch_size: int, cachefile: Optional[Union[str, pathlib.Path]] = '') → discern.io.DISCERNData¶

Create DISCERNData from anndata H5AD file.

Returns: The single cell data.
Return type: DISCERNData

property tfdata: Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, tensorflow.python.data.ops.dataset_ops.DatasetV2]¶

The accociated tf.data.Datasets.

Returns: Training and validation data.
Return type: Tuple[tf.data.Dataset, tf.data.Dataset]

property zeros¶: Get Zero representation in current data.

class discern.io.TFRecordsWriter(out_dir: pathlib.Path)¶

Context manager to be used for writing tf.data.Dataset to TFRecord file.

Parameters: out_dir (pathlib.Path) – Path to the directory where to write the TFRecords.

out_dir¶

Path to the directory where to write the TFRecords.

Type: str

write_dataset(dataset: tensorflow.python.data.ops.dataset_ops.DatasetV2, split: str)¶

Write tf.data.Dataset to TFRecord specified by split.

Parameters

dataset (tf.data.Dataset) – Dataset to be written.
split (str) – Subfile to use: train or valid.

Raises

ValueError – If split is not supported.

discern.io.estimate_csr_nbytes(mat: numpy.ndarray) → int¶

Estimates the size of a sparse matrix generated from numpy array.

Parameters: mat (np.ndarray) – Input array.
Returns: Estimated size of the sparse matrix.
Return type: int

discern.io.generate_h5ad(counts: Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]], var: pandas.core.frame.DataFrame, obs: pandas.core.frame.DataFrame, save_path: Optional[pathlib.Path] = None, threshold: float = 0.1, **kwargs) → anndata._core.anndata.AnnData¶

Generate AnnData format and can save it to file.

Parameters

counts (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – Count data (X in AnnData).
var (pd.DataFrame) – Variables dataframe.
obs (pd.DataFrame) – Observations dataframe.
save_path (Optional[pathlib.Path]) – Save path for the AnnData in h5py file. Defaults to None.
threshold (float) – Set values lower than threshold to zero. Defaults to 0.1.
kwargs – Keyword arguments passed to anndata.AnnData.

Returns

The AnnData file.

Return type

anndata.AnnData

discern.io.make_dataset_from_anndata(adata: anndata._core.anndata.AnnData, for_tfrecord: bool = False) → Tuple[tensorflow.python.data.ops.dataset_ops.DatasetV2, tensorflow.python.data.ops.dataset_ops.DatasetV2]¶

Generate TensorFlow Dataset from AnnData object.

Parameters

adata (anndata.AnnData) – Input cells
for_tfrecord (bool) – make output for writing TFrecords. Defaults to False.

Returns

The training and validation datasets

Return type

Tuple[tf.data.Dataset, tf.data.Dataset]

discern.io.np_one_hot(labels: pandas.core.arrays.categorical.Categorical) → numpy.ndarray¶

One hot encode a numpy array.

Parameters: labels (pd.Categorical) – integer values used as indices
Returns: One hot encoded labels
Return type: np.ndarray

discern.io.parse_tfrecords(tfr_files: Union[pathlib.Path, List[pathlib.Path]], genes_no: int, n_labels: int) → tensorflow.python.data.ops.dataset_ops.DatasetV2¶

Generate TensorFlow dataset from TensorFlow records file(s).

Parameters

tfr_files (Union[pathlib.Path, List[pathlib.Path]]) – TFRecord file(s).
genes_no (int) – Number of genes in the TFRecords.
n_labels (int) – Number of batch labels
batch_size (int) – Size of one batch

Returns

Dataset containing ‘input_data’,: ’batch_input_enc’ and ‘batch_input_dec’

Return type

tf.data.Dataset

Preprocessing functions¶

Contains the GeneMatrix class, used to represent the scRNA-seq data.

class discern.preprocessing.WAERecipe(params: Dict[str, Any], inputs: Optional[Dict[str, anndata._core.anndata.AnnData]] = None, input_files: Optional[Union[Dict[pathlib.Path, str], List[pathlib.Path]]] = None, n_jobs: int = - 1)¶

For storing and processing data.

Can apply filtering, clustering. merging and splitting.

Parameters

params (Dict[str,Any]) – Default parameters for preprocessing.
inputs (Dict[str,anndata.AnnData]) – Input AnnData with batchname as dict-key. Defaults to None.
input_files (List[pathlib.Path]) – Paths to raw input data.
None. (Defaults to) –
n_jobs (int) – Number of jobs/processes to use. Defaults to -1.

sc_raw¶

Read and concatenated input data.

Type: io.DISCERNData

config¶

Parameters calculated during preprocessing.

Type: Dict[str, Any]

params¶

Default parameters for preprocessing.

Type: Dict[str,Any]

celltypes()¶: Aggregate celltype information.

property config¶: Configuration from preprocessing.

dump(job_dir: pathlib.Path)¶

Dump recipe results to directory.

Parameters: job_dir (pathlib.Path) – The directory to save the results at.

dump_tf_records(path: pathlib.Path)¶

Dump the TFRecords to disk.

Parameters: path (pathlib.Path) – Folder to save the TFrecords in.

filtering(min_genes: int, min_cells: int)¶

Apply filtering in-place.

Parameters

min_genes (int) – Minimum number of genes to be present for cell to be considered.
min_cells (int) – Minimum number of cells to be present for gene to be considered.

classmethod from_path(job_dir: pathlib.Path) → discern.preprocessing.WAERecipe¶

Create WAERecipe from DISCERN directory.

Returns: The initalized object.
Return type: WAERecipe

kernel_mmd(neighbors_mmd: int = 50, no_cells_mmd: int = 2000)¶

Apply kernel mmd metrics based on nearest neighbors in-place.

Parameters

neighbors_mmd (int) – Number of neighbors Defaults to 50.
no_cells_mmd (int) – Number of cells used for calculation of mmd. Defaults to 2000.
projector (Optional[np.ndarray]) – PCA-Projector to compute distancs in precomputed PCA space. Defaults to None.

mean_var_scaling()¶: Apply Mean-Variance scaling if ‘fixed_scaling’ is present in params.

projection_pca(pcs: int = 25)¶

Apply PCA projection.

Parameters: pcs (int) – Number of principle components. Defaults to 32.

scaling(scale: int)¶

Apply scaling in-place.

Parameters: scale (int) – Value use to scale with LSN.

split(split_seed: int, valid_cells_ratio: Union[int, float], mmd_cells_ratio: Union[int, float] = 1.0)¶

Split cells to train and validation set.

Parameters

split_seed (int) – Seed used with numpy.
valid_cells_ratio (Union[int,float]) – Number or ratio of cells in the validation set.
mmd_cells_ratio (Optional[Union[int, float]]) – Number of validation
optimization. (cells to use for mmd calculation during hyperparameter) – Defaults to 1. which is valid_cells_no.

discern.preprocessing.merge_data_sets(raw_inputs: Dict[str, anndata._core.anndata.AnnData], batch_keys: Dict[str, str]) → Tuple[anndata._core.anndata.AnnData, Dict[int, str]]¶

Merge a dictionary of AnnData files to a single AnnData object.

Parameters: raw_inputs (Dict[str, anndata.AnnData]) – Names and AnnData objects.
Returns: Merged AnnData and mapping from codes to names.
Return type: Tuple[anndata.AnnData, Dict[int, str]]

discern.preprocessing.read_process_serialize(job_path: pathlib.Path, with_tfrecords: bool = True)¶

Read data, preprocesses it and write output as anndata.AnnData and TFRecords.

Parameters

job_path (pathlib.Path) – Path of the experiments folder.
with_tfrecords (bool) – write tfrecord files. Defaults to True.

discern.preprocessing.read_raw_input(file_path: pathlib.Path) → anndata._core.anndata.AnnData¶

Read input and converts it to anndata.AnnData object.

Currently h5, h5ad, loom, txt and a directory with matrix.mtx. genes.tsv and optional barcodes.tsv is supported.

Parameters: file_path (pathlib.Path) – (File-) Path to the input data.
Returns: The read AnnData object.
Return type: anndata.AnnData
Raises: ValueError – Datatype of input could not be interfered.

Online/Incremental learning¶

Module for supporting online learning.

class discern.online_learning.OnlineDISCERNRunner(debug: bool, gpus: List[int])¶: DISCERNRunner supporting online learning.

class discern.online_learning.OnlineWAERecipe(reference_adata: discern.io.DISCERNData, *args, **kwargs)¶

WAERecipe for the online setting.

This class allows the same preprocessing as done for reference data, which allows further processing in using DISCERN online learning.

celltypes()¶: Aggregate celltype information.

property config¶: Configuration from preprocessing.

dump(job_dir: pathlib.Path)¶

Dump recipe results to directory.

Parameters: job_dir (pathlib.Path) – The directory to save the results at.

dump_tf_records(path: pathlib.Path)¶

Dump the TFRecords to disk.

Parameters: path (pathlib.Path) – Folder to save the TFrecords in.

filtering(min_genes: int, *unused_args, **unused_kwargs)¶

Apply filtering in-place.

Parameters: min_genes (int) – Minimum number of genes to be present for cell to be considered.

fix_batch_labels()¶: Fix batch labels codes by including old categories.

classmethod from_path(job_dir: pathlib.Path) → discern.preprocessing.WAERecipe¶

Create WAERecipe from DISCERN directory.

Returns: The initalized object.
Return type: WAERecipe

kernel_mmd(neighbors_mmd: int = 50, no_cells_mmd: int = 2000)¶

Apply kernel mmd metrics based on nearest neighbors in-place.

Parameters

neighbors_mmd (int) – Number of neighbors Defaults to 50.
no_cells_mmd (int) – Number of cells used for calculation of mmd. Defaults to 2000.
projector (Optional[np.ndarray]) – PCA-Projector to compute distancs in precomputed PCA space. Defaults to None.

mean_var_scaling()¶: Apply Mean-Variance scaling if ‘fixed_scaling’ is present.

projection_pca(pcs: int = 25)¶

Apply PCA projection from reference.

Parameters: pcs (int) – Number of principle components. Defaults to 32.

scaling(scale: int)¶

Apply scaling in-place.

Parameters: scale (int) – Value use to scale with LSN.

split(split_seed: int, valid_cells_ratio: Union[int, float], mmd_cells_ratio: Union[int, float] = 1.0)¶

Split cells to train and validation set.

Parameters

split_seed (int) – Seed used with numpy.
valid_cells_ratio (Union[int,float]) – Number or ratio of cells in the validation set.
mmd_cells_ratio (Optional[Union[int, float]]) – Number of validation
optimization. (cells to use for mmd calculation during hyperparameter) – Defaults to 1. which is valid_cells_no.

discern.online_learning.online_training(exp_folder: pathlib.Path, filename: pathlib.Path, freeze: bool)¶

Continue running an experiment.

Parameters

exp_folder (pathlib.Path) – Experiments folders.
filename (pathlib.Path) – Input path for new data set.
freeze (bool) – Freeze non conditional layers.

discern.online_learning.save_data(file: pathlib.Path, data: discern.io.DISCERNData, old_data: discern.io.DISCERNData)¶: Save data by concatenating to reference.

discern.online_learning.update_model(old_model: tensorflow.python.keras.engine.training.Model, new_model: tensorflow.python.keras.engine.training.Model, freeze_unchanged: bool = False) → tensorflow.python.keras.engine.training.Model¶

Update the weights from an old model to a new model.

New model can have a bigger weight size for the layers in the first dimension.

Parameters

old_model (tf.keras.Model) – Old, possibly trained model.
new_model (tf.keras.Model) – New model, for which the weights should be set (inplace).
freeze_unchanged (bool, optional) – Freeze layers in the new model, which weights didn’t changed in size compared to the old model. Defaults to False.

Returns

The updated new model.

Return type

tf.keras.Model

Other functions¶

Module containing diverse TensorFlow related functions.

discern.functions.get_function_by_name(func_str: str) → Callable[..., Any]¶

Get a function by its name.

Parameters: func_str (str) – Name of function including module, like ‘tensorflow.nn.softplus’.
Returns: the function.
Return type: Callable[Any]
Raises: KeyError – Function does not exists.

discern.functions.getmembers(name: str) → Dict[str, Any]¶

Return a dictionary of all classes defined in this module.

Parameters: name (str) – Name of the module. Usually __name__.
Returns: Name and class of module.
Return type: Dict[str,Any]

discern.functions.parse_mean_var(features: pandas.core.frame.DataFrame, scalings: Dict[str, Union[str, float]]) → Tuple[numpy.ndarray, numpy.ndarray]¶

Get mean and variance from anndata.var and scaling dict.

Parameters

features (pd.DataFrame) – anndata.var.
scalings – (Dict[str, Union[str, float]]): Scalings dict.

Returns

Mean and variance

Return type

Tuple[np.ndarray, np.ndarray]

discern.functions.prepare_train_valid(input_tfr: pathlib.Path) → Tuple[pathlib.Path, pathlib.Path]¶

Get all filennames for train and validation files.

Parameters

input_tfr (pathlib.Path) – Name of input directory.

Returns

Add train and validation: files to seperate lists.

Return type

Tuple[List[pathlib.Path], List[pathlib.Path]]

discern.functions.rescale_by_params(adata: anndata._core.anndata.AnnData, scalings: Dict[str, Union[str, float, int]]) → anndata._core.anndata.AnnData¶

Rescale counts by fixed mean and variance (inplace).

Reverting function scale_by_params.

Parameters

adata (anndata.AnnData) – Data to be rescaled.
scalings (Dict[str, Union[str, float, int]]) – Mean and scale values used for rescaling. Can be numeric or genes. genes means using precomputed values in anndata object like adata.var[‘mean_scaling’] and adata.var[‘var_scaling’], respectively.

Raises

ValueError – mean and variance not numeric or genes.

Returns

The rescaled AnnData object

Return type

anndata.AnnData

discern.functions.sample_counts(counts: numpy.ndarray, probabilities: numpy.ndarray, var: Optional[pandas.core.frame.DataFrame] = None, uns: Optional[Dict[str, Any]] = None) → numpy.ndarray¶

Sample counts using probabilities.

Parameters

counts (np.ndarray) – Count data.
probabilities (np.ndarray) – Probability of being non-zero.

Returns

Sampled count data.

Return type

np.ndarray

discern.functions.scale(adata: anndata._core.anndata.AnnData, mean: Optional[Union[numpy.ndarray, float]] = None, var: Optional[Union[numpy.ndarray, float]] = None) → anndata._core.anndata.AnnData¶

Scale counts by fixed mean and variance.

Parameters

adata (anndata.AnnData) – Data to be scaled.
mean (Optional[np.ndarray]) – Mean for scaling (will be zero-centered). Defaults to None
var (Optional[np.ndarray]) – Variance for scaling (will be rescaled to 1). Defaults to None

Returns

The AnnData file.

Return type

anndata.AnnData

discern.functions.scale_by_params(adata: anndata._core.anndata.AnnData, scalings: Dict[str, Union[str, float]]) → anndata._core.anndata.AnnData¶

Scale counts by fixed mean and variance (inplace).

Parameters

adata (anndata.AnnData) – Data to be scaled.
scalings (Dict[str, Union[str, float]]) – Mean and scale values used for scaling. Can be numeric or genes. genes means using precomputed values in anndata object like adata.var[‘mean_scaling’] and adata.var[‘var_scaling’], respectively.

Raises

ValueError – mean and variance not numeric or genes.

Returns

The scaled AnnData object

Return type

anndata.AnnData

discern.functions.set_gpu_and_threads(n_threads: int, gpus: Optional[List[int]])¶

Limits CPU and GPU usage.

Parameters

n_threads (int) – Number of threads to use (get splittet to inter- and intra-op threads). Can be disabled by feeding 0.
gpus (List[int]) – List of GPUs to use. Use all GPUs by passing None and no GPUs by passing an empty list.

Module for fast MMD calculations.

discern.mmd.mmd_loss(random_cells: numpy.ndarray, valid_cells: numpy.ndarray, sigma: float) → float¶

Compute mmd loss between random cells and valid cells.

Parameters

random_cells (np.ndarray) – Random generated cells.
valid_cells (np.ndarray) – Valid (decoded) cells.
sigma (float) – Precalculated Sigma value.

Returns

MMD loss between random and valid cells.

Return type

float