Module kitchen.ingredients

Resources and utility functions

Functions

def RPKM_to_TPM(RPKM_mat)

Convert RPKM_mat (reads per KB per million) to TPM (transcripts per million)

Parameters

RPKM_mat : np.array
Matrix of RPKM values in samples x genes format (genes as columns)

Returns

TPM_mat : np.array
Matrix in same shape as RPKM_mat containing TPM values
def check_dir_exists(path)

Checks if directory already exists or not and creates it if it doesn't

Parameters

path : str
path to directory

Returns

tries to make directory at path, unless it already exists

def counts_to_RPKM(counts_mat, gene_lengths, mapped_reads=None)

Convert counts_mat to RPKM (reads per kilobase per million)

Parameters

counts_mat : np.array
Matrix of counts in samples x genes format (genes as columns)
gene_lengths : np.array
1D array of length counts_mat.shape[1] containing lengths for each gene in base pairs
mapped_reads : np.array, optional (default=None)
1D array of length counts_mat.shape[0] containing total mapped reads for each sample. If None, calculate sums manually from columns of counts_mat.

Returns

RPKM_mat : np.array
Matrix in same shape as counts_mat containing RPKM values
def counts_to_TPM(counts_mat, gene_lengths, mapped_reads=None)

Convert counts_mat to TPM (transcripts per million)

Parameters

counts_mat : np.array
Matrix of counts in samples x genes format (genes as columns)
gene_lengths : np.array
1D array of length counts_mat.shape[1] containing lengths for each gene in base pairs
mapped_reads : np.array (default=None)
1D array of length counts_mat.shape[0] containing total mapped reads for each sample. If None, calculate sums manually from columns of counts_mat.

Returns

TPM_mat : np.array
Matrix in same shape as counts_mat containing TPM values
def fetch_decoupler_resources(resources=['msigdb', 'panglaodb', 'progeny', 'collectri', 'liana'], genome='human')

Retrieve prior-knowledge networks from OmniPath for use with decoupler pathway analysis methods * MSigDB: biological pathways from HALLMARK (ORA) * PanglaoDB: cell-type and cell-state for scRNA labeling (ORA) * PROGENy: canonical signaling pathways (MLM) * CollecTRI: transcription factor regulon networks (ULM) * LIANA: ligand-receptor interactions (ULM)

Parameters

resources : list, optional (default=["msigdb","panglaodb","progeny","collectri","liana"])
List of resources to fetch. Default all; remove networks not desired.
genome : str literal, optional (default="human")
One of "human" or "mouse" to determine which gene symbols to return in genesymbol or target columns of network dataframes

Returns

nets : dict
Dictionary containing names of OmniPath networks (keys) and the corresponding dataframes containing gene-pathway information (values).
def filter_signatures_with_var_names(signatures_dict, adata)

Filter lists of genes in signatures_dict to include genes in adata.var_names

def flip_signature_dict(signatures_dict)

"Flip" dictionary of signatures where keys are signature names and values are lists of features, returning a dictionary where keys are individual features and values are signature names

Parameters

signatures_dict : dict
dictionary where keys are signature names and values are lists of features

Returns

signatures_dict_flipped : dict
dictionary where keys are features and values are signature names
def human_to_mouse_simple(symbol)

Convert human to mouse symbols by simple case-conversion

def ingest_gene_signatures(sig_files, form='short', sig_col='signature', gene_col='gene')

Read in gene signatures from one or more flat files

Parameters

sig_files : Union[list, str]
Path to single signature file or list of paths to signature files
form : Literal('short','long'), Optional (default='short')
Format of the data in sig_files. If 'short', columns of sig_files are assumed to contain separate signatures, with first row of column headers as signature names. If 'long', expect sig_col and gene_col headers to describe signature names and constituent genes in long-form, respectively.
sig_col : str, Optional (default='signature')
Column in sig_files containing signature names. Ignored if form=='short'.
gene_col : str, Optional (default='gene')
Column in sig_files containing gene names. Ignored if form=='short'.

Returns

genes : dict
Dictionary of gene signatures with signature names as keys and lists of genes as values.
def signature_dict_from_rank_genes_groups(adata, uns_key='rank_genes_groups', groups=None, n_genes=5, ambient=False)

Extract DEGs from AnnData into signature dictionary

Parameters

adata : anndata.AnnData
AnnData object containing DEG results in .uns
uns_key : str, optional (default='rank_genes_groups')
Key from adata.uns containing DEG results. Should reflect categories in adata.obs[groupby].
groups : list of str, optional (default=None)
List of groups within adata.uns[uns_key] to extract DEGs for. If None, retrieve all groups.
n_genes : int, optional (default=5)
Number of top genes per group to show
ambient : bool, optional (default=False)
Include ambient genes as a group in the plot/output dictionary. If True, adata.var must have a boolean column called 'ambient' labeling ambient genes.

Returns

markers : dict
Dictionary of DEGs group names as keys and lists of genes as values
def signature_dict_values(signatures_dict, unique=True)

Extract features from dictionary of signatures where keys are signature names and values are lists of features, returning a single list of features

Parameters

signatures_dict : dict
dictionary where keys are signature names and values are lists of features
unique : bool, optional (default=True)
get only unique features across all signatures_dict.values()

Returns

dict_values : list
 
def signatures_to_long_form(sig_short, sig_col='signature', gene_col='gene')

Convert gene signatures dict or dataframe from short to long form

Parameters

sig_short : Union[dict,pd.DataFrame]
Gene signatures in dict or short form dataframe, where columns are assumed to contain separate signatures, with first row of column headers as signature names.
sig_col : str, Optional (default='signature')
Column in sig_long to contain signature names.
gene_col : str, Optional (default='gene')
Column in sig_long to contain gene names.

Returns

sig_long : pd.DataFrame
Gene signatures in long form, with signature names in sig_col and gene names in gene_col.
def signatures_to_short_form(sig_long, sig_col='signature', gene_col='gene')

Convert gene signatures dict or dataframe from long to short form

Parameters

sig_long : Union[dict,pd.DataFrame]
Gene signatures in dict or long form dataframe, where sig_col and gene_col headers describe signature names and constituent genes, respectively.
sig_col : str, Optional (default='signature')
Column in sig_long containing signature names.
gene_col : str, Optional (default='gene')
Column in sig_long containing gene names.

Returns

sig_short : pd.DataFrame
Gene signatures in short form, where columns are assumed to contain separate signatures, with first row of column headers as signature names.