Module kitchen.ingredients
Resources and utility functions
Functions
def RPKM_to_TPM(RPKM_mat)-
Convert
RPKM_mat(reads per KB per million) to TPM (transcripts per million)Parameters
RPKM_mat:np.array- Matrix of RPKM values in samples x genes format (genes as columns)
Returns
TPM_mat:np.array- Matrix in same shape as
RPKM_matcontaining TPM values
def check_dir_exists(path)-
Checks if directory already exists or not and creates it if it doesn't
Parameters
path:str- path to directory
Returns
tries to make directory at
path, unless it already exists def counts_to_RPKM(counts_mat, gene_lengths, mapped_reads=None)-
Convert
counts_matto RPKM (reads per kilobase per million)Parameters
counts_mat:np.array- Matrix of counts in samples x genes format (genes as columns)
gene_lengths:np.array- 1D array of length
counts_mat.shape[1]containing lengths for each gene in base pairs mapped_reads:np.array, optional(default=None)- 1D array of length
counts_mat.shape[0]containing total mapped reads for each sample. IfNone, calculate sums manually from columns ofcounts_mat.
Returns
RPKM_mat:np.array- Matrix in same shape as
counts_matcontaining RPKM values
def counts_to_TPM(counts_mat, gene_lengths, mapped_reads=None)-
Convert
counts_matto TPM (transcripts per million)Parameters
counts_mat:np.array- Matrix of counts in samples x genes format (genes as columns)
gene_lengths:np.array- 1D array of length
counts_mat.shape[1]containing lengths for each gene in base pairs mapped_reads:np.array (default=None)- 1D array of length
counts_mat.shape[0]containing total mapped reads for each sample. IfNone, calculate sums manually from columns ofcounts_mat.
Returns
TPM_mat:np.array- Matrix in same shape as
counts_matcontaining TPM values
def fetch_decoupler_resources(resources=['msigdb', 'panglaodb', 'progeny', 'collectri', 'liana'], genome='human')-
Retrieve prior-knowledge networks from OmniPath for use with
decouplerpathway analysis methods * MSigDB: biological pathways from HALLMARK (ORA) * PanglaoDB: cell-type and cell-state for scRNA labeling (ORA) * PROGENy: canonical signaling pathways (MLM) * CollecTRI: transcription factor regulon networks (ULM) * LIANA: ligand-receptor interactions (ULM)Parameters
resources:list, optional(default=["msigdb","panglaodb","progeny","collectri","liana"])- List of resources to fetch. Default all; remove networks not desired.
genome:str literal, optional(default="human")- One of "human" or "mouse" to determine which gene symbols to return in
genesymbolortargetcolumns of network dataframes
Returns
nets:dict- Dictionary containing names of OmniPath networks (keys) and the corresponding dataframes containing gene-pathway information (values).
def filter_signatures_with_var_names(signatures_dict, adata)-
Filter lists of genes in
signatures_dictto include genes inadata.var_names def flip_signature_dict(signatures_dict)-
"Flip" dictionary of signatures where keys are signature names and values are lists of features, returning a dictionary where keys are individual features and values are signature names
Parameters
signatures_dict:dict- dictionary where keys are signature names and values are lists of features
Returns
signatures_dict_flipped:dict- dictionary where keys are features and values are signature names
def human_to_mouse_simple(symbol)-
Convert human to mouse symbols by simple case-conversion
def ingest_gene_signatures(sig_files, form='short', sig_col='signature', gene_col='gene')-
Read in gene signatures from one or more flat files
Parameters
sig_files:Union[list, str]- Path to single signature file or list of paths to signature files
form:Literal('short','long'), Optional (default='short')- Format of the data in
sig_files. If 'short', columns ofsig_filesare assumed to contain separate signatures, with first row of column headers as signature names. If 'long', expectsig_colandgene_colheaders to describe signature names and constituent genes in long-form, respectively. sig_col:str, Optional (default='signature')- Column in
sig_filescontaining signature names. Ignored ifform=='short'. gene_col:str, Optional (default='gene')- Column in
sig_filescontaining gene names. Ignored ifform=='short'.
Returns
genes:dict- Dictionary of gene signatures with signature names as keys and lists of genes as values.
def signature_dict_from_rank_genes_groups(adata, uns_key='rank_genes_groups', groups=None, n_genes=5, add_down=False, ambient=False)-
Extract DEGs from AnnData into signature dictionary
Parameters
adata:anndata.AnnData- AnnData object containing DEG results in
.uns uns_key:str, optional(default='rank_genes_groups')- Key from
adata.unscontaining DEG results. Should reflect categories inadata.obs[groupby]. groups:listofstr, optional(default=None)- List of groups within
adata.uns[uns_key]to extract DEGs for. IfNone, retrieve all groups. n_genes:int, optional(default=5)- Number of top genes per group to show
add_down:bool, optional(default=False)- Include bottom
n_genesby score as well as top genes ambient:bool, optional(default=False)- Include ambient genes as a group in the plot/output dictionary. If
True,adata.varmust have a boolean column called 'ambient' labeling ambient genes.
Returns
markers:dict- Dictionary of DEGs group names as keys and lists of genes as values
def signature_dict_values(signatures_dict, unique=True)-
Extract features from dictionary of signatures where keys are signature names and values are lists of features, returning a single list of features
Parameters
signatures_dict:dict- dictionary where keys are signature names and values are lists of features
unique:bool, optional(default=True)- get only unique features across all
signatures_dict.values()
Returns
dict_values:list
def signatures_to_long_form(sig_short, sig_col='signature', gene_col='gene')-
Convert gene signatures dict or dataframe from short to long form
Parameters
sig_short:Union[dict,pd.DataFrame]- Gene signatures in dict or short form dataframe, where columns are assumed to contain separate signatures, with first row of column headers as signature names.
sig_col:str, Optional (default='signature')- Column in
sig_longto contain signature names. gene_col:str, Optional (default='gene')- Column in
sig_longto contain gene names.
Returns
sig_long:pd.DataFrame- Gene signatures in long form, with signature names in
sig_coland gene names ingene_col.
def signatures_to_short_form(sig_long, sig_col='signature', gene_col='gene')-
Convert gene signatures dict or dataframe from long to short form
Parameters
sig_long:Union[dict,pd.DataFrame]- Gene signatures in dict or long form dataframe, where
sig_colandgene_colheaders describe signature names and constituent genes, respectively. sig_col:str, Optional (default='signature')- Column in
sig_longcontaining signature names. gene_col:str, Optional (default='gene')- Column in
sig_longcontaining gene names.
Returns
sig_short:pd.DataFrame- Gene signatures in short form, where columns are assumed to contain separate signatures, with first row of column headers as signature names.