Module kitchen.ingredients
Resources and utility functions
Functions
def RPKM_to_TPM(RPKM_mat)
-
Convert
RPKM_mat
(reads per KB per million) to TPM (transcripts per million)Parameters
RPKM_mat
:np.array
- Matrix of RPKM values in samples x genes format (genes as columns)
Returns
TPM_mat
:np.array
- Matrix in same shape as
RPKM_mat
containing TPM values
def check_dir_exists(path)
-
Checks if directory already exists or not and creates it if it doesn't
Parameters
path
:str
- path to directory
Returns
tries to make directory at
path
, unless it already exists def counts_to_RPKM(counts_mat, gene_lengths, mapped_reads=None)
-
Convert
counts_mat
to RPKM (reads per kilobase per million)Parameters
counts_mat
:np.array
- Matrix of counts in samples x genes format (genes as columns)
gene_lengths
:np.array
- 1D array of length
counts_mat.shape[1]
containing lengths for each gene in base pairs mapped_reads
:np.array
, optional(default=
None)
- 1D array of length
counts_mat.shape[0]
containing total mapped reads for each sample. IfNone
, calculate sums manually from columns ofcounts_mat
.
Returns
RPKM_mat
:np.array
- Matrix in same shape as
counts_mat
containing RPKM values
def counts_to_TPM(counts_mat, gene_lengths, mapped_reads=None)
-
Convert
counts_mat
to TPM (transcripts per million)Parameters
counts_mat
:np.array
- Matrix of counts in samples x genes format (genes as columns)
gene_lengths
:np.array
- 1D array of length
counts_mat.shape[1]
containing lengths for each gene in base pairs mapped_reads
:np.array (default=
None)
- 1D array of length
counts_mat.shape[0]
containing total mapped reads for each sample. IfNone
, calculate sums manually from columns ofcounts_mat
.
Returns
TPM_mat
:np.array
- Matrix in same shape as
counts_mat
containing TPM values
def fetch_decoupler_resources(resources=['msigdb', 'panglaodb', 'progeny', 'collectri', 'liana'], genome='human')
-
Retrieve prior-knowledge networks from OmniPath for use with
decoupler
pathway analysis methods * MSigDB: biological pathways from HALLMARK (ORA) * PanglaoDB: cell-type and cell-state for scRNA labeling (ORA) * PROGENy: canonical signaling pathways (MLM) * CollecTRI: transcription factor regulon networks (ULM) * LIANA: ligand-receptor interactions (ULM)Parameters
resources
:list
, optional(default=["msigdb","panglaodb","progeny","collectri","liana"])
- List of resources to fetch. Default all; remove networks not desired.
genome
:str literal
, optional(default="human")
- One of "human" or "mouse" to determine which gene symbols to return in
genesymbol
ortarget
columns of network dataframes
Returns
nets
:dict
- Dictionary containing names of OmniPath networks (keys) and the corresponding dataframes containing gene-pathway information (values).
def filter_signatures_with_var_names(signatures_dict, adata)
-
Filter lists of genes in
signatures_dict
to include genes inadata.var_names
def flip_signature_dict(signatures_dict)
-
"Flip" dictionary of signatures where keys are signature names and values are lists of features, returning a dictionary where keys are individual features and values are signature names
Parameters
signatures_dict
:dict
- dictionary where keys are signature names and values are lists of features
Returns
signatures_dict_flipped
:dict
- dictionary where keys are features and values are signature names
def human_to_mouse_simple(symbol)
-
Convert human to mouse symbols by simple case-conversion
def ingest_gene_signatures(sig_files, form='short', sig_col='signature', gene_col='gene')
-
Read in gene signatures from one or more flat files
Parameters
sig_files
:Union[list, str]
- Path to single signature file or list of paths to signature files
form
:Literal('short','long'), Optional (default='short')
- Format of the data in
sig_files
. If 'short', columns ofsig_files
are assumed to contain separate signatures, with first row of column headers as signature names. If 'long', expectsig_col
andgene_col
headers to describe signature names and constituent genes in long-form, respectively. sig_col
:str, Optional (default='signature')
- Column in
sig_files
containing signature names. Ignored ifform
=='short'. gene_col
:str, Optional (default='gene')
- Column in
sig_files
containing gene names. Ignored ifform
=='short'.
Returns
genes
:dict
- Dictionary of gene signatures with signature names as keys and lists of genes as values.
def signature_dict_from_rank_genes_groups(adata, uns_key='rank_genes_groups', groups=None, n_genes=5, ambient=False)
-
Extract DEGs from AnnData into signature dictionary
Parameters
adata
:anndata.AnnData
- AnnData object containing DEG results in
.uns
uns_key
:str
, optional(default='rank_genes_groups')
- Key from
adata.uns
containing DEG results. Should reflect categories inadata.obs[groupby]
. groups
:list
ofstr
, optional(default=
None)
- List of groups within
adata.uns[uns_key]
to extract DEGs for. IfNone
, retrieve all groups. n_genes
:int
, optional(default=5)
- Number of top genes per group to show
ambient
:bool
, optional(default=False)
- Include ambient genes as a group in the plot/output dictionary. If
True
,adata.var
must have a boolean column called 'ambient' labeling ambient genes.
Returns
markers
:dict
- Dictionary of DEGs group names as keys and lists of genes as values
def signature_dict_values(signatures_dict, unique=True)
-
Extract features from dictionary of signatures where keys are signature names and values are lists of features, returning a single list of features
Parameters
signatures_dict
:dict
- dictionary where keys are signature names and values are lists of features
unique
:bool
, optional(default=
True)
- get only unique features across all
signatures_dict.values()
Returns
dict_values
:list
def signatures_to_long_form(sig_short, sig_col='signature', gene_col='gene')
-
Convert gene signatures dict or dataframe from short to long form
Parameters
sig_short
:Union[dict,pd.DataFrame]
- Gene signatures in dict or short form dataframe, where columns are assumed to contain separate signatures, with first row of column headers as signature names.
sig_col
:str, Optional (default='signature')
- Column in
sig_long
to contain signature names. gene_col
:str, Optional (default='gene')
- Column in
sig_long
to contain gene names.
Returns
sig_long
:pd.DataFrame
- Gene signatures in long form, with signature names in
sig_col
and gene names ingene_col
.
def signatures_to_short_form(sig_long, sig_col='signature', gene_col='gene')
-
Convert gene signatures dict or dataframe from long to short form
Parameters
sig_long
:Union[dict,pd.DataFrame]
- Gene signatures in dict or long form dataframe, where
sig_col
andgene_col
headers describe signature names and constituent genes, respectively. sig_col
:str, Optional (default='signature')
- Column in
sig_long
containing signature names. gene_col
:str, Optional (default='gene')
- Column in
sig_long
containing gene names.
Returns
sig_short
:pd.DataFrame
- Gene signatures in short form, where columns are assumed to contain separate signatures, with first row of column headers as signature names.