🎯 CSPOT Phenotyping¶
Assign phenotypes to each cell. Clustering data may not always be ideal, so we developed a cell type assignment algorithm that does a hierarchical assignment process iteratively.
Please keep in mind that the sample data is used for demonstration purposes only and has been simplified and reduced in size. It is solely intended for educational purposes on how to execute cspot
and will not yeild any meaningful results.
Download executable notebook here.
Make sure you have completed Build cepot Model
and Run cspot Algorithm
Tutorial before you try to execute this Jupyter Notebook!
# import packages
import cspot as cs
import pandas as pd
We need two
basic inputs to perform phenotyping with CSPOT
- The cspot Object
- A Phenotyping workflow based on prior knowledge
# set the Project directory
projectDir = '/Users/aj/Documents/cspotExampleData'
# Path to the CSPOT Object
csObject = projectDir + '/CSPOT/csObject/exampleImage_cspotPredict.ome.h5ad'
# load the phenotyping workflow
phenotype = pd.read_csv(str(projectDir) + '/phenotype_workflow.csv')
# view the table:
phenotype.style.format(na_rep='')
Unnamed: 0 | Unnamed: 1 | ECAD | CD45 | CD4 | CD3D | CD8A | KI67 | |
---|---|---|---|---|---|---|---|---|
0 | all | Immune | anypos | anypos | anypos | anypos | ||
1 | all | ECAD+ | pos | |||||
2 | ECAD+ | KI67+ ECAD+ | pos | |||||
3 | Immune | CD4+ T | allpos | allpos | ||||
4 | Immune | CD8+ T | allpos | allpos | ||||
5 | Immune | Non T CD4+ cells | pos | neg |
As it can be seen from the table above,
(1) The first column
has to contain the cell that are to be classified.
(2) The second column
indicates the phenotype a particular cell will be assigned if it satifies the conditions in the row.
(3) Column three
and onward represent protein markers. If the protein marker is known to be expressed for that cell type, then it is denoted by either pos
, allpos
. If the protein marker is known to not express for a cell type it can be denoted by neg
, allneg
. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left empty. anypos
and anyneg
are options for using a set of markers and if any of the marker is positive or negative, the cell type is denoted accordingly.
To give users maximum flexibility in identifying desired cell types, we have implemented various classification arguments as described above for strategical classification. They include
- allpos
- allneg
- anypos
- anyneg
- pos
- neg
pos
: "Pos" looks for cells positive for a given marker. If multiple markers are annotated as pos
, all must be positive to denote the cell type. For example, a Regulatory T cell can be defined as CD3+CD4+FOXP3+
by passing pos
to each marker. If one or more markers don't meet the criteria (e.g. CD4-), the program will classify it as Likely-Regulatory-T cell
, pending user confirmation. This is useful in cases of technical artifacts or when cell types (such as cancer cells) are defined by marker loss (e.g. T-cell Lymphomas).
neg
: Same as pos
but looks for negativity of the defined markers.
allpos
: "Allpos" requires all defined markers to be positive. Unlike pos
, it doesn't classify cells as Likely-cellType
, but strictly annotates cells positive for all defined markers.
allneg
: Same as allpos
but looks for negativity of the defined markers.
anypos
: "Anypos" requires only one of the defined markers to be positive. For example, to define macrophages, a cell could be designated as such if any of CD68
, CD163
, or CD206
is positive.
anyneg
: Same as anyneg
but looks for negativity of the defined markers.
adata = cs.csPhenotype ( csObject=csObject,
phenotype=phenotype,
midpoint = 0.5,
label="phenotype",
imageid='imageid',
pheno_threshold_percent=None,
pheno_threshold_abs=None,
fileName=None,
projectDir=projectDir)
Phenotyping Immune
Phenotyping ECAD+
-- Subsetting ECAD+
Phenotyping KI67+ ECAD+
-- Subsetting Immune
Phenotyping CD4+ T
Phenotyping CD8+ T
Phenotyping Non T CD4+ cells
Consolidating the phenotypes across all groups
Modified csObject is stored at "/Users/aj/Documents/cspotExampleData/CSPOT/csPhenotype
/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/csPhenotype.py:259: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
allpos_score['score'] = allpos_score.max(axis=1)
/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/csPhenotype.py:259: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
allpos_score['score'] = allpos_score.max(axis=1)
Same function if the user wants to run it via Command Line Interface
python csPhenotype.py \
--csObject /Users/aj/Documents/cspotExampleData/CSPOT/csObject/exampleImage_cspotPredict.ome.h5ad \
--phenotype /Users/aj/Documents/cspotExampleData/phenotype_workflow.csv \
--projectDir /Users/aj/Documents/cspotExampleData
If you had provided projectDir
the modified csObject would be stored in CSPOT/csPhenotype/
, else, the object will be returned to memory.
# check the identified phenotypes
adata.obs['phenotype'].value_counts()
KI67+ ECAD+ 6159
CD4+ T 5785
CD8+ T 816
Name: phenotype, dtype: int64
# Tutorial Ends here (check out some of the helper functions!)