Skip to content

csObject

Short Description

The csObject function creates a CSPOT object using the anndata framework by inputting csScore and a pre-calculated single-cell spatial table. This centralizes all information into one file, streamlining the data analysis process and reducing the risk of losing data.

Function

csObject(spatialTablePath, csScorePath, CellId='CellID', uniqueCellId=True, split='X_centroid', removeDNA=True, remove_string_from_name=None, log=True, dropMarkers=None, verbose=True, projectDir=None)

Parameters:

Name Type Description Default
spatialTablePath list

Provide a list of paths to the single-cell spatial feature tables, ensuring each image has a unique path specified.

required
csScorePath list

Supply a list of paths to the DL score tables created using generateCSScore, ensuring they correspond to the image paths specified.

required
CellId str

Specify the column name that holds the cell ID (a unique name given to each cell).

'CellID'
uniqueCellId bool

The function generates a unique name for each cell by combining the CellId and imageid. If you don't want this, pass False. In such case the function will default to using just the CellId. However, make sure CellId is unique especially when loading multiple images together.

True
split string

The spatial feature table generally includes single cell expression data and meta data such as X, Y coordinates, and cell shape size. The CSPOT object separates them. Ensure that the expression data columns come first, followed by meta data columns. Provide the column name that marks the split, i.e the column name immediately following the expression data.

'X_centroid'
removeDNA bool

Exclude DNA channels from the final output. The function searches for column names containing the string dna or dapi.

True
remove_string_from_name string

Cleans up channel names by removing user specified string from all marker names.

None
log bool

Apply log1p transformation to log the data.

True
dropMarkers list

Specify a list of markers to be removed from the analysis, for example: ["background_channel", "CD20"].

None
verbose bool

If True, print detailed information about the process to the console.

True
projectDir string

Provide the path to the output directory. The result will be located at projectDir/CSPOT/csObject/.

None

Returns:

Name Type Description
csObject anndata

If projectDir is provided the CSPOT Object will be saved as a .h5ad file in the provided directory.

Example
# set the working directory & set paths to the example data
projectDir = '/Users/aj/Documents/cspotExampleData'

# Path to all the files that are necessary files for running csObject function
segmentationPath = projectDir + '/segmentation/exampleSegmentationMask.tif'
csScorePath = projectDir + '/CSPOT/csScore/exampleImage_cspotPredict.ome.csv'

# please note that there are a number of defaults in the below function that assumes certain structure within the spatialTable.
# Please confirm it is similar with user data or modifiy the parameters accordingly
# check out the documentation for further details
adata = cs.csObject (spatialTablePath=spatialTablePath,
                csScorePath=csScorePath,
                CellId='CellID',
                uniqueCellId=True,
                split='X_centroid',
                removeDNA=True,
                remove_string_from_name=None,
                log=True,
                dropMarkers=None,
                projectDir=projectDir)

# Same function if the user wants to run it via Command Line Interface
python csObject.py             --spatialTablePath /Users/aj/Documents/cspotExampleData/quantification/exampleSpatialTable.csv             --csScorePath /Users/aj/Documents/cspotExampleData/CSPOT/csScore/exampleImage_cspotPredict.ome.csv             --projectDir /Users/aj/Documents/cspotExampleData
Source code in cspot/csObject.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def csObject (spatialTablePath,
                 csScorePath,
                 CellId='CellID',
                 uniqueCellId=True,
                 split='X_centroid',
                 removeDNA=True,
                 remove_string_from_name=None,
                 log=True,
                 dropMarkers=None,
                 verbose=True,
                 projectDir=None):
    """
Parameters:
    spatialTablePath (list):
        Provide a list of paths to the single-cell spatial feature tables, ensuring each image has a unique path specified.

    csScorePath (list):
        Supply a list of paths to the DL score tables created using generateCSScore,
        ensuring they correspond to the image paths specified.

    CellId (str, optional):
        Specify the column name that holds the cell ID (a unique name given to each cell).

    uniqueCellId (bool, optional):
        The function generates a unique name for each cell by combining the CellId and imageid.
        If you don't want this, pass False. In such case the function will default to using just the CellId.
        However, make sure CellId is unique especially when loading multiple images together.

    split (string, optional):
        The spatial feature table generally includes single cell expression data
        and meta data such as X, Y coordinates, and cell shape size. The CSPOT
        object separates them. Ensure that the expression data columns come first,
        followed by meta data columns. Provide the column name that marks the split,
        i.e the column name immediately following the expression data.

    removeDNA (bool, optional):
        Exclude DNA channels from the final output. The function searches for
        column names containing the string `dna` or `dapi`. 

    remove_string_from_name (string, optional):
        Cleans up channel names by removing user specified string from all marker
        names. 

    log (bool, optional):
        Apply log1p transformation to log the data. 

    dropMarkers (list, optional):
        Specify a list of markers to be removed from the analysis, for
        example: ["background_channel", "CD20"]. 

    verbose (bool, optional):
        If True, print detailed information about the process to the console.  

    projectDir (string, optional):
        Provide the path to the output directory. The result will be located at
        `projectDir/CSPOT/csObject/`.

Returns:
    csObject (anndata):
        If projectDir is provided the CSPOT Object will be saved as a
        `.h5ad` file in the provided directory.

Example:
        ```python

        # set the working directory & set paths to the example data
        projectDir = '/Users/aj/Documents/cspotExampleData'

        # Path to all the files that are necessary files for running csObject function
        segmentationPath = projectDir + '/segmentation/exampleSegmentationMask.tif'
        csScorePath = projectDir + '/CSPOT/csScore/exampleImage_cspotPredict.ome.csv'

        # please note that there are a number of defaults in the below function that assumes certain structure within the spatialTable.
        # Please confirm it is similar with user data or modifiy the parameters accordingly
        # check out the documentation for further details
        adata = cs.csObject (spatialTablePath=spatialTablePath,
                        csScorePath=csScorePath,
                        CellId='CellID',
                        uniqueCellId=True,
                        split='X_centroid',
                        removeDNA=True,
                        remove_string_from_name=None,
                        log=True,
                        dropMarkers=None,
                        projectDir=projectDir)

        # Same function if the user wants to run it via Command Line Interface
        python csObject.py \
            --spatialTablePath /Users/aj/Documents/cspotExampleData/quantification/exampleSpatialTable.csv \
            --csScorePath /Users/aj/Documents/cspotExampleData/CSPOT/csScore/exampleImage_cspotPredict.ome.csv \
            --projectDir /Users/aj/Documents/cspotExampleData

        ```

    """

    # spatialTablePath list or string
    if isinstance(spatialTablePath, str):
        spatialTablePath = [spatialTablePath]
    spatialTablePath = [pathlib.Path(p) for p in spatialTablePath]
    # csScorePath list or string
    if isinstance(csScorePath, str):
        csScorePath = [csScorePath]
    csScorePath = [pathlib.Path(p) for p in csScorePath]

    # Import spatialTablePath
    def load_process_data (image):
        # Print the data that is being processed
        if verbose is True:
            print(f"Loading {image.name}")
        d = pd.read_csv(image)
        # If the data does not have a unique image ID column, add one.
        if 'imageid' not in d.columns:
            imid = image.stem
            d['imageid'] = imid
        # Unique name for the data
        if uniqueCellId is True:
            d.index = d['imageid'].astype(str)+'_'+d[CellId].astype(str)
        else:
            d.index = d[CellId]

        # move image id and cellID column to end
        cellid_col = [col for col in d.columns if col != CellId] + [CellId]; d = d[cellid_col]
        imageid_col = [col for col in d.columns if col != 'imageid'] + ['imageid']; d = d[imageid_col]
        # If there is INF replace with zero
        d = d.replace([np.inf, -np.inf], 0)
        # Return data
        return d

    # Import csScorePath
    def load_process_probTable (image):
        d = pd.read_csv(image, index_col=0)
        # Return data
        return d

    # Apply function to all spatialTablePath and create a master dataframe
    r_load_process_data = lambda x: load_process_data(image=x) # Create lamda function
    all_spatialTable = list(map(r_load_process_data, list(spatialTablePath))) # Apply function
    # Merge all the spatialTablePath into a single large dataframe
    for i in range(len(all_spatialTable)):
        all_spatialTable[i].columns = all_spatialTable[0].columns
    entire_spatialTable = pd.concat(all_spatialTable, axis=0, sort=False)

    # Apply function to all csScorePath and create a master dataframe
    r_load_process_probTable = lambda x: load_process_probTable(image=x) # Create lamda function
    all_probTable = list(map(r_load_process_probTable, list(csScorePath))) # Apply function
    # Merge all the csScorePath into a single large dataframe
    for i in range(len(all_probTable)):
        all_probTable[i].columns = all_probTable[0].columns
    entire_probTable = pd.concat(all_probTable, axis=0, sort=False)
    # make the index of entire_probTable same as all_probTable
    ## NOTE THIS IS A HARD COPY WITHOUT ANY CHECKS! ASSUMES BOTH ARE IN SAME ORDER
    entire_probTable.index = entire_spatialTable.index


    # Split the data into expression data and meta data
    # Step-1 (Find the index of the column with name X_centroid)
    split_idx = entire_spatialTable.columns.get_loc(split)
    meta = entire_spatialTable.iloc [:,split_idx:]
    # Step-2 (select only the expression values)
    entire_spatialTable = entire_spatialTable.iloc [:,:split_idx]

    # Rename the columns of the data
    if remove_string_from_name is not None:
        entire_spatialTable.columns = entire_spatialTable.columns.str.replace(remove_string_from_name, '')

    # Save a copy of the column names in the uns space of ANNDATA
    markers = list(entire_spatialTable.columns)

    # Remove DNA channels
    if removeDNA is True:
        entire_spatialTable = entire_spatialTable.loc[:,~entire_spatialTable.columns.str.contains('dna', case=False)]
        entire_spatialTable = entire_spatialTable.loc[:,~entire_spatialTable.columns.str.contains('dapi', case=False)]

    # Drop unnecessary markers
    if dropMarkers is not None:
        if isinstance(dropMarkers, str):
            dropMarkers = [dropMarkers]
        dropMarkers = list(set(dropMarkers).intersection(entire_spatialTable.columns))
        entire_spatialTable = entire_spatialTable.drop(columns=dropMarkers)

    # Create an anndata object
    adata = ad.AnnData(entire_spatialTable, dtype=np.float64)
    adata.obs = meta
    adata.uns['all_markers'] = markers
    adata.uns['csScore'] = entire_probTable

    # Add log data
    if log is True:
        adata.raw = adata
        adata.X = np.log1p(adata.X)

    # Save data if requested
    if projectDir is not None:
        finalPath = pathlib.Path(projectDir + '/CSPOT/csObject')
        if not os.path.exists(finalPath):
            os.makedirs(finalPath)
        if len(spatialTablePath) > 1:
            imid = 'csObject'
        else:
            imid = csScorePath[0].stem
        adata.write(finalPath / f'{imid}.h5ad')
        # Finish Job
        if verbose is True:
            print('CSPOT Object has been created, head over to'+ str(projectDir) + '/CSPOT/csObject" to view results')
    else:
        # Return data
        if verbose is True:
            print('CSPOT Object has been created')
        return adata