Skip to content

csTrain

Short Description

The function trains a deep learning model for each marker in the provided training data. To train the cspotModel, simply direct the function to the TrainingData folder. To train only specific models, specify the folder names using the trainMarkers parameter. The projectDir remains constant and the program will automatically create subfolders to save the trained models.

Function

csTrain(trainingDataPath, projectDir, trainMarkers=None, artefactPath=None, imSize=64, nChannels=1, nClasses=2, nExtraConvs=0, nLayers=3, featMapsFact=2, downSampFact=2, ks=3, nOut0=16, stdDev0=0.03, batchSize=16, epochs=100, verbose=True)

Parameters:

Name Type Description Default
trainingDataPath str

The file path leading to the directory that holds the training data.

required
projectDir str

Path to output directory. The result will be located at projectDir/CSPOT/cspotModel/.

required
trainMarkers list

Generate models for a specified list of markers. By default, models are c reated for all data in the TrainingData folder. If the user wants to limit it to a specific list, they can pass in the folder names (e.g. ['CD3D', 'CD4'])

None
artefactPath str

Path to the directory where the artefacts data is loaded from.

None
imSize int

Image size (assumed to be square).

64
nChannels int

Number of channels in the input image.

1
nClasses int

Number of classes in the classification problem.

2
nExtraConvs int

Number of extra convolutional layers to add to the model.

0
nLayers int

Total number of layers in the model.

3
featMapsFact int

Factor to multiply the number of feature maps by in each layer.

2
downSampFact int

Factor to down-sample the feature maps by in each layer.

2
ks int

Kernel size for the convolutional layers.

3
nOut0 int

Number of filters in the first layer.

16
stdDev0 float

Standard deviation for the initializer for the first layer.

0.03
batchSize int

Batch size for training.

16
epochs int

Number of training epochs.

100
verbose bool

If True, print detailed information about the process to the console.

True

Returns:

Name Type Description
Model images and model

The result will be located at projectDir/CSPOT/cspotModel/.

Example
# High level working directory
projectDir = '/Users/aj/Documents/cspotExampleData'

trainingDataPath = projectDir + '/CSPOT/TrainingData'

cs.csTrain(trainingDataPath=trainingDataPath,
               projectDir=projectDir,
               trainMarkers=None,
               artefactPath=None,
               imSize=64,
               nChannels=1,
               nClasses=2,
               nExtraConvs=0,
               nLayers=3,
               featMapsFact=2,
               downSampFact=2,
               ks=3,
               nOut0=16,
               stdDev0=0.03,
               batchSize=16,
               epochs=1)

# Same function if the user wants to run it via Command Line Interface
python csTrain.py         --trainingDataPath /Users/aj/Documents/cspotExampleData/CSPOT/TrainingData         --projectDir /Users/aj/Documents/cspotExampleData/         --epochs 1
Source code in cspot/csTrain.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def csTrain(trainingDataPath,
               projectDir,
               trainMarkers=None,
               artefactPath=None,
               imSize=64,
               nChannels=1,
               nClasses=2,
               nExtraConvs=0,
               nLayers=3,
               featMapsFact=2,
               downSampFact=2,
               ks=3,
               nOut0=16,
               stdDev0=0.03,
               batchSize=16,
               epochs=100,
               verbose=True):
    """

Parameters:
    trainingDataPath (str):
        The file path leading to the directory that holds the training data.

    projectDir (str):
        Path to output directory. The result will be located at `projectDir/CSPOT/cspotModel/`.

    trainMarkers (list):
        Generate models for a specified list of markers. By default, models are c
        reated for all data in the TrainingData folder. If the user wants to
        limit it to a specific list, they can pass in the folder names (e.g. ['CD3D', 'CD4'])

    artefactPath (str):
        Path to the directory where the artefacts data is loaded from.

    imSize (int, optional):
        Image size (assumed to be square).

    nChannels (int, optional):
        Number of channels in the input image.

    nClasses (int, optional):
        Number of classes in the classification problem.

    nExtraConvs (int, optional):
        Number of extra convolutional layers to add to the model.

    nLayers (int, optional):
        Total number of layers in the model.

    featMapsFact (int, optional):
        Factor to multiply the number of feature maps by in each layer.

    downSampFact (int, optional):
        Factor to down-sample the feature maps by in each layer.

    ks (int, optional):
        Kernel size for the convolutional layers.

    nOut0 (int, optional):
        Number of filters in the first layer.

    stdDev0 (float, optional):
        Standard deviation for the initializer for the first layer.

    batchSize (int, optional):
        Batch size for training.

    epochs (int, optional):
        Number of training epochs.

    verbose (bool, optional):
        If True, print detailed information about the process to the console.  

Returns:
    Model (images and model):  
        The result will be located at `projectDir/CSPOT/cspotModel/`.


Example:
    ```python

    # High level working directory
    projectDir = '/Users/aj/Documents/cspotExampleData'

    trainingDataPath = projectDir + '/CSPOT/TrainingData'

    cs.csTrain(trainingDataPath=trainingDataPath,
                   projectDir=projectDir,
                   trainMarkers=None,
                   artefactPath=None,
                   imSize=64,
                   nChannels=1,
                   nClasses=2,
                   nExtraConvs=0,
                   nLayers=3,
                   featMapsFact=2,
                   downSampFact=2,
                   ks=3,
                   nOut0=16,
                   stdDev0=0.03,
                   batchSize=16,
                   epochs=1)

    # Same function if the user wants to run it via Command Line Interface
    python csTrain.py \
        --trainingDataPath /Users/aj/Documents/cspotExampleData/CSPOT/TrainingData \
        --projectDir /Users/aj/Documents/cspotExampleData/ \
        --epochs 1

    ```


    """

    # Start here
    # convert to path
    trainingDataPath = pathlib.Path(trainingDataPath)
    # identify all the data folders within the given TrainingData folder
    directories = [x for x in trainingDataPath.iterdir() if x.is_dir()]
    # keep only folders that the user have requested
    if trainMarkers is not None:
        if isinstance(trainMarkers, str):
            trainMarkers = [trainMarkers]
        directories = [x for x in directories if x.stem in trainMarkers]

    # optional artifacts
    if artefactPath is not None:
        artefactPath = pathlib.Path(artefactPath)
        artefactTrainPath = pathlib.Path(artefactPath / 'training')
        artefactValidPath = pathlib.Path(artefactPath / 'validation')
    else:
        artefactPath = ''
        artefactTrainPath = ''
        artefactValidPath = ''
    # Need to run the training for each marker

    def csTrainInternal(trainingDataPath,
                           projectDir,
                           artefactPath,
                           imSize,
                           nChannels,
                           nClasses,
                           nExtraConvs,
                           nLayers,
                           featMapsFact,
                           downSampFact,
                           ks,
                           nOut0,
                           stdDev0,
                           batchSize,
                           epochs):
        # process the file name
        finalName = trainingDataPath.stem

        # paths for loading data
        trainPath = pathlib.Path(trainingDataPath / 'training')
        validPath = pathlib.Path(trainingDataPath / 'validation')
        testPath = pathlib.Path(trainingDataPath / 'test')

        # Paths for saving data
        logPath = pathlib.Path(
            projectDir + '/CSPOT/csTrain/' + finalName + '/tempTFLogs/')
        modelPath = pathlib.Path(projectDir + '/CSPOT/cspotModel/' + finalName)
        pmPath = pathlib.Path(projectDir + '/CSPOT/csTrain/' +
                              finalName + '/TFprobMaps/')

        # set up the model
        UNet2D.setup(imSize=imSize,
                     nClasses=nClasses,
                     nChannels=nChannels,
                     nExtraConvs=nExtraConvs,
                     nDownSampLayers=nLayers,
                     featMapsFact=featMapsFact,
                     downSampFact=downSampFact,
                     kernelSize=ks,
                     nOut0=nOut0,
                     stdDev0=stdDev0,
                     batchSize=batchSize)

        # train the model
        UNet2D.train(trainPath=trainPath,
                     validPath=validPath,
                     testPath=testPath,
                     artTrainPath=artefactTrainPath,
                     artValidPath=artefactValidPath,
                     logPath=logPath,
                     modelPath=modelPath,
                     pmPath=pmPath,
                     restoreVariables=False,
                     nSteps=epochs,
                     gpuIndex=0,
                     testPMIndex=2)

    # Run the function on all markers
    def r_csTrainInternal(x): return csTrainInternal(trainingDataPath=x,
                                                           projectDir=projectDir,
                                                           artefactPath=artefactPath,
                                                           imSize=imSize,
                                                           nChannels=nChannels,
                                                           nClasses=nClasses,
                                                           nExtraConvs=nExtraConvs,
                                                           nLayers=nLayers,
                                                           featMapsFact=featMapsFact,
                                                           downSampFact=downSampFact,
                                                           ks=ks,
                                                           nOut0=nOut0,
                                                           stdDev0=stdDev0,
                                                           batchSize=batchSize,
                                                           epochs=epochs)

    csTrainInternal_result = list(
        map(r_csTrainInternal,  directories))  # Apply function

    # Finish Job
    if verbose is True:
        print('CSPOT Models have been generated, head over to "' + str(projectDir) + '/CSPOT/cspotModel" to view results')