Train 2D images

Short Description

Train an autoencoder (a deep learning model) on a given dataset for selected marker/markers. The autoencoder learns to compress the dataset images into a lower-dimensional encoding and then reconstruct them from this encoding. To train the aeTrain or aeTrainMulti model, simply direct the function to the dataset_dir folder.

Function¶

`aeTrain(dataset_dir, outModelPath, input_dim=256, encoding_dim=64, max_epoch_num=100, batch_size=8, num_workers=10, prefetch_factor=8)` ¶

Parameters:

Name	Type	Description	Default
`dataset_dir`	`str`	The file path leading to the directory that holds the training data.	required
`outModelPath`	`str`	The file path where the trained model will be saved.	required
`input_dim`	`int`	The input dimension of the model, the input images assumed to be square, i.e., size x size. Default input dimensio is 256, which corresponds to an input image size of 16x16 pixels.	`256`
`encoding_dim`	`int`	The size of the encoding layer, representing the dimensionality of the feature encoding learned by the autoencoder.	`64`
`max_epoch_num`	`int`	The maximum number of epochs to run during training.	`100`
`batch_size`	`int`	The number of images in each batch of dataloader.	`8`
`num_workers`	`int`	The number of worker subprocesses for loading the data. A value of 0 means that the data will be loaded in the main process.	`10`
`prefetch_factor`	`int`	The number of batches to load in advance by each worker.	`8`

Returns:

Name	Type	Description
`model`	`model`	The function saves the trained model to the specified path.

Example

import spatialae as sa

# Define the parameters
input_dim = 256 # 16*16

# Replace with the path to your dataset directory
dataset_dir="/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/SinglePatch/DNA1/"

# Replace with your desired output model file path
outModelPath = '/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/ln_autoencoder_DNA_validate_300_model.pth'

# Train the autoencoder
sa.aeTrain(dataset_dir, outModelPath,  input_dim, encoding_dim,  max_epoch_num, batch_size)

Source code in spatialae/models/aeTrain.py

def aeTrain(dataset_dir,
            outModelPath,
            input_dim = 256,
            encoding_dim = 64,
            max_epoch_num = 100,
            batch_size = 8,
            num_workers = 10,
            prefetch_factor = 8
            ):
    """
Parameters:
    dataset_dir (str):
        The file path leading to the directory that holds the training data.

    outModelPath (str):
        The file path where the trained model will be saved.

    input_dim (int, optional):
        The input dimension of the model, the input images assumed to be square, i.e., size x size. Default input dimensio is 256, which corresponds to an input image size of 16x16 pixels.

    encoding_dim (int, optional):
        The size of the encoding layer, representing the dimensionality of the feature encoding learned by the autoencoder. 

    max_epoch_num (int, optional):
        The maximum number of epochs to run during training. 

    batch_size (int, optional):
        The number of images in each batch of dataloader.

    num_workers (int, optional):
        The number of worker subprocesses for loading the data. A value of 0 means that the data will be loaded in the main process.

    prefetch_factor (int, optional):
        The number of batches to load in advance by each worker. 

Returns:
    model (model): 
        The function saves the trained model to the specified path.

Example:
    ```python

    import spatialae as sa

    # Define the parameters
    input_dim = 256 # 16*16

    # Replace with the path to your dataset directory
    dataset_dir="/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/SinglePatch/DNA1/"

    # Replace with your desired output model file path
    outModelPath = '/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/ln_autoencoder_DNA_validate_300_model.pth'

    # Train the autoencoder
    sa.aeTrain(dataset_dir, outModelPath,  input_dim, encoding_dim,  max_epoch_num, batch_size)

    ```

    """
    model = LitAutoEncoder(input_dim, encoding_dim)
    # Load the dataset
    train_dataset = SpatialImageDataset(dataset_dir, get_train = True)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers = num_workers, prefetch_factor = prefetch_factor)

    validate_dataset = SpatialImageDataset(dataset_dir, get_validate = True)
    validate_loader = DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers = num_workers, prefetch_factor = prefetch_factor)
    # train the model
    trainer = pl.Trainer(max_epochs=max_epoch_num)
    trainer.fit(model, train_loader, validate_loader)

    # save the trained model
    torch.save(model.state_dict(), outModelPath)

`aeTrainMulti(dataset_dir, outModelPath, channels, input_dim=256, encoding_dim=64, max_epoch_num=100, batch_size=8, num_workers=10, prefetch_factor=8)` ¶

Parameters:

Name	Type	Description	Default
`dataset_dir`	`str`	The file path to the directory containing the training dataset. Each subdirectory within is expected to represent a marker.	required
`outModelPath`	`str`	The file path where the trained model will be saved.	required
`channels`	`list`	A list of strings representing the markers/channels to be used for model training. If a user wants to train the model on specific channels, they can specify them by name (e.g., ['CD3D', 'CD4']). Only data corresponding to these channels will be used for training.	required
`input_dim`	`int`	The input dimension of the model, the input images assumed to be square, i.e., size x size. Default input dimensio is 256, which corresponds to an input image size of 16x16 pixels.	`256`
`encoding_dim`	`int`	The size of the encoding layer, representing the dimensionality of the feature encoding learned by the autoencoder. Default is 64.	`64`
`max_epoch_num`	`int`	The maximum number of epochs to run during training. Default is 100.	`100`
`batch_size`	`int`	The number of images in each batch of dataloader. Default is 8.	`8`
`num_workers`	`int`	The number of worker subprocesses for loading the data. Default is 10. A value of 0 means that the data will be loaded in the main process.	`10`
`prefetch_factor`	`int`	The number of batches to load in advance by each worker. Default is 8.	`8`

Returns:

Name	Type	Description
`model`	`model`	The function saves the trained model to the specified path.

Example

import spatialae as sa

# Define the parameters
channels = ["DNA1", "CD3", "KERATIN", "CD20", "CD68","CD8A", "CD163","ECAD", "CD31"]
input_dim=256
encoding_dim=64
batch_size=32
max_epoch_num=300

# Replace with the path to your dataset directory
dataset_dir=/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/SinglePatch/

# Replace with your desired output model file path
outModelPath=/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/ln_autoencoder_multi_validate_300_model_dim32.pth

# Train the autoencoder
sa.aeTrainMulti(dataset_dir, outModelPath, channels, input_dim, encoding_dim, max_epoch_num, batch_size)

Source code in spatialae/models/aeTrain.py

def aeTrainMulti(dataset_dir,
                 outModelPath,
                 channels,
                 input_dim = 256,
                 encoding_dim = 64,
                 max_epoch_num = 100,
                 batch_size = 8,
                 num_workers = 10,
                 prefetch_factor = 8
                 ):
    """
Parameters:
    dataset_dir (str):
        The file path to the directory containing the training dataset.
        Each subdirectory within is expected to represent a marker.

    outModelPath (str):
        The file path where the trained model will be saved.

    channels (list):
        A list of strings representing the markers/channels to be used for model training.
        If a user wants to train the model on specific channels, they can specify them
        by name (e.g., ['CD3D', 'CD4']). Only data corresponding to these channels will be
        used for training.

    input_dim (int, optional):
        The input dimension of the model, the input images assumed to be square, i.e., size x size.
        Default input dimensio is 256, which corresponds to an input image size of 16x16 pixels.

    encoding_dim (int, optional):
        The size of the encoding layer, representing the dimensionality of the 
        feature encoding learned by the autoencoder. Default is 64.

    max_epoch_num (int, optional):
        The maximum number of epochs to run during training. Default is 100.

    batch_size (int, optional):
        The number of images in each batch of dataloader. Default is 8.

    num_workers (int, optional):
        The number of worker subprocesses for loading the data. Default is 10.
        A value of 0 means that the data will be loaded in the main process.

    prefetch_factor (int, optional):
        The number of batches to load in advance by each worker. Default is 8.

Returns:
    model (model): 
        The function saves the trained model to the specified path.


Example:
    ```python

    import spatialae as sa

    # Define the parameters
    channels = ["DNA1", "CD3", "KERATIN", "CD20", "CD68","CD8A", "CD163","ECAD", "CD31"]
    input_dim=256
    encoding_dim=64
    batch_size=32
    max_epoch_num=300

    # Replace with the path to your dataset directory
    dataset_dir=/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/SinglePatch/

    # Replace with your desired output model file path
    outModelPath=/n/scratch/users/r/roh6824/Results/CRC12image_update/SpatialAE/ln_autoencoder_multi_validate_300_model_dim32.pth

    # Train the autoencoder
    sa.aeTrainMulti(dataset_dir, outModelPath, channels, input_dim, encoding_dim, max_epoch_num, batch_size)

    ```
    """
    input_dim = input_dim*len(channels)
    model = LitAutoEncoder(input_dim, encoding_dim)
    # Load the dataset
    train_dataset = MultiChannelImageDataset(dataset_dir, channels, get_train = True)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers = num_workers, prefetch_factor = prefetch_factor)

    validate_dataset = MultiChannelImageDataset(dataset_dir, channels, get_validate = True)
    validate_loader = DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers = num_workers, prefetch_factor = prefetch_factor)
    # train the model
    trainer = pl.Trainer(max_epochs=max_epoch_num)
    trainer.fit(model, train_loader, validate_loader)
    # save the trained model
    torch.save(model.state_dict(), outModelPath)

Train 2D images

Function¶

aeTrain(dataset_dir, outModelPath, input_dim=256, encoding_dim=64, max_epoch_num=100, batch_size=8, num_workers=10, prefetch_factor=8) ¶

aeTrainMulti(dataset_dir, outModelPath, channels, input_dim=256, encoding_dim=64, max_epoch_num=100, batch_size=8, num_workers=10, prefetch_factor=8) ¶

`aeTrain(dataset_dir, outModelPath, input_dim=256, encoding_dim=64, max_epoch_num=100, batch_size=8, num_workers=10, prefetch_factor=8)` ¶

`aeTrainMulti(dataset_dir, outModelPath, channels, input_dim=256, encoding_dim=64, max_epoch_num=100, batch_size=8, num_workers=10, prefetch_factor=8)` ¶