Spectral Hyper-parameter Optimization

Starting with version 2.0, NeoPulse® AI Studio now makes available one of the state-of-the-art automatic hyper-parameter optimization algorithms, Spectral Optimization. Users can now provide a list of hyper-parameters for the Spectral Optimization algorithm to optimize over during an optimization stage. Then the model will be trained with the best combination of hyper-parameters for the specified number of epochs.

To use spectral hyper-parameter optimization, users need to define only an architecture construct and a train_opt construct (along with any indirect oracle hints). The architecture construct for spectral optimization is defined the same way as in any other NML file.

The train_opt construct

The train_opt construct combines all the information in source and train constructs as in other NML files. The main difference is the hyper_opt block, where users provide lists of values for each hyper-parameter to be optimized.

NOTE: As of version 2.0, training with multiple GPUs is now supported. To train with multiple GPUs, simply add the key Ngpu and specify the number of GPUs to the train_opt construct, as shown below.

train_opt Ngpu 2:

The hyper_opt block

The hyper_opt block defines the hyper-parameters that the spectral optimization algorithm will compare. Now AI studio currently only supports optimization over the following hyper-parameters: optimizer, learning rate, momentum, decay rate, and batch size. If choices of any of them are not given, system will use a default choices list for that hyperparameter.

To set any of the five hyper-parameters to be single fixed value, declare the hyper-parameter as a list of one choice. For multiple GPUs to optimize and train, they can just set value "n_gpus" as the number they want. The default value of "n_gpus" is 1.

Optimizer

Optimizer choices are declared by key word "opt_options". Now, the optimizers supported by AI studio includes "sgd", "rmsprop", "adam", and "adamax". The default choices list of opt_options is:

opt_options = ['sgd', 'rmsprop', 'adam', 'adamax'];

Learning rate

Learning rate choices are declared by key word "lr_options", and the choices of "lr_options" could be any positive number. The default choices list of lr_options is:

lr_options = [0.3, 0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.0001];

Momentum

Momentum choices are declared by key word "momentum_options", and the choices of "momentum_options" could be any positive number between 0 and 1. The default choices list of momentum_options is:

momentum_options = [0.99, 0.9, 0];

Decay rate

Decay rate choices are declared by key word "decay_options", and the choices of "decay_options" could be any positive number. The default choices list of decay_options is:

decay_options = [1e-4, 0];

Batch size

Batch size choices are declared by key word "batch_options", and the choices of "batch_options" could be any positive integer. The default choices list of batch_options is:

batch_options = [32, 64, 128, 256];

NOTE: When optimizing over batch size, make sure that the batch sizes are all small enough so that an entire batch of data will fit in memory on the GPU.

An example of "train_opt" construct is shown as below.

train_opt:
    bind = "/Users/Downloads/spectro_test/cifar10_small.csv" ;
    input:
        x ~ from "image"
            -> image: [shape = [32,32], channels = 3]
            -> ImageDataGenerator: [];

    output:
        y ~ from "output" -> flat: [10] -> FlatDataGenerator: [] ;

    hyper_opt:
        opt_options = ['sgd', 'adam', 'adamax'],
        lr_options = [0.03, 0.01, 0.003, 0.001, 0.0003],
        momentum_options = [0.99, 0.9, 0.0],
        decay_options = [0.0];

    params:
        shuffle = True,
        shuffle_init = True;

    compile:
        loss = 'categorical_crossentropy',
        metrics = ['accuracy'] ;

    run:
        epochs = 10;

dashboard: ;