Genetic Algorithm for hyper-parameters optimization of a Convolutional Neural Network

https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164

http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf

In this post, I detail my implementation of Genetic Algorithm to find the optimum architecture of a Neural Network for classifying image of Dogs and Cats.

Geneticalgorithms are commonly used to generate high-quality solutions optimization and search problems by relying on bio-inspired operators: mutation, crossover and selection. OpenAI published a paper called Evolution Strategies as a Scalable Alternative to Reinforcement Learning where they showed that evolution strategies, while being less data efficient than RL, offer many benefits. The ability to abandon gradient calculation allows such algorithms to be evaluated more efficiently. It is also easy to distribute the computation for an ES algorithm to thousands of machines for parallel computation. By running the algorithm from scratch many times, they also showed that policies discovered using ES tend to be more diverse compared to policies discovered by RL algorithms.

I would like to point out that even for the problem of identifying a machine learning model, such as designing a neural net’s architecture, is one where we cannot directly compute gradients. While RL, Evolution, GA etc., can be applied to search in the space of model architectures, in this post, I will focus only on applying these algorithms to search for parameters of a pre-defined model.

The model is composed of a feature detector, a few blocks of convolutional layers similar to VGG16, and a fully connected classifier with the sigmoid activation function at the output. The model is build with Keras (Tensorflow backend). The Genetic Algorithm is used on the classifier only, and the architecture of the feature detector is unchanged. One additional caveat is that the models are constraint to have less than 500,000 training parameters. The motivation is 2-fold: (1) we narrow down the space search, (2) we can find the best-performing light-weight model suitable for machines with limited ressources such as cellphones, raspberry pi, etc…

the architecture of a specimen

Genetic Algorithm is a heuristic search and optimization technique inspired by natural evolution.
The basic idea is to start with a population of specimens/individuals with random attributes. Through different processes/operators, the specimens are allowed to evolve, i.e the attributes are tuned so to achieve higher performance. The performance of a specimen is evaluated using a fitness function.

For our purpose, the attributes are: the number of hidden layers (from 1 to 10), the number of hidden units (constraint so that the total number of trainable parameters is below 500,000), the activation function (ReLU, eLU, swish), the dropout value (between 0.4 and 1))`. Our choice for the fitness function is the loss of the validation set.

Implementation of the genetic algorithm in 8 steps

1) Initialization:

We build a population of 20 specimens (ConvNets), where the hyper-parameters and architecture of the classifier are randomly generated Each specimen is a dictionary of the form:

1
2
3
4
5
6
7
8
9
10
specimen = {
            "model": <keras model>, 
            "classifier_design": [{"units"; 100, "activation": "relu", dropout": 0.5}, 
                                  {"units"; 30, "activation": "relu", dropout": 0.8},
                                  ...
                                  ], 
            "score": 0.34
            }

n_population = 20

classifier_design (a list of dict()) gives the architecture of the classifier, i.e the attributes layer-by-layer: units is the number of hidden units in the layer, activation is the non-linear function to be applied , and dropout is the percentage of nodes to keep on during training. score is the loss of the validation set for the last step of the training.

2) Get the score:

Train all the specimens (ConvNets) using a small subset of the dataset (5000 images) to speedup the training, and record the loss of the validation set

1
2
3
4
5
train_data, validation_data = build_dataset(5000, 0.25)
for idx, specimen in enumerate(population):
    specimen["score"], val_acc = train_specimen(train_data, validation_data, specimen["model"], batch_sz=128, 
                                           n_epochs=n_epochs, lr=0.0001, patience=patience)
        print("score: {} | Accuracy: {}".format(specimen["score"], val_acc) )

3) Sort the specimen by increasing order of the validation loss: The loss is the cross-entropy loss calculate for the validation set:

where $j$ is the sample index. Smaller loss is better.

1
2
def sorted_specimen(self, population):
    return sorted(population, key=itemgetter('score'), reverse=False)

4) Select the 5 top specimens:

1
2
3
n_top = 5
def keep_top_specimen(self, n_top, sorted_population):
    return sorted_population[0:n_top]

5) Randomly select 5 more specimens from the population:

1
2
3
4
5
n_random_pick = 5
def keep_simple_specimen(self, n_top, n_random_pick, sorted_population):
    idx = np.arange(n_top, len(sorted_population))
    picks = np.random.choice(idx, n_random_pick)
    return [sorted_population[i] for i in picks]

6) Generate 5 ConvNets from breed operator.

Randomly pick 2 specimens (the parents), and decide what attributes to transfer to their child. The attribute to be inherit can be an entire layer or a units, activation or dropout value from one of the parents.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def breed(self, parents):

    #index of parent with most hidden layers
    deepest_parent = np.argmax( list(map(lambda x: len(x["cls_design"]), parents)))
    shallowest_parent = (not( bool(deepest_parent) )) * 1
    #Initialize child
    child_design = []   
        
    for layer_id in range( len(parents[ deepest_parent ]["cls_design"]) -1):
        layer = parents[ deepest_parent ]["cls_design"][layer_id]
            
        if layer_id > (len(parents[shallowest_parent]["cls_design"]) - 1):
            #shallowest parent does not have this layer level
            #==> randomly decide to pick layer of deepest or stop the breed
            if np.random.choice([0, 1]):
                child_design.append(layer)
            break

        else:
            child_layer = dict()
            for key in layer.keys():
                pick_gene_from = np.random.choice([deepest_parent, shallowest_parent])
                child_layer[key] = parents[pick_gene_from]["cls_design"][layer_id][key]
            child_design.append(child_layer)
        
    #append the output layer
    child_design.append({"units": 1, "activation": "sigmoid", "dropout": None})
    child = {"model": None, "cls_design": child_design, "score": None}
    return self.design2model(child)

7) Generate 5 ConvNets from mutate operator:

Randomly pick a specimen from the population and change the value of an attribute: the activation function, the number of units or the value of the dropout. The change in the number of units and dropout can be $+/- 50\%$ of the original value. Note that we clip the lower bound of the number of units to be $\geq 1$.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
      
def mutate(self, specimen):
    #pick_layer to mutate
    pick_layer = np.random.choice( np.arange( 0, len(specimen["cls_design"]) - 2, 1 ) ) #skip the last layer
    pick_key = np.random.choice( list( specimen["cls_design"][0].keys() ) )
    new_specimen = dict()
    new_specimen = specimen["cls_design"]

    if pick_key == "units":
        range_mutation = [i/10 for i in range(5, 16, 1) if i!=10]
        mutation_factor = 1
        while mutation_factor == 1: 
            #insure that the value is changed and allows +/-50% mutation in units
            mutation_factor = np.random.choice( range_mutation  )
            mutation = int(specimen["cls_design"][pick_layer][pick_key] * mutation_factor)
        mutation = np.clip( mutation, 1, None ) #units cannot be below 1
        new_specimen["cls_design"][pick_layer]["units"] = mutation
        #reinitialize model and score for the new specimen
        #build and add keras model to the specimen 
        new_specimen = self.design2model( new_specimen )
        return new_specimen

    elif pick_key == "dropout":
        new_specimen["cls_design"] = specimen["cls_design"]
        mutation_factor = 1
        range_mutation = [i/10 for i in range(5, 16, 1) if i!=10]
        mutation = specimen["cls_design"][pick_layer][pick_key]
        while mutation == specimen["cls_design"][pick_layer][pick_key]:
            mutation_factor = np.random.choice(  range_mutation )
            mutation = specimen["cls_design"][pick_layer][pick_key] * mutation_factor
            mutation = np.clip( mutation, 0.4, 1 )
        new_specimen["cls_design"][pick_layer]["dropout"] = mutation
        #build and add keras model to the specimen 
        new_specimen = self.design2model( new_specimen )
        return new_specimen

    else:
        new_specimen["cls_design"] = specimen["cls_design"]
        old_gene =  specimen["cls_design"][pick_layer]["activation"]
        mutation = old_gene
        while mutation == old_gene:
            mutation = np.random.choice(self.configs.list_activations())
        new_specimen["cls_design"][pick_layer]["activation"] = mutation
        #build and add keras model to the specimen 
        new_specimen = self.design2model( new_specimen )
        return new_specimen

8) Build a new population:

The new population is made of the specimens created by the 4 operators: select best, select random, breed and mutate, and has the same size as the previous population.

We then iterate step 2 through step 8 with the new population.

Results

Genetic algorithms help diversity by keeping track of a diverse set of candidate solutions to reproduce the next generation. However, in practice, most of the solutions in the elite surviving population tend to converge to a local optimum over time. There are more sophisticated variations of GA out there, such as CoSyNe, ESP, and NEAT, where the idea is to cluster similar solutions in the population together into different species, to maintain better diversity over time.

One of the most common uses for resistors is to limit the current flowing through an electronic component. Some components, such as light-emitting diodes, are very sensitive to current. A few milliamps of current is enough to make an LED glow; a few hundred milliamps is enough to destroy the LED.

Project 2-1 shows you how to build a simple circuit that demonstrates how a resistor can be used to limit current to an LED.

https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164 https://www.youtube.com/watch?v=1i8muvzZkPw https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html https://ac.els-cdn.com/S0377042705000774/1-s2.0-S0377042705000774-main.pdf?_tid=073f74ff-3596-4291-901c-898f01cb75fc&acdnat=1523809306_304c126235c248115154b47b62baaefd https://lethain.com/genetic-algorithms-cool-name-damn-simple/ https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164

ALTERNATIVE

https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

http://blog.otoro.net/2017/10/29/visual-evolution-strategies/