Counting with Recurrent Neural Networks

Version 1.0

(C) 2020 - Umberto Michelucci, Michela Sperti

This notebook is part of the book Applied Deep Learning: a case based approach, 2nd edition from APRESS by U. Michelucci and M. Sperti.

The purpose of this notebook is to give you a very simple example of application of Recurrent Neural Networks (RNNs) and specifically a RNN which is able to count how many 1s there are in a vector containing both 0s and 1s.

Notebook Learning Goals

At the end of the notebook you are going to know which is the most basic structure of a RNN. Moreover, you will be able to apply it in other similar problems that you may encounter further.

Libraries Import

import numpy as np
import tensorflow as tf
from random import shuffle
from tensorflow import keras
from tensorflow.keras import layers

We will now create \(10^5\) vectors made of 15 elements each, containing only 1 and 0 values.

We want to have all possible combination of 1 and 0. An easy way to do this is by taking all numbers up to \(2^{15}\) in binary format. To understand why, consider the following simpler example, in which we will generate every possible combination of four 0 and 1, so we will consider every number up to \(2^4\).

Explanation of Data Preparation

 ['{0:04b}'.format(i) for i in range(2**4)]
['0000',
 '0001',
 '0010',
 '0011',
 '0100',
 '0101',
 '0110',
 '0111',
 '1000',
 '1001',
 '1010',
 '1011',
 '1100',
 '1101',
 '1110',
 '1111']

The above code simply format all numbers that you get with the range(2**4) function from 0 to 2**4 in binary format with {0:04b}, limiting the number of digits to 4.

For our example we will simply do it with 15 digits, that means we will do it with numbers up to 2**15.

Data Preparation

nn = 15
ll = 2**15
train_input = ['{0:015b}'.format(i) for i in range(ll)] # consider every number up to 2^15 in binary format
shuffle(train_input) # shuffle inputs
train_input = [map(int, i) for i in train_input]
ti  = []
for i in train_input:
  temp_list = []
  for j in i:
    temp_list.append([j])
  ti.append(np.array(temp_list))
train_input = ti

The code above is there to simply transform a string like ‘0100’ in a list [0,1,0,0] and then concatenate all the lists with all the possible combinations.

Then we prepare the target variable, a one-hot encoded version of the counts. That means that if we have an input with four 1s in the vector our target vector will look like [0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0].

Targets Preparation

train_output = []
 
for i in train_input:
    count = 0
    for j in i:
        if j[0] == 1:
            count += 1
    temp_list = ([0]*(nn + 1))
    temp_list[count] = 1
    train_output.append(temp_list)

Dataset Splitting

NUM_EXAMPLES = ll - 2000
test_input = train_input[NUM_EXAMPLES:]
test_output = train_output[NUM_EXAMPLES:] # everything beyond 10,000
 
train_input = train_input[:NUM_EXAMPLES]
train_output = train_output[:NUM_EXAMPLES] # till 10,000

Network Building

model = keras.Sequential()

model.add(layers.Embedding(input_dim = 15, output_dim = 15))

# Add a LSTM layer with 128 internal units.
model.add(layers.LSTM(24, input_dim = 15))

# Add a Dense layer with 10 units.
model.add(layers.Dense(16, activation = 'softmax'))

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['categorical_accuracy'])

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, None, 15)          225       
_________________________________________________________________
lstm (LSTM)                  (None, 24)                3840      
_________________________________________________________________
dense (Dense)                (None, 16)                400       
=================================================================
Total params: 4,465
Trainable params: 4,465
Non-trainable params: 0
_________________________________________________________________

Network Training

# we need to convert the input and output to numpy array to be used by the network
train_input = np.array(train_input)
train_output = np.array(train_output)

test_input = np.array(test_input)
test_output = np.array(test_output)
model.fit(train_input, train_output, validation_data = (test_input, test_output), epochs = 10, batch_size = 100)
Epoch 1/10
308/308 [==============================] - 4s 9ms/step - loss: 1.9441 - categorical_accuracy: 0.3063 - val_loss: 1.1784 - val_categorical_accuracy: 0.6840
Epoch 2/10
308/308 [==============================] - 2s 7ms/step - loss: 0.7472 - categorical_accuracy: 0.8332 - val_loss: 0.4515 - val_categorical_accuracy: 0.9270
Epoch 3/10
308/308 [==============================] - 2s 7ms/step - loss: 0.3311 - categorical_accuracy: 0.9554 - val_loss: 0.2360 - val_categorical_accuracy: 0.9630
Epoch 4/10
308/308 [==============================] - 2s 7ms/step - loss: 0.1921 - categorical_accuracy: 0.9658 - val_loss: 0.1530 - val_categorical_accuracy: 0.9675
Epoch 5/10
308/308 [==============================] - 2s 7ms/step - loss: 0.1306 - categorical_accuracy: 0.9760 - val_loss: 0.1071 - val_categorical_accuracy: 0.9775
Epoch 6/10
308/308 [==============================] - 2s 7ms/step - loss: 0.0937 - categorical_accuracy: 0.9824 - val_loss: 0.0778 - val_categorical_accuracy: 0.9870
Epoch 7/10
308/308 [==============================] - 2s 7ms/step - loss: 0.0696 - categorical_accuracy: 0.9905 - val_loss: 0.0586 - val_categorical_accuracy: 0.9930
Epoch 8/10
308/308 [==============================] - 2s 7ms/step - loss: 0.0533 - categorical_accuracy: 0.9921 - val_loss: 0.0446 - val_categorical_accuracy: 0.9945
Epoch 9/10
308/308 [==============================] - 2s 7ms/step - loss: 0.0422 - categorical_accuracy: 0.9924 - val_loss: 0.0367 - val_categorical_accuracy: 0.9960
Epoch 10/10
308/308 [==============================] - 2s 7ms/step - loss: 0.0346 - categorical_accuracy: 0.9943 - val_loss: 0.0301 - val_categorical_accuracy: 0.9955
<tensorflow.python.keras.callbacks.History at 0x7f6b7b3bd990>

After just 10 epochs the network is right in 99% of the cases. Just let it run for more epochs and you can reach incredible precision.

Exercises

  1. [Medium Difficulty] Try to train a fully connected network (as the ones we have discussed so far) to count and compare it with the RNN we have seen during this Chapter. You will see how this is not possible.