Eager Execution

Version 1.0

(C) 2020 - Umberto Michelucci, Michela Sperti

This notebook is part of the book Applied Deep Learning: a case based approach, 2nd edition from APRESS by U. Michelucci and M. Sperti.

Notebook Learning Goals

At the end of the notebook you are going to know how to address a computational problem using TensorFlow 2.X. You are going to know the most important feature of TensorFlow 2.X: Eager Execution and to become familiar with TensorFlow environment.

TensorFlow Setup

TensorFlow 2.X can be treated as a Python library.

# general libraries
import numpy as np
import matplotlib.pyplot as plt

# tensorflow libraries
import tensorflow as tf

The default version of TensorFlow is 2.X and eager execution is enabled by default.

# check tensorflow version
print(tf.__version__)
2.3.0
# check eager execution
tf.executing_eagerly()
True

What does Eager Execution mean?

Eager execution is the most evident change between TensorFlow 1.X and TensorFlow 2.X.

TensorFlow 1.X was designed with a static computational graph approach, meaning that, if you have to perform an operation, you must first describe what your model should do and only after that you can execute the program (inside a session, separated from the graph’s definition). TensorFlow 2.X is instead based on an imperative programming approach, meaning that you describe how your model has to obtain the result and all operations are executed immediately, returning concrete values as output.

This change made TensorFlow more pythonic, easier to understand and to use. In particular, as stated in its official documentation, eager execution provides:

  • an intuitive interface: you can structure your code naturally and use Python data structures;

  • easier debugging: you can use standard Python debugging tools for immediate error reporting;

  • natural control flow: you can use Python control flow instead of graph control flow, simplifying the specification of dynamic models.

Before giving an example on eager execution, let’s begin with a quick overview of TensorFlow basics.

TensorFlow Basic Usage

Tensors

If you are familiar with NumPy, you know that its basic units are arrays. TensorFlow basic units are tensors. A tensor is a multi-dimensional array, therefore it is a generalization of vectors.

Each tensor in tensorflow is characterized by a static type (dtype) and a dynamic dimension (shape). This means that, once a tensor has been defined, you cannot change its type, but you can dynamically change its dimensions before evaluating it.

Usually, the dimension of a tensor is called rank. For example, the simplest tensor (of rank 0) is a scalar. An array has rank 1 and a bidimensional matrix has rank 2. Rank can be calculated as the length of a tensor’s shape.

Let’s create some basic tensors.

# SCALAR TENSOR
# This will be an int32 tensor by default; see "dtypes" below.
# A scalar is a tensor without shape (rank is 0 in this case).
# When printing the tensor, you see its value, its shape and its type.
rank_0_tensor = tf.constant(4)
print(rank_0_tensor)
tf.Tensor(4, shape=(), dtype=int32)
# ARRAY TENSOR
# Let's make this a float tensor.
# An array (a list of elements) is a tensor of rank 1.
rank_1_tensor = tf.constant([2.0, 3.0, 4.0])
print(rank_1_tensor)
tf.Tensor([2. 3. 4.], shape=(3,), dtype=float32)
# MATRIX TENSOR
# If we want to be specific, we can set the dtype (see below) at creation time.
# A bidimensional matrix is a tensor of rank 2.
rank_2_tensor = tf.constant([[1, 2],
                             [3, 4],
                             [5, 6]], dtype=tf.float16)
print(rank_2_tensor)
tf.Tensor(
[[1. 2.]
 [3. 4.]
 [5. 6.]], shape=(3, 2), dtype=float16)

You can go on and create n-dimensional tensors.

Moreover, you can perform all basic mathematical operations with tensors.

Variables

The three most important types of tensors are:

  • tf.constant

  • tf.Variable

  • tf.placeholder

The tf.constant and the tf.placeholder values are, during a single-session run, immutable. Once they have a value, they will not change.

A tf.Variable contains values that are going to change during running, because, for example, they must be optimized for a specific problem.

# create a variable
a = tf.Variable([2.0, 3.0])
# create another variable b based on the value of a
b = tf.Variable(a)
a.assign([5, 6]) # this command changes a values
# Two variables will not share the same memory.

# a and b are different
print(a.numpy())
print(b.numpy())

# There are other versions of assign
print(a.assign_add([2,3]).numpy())  # [7. 9.]
print(a.assign_sub([7,9]).numpy())  # [0. 0.]
[5. 6.]
[2. 3.]
[7. 9.]
[0. 0.]

With eager execution, TensorFlow operations are immediately evaluated and return their values to you.

tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. Therefore, it’s easy to inspect results using print() or a debugger.

# let's define a constant tensor and print it
a = tf.constant([[1, 2], [3, 4]])
print(a)
tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)

A striking example of the useful integration of TensorFlow 2.X inside Python is represented by its compatibility with NumPy library. NumPy operations accept tf.Tensor arguments. The TensorFlow tf.math operations convert Python objects and NumPy arrays to tf.Tensor objects. The tf.Tensor.numpy method returns the object’s value as a NumPy ndarray.

# broadcasting is supported
b = tf.add(a, 1)
# numpy is easily integrated
c = np.multiply(a, b)
print(c)
[[ 2  6]
 [12 20]]

If you don’t know what broadcasting means, have a look at the Further Readings section.

A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. Let’s see an example of how you can perform a dynamic control flow (i.e. to dynamically execute algorithm’s instructions step by step).

Dynamic Control Flow in TensorFlow: Solving Sudoku

To practice a bit with simple operations between tensors using TensorFlow, we will write a recursive program to solve the famous logic game Sudoku. This is possible thanks to eager execution, that give the possibility to integrate TensorFlow code inside Python environment and execute operations immediately. Notice that the same problem could be solve using NumPy, for example.

Sudoku Problem

Given a partially filled tensor of shape (9, 9), the goal is to assign digits (from 1 to 9) to the empty cells so that every row, column, and sub-tensor of shape (3, 3) contains exactly one instance of the digits from 1 to 9.

# input tensor (this is a possible example, you can change values for others)
input = tf.constant([[3, 0, 6, 5, 0, 8, 4, 0, 0], 
                     [5, 2, 0, 0, 0, 0, 0, 0, 0], 
                     [0, 8, 7, 0, 0, 0, 0, 3, 1], 
                     [0, 0, 3, 0, 1, 0, 0, 8, 0], 
                     [9, 0, 0, 8, 6, 3, 0, 0, 5], 
                     [0, 5, 0, 0, 9, 0, 6, 0, 0], 
                     [1, 3, 0, 0, 0, 0, 2, 5, 0], 
                     [0, 0, 0, 0, 0, 0, 0, 7, 4], 
                     [0, 0, 5, 2, 0, 6, 3, 0, 0]])

The simplest approach is to generate all possible sets of numbers between 1 and 9 to fill all empty cells and test them (checking if the final tensor meets the required constraints).

Instead, we will solve the problem using another approach: backtracking (this technique is used to solve problems in which different constraints must be met, trying several possibilities, coming back if the solution has not been reached and trying again until end).

We define a function to print a tensor (print_tensor), a function to check if there are empty cells left inside the tensor (find_empty_cell), a function to check if the current assigned number meets the Sudoku’s constraints (check_validity) and finally the recursive function that takes an input tensor and try to fill it (generate_elements).

def print_tensor(tensor):
  """Prints the tensor given as input (i.e. the sudoku grid)."""
  print(tensor)
def find_empty_cell(tensor):
  """Find an empty cell inside a tensor, if it exists,
  otherwise returns False."""
  pos0 = tf.where(tensor == 0) # find every 0 present inside tensor
  if len(pos0) != 0: # an empty cell has been found
    return True
  else: # no left empty cells
    return False
def check_validity(tensor, i, j, d):
  """Checks, after assigning the current digit, if the tensor 
  meets constraints or not."""
  # a list of all initial and final indeces of the sub-tensors,
  # to be quickly identified inside the function
  subtensors = [[0,3,0,3],[0,3,3,6],[0,3,6,9],
                [3,6,0,3],[3,6,3,6],[3,6,6,9],
                [6,9,0,3],[6,9,3,6],[6,9,6,9]]
  # check if the current number is already present
  pos_row = tf.where(tensor[i,:] == d) 
  pos_col = tf.where(tensor[:,j] == d) 
  # check for every row and column
  if len(pos_row) != 0 or len(pos_col) != 0:
        return False
  # check for every sub-tensor
  for st in subtensors:
    if i >= st[0] and i < st[1] and j >= st[2] and j < st[3]:
      pos_sub = tf.where(tensor[st[0]:st[1],st[2]:st[3]] == d)
      if len(pos_sub) != 0:
        return False
  return True # all constraints are satisfied!
def generate_elements(tensor):
  """Takes an input tensor and recursively try to insert an element
  and check tensor's validity."""
  tensor_tmp = tf.Variable(tensor)
  # find an empty cell
  if not find_empty_cell(tensor_tmp):
    # if no empty cells are left, you have successfully filled the Sudoku!
    print_tensor(tensor_tmp) 
    return True
  # take the first empty cell and try to fill it
  pos0 = tf.where(tensor_tmp == 0)
  i, j = pos0[0][0], pos0[0][1]
  # try to fill the empty cell with a number from 1 to 9, checking validity
  for d in range(1, 10):
    # check tensor's validity
    if check_validity(tensor_tmp, i, j, d):
      # if all constraints are satisfied, assigned the current element to
      # the current position
      tensor_tmp = tensor_tmp[i, j].assign(d)
      # backtracking (recursion): repeat X times the function itself
      if generate_elements(tensor_tmp):
        return True
      # if constraints are not satisfied (failure), assign a zero to the
      # current position
      tensor_tmp = tensor_tmp[i, j].assign(0)
  return False # continue with backtracking 
generate_elements(input)
<tf.Variable 'Variable:0' shape=(9, 9) dtype=int32, numpy=
array([[3, 1, 6, 5, 7, 8, 4, 9, 2],
       [5, 2, 9, 1, 3, 4, 7, 6, 8],
       [4, 8, 7, 6, 2, 9, 5, 3, 1],
       [2, 6, 3, 4, 1, 5, 9, 8, 7],
       [9, 7, 4, 8, 6, 3, 1, 2, 5],
       [8, 5, 1, 7, 9, 2, 6, 4, 3],
       [1, 3, 8, 9, 4, 7, 2, 5, 6],
       [6, 9, 2, 3, 5, 1, 8, 7, 4],
       [7, 4, 5, 2, 8, 6, 3, 1, 9]], dtype=int32)>
True

Eager Gradients Computation

One of the most important step to train a neural network is weight optimization, done by finding the minimum of a loss function. The way to do this is through backpropagation algorithm (see Further Readings section for additional material). To calculate the minimum of any function you need to be able to compute gradients.

Now, we will discuss ways you can compute gradients with TensorFlow, especially in eager execution. This is an example of automatic differentiation

# During eager execution, use tf.GradientTape to trace operations 
# for computing gradients later
w = tf.Variable(3.0) # define a tf variable and assign it the value 3
# define a function using tf.GradientTape
with tf.GradientTape() as tape:
  loss = w * w * w 
# calculate gradient with respect to a specific variable (w in this case)
grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor(27., shape=(), dtype=float32)
tf.Tensor(27.0, shape=(), dtype=float32)

Even if TensorFlow 1.X presents some disadvantages (it has a steep learning curve, it is difficult to debug, it has a counter-intuitive semantics and it is not well inegrated with Python), it is still a powerful and highly expressive tool. Therefore, it is recommended to have a deep understanding of computational graphs and the logic behind TensorFlow 1.X, since this helps to better understand TensorFlow 2.X.

Exercises

  1. [Easy Difficulty] Create a tensor of rank 5.

  2. [Easy Difficulty] Calculate the derivative of \(y=x^2+y-z^2\) with respect to \(z\) and evaluated in \(z=2\).

References

  1. https://www.tensorflow.org/guide/eager (eager execution official documentation)

Further Readings

NumPy package

  1. All documentation (with lots of tutorial and examples already implemented): https://numpy.org/doc/stable/

  2. Broadcasting in NumPy (https://numpy.org/doc/stable/user/basics.broadcasting.html)

Backpropagation algorithm in neural networks

  1. Section 6.5 of “Deep Learning” book by Ian Goodfellow, Yoshua Bengio and Aaron Courville (https://www.deeplearningbook.org/contents/mlp.html), freely available online

Automatic differentiation

  1. Baydin, Atılım Günes, et al. “Automatic differentiation in machine learning: a survey.” The Journal of Machine Learning Research 18.1 (2017): 5595-5637.