(C) 2020 - Umberto Michelucci, Michela Sperti
Notebook Learning Goals¶
At the end of the notebook you are going to know how to address a computational problem using TensorFlow 2.X. You are going to know the most important feature of TensorFlow 2.X: Eager Execution and to become familiar with TensorFlow environment.
TensorFlow 2.X can be treated as a Python library.
# general libraries import numpy as np import matplotlib.pyplot as plt # tensorflow libraries import tensorflow as tf
The default version of TensorFlow is 2.X and eager execution is enabled by default.
# check tensorflow version print(tf.__version__)
# check eager execution tf.executing_eagerly()
What does Eager Execution mean?¶
Eager execution is the most evident change between TensorFlow 1.X and TensorFlow 2.X.
TensorFlow 1.X was designed with a static computational graph approach, meaning that, if you have to perform an operation, you must first describe what your model should do and only after that you can execute the program (inside a session, separated from the graph’s definition). TensorFlow 2.X is instead based on an imperative programming approach, meaning that you describe how your model has to obtain the result and all operations are executed immediately, returning concrete values as output.
This change made TensorFlow more pythonic, easier to understand and to use. In particular, as stated in its official documentation, eager execution provides:
an intuitive interface: you can structure your code naturally and use Python data structures;
easier debugging: you can use standard Python debugging tools for immediate error reporting;
natural control flow: you can use Python control flow instead of graph control flow, simplifying the specification of dynamic models.
Before giving an example on eager execution, let’s begin with a quick overview of TensorFlow basics.
TensorFlow Basic Usage¶
If you are familiar with NumPy, you know that its basic units are arrays. TensorFlow basic units are tensors. A tensor is a multi-dimensional array, therefore it is a generalization of vectors.
Each tensor in tensorflow is characterized by a static type (
dtype) and a dynamic dimension (
shape). This means that, once a tensor has been defined, you cannot change its type, but you can dynamically change its dimensions before evaluating it.
Usually, the dimension of a tensor is called rank. For example, the simplest tensor (of rank 0) is a scalar. An array has rank 1 and a bidimensional matrix has rank 2. Rank can be calculated as the length of a tensor’s
Let’s create some basic tensors.
# SCALAR TENSOR # This will be an int32 tensor by default; see "dtypes" below. # A scalar is a tensor without shape (rank is 0 in this case). # When printing the tensor, you see its value, its shape and its type. rank_0_tensor = tf.constant(4) print(rank_0_tensor)
tf.Tensor(4, shape=(), dtype=int32)
# ARRAY TENSOR # Let's make this a float tensor. # An array (a list of elements) is a tensor of rank 1. rank_1_tensor = tf.constant([2.0, 3.0, 4.0]) print(rank_1_tensor)
tf.Tensor([2. 3. 4.], shape=(3,), dtype=float32)
# MATRIX TENSOR # If we want to be specific, we can set the dtype (see below) at creation time. # A bidimensional matrix is a tensor of rank 2. rank_2_tensor = tf.constant([[1, 2], [3, 4], [5, 6]], dtype=tf.float16) print(rank_2_tensor)
tf.Tensor( [[1. 2.] [3. 4.] [5. 6.]], shape=(3, 2), dtype=float16)
You can go on and create n-dimensional tensors.
Moreover, you can perform all basic mathematical operations with tensors.
The three most important types of tensors are:
tf.constant and the
tf.placeholder values are, during a single-session run, immutable. Once they have a value, they will not change.
tf.Variable contains values that are going to change during
running, because, for example, they must be optimized for a specific problem.
# create a variable a = tf.Variable([2.0, 3.0]) # create another variable b based on the value of a b = tf.Variable(a) a.assign([5, 6]) # this command changes a values # Two variables will not share the same memory. # a and b are different print(a.numpy()) print(b.numpy()) # There are other versions of assign print(a.assign_add([2,3]).numpy()) # [7. 9.] print(a.assign_sub([7,9]).numpy()) # [0. 0.]
[5. 6.] [2. 3.] [7. 9.] [0. 0.]
With eager execution, TensorFlow operations are immediately evaluated and return their values to you.
tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. Therefore, it’s easy to inspect results using
print() or a debugger.
# let's define a constant tensor and print it a = tf.constant([[1, 2], [3, 4]]) print(a)
tf.Tensor( [[1 2] [3 4]], shape=(2, 2), dtype=int32)
A striking example of the useful integration of TensorFlow 2.X inside Python is represented by its compatibility with NumPy library. NumPy operations accept
tf.Tensor arguments. The TensorFlow
tf.math operations convert
Python objects and NumPy arrays to
tf.Tensor objects. The
tf.Tensor.numpy method returns the object’s value as a NumPy
# broadcasting is supported b = tf.add(a, 1) # numpy is easily integrated c = np.multiply(a, b) print(c)
[[ 2 6] [12 20]]
If you don’t know what broadcasting means, have a look at the Further Readings section.
A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. Let’s see an example of how you can perform a dynamic control flow (i.e. to dynamically execute algorithm’s instructions step by step).
Dynamic Control Flow in TensorFlow: Solving Sudoku¶
To practice a bit with simple operations between tensors using TensorFlow, we will write a recursive program to solve the famous logic game Sudoku. This is possible thanks to eager execution, that give the possibility to integrate TensorFlow code inside Python environment and execute operations immediately. Notice that the same problem could be solve using NumPy, for example.
Given a partially filled tensor of shape (9, 9), the goal is to assign digits (from 1 to 9) to the empty cells so that every row, column, and sub-tensor of shape (3, 3) contains exactly one instance of the digits from 1 to 9.
# input tensor (this is a possible example, you can change values for others) input = tf.constant([[3, 0, 6, 5, 0, 8, 4, 0, 0], [5, 2, 0, 0, 0, 0, 0, 0, 0], [0, 8, 7, 0, 0, 0, 0, 3, 1], [0, 0, 3, 0, 1, 0, 0, 8, 0], [9, 0, 0, 8, 6, 3, 0, 0, 5], [0, 5, 0, 0, 9, 0, 6, 0, 0], [1, 3, 0, 0, 0, 0, 2, 5, 0], [0, 0, 0, 0, 0, 0, 0, 7, 4], [0, 0, 5, 2, 0, 6, 3, 0, 0]])
The simplest approach is to generate all possible sets of numbers between 1 and 9 to fill all empty cells and test them (checking if the final tensor meets the required constraints).
Instead, we will solve the problem using another approach: backtracking (this technique is used to solve problems in which different constraints must be met, trying several possibilities, coming back if the solution has not been reached and trying again until end).
We define a function to print a tensor (
print_tensor), a function to check if there are empty cells left inside the tensor (
find_empty_cell), a function to check if the current assigned number meets the Sudoku’s constraints (
check_validity) and finally the recursive function that takes an input tensor and try to fill it (
def print_tensor(tensor): """Prints the tensor given as input (i.e. the sudoku grid).""" print(tensor)
def find_empty_cell(tensor): """Find an empty cell inside a tensor, if it exists, otherwise returns False.""" pos0 = tf.where(tensor == 0) # find every 0 present inside tensor if len(pos0) != 0: # an empty cell has been found return True else: # no left empty cells return False
def check_validity(tensor, i, j, d): """Checks, after assigning the current digit, if the tensor meets constraints or not.""" # a list of all initial and final indeces of the sub-tensors, # to be quickly identified inside the function subtensors = [[0,3,0,3],[0,3,3,6],[0,3,6,9], [3,6,0,3],[3,6,3,6],[3,6,6,9], [6,9,0,3],[6,9,3,6],[6,9,6,9]] # check if the current number is already present pos_row = tf.where(tensor[i,:] == d) pos_col = tf.where(tensor[:,j] == d) # check for every row and column if len(pos_row) != 0 or len(pos_col) != 0: return False # check for every sub-tensor for st in subtensors: if i >= st and i < st and j >= st and j < st: pos_sub = tf.where(tensor[st:st,st:st] == d) if len(pos_sub) != 0: return False return True # all constraints are satisfied!
def generate_elements(tensor): """Takes an input tensor and recursively try to insert an element and check tensor's validity.""" tensor_tmp = tf.Variable(tensor) # find an empty cell if not find_empty_cell(tensor_tmp): # if no empty cells are left, you have successfully filled the Sudoku! print_tensor(tensor_tmp) return True # take the first empty cell and try to fill it pos0 = tf.where(tensor_tmp == 0) i, j = pos0, pos0 # try to fill the empty cell with a number from 1 to 9, checking validity for d in range(1, 10): # check tensor's validity if check_validity(tensor_tmp, i, j, d): # if all constraints are satisfied, assigned the current element to # the current position tensor_tmp = tensor_tmp[i, j].assign(d) # backtracking (recursion): repeat X times the function itself if generate_elements(tensor_tmp): return True # if constraints are not satisfied (failure), assign a zero to the # current position tensor_tmp = tensor_tmp[i, j].assign(0) return False # continue with backtracking
<tf.Variable 'Variable:0' shape=(9, 9) dtype=int32, numpy= array([[3, 1, 6, 5, 7, 8, 4, 9, 2], [5, 2, 9, 1, 3, 4, 7, 6, 8], [4, 8, 7, 6, 2, 9, 5, 3, 1], [2, 6, 3, 4, 1, 5, 9, 8, 7], [9, 7, 4, 8, 6, 3, 1, 2, 5], [8, 5, 1, 7, 9, 2, 6, 4, 3], [1, 3, 8, 9, 4, 7, 2, 5, 6], [6, 9, 2, 3, 5, 1, 8, 7, 4], [7, 4, 5, 2, 8, 6, 3, 1, 9]], dtype=int32)>
Eager Gradients Computation¶
One of the most important step to train a neural network is weight optimization, done by finding the minimum of a loss function. The way to do this is through backpropagation algorithm (see Further Readings section for additional material). To calculate the minimum of any function you need to be able to compute gradients.
Now, we will discuss ways you can compute gradients with TensorFlow, especially in eager execution. This is an example of automatic differentiation
# During eager execution, use tf.GradientTape to trace operations # for computing gradients later w = tf.Variable(3.0) # define a tf variable and assign it the value 3 # define a function using tf.GradientTape with tf.GradientTape() as tape: loss = w * w * w # calculate gradient with respect to a specific variable (w in this case) grad = tape.gradient(loss, w) print(grad) # => tf.Tensor(27., shape=(), dtype=float32)
tf.Tensor(27.0, shape=(), dtype=float32)
Even if TensorFlow 1.X presents some disadvantages (it has a steep learning curve, it is difficult to debug, it has a counter-intuitive semantics and it is not well inegrated with Python), it is still a powerful and highly expressive tool. Therefore, it is recommended to have a deep understanding of computational graphs and the logic behind TensorFlow 1.X, since this helps to better understand TensorFlow 2.X.
[Easy Difficulty] Create a tensor of rank 5.
[Easy Difficulty] Calculate the derivative of \(y=x^2+y-z^2\) with respect to \(z\) and evaluated in \(z=2\).
https://www.tensorflow.org/guide/eager (eager execution official documentation)
Further Readings ¶
All documentation (with lots of tutorial and examples already implemented): https://numpy.org/doc/stable/
Broadcasting in NumPy (https://numpy.org/doc/stable/user/basics.broadcasting.html)
Backpropagation algorithm in neural networks
Section 6.5 of “Deep Learning” book by Ian Goodfellow, Yoshua Bengio and Aaron Courville (https://www.deeplearningbook.org/contents/mlp.html), freely available online
Baydin, Atılım Günes, et al. “Automatic differentiation in machine learning: a survey.” The Journal of Machine Learning Research 18.1 (2017): 5595-5637.