Deep learning and the switch to Python
The convolutional neural network AlexNet brought about a revolution in the AI community back in 2012 just as deep learning’s importance was dawning upon a broader community. The surge in deep learning (DL) led to the need for proper programming support in the form of libraries.
Earlier solutions included Torch, a library written in Lua, and Caffe (2013), written in C++. Over time, the DL community realized the limitations of Lua and decided to switch to Python.
As a result, Theano – a library for numerical computing – was upgraded to allow deep learning support in 2013. Theano was received warmly, as users were able to appreciate a number of Python features and a NumPy-like style and interface.
Check out Python and machine learning, if you need a primer on the basics.
TensorFlow
Since Theano was maintained by an academic group, industry giants like Google decided to step in just as Keras was introduced in early 2015. Keras was focused on providing a simplified high-level interface for designing deep models. It was soon followed by TensorFlow in the same year.
TensorFlow, based on the internally used DistBelief, was the first major DL library. And given the rapidly evolving history of deep learning, it makes one wonder how it has stood the test of time.
The secret lies in a number of factors, from Google’s continuous support and scalability to some very cool features. We'll cover them in detail shortly.
PyTorch
Meta (then still just Facebook) soon followed suit and upgraded Caffe to PyTorch in 2016. It was challenging to introduce a new DL library in the towering presence of TensorFlow. The initial reception was cool and adoption slow, but gradually, PyTorch barged its way in thanks to some of its exciting features. Despite all the front-end support for Keras, PyTorch has continued to make inroads in the DL community. And today, the choice any DL engineer or researcher has to make is the one between TensorFlow and PyTorch.
PyTorch vs. TensorFlow
PyTorch or TensorFlow? This is a pressing question, and the answer requires some analysis of the problem at hand and an in-depth comparison between the two. We'll try to do the topic justice. Let’s begin.
Tensors
In numerical computing, and especially DL, we deal with tensors. Tensors are nothing more than a generalized form of matrices that allow higher dimensions. Unlike traditional Python collections, all the elements in a tensor have the same datatype.
In PyTorch, we can initialize a tensor like so:
<tensor> = torch.tensor(<collection/NumPy Array>)
Similarly, in TensorFlow, we can convert an existing collection or a NumPy array into a tensor as follows:
<tensor> = tf.convert_to_tensor(<collection/NumPy Array>)
Note: For ease of reference, we'll use the nomenclature of pyTensor..
for PyTorch tensors and tfTensor...
for the TensorFlow ones.
import torch
import numpy as np
import TensorFlow as tf
print(tf.__version__)
print(torch.__version__)
2.13.0
2.0.1
A = np.ones((1,10))
pyTensorA = torch.tensor(A)
tfTensorA = tf.convert_to_tensor(A)
print(type(pyTensorA))
print(type(tfTensorA))
<class 'torch.Tensor'>
<class 'TensorFlow.python.framework.ops.EagerTensor'>
There isn't really any difference, as both TensorFlow and PyTorch are exclusive to each other and are expected to have different implementation classes.
1. Jagged arrays support
Neither PyTorch nor TensorFlow natively support jagged arrays. Arrays are required to have the same dimensions across the axes.
listB = [[2,3],[4,5,6],[12,13,14,15]]
listB
[[2, 3], [4, 5, 6], [12, 13, 14, 15]]
tfTensorB = tf.convert_to_tensor(listB)
tfTensorB
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 tfTensorB = tf.convert_to_tensor(listB)
2 tfTensorB
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Can't convert non-rectangular Python sequence to Tensor.
pyTensorB = torch.tensor(listB)
pyTensorB
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 1
----> 1 pyTensorB = torch.tensor(listB)
2 pyTensorB
ValueError: expected sequence of length 2 at dim 1 (got 3)
But there's more to it than that. TensorFlow has the support of RaggedTensor
and SparseTensor
and they allow us to have jagged tensors.
raggedTensorB = tf.ragged.constant(listB)
raggedTensorB
<tf.RaggedTensor [[2, 3], [4, 5, 6], [12, 13, 14, 15]]>
On the other hand, PyTorch took inspiration by implementing the nested_tensor.
nestedTensorB = torch.nested.nested_tensor(listB)
nestedTensorB
/tmp/ipykernel_4077/337460509.py:1: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /croot/pytorch_1686931851744/work/aten/src/ATen/NestedTensorImpl.cpp:177.)
nestedTensorB = torch.nested.nested_tensor(listB)
nested_tensor([
tensor([2, 3]),
tensor([4, 5, 6]),
tensor([12, 13, 14, 15])
])
nested_tensor
is still in its beta phase. On the other hand, TensorFlow’s RaggedTensor
is much more stable and mature and recommended over the nestedtensor
.2. Sparse arrays
Sparse matrices and their handling is an important problem in computer science. We have to minimize the storage while also optimizing the data retrieval/storage efficiency.
sparseList = [[0,0,0],[0,1,0],[0,0,1],[0,0,0],[1,0,0]]
tfSparseTensor = tf.convert_to_tensor(sparseList)
tfSparseTensor
<tf.Tensor: shape=(5, 3), dtype=int32, numpy=
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]], dtype=int32)>
This is OK, but not a scalable solution. A better approach is to use TensorFlow’s SparseTensor
for this purpose. Its syntax is:
<sparseTensor> = tf.sparse.from_dense(<denseTensor>)
tfOptimizedSparseTensor = tf.sparse.from_dense(tfSparseTensor)
tfOptimizedSparseTensor
SparseTensor(indices=tf.Tensor(
[[1 1]
[2 2]
[4 0]], shape=(3, 2), dtype=int64), values=tf.Tensor([1 1 1], shape=(3,), dtype=int32), dense_shape=tf.Tensor([5 3], shape=(2,), dtype=int64))
We can convert it back into a dense tensor using:
tf.sparse.to_dense()
:
tf.sparse.to_dense(tfOptimizedSparseTensor)
<tf.Tensor: shape=(5, 3), dtype=int32, numpy=
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]], dtype=int32)>
Inspired by this, PyTorch has also released its beta torch.sparse
. We can convert a tensor into a sparse one using this:
<tensor>.to_sparse()
pySparseTensor = torch.tensor(sparseList)
pySparseTensor
tensor([[0, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]])
pyOptimizedSparseTensor = pySparseTensor.to_sparse()
pyOptimizedSparseTensor
tensor(indices=tensor([[1, 2, 4],
[1, 2, 0]]),
values=tensor([1, 1, 1]),
size=(5, 3), nnz=3, layout=torch.sparse_coo)
Support for different sparse formats
If we look closely at the tensor’s output above, it shows sparse_coo
as the layout. The coordinate list (COO) format is the default format used in both PyTorch and TensorFlow. However, this is where PyTorch has a clear edge as it supports other formats too, like BSC, BSR, and CSC. Covering the details of these formats is beyond the scope of this tutorial, but curious readers are encouraged to check the details.
3. Strings
While TensorFlow allows strings, PyTorch doesn’t.
listC = ["Ali","Bilal"]
tfTensorC = tf.convert_to_tensor(listC)
tfTensorC
<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'Ali', b'Bilal'], dtype=object)>
pyTensorC = torch.tensor(listC)
pyTensorC
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[17], line 1
----> 1 pyTensorC = torch.tensor(listC)
2 pyTensorC
ValueError: too many dimensions 'str'
4. User-defined classes
To avoid complicating things too much, both libraries avoid user-defined types.
class MyClass:
a = 0
b = 1
obj1 = MyClass()
obj2 = MyClass()
objectsList = [obj1,obj2]
pyTensorObjects = torch.tensor(objectsList)
pyTensorObjects
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[20], line 1
----> 1 pyTensorObjects = torch.tensor(objectsList)
2 pyTensorObjects
RuntimeError: Could not infer dtype of MyClass
tfTensorObjects = tf.convert_to_tensor(objectsList)
tfTensorObjects
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[21], line 1
----> 1 tfTensorObjects = tf.convert_to_tensor(objectsList)
2 tfTensorObjects
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (<__main__.MyClass object at 0x7ff9586b29d0>) with an unsupported type (<class '__main__.MyClass'>) to a Tensor.
5. Complex numbers
Python has built-in support for complex
numbers, and they're supported by both TensorFlow and PyTorch.
c1 = complex(1,3)
c2 = complex(2,-1)
listComplex = [c1,c2]
tfTensorComplex = tf.convert_to_tensor(listComplex)
tfTensorComplex
<tf.Tensor: shape=(2,), dtype=complex128, numpy=array([1.+3.j, 2.-1.j])>
pyTensorComplex = torch.tensor(listComplex)
pyTensorComplex
tensor([1.+3.j, 2.-1.j])
tuple1 = ((1,2),(3,4))
tfTensorTuple = tf.convert_to_tensor(tuple1)
pyTensorTuple = torch.tensor(tuple1)
6. Data types
We mentioned above that all the members of a tensor are supposed to be of the same size. We've also seen that PyTorch allows only numeric types (including complex
), while TensorFlow supports string types as well.
Another difference between the two lies in the data types they support. PyTorch implements Python’s intrinsic numeric types (like float32
and int64
, for example). TensorFlow does the same, but it wraps them, and hence we need conversions while integrating with Python. PyTorch, on the other hand, is seamless.
Data handling
Arguably, the most important part of any ML/DL project is data processing. Data comes in many forms, from unstructured text to video clips, and requires a proper framework.
In TensorFlow, we use the tf.data
API for input pipelines, while PyTorch uses the torch.data
API for the same purpose.
1. Dataset
A dataset is an abstraction for a collection of data handled in a sequential manner. We can make a dataset from any Python collection (or NumPy array) as follows:
listD = [[1,10,8],[21,32,-3],[34,21,0],[21,5,-2],[12,3,9]]
tfDataSetD = tf.data.Dataset.from_tensor_slices(listD)
tfDataSetD
<_TensorSliceDataset element_spec=TensorSpec(shape=(3,), dtype=tf.int32, name=None)>
PyTorch serves the same purpose using its own implementation. We can use TensorDataset
there.
from torch.utils.data import Tensor
DatasettensorD = torch.tensor(listD)
pyDataSetD = TensorDataset(tensorD)
pyDataSetD
<torch.utils.data.dataset.TensorDataset at 0x7ff95850c7d0>
Both TensorFlow and PyTorch datasets are iterable.
for x in tfDataSetD:
print(x)
tf.Tensor([ 1 10 8], shape=(3,), dtype=int32)
tf.Tensor([21 32 -3], shape=(3,), dtype=int32)
tf.Tensor([34 21 0], shape=(3,), dtype=int32)
tf.Tensor([21 5 -2], shape=(3,), dtype=int32)
tf.Tensor([12 3 9], shape=(3,), dtype=int32)
for x in pyDataSetD:
print(x)
(tensor([ 1, 10, 8]),)
(tensor([21, 32, -3]),)
(tensor([34, 21, 0]),)
(tensor([21, 5, -2]),)
(tensor([12, 3, 9]),)
tfIterator = iter(tfDataSetD)
print(next(tfIterator))
print(next(tfIterator))
tf.Tensor([ 1 10 8], shape=(3,), dtype=int32)
tf.Tensor([21 32 -3], shape=(3,), dtype=int32)
pyIterator = iter(pyDataSetD)
print(next(pyIterator))
print(next(pyIterator))
(tensor([ 1, 10, 8]),)
(tensor([21, 32, -3]),)
Creating a dataset is a basic step. The real magic happens once we take it and perform transformations on it. Let’s learn some of that magic.
2. Batching
Batching is a pretty essential component in which we divide the dataset between batches. This division is based on a number of parameters, including memory, the number of data samples in the dataset, and the degree of flexibility between Gradient Descent and SGD.
For batching in TensorFlow, we can simply apply batch(<batch size>)
.
tfBatchedDataset = tfDataSetD.batch(2)
for batch in tfBatchedDataset:
print(batch)
tf.Tensor(
[[ 1 10 8]
[21 32 -3]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[34 21 0]
[21 5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[12 3 9]], shape=(1, 3), dtype=int32)
As you may have noticed, it returns the single data sample left in the last iteration. So don’t panic or expect an error if you're dividing a dataset of N samples into B batches, even if B is not a factor of N.
Batching is automatically set in PyTorch. We can make a data batch by using its DataLoader
.
from torch.utils.data import DataLoader
pyBatchedDataset = DataLoader(pyDataSetD, batch_size=3, shuffle=True)
for batch in pyBatchedDataset:
print(batch)
[tensor([[34, 21, 0],
[21, 5, -2],
[21, 32, -3]])]
[tensor([[12, 3, 9],
[ 1, 10, 8]])]
You may also have noticed that PyTorch supports data shuffling by default in its DataLoader
, while TensorFlow provides a separate function - shuffle()
- for the dataset objects. It takes buffer size as its argument (something which can be better demonstrated with a bigger dataset).
tfShuffledBatch = tfDataSetD.shuffle(3).batch(2)
for batch in tfShuffledBatch:
print(batch)
tf.Tensor(
[[ 1 10 8]
[34 21 0]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[12 3 9]
[21 5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[21 32 -3]], shape=(1, 3), dtype=int32)
3. Padding
There are a number of datasets with data of inexact/unequal length. I remember working with RNA sequences (applies to almost every NLP problem) of different lengths in the pre-TensorFlow/PyTorch days, and it was agony. Luckily, our modern DL libraries take care of that problem.
In TensorFlow, we can perform padding using padded_batch()
.
tfPaddedBatch = tfDataSetD.padded_batch(2)
for batch in tfPaddedBatch:
print(batch)
tf.Tensor(
[[ 1 10 8]
[21 32 -3]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[34 21 0]
[21 5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[12 3 9]], shape=(1, 3), dtype=int32)
The output will be the same as above but will change if we use some unequal-length dataset like text strings. We can perform the same in PyTorch using collate_fn
.
Autograd
One of the biggest challenges of any deep learning library is its ability to calculate the gradients (for gradient-based optimizers like SGD). Automatic differentiation has an edge over other methods like manual or symbolic differentiation and hence is the de facto approach in DL libraries. It would be useful to compare the differences between the auto-differentiation in the two libraries.
1. Autograd APIs
PyTorch uses torch.autograd
for calculating gradients. What this means is that we can calculate the gradient of any tensor by just setting its requires_grad
to be True
, followed by <tensor>.backward()
.
Let's create an example by calculating the derivative of f(x) = 3x2 − 5x.
def polynomialFunc(x):
return 3 * x[0]**2 - 5*x[1]
pyTensorX = torch.tensor([2.0,3.0], requires_grad=True)
pyTensory = polynomialFunc(pyTensorX)
pyTensory.backward()
dx_dy = pyTensorX.graddx_dy
TensorFlow, on the other hand, uses the tf.GradientTape
API for this purpose.
2. Forward vs. reverse mode
The difference between forward and reverse mode lies in the calculation of Jacobians.
For a function, f : Rm → Rn, the Jacobian matrix is defined as:
$$ \mathbf J = \begin{bmatrix} \dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \nabla^{\mathrm T} f_1 \\
\vdots \\ \nabla^{\mathrm T} f_m \end{bmatrix} = \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1} & \cdots & \dfrac{\partial f_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f_m}{\partial x_1} & \cdots & \dfrac{\partial f_m}{\partial x_n} \end{bmatrix} $$
This matrix may sound daunting, but it's just a vector of gradients. The important part here is to decide whether to calculate it using the forward mode or the reverse one. So the answer is pretty simple. If m > n, then we'll use forward mode and vice-versa.
While TensorFlow allows only reverse mode, PyTorch provides us with both forward and reverse modes of differentiation.
3. Function Transforms
Similar to JAX (which arguably has a better autograd than both PyTorch and TensorFlow), PyTorch enables us to transform functions (like calculating Hessian as a Jacobian of a Jacobian) and should be preferred here over TensorFlow.
Functional programming support
Both TensorFlow and PyTorch have no functional programming support per se, but the growing popularity of JAX’s functional programming model has inspired PyTorch to add torch.func
in its version 2.0.
Random number generation
PyTorch uses PCG, while TensorFlow is based on Threefry. PyTorch also provides us with CSPRNG as an extension.
Lazy vs. eager execution
Since a deep model is a graph with a number of connected nodes, there can be two ways of executing it:
- Static
- Dynamic
In static, also known as lazy execution, we build a graph and don’t get the results until it's completely built. TensorFlow used to use lazy execution. This can be annoying and time-consuming as we may need to check the intermediate results.
PyTorch, on the other hand, uses both lazy and eager execution. It uses eager execution as the default mode, enabling us to check the intermediate outputs. Inevitably, it's quite helpful for researchers and even ML engineers as model development undergoes a number of testing phases.
Inspired by PyTorch, TensorFlow introduced eager mode in 2017/18. From TensorFlow 2.0, it uses eager mode by default as well. So, if you're using TensorFlow < 2.0, then you probably need to set eager mode explicitly. Otherwise, it's quite similar in the current versions of both libraries.
We can set it explicitly as:
import TensorFlow.contrib.eager as tfe
tfe.enable_eager_execution()
Multiple language support
While both TensorFlow and PyTorch are primarily Python-based, they have interfaces for other languages, too.
PyTorch has a pretty strong C++ interface. It can be used by calling the <ATen/ATen.h>
library.
There can be cases where our own or some 3rd party’s implementation is better than PyTorch’s C++ API. If so, you can use PyTorch's C++ extensions.
TensorFlow, on the other hand, is quite established in its C++, Java, and JavaScript implementations.
Other platforms
TensorFlow is a clear winner here as it has an established presence in the mobile developer community through TF Lite. PyTorch has its own mobile version as well, but it's still in beta and has yet to catch the attention of developers.
Many of you may be familiar with the Neural Network playground by TensorFlow and might have wondered 'How come it runs in the browser?' Yes! There's a JavaScript version for TensorFlow as well. PyTorch, on the other hand, is yet to make its mark on the web. There's ONNX, but it's harder to understand.
So TensorFlow is a clear winner here. If you're a web or mobile developer, TensorFlow should be your go-to library.
PyTorch vs. TensorFlow: looking ahead to Keras 3.0
Both TensorFlow and PyTorch are phenomenal in the DL community. Both have their own style, and each has an edge in different features. Luckily, Keras Core has added support for both models and will be available as Keras 3.0 this fall. So keep your fingers crossed that Keras will bridge the gap and giving developers the best of both without needing to switch. I'll be looking at Keras in detail in my next post.
If you need data for your models, you might be interested in web scraping with Python, or other methods of collecting data for AI.