PyTorch vs. TensorFlow: which is best for deep learning?

Deep learning and the switch to Python

The convolutional neural network AlexNet brought about a revolution in the AI community back in 2012 just as deep learning’s importance was dawning upon a broader community. The surge in deep learning (DL) led to the need for proper programming support in the form of libraries.

PyTorch vs. TensorFlow: blocks of data going from red to orange — What's the difference between how PyTorch and TensorFlow handle data for deep learning?

Earlier solutions included Torch, a library written in Lua, and Caffe (2013), written in C++. Over time, the DL community realized the limitations of Lua and decided to switch to Python.

As a result, Theano – a library for numerical computing – was upgraded to allow deep learning support in 2013. Theano was received warmly, as users were able to appreciate a number of Python features and a NumPy-like style and interface.

Check out Python and machine learning, if you need a primer on the basics.

TensorFlow

Since Theano was maintained by an academic group, industry giants like Google decided to step in just as Keras was introduced in early 2015. Keras was focused on providing a simplified high-level interface for designing deep models. It was soon followed by TensorFlow in the same year.

TensorFlow, based on the internally used DistBelief, was the first major DL library. And given the rapidly evolving history of deep learning, it makes one wonder how it has stood the test of time.

The secret lies in a number of factors, from Google’s continuous support and scalability to some very cool features. We'll cover them in detail shortly.

PyTorch

Meta (then still just Facebook) soon followed suit and upgraded Caffe to PyTorch in 2016. It was challenging to introduce a new DL library in the towering presence of TensorFlow. The initial reception was cool and adoption slow, but gradually, PyTorch barged its way in thanks to some of its exciting features. Despite all the front-end support for Keras, PyTorch has continued to make inroads in the DL community. And today, the choice any DL engineer or researcher has to make is not just between TensorFlow and PyTorch but also about mastering the important programming languages that power these frameworks.

PyTorch vs. TensorFlow

PyTorch or TensorFlow? This is a pressing question, and the answer requires some analysis of the problem at hand and an in-depth comparison between the two. We'll try to do the topic justice. Let’s begin.

🗒️

Note: Since this comparative analysis includes both theory and practical examples, you'll find it helpful to have a Python IDE or some Jupyter notebook open in the adjacent tab to test and verify the comparisons yourself.

Tensors

In numerical computing, and especially DL, we deal with tensors. Tensors are nothing more than a generalized form of matrices that allow higher dimensions. Unlike traditional Python collections, all the elements in a tensor have the same datatype.

In PyTorch, we can initialize a tensor like so:

<tensor> = torch.tensor(<collection/NumPy Array>)

Similarly, in TensorFlow, we can convert an existing collection or a NumPy array into a tensor as follows:

<tensor> = tf.convert_to_tensor(<collection/NumPy Array>)

Note: For ease of reference, we'll use the nomenclature of pyTensor.. for PyTorch tensors and tfTensor... for the TensorFlow ones.

import torch
import numpy as np
import TensorFlow as tf

🗒️

Note: We're using the latest versions (2.13.0 and 2.0.1 for TensorFlow and PyTorch) for the analysis here. The bottom line here is the use of version 2.0 for both libraries.

print(tf.__version__)
print(torch.__version__)

2.13.0
2.0.1

A = np.ones((1,10))
pyTensorA = torch.tensor(A)
tfTensorA = tf.convert_to_tensor(A)
print(type(pyTensorA))
print(type(tfTensorA))

<class 'torch.Tensor'>
<class 'TensorFlow.python.framework.ops.EagerTensor'>

There isn't really any difference, as both TensorFlow and PyTorch are exclusive to each other and are expected to have different implementation classes.

1. Jagged arrays support

Neither PyTorch nor TensorFlow natively support jagged arrays. Arrays are required to have the same dimensions across the axes.

listB = [[2,3],[4,5,6],[12,13,14,15]]
listB

[[2, 3], [4, 5, 6], [12, 13, 14, 15]]

tfTensorB = tf.convert_to_tensor(listB)
tfTensorB

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[5], line 1
----> 1 tfTensorB = tf.convert_to_tensor(listB)
      2 tfTensorB

File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb

File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
     96     dtype = dtypes.as_dtype(dtype).as_datatype_enum
     97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)

ValueError: Can't convert non-rectangular Python sequence to Tensor.

pyTensorB = torch.tensor(listB)
pyTensorB

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[6], line 1
----> 1 pyTensorB = torch.tensor(listB)
      2 pyTensorB

ValueError: expected sequence of length 2 at dim 1 (got 3)

But there's more to it than that. TensorFlow has the support of RaggedTensor and SparseTensor and they allow us to have jagged tensors.

raggedTensorB = tf.ragged.constant(listB)
raggedTensorB

<tf.RaggedTensor [[2, 3], [4, 5, 6], [12, 13, 14, 15]]>

On the other hand, PyTorch took inspiration by implementing the nested_tensor.

nestedTensorB = torch.nested.nested_tensor(listB)
nestedTensorB

/tmp/ipykernel_4077/337460509.py:1: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /croot/pytorch_1686931851744/work/aten/src/ATen/NestedTensorImpl.cpp:177.)
  nestedTensorB = torch.nested.nested_tensor(listB)

nested_tensor([
  tensor([2, 3]),
  tensor([4, 5, 6]),
  tensor([12, 13, 14, 15])
])

🗒️

Bottom line: nested_tensor is still in its beta phase. On the other hand, TensorFlow’s RaggedTensor is much more stable and mature and recommended over the nestedtensor.

2. Sparse arrays

Sparse matrices and their handling is an important problem in computer science. We have to minimize the storage while also optimizing the data retrieval/storage efficiency.

sparseList = [[0,0,0],[0,1,0],[0,0,1],[0,0,0],[1,0,0]]
tfSparseTensor = tf.convert_to_tensor(sparseList)
tfSparseTensor

<tf.Tensor: shape=(5, 3), dtype=int32, numpy=
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 0, 0],
       [1, 0, 0]], dtype=int32)>

This is OK, but not a scalable solution. A better approach is to use TensorFlow’s SparseTensor for this purpose. Its syntax is:

<sparseTensor> = tf.sparse.from_dense(<denseTensor>)

tfOptimizedSparseTensor = tf.sparse.from_dense(tfSparseTensor)
tfOptimizedSparseTensor

SparseTensor(indices=tf.Tensor(
[[1 1]
 [2 2]
 [4 0]], shape=(3, 2), dtype=int64), values=tf.Tensor([1 1 1], shape=(3,), dtype=int32), dense_shape=tf.Tensor([5 3], shape=(2,), dtype=int64))

We can convert it back into a dense tensor using:

tf.sparse.to_dense():

tf.sparse.to_dense(tfOptimizedSparseTensor)

<tf.Tensor: shape=(5, 3), dtype=int32, numpy=
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 0, 0],
       [1, 0, 0]], dtype=int32)>

Inspired by this, PyTorch has also released its beta torch.sparse. We can convert a tensor into a sparse one using this:

<tensor>.to_sparse()

pySparseTensor = torch.tensor(sparseList)
pySparseTensor

tensor([[0, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 0, 0],
        [1, 0, 0]])

pyOptimizedSparseTensor = pySparseTensor.to_sparse()
pyOptimizedSparseTensor

tensor(indices=tensor([[1, 2, 4],
                       [1, 2, 0]]),
       values=tensor([1, 1, 1]),
       size=(5, 3), nnz=3, layout=torch.sparse_coo)

Support for different sparse formats

If we look closely at the tensor’s output above, it shows sparse_coo as the layout. The coordinate list (COO) format is the default format used in both PyTorch and TensorFlow. However, this is where PyTorch has a clear edge as it supports other formats too, like BSC, BSR, and CSC. Covering the details of these formats is beyond the scope of this tutorial, but curious readers are encouraged to check the details.

🗒️

Conclusion: There isn’t much difference between TensorFlow and PyTorch’s Sparse implementations, though deploying PyTorch’s implementation in a commercial application can be prone to bugs. On the other hand, if we want to use other formats, then PyTorch is a clear choice.

3. Strings

While TensorFlow allows strings, PyTorch doesn’t.

listC = ["Ali","Bilal"]

tfTensorC = tf.convert_to_tensor(listC)
tfTensorC

<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'Ali', b'Bilal'], dtype=object)>

pyTensorC = torch.tensor(listC)
pyTensorC

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[17], line 1
----> 1 pyTensorC = torch.tensor(listC)
      2 pyTensorC

ValueError: too many dimensions 'str'

4. User-defined classes

To avoid complicating things too much, both libraries avoid user-defined types.

class MyClass:    
	a = 0    
	b = 1

obj1 = MyClass()
obj2 = MyClass()
objectsList = [obj1,obj2]

pyTensorObjects = torch.tensor(objectsList)
pyTensorObjects

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

Cell In[20], line 1
----> 1 pyTensorObjects = torch.tensor(objectsList)
      2 pyTensorObjects

RuntimeError: Could not infer dtype of MyClass

tfTensorObjects = tf.convert_to_tensor(objectsList)
tfTensorObjects

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[21], line 1
----> 1 tfTensorObjects = tf.convert_to_tensor(objectsList)
      2 tfTensorObjects

File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb

File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
     96     dtype = dtypes.as_dtype(dtype).as_datatype_enum
     97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)

ValueError: Attempt to convert a value (<__main__.MyClass object at 0x7ff9586b29d0>) with an unsupported type (<class '__main__.MyClass'>) to a Tensor.

5. Complex numbers

Python has built-in support for complex numbers, and they're supported by both TensorFlow and PyTorch.

c1 = complex(1,3)
c2 = complex(2,-1)
listComplex = [c1,c2]

tfTensorComplex = tf.convert_to_tensor(listComplex)
tfTensorComplex

<tf.Tensor: shape=(2,), dtype=complex128, numpy=array([1.+3.j, 2.-1.j])>

pyTensorComplex = torch.tensor(listComplex)
pyTensorComplex

tensor([1.+3.j, 2.-1.j])

tuple1 = ((1,2),(3,4))
tfTensorTuple = tf.convert_to_tensor(tuple1)
pyTensorTuple = torch.tensor(tuple1)

6. Data types

We mentioned above that all the members of a tensor are supposed to be of the same size. We've also seen that PyTorch allows only numeric types (including complex), while TensorFlow supports string types as well.

Another difference between the two lies in the data types they support. PyTorch implements Python’s intrinsic numeric types (like float32 and int64, for example). TensorFlow does the same, but it wraps them, and hence we need conversions while integrating with Python. PyTorch, on the other hand, is seamless.

🗒️

Suggestion: If in doubt about integrating with Python’s intrinsic data types, go for PyTorch.

Data handling

Arguably, the most important part of any ML/DL project is data processing. Data comes in many forms, from unstructured text to video clips, and requires a proper framework.

In TensorFlow, we use the tf.data API for input pipelines, while PyTorch uses the torch.data API for the same purpose.

1. Dataset

A dataset is an abstraction for a collection of data handled in a sequential manner. We can make a dataset from any Python collection (or NumPy array) as follows:

listD = [[1,10,8],[21,32,-3],[34,21,0],[21,5,-2],[12,3,9]]
tfDataSetD = tf.data.Dataset.from_tensor_slices(listD)
tfDataSetD

<_TensorSliceDataset element_spec=TensorSpec(shape=(3,), dtype=tf.int32, name=None)>

PyTorch serves the same purpose using its own implementation. We can use TensorDataset there.

from torch.utils.data import Tensor
DatasettensorD = torch.tensor(listD)
pyDataSetD = TensorDataset(tensorD)
pyDataSetD

<torch.utils.data.dataset.TensorDataset at 0x7ff95850c7d0>

Both TensorFlow and PyTorch datasets are iterable.

for x in tfDataSetD:    
	print(x)

tf.Tensor([ 1 10  8], shape=(3,), dtype=int32)
tf.Tensor([21 32 -3], shape=(3,), dtype=int32)
tf.Tensor([34 21  0], shape=(3,), dtype=int32)
tf.Tensor([21  5 -2], shape=(3,), dtype=int32)
tf.Tensor([12  3  9], shape=(3,), dtype=int32)

for x in pyDataSetD:    
	print(x)

(tensor([ 1, 10,  8]),)
(tensor([21, 32, -3]),)
(tensor([34, 21,  0]),)
(tensor([21,  5, -2]),)
(tensor([12,  3,  9]),)

tfIterator = iter(tfDataSetD)
print(next(tfIterator))
print(next(tfIterator))

tf.Tensor([ 1 10  8], shape=(3,), dtype=int32)
tf.Tensor([21 32 -3], shape=(3,), dtype=int32)

pyIterator = iter(pyDataSetD)
print(next(pyIterator))
print(next(pyIterator))

(tensor([ 1, 10,  8]),)
(tensor([21, 32, -3]),)

Creating a dataset is a basic step. The real magic happens once we take it and perform transformations on it. Let’s learn some of that magic.

2. Batching

Batching is a pretty essential component in which we divide the dataset between batches. This division is based on a number of parameters, including memory, the number of data samples in the dataset, and the degree of flexibility between Gradient Descent and SGD.

For batching in TensorFlow, we can simply apply batch(<batch size>).

tfBatchedDataset = tfDataSetD.batch(2)
for batch in tfBatchedDataset:    
	print(batch)

tf.Tensor(
[[ 1 10  8]
 [21 32 -3]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[34 21  0]
 [21  5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[12  3  9]], shape=(1, 3), dtype=int32)

As you may have noticed, it returns the single data sample left in the last iteration. So don’t panic or expect an error if you're dividing a dataset of N samples into B batches, even if B is not a factor of N.

Batching is automatically set in PyTorch. We can make a data batch by using its DataLoader.

from torch.utils.data import DataLoader
pyBatchedDataset = DataLoader(pyDataSetD, batch_size=3, shuffle=True)
for batch in pyBatchedDataset:    
	print(batch)

[tensor([[34, 21,  0],
        [21,  5, -2],
        [21, 32, -3]])]
[tensor([[12,  3,  9],
        [ 1, 10,  8]])]

You may also have noticed that PyTorch supports data shuffling by default in its DataLoader, while TensorFlow provides a separate function - shuffle() - for the dataset objects. It takes buffer size as its argument (something which can be better demonstrated with a bigger dataset).

tfShuffledBatch = tfDataSetD.shuffle(3).batch(2)
for batch in tfShuffledBatch:    
	print(batch)

tf.Tensor(
[[ 1 10  8]
 [34 21  0]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[12  3  9]
 [21  5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[21 32 -3]], shape=(1, 3), dtype=int32)

3. Padding

There are a number of datasets with data of inexact/unequal length. I remember working with RNA sequences (applies to almost every NLP problem) of different lengths in the pre-TensorFlow/PyTorch days, and it was agony. Luckily, our modern DL libraries take care of that problem.

In TensorFlow, we can perform padding using padded_batch().

tfPaddedBatch = tfDataSetD.padded_batch(2)
for batch in tfPaddedBatch:    
	print(batch)

tf.Tensor(
[[ 1 10  8]
 [21 32 -3]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[34 21  0]
 [21  5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[12  3  9]], shape=(1, 3), dtype=int32)

The output will be the same as above but will change if we use some unequal-length dataset like text strings. We can perform the same in PyTorch using collate_fn.

Autograd

One of the biggest challenges of any deep learning library is its ability to calculate the gradients (for gradient-based optimizers like SGD). Automatic differentiation has an edge over other methods like manual or symbolic differentiation and hence is the de facto approach in DL libraries. It would be useful to compare the differences between the auto-differentiation in the two libraries.

🗒️

Note: We'll use the terms auto-differentiation and autograd interchangeably.

1. Autograd APIs

PyTorch uses torch.autograd for calculating gradients. What this means is that we can calculate the gradient of any tensor by just setting its requires_grad to be True, followed by <tensor>.backward().

Let's create an example by calculating the derivative of f(x) = 3x2 − 5x.

def polynomialFunc(x):    
	return 3 * x[0]**2 - 5*x[1]
pyTensorX = torch.tensor([2.0,3.0], requires_grad=True)
pyTensory = polynomialFunc(pyTensorX)
pyTensory.backward()
dx_dy = pyTensorX.graddx_dy

TensorFlow, on the other hand, uses the tf.GradientTape API for this purpose.

2. Forward vs. reverse mode

The difference between forward and reverse mode lies in the calculation of Jacobians.

For a function, f : Rm → Rn, the Jacobian matrix is defined as:

$$ \mathbf J = \begin{bmatrix} \dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \nabla^{\mathrm T} f_1 \\
\vdots \\ \nabla^{\mathrm T} f_m \end{bmatrix} = \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1} & \cdots & \dfrac{\partial f_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f_m}{\partial x_1} & \cdots & \dfrac{\partial f_m}{\partial x_n} \end{bmatrix} $$

This matrix may sound daunting, but it's just a vector of gradients. The important part here is to decide whether to calculate it using the forward mode or the reverse one. So the answer is pretty simple. If m > n, then we'll use forward mode and vice-versa.

While TensorFlow allows only reverse mode, PyTorch provides us with both forward and reverse modes of differentiation.

🗒️

Note: Since a neural network usually returns just a single input, reverse mode is the default choice.

3. Function Transforms

Similar to JAX (which arguably has a better autograd than both PyTorch and TensorFlow), PyTorch enables us to transform functions (like calculating Hessian as a Jacobian of a Jacobian) and should be preferred here over TensorFlow.

Functional programming support

Both TensorFlow and PyTorch have no functional programming support per se, but the growing popularity of JAX’s functional programming model has inspired PyTorch to add torch.func in its version 2.0.

Random number generation

PyTorch uses PCG, while TensorFlow is based on Threefry. PyTorch also provides us with CSPRNG as an extension.

Lazy vs. eager execution

Since a deep model is a graph with a number of connected nodes, there can be two ways of executing it:

Static
Dynamic

In static, also known as lazy execution, we build a graph and don’t get the results until it's completely built. TensorFlow used to use lazy execution. This can be annoying and time-consuming as we may need to check the intermediate results.

PyTorch, on the other hand, uses both lazy and eager execution. It uses eager execution as the default mode, enabling us to check the intermediate outputs. Inevitably, it's quite helpful for researchers and even ML engineers as model development undergoes a number of testing phases.

Inspired by PyTorch, TensorFlow introduced eager mode in 2017/18. From TensorFlow 2.0, it uses eager mode by default as well. So, if you're using TensorFlow < 2.0, then you probably need to set eager mode explicitly. Otherwise, it's quite similar in the current versions of both libraries.

We can set it explicitly as:

import TensorFlow.contrib.eager as tfe
tfe.enable_eager_execution()

🗒️

Note: The code snippet above won’t work for TensorFlow 2.0+ as it's already enabled by default.

Multiple language support

While both TensorFlow and PyTorch are primarily Python-based, they have interfaces for other languages, too.

PyTorch has a pretty strong C++ interface. It can be used by calling the <ATen/ATen.h> library.

There can be cases where our own or some 3rd party’s implementation is better than PyTorch’s C++ API. If so, you can use PyTorch's C++ extensions.

TensorFlow, on the other hand, is quite established in its C++, Java, and JavaScript implementations.

🗒️

Suggestion: For C++, it's more or less a tie with a slight edge in PyTorch’s favor, but TensorFlow is a clear choice for Java development.

Other platforms

TensorFlow is a clear winner here as it has an established presence in the mobile developer community through TF Lite. PyTorch has its own mobile version as well, but it's still in beta and has yet to catch the attention of developers.

Many of you may be familiar with the Neural Network playground by TensorFlow and might have wondered 'How come it runs in the browser?' Yes! There's a JavaScript version for TensorFlow as well. PyTorch, on the other hand, is yet to make its mark on the web. There's ONNX, but it's harder to understand.

So TensorFlow is a clear winner here. If you're a web or mobile developer, TensorFlow should be your go-to library.

🗒️

Note: Please don’t confuse TorchScript with JavaScript. They're unrelated.

PyTorch vs. TensorFlow: looking ahead to Keras 3.0

Both TensorFlow and PyTorch are phenomenal in the DL community. Both have their own style, and each has an edge in different features. Luckily, Keras Core has added support for both models and will be available as Keras 3.0 this fall. So keep your fingers crossed that Keras will bridge the gap and giving developers the best of both without needing to switch. I'll be looking at Keras in detail in my next post.

If you need data for your models, you might be interested in web scraping with Python, or other methods of collecting data for AI.