Content

## Deep learning and the switch to Python

The convolutional neural network AlexNet brought about a revolution in the AI community back in 2012 just as deep learning’s importance was dawning upon a broader community. The surge in deep learning (DL) led to the need for proper programming support in the form of libraries.

Earlier solutions included Torch, a library written in Lua, and Caffe (2013), written in C++. Over time, the DL community realized the limitations of Lua and decided to switch to Python.

As a result, Theano – a library for numerical computing – was upgraded to allow deep learning support in 2013. Theano was received warmly, as users were able to appreciate a number of Python features and a NumPy-like style and interface.

Check out Python and machine learning, if you need a primer on the basics.

### TensorFlow

Since Theano was maintained by an academic group, industry giants like Google decided to step in just as Keras was introduced in early 2015. Keras was focused on providing a simplified high-level interface for designing deep models. It was soon followed by TensorFlow in the same year.

TensorFlow, based on the internally used DistBelief, was the first major DL library. And given the rapidly evolving history of deep learning, it makes one wonder how it has stood the test of time.

The secret lies in a number of factors, from Google’s continuous support and scalability to some very cool features. We'll cover them in detail shortly.

### PyTorch

Meta (then still just Facebook) soon followed suit and upgraded Caffe to PyTorch in 2016. It was challenging to introduce a new DL library in the towering presence of TensorFlow. The initial reception was cool and adoption slow, but gradually, PyTorch barged its way in thanks to some of its exciting features. Despite all the front-end support for Keras, PyTorch has continued to make inroads in the DL community. And today, the choice any DL engineer or researcher has to make is the one between TensorFlow and PyTorch.

## PyTorch vs. TensorFlow

PyTorch or TensorFlow? This is a pressing question, and the answer requires some analysis of the problem at hand and an in-depth comparison between the two. We'll try to do the topic justice. Let’s begin.

## Tensors

In numerical computing, and especially DL, we deal with tensors. Tensors are nothing more than a generalized form of matrices that allow higher dimensions. Unlike traditional Python collections, all the elements in a tensor have the same datatype.

In PyTorch, we can initialize a tensor like so:

`<tensor> = torch.tensor(<collection/NumPy Array>)`

Similarly, in TensorFlow, we can convert an existing collection or a NumPy array into a tensor as follows:

`<tensor> = tf.convert_to_tensor(<collection/NumPy Array>)`

**Note:** For ease of reference, we'll use the nomenclature of `pyTensor..`

for PyTorch tensors and `tfTensor...`

for the TensorFlow ones.

```
import torch
import numpy as np
import TensorFlow as tf
```

**We're using the latest versions (2.13.0 and 2.0.1 for TensorFlow and PyTorch) for the analysis here. The bottom line here is the use of version**

**Note:****for both libraries.**

**2.0**```
print(tf.__version__)
print(torch.__version__)
```

```
2.13.0
2.0.1
```

```
A = np.ones((1,10))
pyTensorA = torch.tensor(A)
tfTensorA = tf.convert_to_tensor(A)
print(type(pyTensorA))
print(type(tfTensorA))
```

```
<class 'torch.Tensor'>
<class 'TensorFlow.python.framework.ops.EagerTensor'>
```

There isn't really any difference, as both TensorFlow and PyTorch are exclusive to each other and are expected to have different implementation classes.

### 1. Jagged arrays support

Neither PyTorch nor TensorFlow natively support jagged arrays. Arrays are required to have the same dimensions across the axes.

```
listB = [[2,3],[4,5,6],[12,13,14,15]]
listB
```

```
[[2, 3], [4, 5, 6], [12, 13, 14, 15]]
```

```
tfTensorB = tf.convert_to_tensor(listB)
tfTensorB
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 tfTensorB = tf.convert_to_tensor(listB)
2 tfTensorB
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Can't convert non-rectangular Python sequence to Tensor.
```

```
pyTensorB = torch.tensor(listB)
pyTensorB
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 1
----> 1 pyTensorB = torch.tensor(listB)
2 pyTensorB
ValueError: expected sequence of length 2 at dim 1 (got 3)
```

But there's more to it than that. TensorFlow has the support of ** RaggedTensor** and

**and they allow us to have jagged tensors.**

`SparseTensor`

```
raggedTensorB = tf.ragged.constant(listB)
raggedTensorB
```

```
<tf.RaggedTensor [[2, 3], [4, 5, 6], [12, 13, 14, 15]]>
```

On the other hand, PyTorch took inspiration by implementing the **nested_tensor**.

```
nestedTensorB = torch.nested.nested_tensor(listB)
nestedTensorB
```

```
/tmp/ipykernel_4077/337460509.py:1: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /croot/pytorch_1686931851744/work/aten/src/ATen/NestedTensorImpl.cpp:177.)
nestedTensorB = torch.nested.nested_tensor(listB)
nested_tensor([
tensor([2, 3]),
tensor([4, 5, 6]),
tensor([12, 13, 14, 15])
])
```

**nested_tensor is still in its beta phase. On the other hand, TensorFlow’s RaggedTensor is much more stable and mature and recommended over the nestedtensor.**

**Bottom line:**### 2. Sparse arrays

Sparse matrices and their handling is an important problem in computer science. We have to minimize the storage while also optimizing the data retrieval/storage efficiency.

```
sparseList = [[0,0,0],[0,1,0],[0,0,1],[0,0,0],[1,0,0]]
tfSparseTensor = tf.convert_to_tensor(sparseList)
tfSparseTensor
```

```
<tf.Tensor: shape=(5, 3), dtype=int32, numpy=
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]], dtype=int32)>
```

This is OK, but not a scalable solution. A better approach is to use TensorFlow’s ** SparseTensor** for this purpose. Its syntax is:

`<sparseTensor> = tf.sparse.from_dense(<denseTensor>)`

```
tfOptimizedSparseTensor = tf.sparse.from_dense(tfSparseTensor)
tfOptimizedSparseTensor
```

```
SparseTensor(indices=tf.Tensor(
[[1 1]
[2 2]
[4 0]], shape=(3, 2), dtype=int64), values=tf.Tensor([1 1 1], shape=(3,), dtype=int32), dense_shape=tf.Tensor([5 3], shape=(2,), dtype=int64))
```

We can convert it back into a dense tensor using:

`tf.sparse.to_dense()`

:

```
tf.sparse.to_dense(tfOptimizedSparseTensor)
```

```
<tf.Tensor: shape=(5, 3), dtype=int32, numpy=
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]], dtype=int32)>
```

Inspired by this, PyTorch has also released its beta `torch.sparse`

. We can convert a tensor into a sparse one using this:

`<tensor>.to_sparse()`

```
pySparseTensor = torch.tensor(sparseList)
pySparseTensor
```

```
tensor([[0, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[1, 0, 0]])
```

```
pyOptimizedSparseTensor = pySparseTensor.to_sparse()
pyOptimizedSparseTensor
```

```
tensor(indices=tensor([[1, 2, 4],
[1, 2, 0]]),
values=tensor([1, 1, 1]),
size=(5, 3), nnz=3, layout=torch.sparse_coo)
```

### Support for different sparse formats

If we look closely at the tensor’s output above, it shows `sparse_coo`

as the layout. The coordinate list (COO) format is the default format used in both PyTorch and TensorFlow. However, this is where PyTorch has a clear edge as it supports other formats too, like BSC, BSR, and CSC. Covering the details of these formats is beyond the scope of this tutorial, but curious readers are encouraged to check the details.

**There isn’t much difference between TensorFlow and PyTorch’s Sparse implementations, though deploying PyTorch’s implementation in a commercial application can be prone to bugs. On the other hand, if we want to use other formats, then PyTorch is a clear choice.**

**Conclusion:**### 3. Strings

While TensorFlow allows strings, PyTorch doesn’t.

```
listC = ["Ali","Bilal"]
```

```
tfTensorC = tf.convert_to_tensor(listC)
tfTensorC
```

```
<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'Ali', b'Bilal'], dtype=object)>
```

```
pyTensorC = torch.tensor(listC)
pyTensorC
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[17], line 1
----> 1 pyTensorC = torch.tensor(listC)
2 pyTensorC
ValueError: too many dimensions 'str'
```

### 4. User-defined classes

To avoid complicating things too much, both libraries avoid user-defined types.

```
class MyClass:
a = 0
b = 1
```

```
obj1 = MyClass()
obj2 = MyClass()
objectsList = [obj1,obj2]
```

```
pyTensorObjects = torch.tensor(objectsList)
pyTensorObjects
```

```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[20], line 1
----> 1 pyTensorObjects = torch.tensor(objectsList)
2 pyTensorObjects
RuntimeError: Could not infer dtype of MyClass
```

```
tfTensorObjects = tf.convert_to_tensor(objectsList)
tfTensorObjects
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[21], line 1
----> 1 tfTensorObjects = tf.convert_to_tensor(objectsList)
2 tfTensorObjects
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File ~/anaconda3/lib/python3.11/site-packages/TensorFlow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (<__main__.MyClass object at 0x7ff9586b29d0>) with an unsupported type (<class '__main__.MyClass'>) to a Tensor.
```

### 5. Complex numbers

Python has built-in support for `complex`

* *numbers, and they're supported by both TensorFlow and PyTorch.

```
c1 = complex(1,3)
c2 = complex(2,-1)
listComplex = [c1,c2]
```

```
tfTensorComplex = tf.convert_to_tensor(listComplex)
tfTensorComplex
```

```
<tf.Tensor: shape=(2,), dtype=complex128, numpy=array([1.+3.j, 2.-1.j])>
```

```
pyTensorComplex = torch.tensor(listComplex)
pyTensorComplex
```

```
tensor([1.+3.j, 2.-1.j])
```

```
tuple1 = ((1,2),(3,4))
tfTensorTuple = tf.convert_to_tensor(tuple1)
pyTensorTuple = torch.tensor(tuple1)
```

### 6. Data types

We mentioned above that all the members of a tensor are supposed to be of the same size. We've also seen that PyTorch allows only numeric types (including `complex`

), while TensorFlow supports string types as well.

Another difference between the two lies in the data types they support. PyTorch implements Python’s intrinsic numeric types (like `float32`

and `int64`

, for example). TensorFlow does the same, but it wraps them, and hence we need conversions while integrating with Python. PyTorch, on the other hand, is seamless.

**If in doubt about integrating with Python’s intrinsic data types, go for PyTorch.**

**Suggestion:**## Data handling

Arguably, the most important part of any ML/DL project is data processing. Data comes in many forms, from unstructured text to video clips, and requires a proper framework.

In TensorFlow, we use the ** tf.data** API for input pipelines, while PyTorch uses the

**API for the same purpose.**

`torch.data`

### 1. Dataset

A dataset is an abstraction for a collection of data handled in a sequential manner. We can make a dataset from any Python collection (or NumPy array) as follows:

```
listD = [[1,10,8],[21,32,-3],[34,21,0],[21,5,-2],[12,3,9]]
tfDataSetD = tf.data.Dataset.from_tensor_slices(listD)
tfDataSetD
```

```
<_TensorSliceDataset element_spec=TensorSpec(shape=(3,), dtype=tf.int32, name=None)>
```

PyTorch serves the same purpose using its own implementation. We can use ** TensorDataset** there.

```
from torch.utils.data import Tensor
DatasettensorD = torch.tensor(listD)
pyDataSetD = TensorDataset(tensorD)
pyDataSetD
```

```
<torch.utils.data.dataset.TensorDataset at 0x7ff95850c7d0>
```

Both TensorFlow and PyTorch datasets are iterable.

```
for x in tfDataSetD:
print(x)
```

```
tf.Tensor([ 1 10 8], shape=(3,), dtype=int32)
tf.Tensor([21 32 -3], shape=(3,), dtype=int32)
tf.Tensor([34 21 0], shape=(3,), dtype=int32)
tf.Tensor([21 5 -2], shape=(3,), dtype=int32)
tf.Tensor([12 3 9], shape=(3,), dtype=int32)
```

```
for x in pyDataSetD:
print(x)
```

```
(tensor([ 1, 10, 8]),)
(tensor([21, 32, -3]),)
(tensor([34, 21, 0]),)
(tensor([21, 5, -2]),)
(tensor([12, 3, 9]),)
```

```
tfIterator = iter(tfDataSetD)
print(next(tfIterator))
print(next(tfIterator))
```

```
tf.Tensor([ 1 10 8], shape=(3,), dtype=int32)
tf.Tensor([21 32 -3], shape=(3,), dtype=int32)
```

```
pyIterator = iter(pyDataSetD)
print(next(pyIterator))
print(next(pyIterator))
```

```
(tensor([ 1, 10, 8]),)
(tensor([21, 32, -3]),)
```

Creating a dataset is a basic step. The real magic happens once we take it and perform transformations on it. Let’s learn some of that magic.

### 2. Batching

Batching is a pretty essential component in which we divide the dataset between batches. This division is based on a number of parameters, including memory, the number of data samples in the dataset, and the degree of flexibility between Gradient Descent and SGD.

For batching in TensorFlow, we can simply apply `batch(<batch size>)`

.

```
tfBatchedDataset = tfDataSetD.batch(2)
for batch in tfBatchedDataset:
print(batch)
```

```
tf.Tensor(
[[ 1 10 8]
[21 32 -3]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[34 21 0]
[21 5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[12 3 9]], shape=(1, 3), dtype=int32)
```

As you may have noticed, it returns the single data sample left in the last iteration. So don’t panic or expect an error if you're dividing a dataset of *N* samples into *B* batches, even if *B* is not a factor of *N*.

Batching is automatically set in PyTorch. We can make a data batch by using its ** DataLoader**.

```
from torch.utils.data import DataLoader
pyBatchedDataset = DataLoader(pyDataSetD, batch_size=3, shuffle=True)
for batch in pyBatchedDataset:
print(batch)
```

```
[tensor([[34, 21, 0],
[21, 5, -2],
[21, 32, -3]])]
[tensor([[12, 3, 9],
[ 1, 10, 8]])]
```

You may also have noticed that PyTorch supports data shuffling by default in its `DataLoader`

, while TensorFlow provides a separate function - `shuffle()`

- for the dataset objects. It takes buffer size as its argument (something which can be better demonstrated with a bigger dataset).

```
tfShuffledBatch = tfDataSetD.shuffle(3).batch(2)
for batch in tfShuffledBatch:
print(batch)
```

```
tf.Tensor(
[[ 1 10 8]
[34 21 0]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[12 3 9]
[21 5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[21 32 -3]], shape=(1, 3), dtype=int32)
```

### 3. Padding

There are a number of datasets with data of inexact/unequal length. I remember working with RNA sequences (applies to almost every NLP problem) of different lengths in the pre-TensorFlow/PyTorch days, and it was agony. Luckily, our modern DL libraries take care of that problem.

In TensorFlow, we can perform padding using ** padded_batch()**.

```
tfPaddedBatch = tfDataSetD.padded_batch(2)
for batch in tfPaddedBatch:
print(batch)
```

```
tf.Tensor(
[[ 1 10 8]
[21 32 -3]], shape=(2, 3), dtype=int32)
tf.Tensor(
[[34 21 0]
[21 5 -2]], shape=(2, 3), dtype=int32)
tf.Tensor([[12 3 9]], shape=(1, 3), dtype=int32)
```

The output will be the same as above but will change if we use some unequal-length dataset like text strings. We can perform the same in PyTorch using `collate_fn`

.

## Autograd

One of the biggest challenges of any deep learning library is its ability to calculate the gradients (for gradient-based optimizers like SGD). Automatic differentiation has an edge over other methods like manual or symbolic differentiation and hence is the *de facto* approach in DL libraries. It would be useful to compare the differences between the auto-differentiation in the two libraries.

**We'll use the terms auto-differentiation and autograd interchangeably.**

**Note:**### 1. Autograd APIs

PyTorch uses

for calculating gradients. What this means is that we can calculate the gradient of any tensor by just setting its **torch.autograd**`requires_grad`

to be

, followed by **True**`<tensor>.backward()`

.

Let's create an example by calculating the derivative of *f*(*x*) = 3*x*2 − 5*x*.

```
def polynomialFunc(x):
return 3 * x[0]**2 - 5*x[1]
pyTensorX = torch.tensor([2.0,3.0], requires_grad=True)
pyTensory = polynomialFunc(pyTensorX)
pyTensory.backward()
dx_dy = pyTensorX.graddx_dy
```

TensorFlow, on the other hand, uses the `tf.GradientTape`

API for this purpose.

### 2. Forward vs. reverse mode

The difference between forward and reverse mode lies in the calculation of Jacobians.

For a function, *f* : *Rm* → *Rn*, the Jacobian matrix is defined as:

$$ \mathbf J = \begin{bmatrix} \dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \nabla^{\mathrm T} f_1 \\

\vdots \\ \nabla^{\mathrm T} f_m \end{bmatrix} = \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1} & \cdots & \dfrac{\partial f_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f_m}{\partial x_1} & \cdots & \dfrac{\partial f_m}{\partial x_n} \end{bmatrix} $$

This matrix may sound daunting, but it's just a vector of gradients. The important part here is to decide whether to calculate it using the forward mode or the reverse one. So the answer is pretty simple. If *m* > *n*, then we'll use forward mode and vice-versa.

While TensorFlow allows only reverse mode, PyTorch provides us with both forward and reverse modes of differentiation.

**Since a neural network usually returns just a single input, reverse mode is the default choice.**

**Note:**### 3. Function Transforms

Similar to JAX (which arguably has a better autograd than both PyTorch and TensorFlow), PyTorch enables us to transform functions (like calculating Hessian as a Jacobian of a Jacobian) and should be preferred here over TensorFlow.

## Functional programming support

Both TensorFlow and PyTorch have no functional programming support *per se,* but the growing popularity of JAX’s functional programming model has inspired PyTorch to add `torch.func`

in its version 2.0.

## Random number generation

PyTorch uses PCG, while TensorFlow is based on Threefry. PyTorch also provides us with CSPRNG as an extension.

## Lazy vs. eager execution

Since a deep model is a graph with a number of connected nodes, there can be two ways of executing it:

**Static****Dynamic**

In static, also known as lazy execution, we build a graph and don’t get the results until it's completely built. TensorFlow used to use lazy execution. This can be annoying and time-consuming as we may need to check the intermediate results.

PyTorch, on the other hand, uses both lazy and eager execution. It uses eager execution as the default mode, enabling us to check the intermediate outputs. Inevitably, it's quite helpful for researchers and even ML engineers as model development undergoes a number of testing phases.

Inspired by PyTorch, TensorFlow introduced eager mode in 2017/18. From TensorFlow 2.0, it uses eager mode by default as well. So, if you're using TensorFlow < 2.0, then you probably need to set eager mode explicitly. Otherwise, it's quite similar in the current versions of both libraries.

We can set it explicitly as:

```
import TensorFlow.contrib.eager as tfe
tfe.enable_eager_execution()
```

**The code snippet above won’t work for TensorFlow 2.0+ as it's already enabled by default.**

**Note:**## Multiple language support

While both TensorFlow and PyTorch are primarily Python-based, they have interfaces for other languages, too.

PyTorch has a pretty strong C++ interface. It can be used by calling the `<ATen/ATen.h>`

library.

There can be cases where our own or some 3rd party’s implementation is better than PyTorch’s C++ API. If so, you can use PyTorch's C++ extensions.

TensorFlow, on the other hand, is quite established in its C++, Java, and JavaScript implementations.

**For C++, it's more or less a tie with a slight edge in PyTorch’s favor, but TensorFlow is a clear choice for Java development.**

**Suggestion:**## Other platforms

TensorFlow is a clear winner here as it has an established presence in the mobile developer community through TF Lite. PyTorch has its own mobile version as well, but it's still in beta and has yet to catch the attention of developers.

Many of you may be familiar with the Neural Network playground by TensorFlow and might have wondered 'How come it runs in the browser?' Yes! There's a JavaScript version for TensorFlow as well. PyTorch, on the other hand, is yet to make its mark on the web. There's ONNX, but it's harder to understand.

So TensorFlow is a clear winner here. If you're a web or mobile developer, TensorFlow should be your go-to library.

**Please don’t confuse TorchScript with JavaScript. They're unrelated.**

**Note:**## PyTorch vs. TensorFlow: looking ahead to Keras 3.0

Both TensorFlow and PyTorch are phenomenal in the DL community. Both have their own style, and each has an edge in different features. Luckily, Keras Core has added support for both models and will be available as **Keras 3.0** this fall. So keep your fingers crossed that Keras will bridge the gap and giving developers the best of both without needing to switch. I'll be looking at Keras in detail in my next post.

If you need data for your models, you might be interested in web scraping with Python, or other methods of collecting data for AI.