PyTorch Tutorial, Part 1: Installation and The basics
PyTorch is the most popular deep-learning framework which is used by many researchers on the field of machine learning and deep learning. I thing any body on this field should know this framework and use it on their implementations.
1. Intro
PyTorch is a fully featured framework for building deep learning models, which is a type of machine learning that’s commonly used in applications like image recognition and language processing. Written in Python, it’s relatively easy for most machine learning developers to learn and use. PyTorch is distinctive for its excellent support for GPUs and its use of reverse-mode auto-differentiation, which enables computation graphs to be modified on the fly. This makes it a popular choice for fast experimentation and prototyping.
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open-source software released under the modified BSD license.
PyTorch is the work of developers at Facebook AI Research and several other labs. The framework combines the efficient and flexible GPU-accelerated backend libraries from Torch with an intuitive Python frontend that focuses on rapid prototyping, readable code, and support for the widest possible variety of deep learning models. Pytorch lets developers use the familiar imperative programming approach, but still output to graphs. It was released to open source in 2017, and its Python roots have made it a favorite with machine learning developers.
Significantly, PyTorch adopted a Chainer innovation called reverse-mode automatic differentiation. Essentially, it’s like a tape recorder that records completed operations and then replays backward to compute gradients. This makes PyTorch relatively simple to debug and well-adapted to certain applications such as dynamic neural networks. It’s popular for prototyping because every iteration can be different.
PyTorch is especially popular with Python developers because it’s written in Python and uses that language’s imperative, define-by-run eager execution mode in which operations are executed as they are called from Python. As the popularity of the Python programming language persists, a survey identified a growing focus on AI and machine learning tasks and, with them, greater adoption of related PyTorch. This makes PyTorch a good choice for Python developers who are new to deep learning, and a growing library of deep learning courses are based on PyTorch. The API has remained consistent from early releases, meaning that the code is relatively easy for experienced Python developers to understand.
PyTorch’s particular strength is in rapid prototyping and smaller projects. Its ease of use and flexibility also makes it a favorite for academic and research communities.
Facebook developers have been working hard to improve PyTorch’s productive applications. Recent releases have provided enhancements like support for Google’s TensorBoard visualization tool, and just-in-time compilation. It has also expanded support for ONNX (Open Neural Network Exchange), which enables developers to match with deep learning frameworks or runtimes that work best for their applications.
2. Installing PyTorch
You can follow this tutorial using some online platforms such as Google Colab or Kaggle which give you a python environment via a jupyter note book and a proper GPU to meet your needs during learning process and even doing small projects and homeworks. If you prefer using these platforms you can skip this section but if you want to use pytorch on your local machine and use your own GPU, here are the installation steps you should follow.
2.1 Installing PyTorch
You can follow the steps on PyTorch official website pytorch.org to install it locally or stay with me.
If you haven’t installed Anaconda on your machine download and install anaconda then create a conda environment:
After creating the environment activate it using:
Then use pip to install PyTorch:
It will take some time but will install pytorch and all gpu requirements on your machine.
to test that if gpu is supported, open a python file and run the code below:
Well done, you have installed PyTorch on your computer and ready to go through this tutorial.
3. Tensor basics
The very basic class in PyTorch library is the tensor class. almost Every variable and operation in PyTorch is represented by a tensor. You can look at the tensor as just like a numpy array or a multi-dimensional python list. Because of the mathematical nature of Machine Learning operations which are performed on linear-algebra, we need such a class to implement and use the calculations in python.
Tensor can be used in CPU or GPU. Using GPU makes the calculations so much faster. To move the tensor to GPU, you have to use tensor.to('cuda')
or tensor.to(device)
function.
Creating tensors:
Moving tensors to GPU:
You can reshape the tensors using the view
function. This function as very similar to the reshape
function in numpy.
3.1 Operations and gradient calculation
In PyTorch every calculation is represented by a computation graph. For example, if we say y=x+2 this will build a graph as below:
Figure-1: Computational graph for Y = X + 2
This is due to the ace of the gradient calculation. The gradients are required for optimization of the model weights. This computation graphs used for computing the gradients based on the chain rule and Jacobian matrix method. The gradient calculation can be automatically done using the backward
function. If you want to compute the gradient of a tensor, you have to set the require_gradients
parameter to true while defining the tensor.
4. Linear regression
Learning by doing a real project is a perfect way to gain some kinds of skills specially programming. To understand the basics of using the framework, it’s recommended to implement a simple mini project step by step from scratch. We choose linear regression as the training example and will go through the implementations from scratch and with out using pytorch. Then we will convert the code into using PyTorch and advanced functions.
4.1 Problem statement
Simple linear regression is used to estimate the relationship between two quantitative variables. You can use simple linear regression when you want to know:
- How strong the relationship is between two variables (e.g., the relationship between rainfall and soil erosion).
- The value of the dependent variable at a certain value of the independent variable (e.g., the amount of soil erosion at a certain level of rainfall).
Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line. Regression allows you to estimate how a dependent variable changes as the independent variable(s) change.
The formula for a simple linear regression is:
y=β0+β1.X+ϵ- y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).
- β0 is the intercept, the predicted value of y when the x is 0.
- β1 is the regression coefficient – how much we expect y to change as x increases.
- x is the independent variable ( the variable we expect is influencing y).
- ϵ is the error of the estimate, or how much variation there is in our estimate of the regression coefficient.
Linear regression finds the line of best fit line through your data by searching for the regression coefficient (B1) that minimizes the total error (e) of the model.
The loss function or error function in linear regression is determined by Mean Squared Error or MSE.
L=1NN∑i=1(ˆYi−Yi)2To minimize the error, we have to update the regression coefficients by computing the gradient with respect to the dependent variables (Model weights). And update the regression coefficients as below:
w=w−α.dJdwWhich:
dJdw=1N.2x.(ˆy−y)For this example we define a simple training set which is a set of 2D points (x,y) such that y=2×x.
x | y |
---|---|
1 | 2 |
2 | 4 |
3 | 6 |
4 | 8 |
5 | 10 |
6 | 12 |
We pick x=6 as the test point and the rest as training data. Here is the implementation:
The network will have a single node that has a single parameter w which is randomly evaluated.
In PyTorch the forward pass in calculating the layer output is done by calling the forward function which, represents the forward pass of the network. In conclusion, we will call the model output function, the forward function.
Then we have to define the loss function of the network which is the MSE loss function:
Finally, we will need a function to calculate the gradient of the network coefficients, which in pytorch is called the backward function.
Finally, here is the training loop:
As you can see, the model converged after 100 iterations and successfully predicted the expected value for x=6 which is y=12.
4.2 Including PyTorch
We implemented a simple linear regression model from scratch and only using numpy. Now it’s time to include PyTorch in our code. First, we have to turn every variable (x, y and w) into a tensor instead of numpy arrays.
In the above code, while defining the weight parameter, we said that it requires tracking the gradient calculation for this tensor by setting the require_grad
parameter to True
. If we don’t set this parameter to True
, while calling the backward function, it will throw an exception because it doesn’t store the gradients in the computation graph. So, be careful when defining a tensor which is required to calculate the gradients.
Next, we have to define the forward and loss function for the model.
As we said in the previous section, gradients can be calculated by calling the backward function. For this reason, there is no need to define the backward function. While calling the backward function, the calculated gradients will remain in the computation graph until you free its memory. and this could be done by calling tensor.grad.zero_()
function. So, be careful while calling the backward function and make sure that you free the memory associated with the gradients (you can see this in the code below).
Here is the training loop:
4.3 More including PyTorch
Now, let’s use the built-in PyTorch optimizer and loss function as well as the built-in forward function. First change that we should make is to remove the loss function that we where using and use the built-in MSELoss instead. Then, instead of manually updating the model parameters, we can use the built-in optimizers such as Stochastic Gradient Descent (SGD), Adam or etc.
As we know, this model is a single linear nuron which can be represented by torch.nn.linear(input_size, output_size)
. This layer has its own parameters which means it’s not required to define the weights parameter w any more.
While using optimizers, calling optimizer.step()
will automatically update the model parameters and optimizer.zero_grad()
will automatically free the gradients memory.
4.4 Turning model into a Torch module
We can define blocks of layers as modules in PyTorch which are called modules. To do so, we have to define a class which inherits from the base torch.nn.Module
class and implement the forward function for that module.
Now we can instantiate and use this model instead of defining a single fully connected layer as our model.
4.5 More realistic example
Now lets use a more realistic data and plot the results with matplotlib.
Figure-2: Regression results
5. Logistic regression
Here is a classification example using the breast cancer dataset from scikit-learn library. To recap, the problem statement is, we want to classify patients into two classes, having and not having the breast cancer using a single nuron as the previous examples.
As you can see, in the code, when we want to compute the accuracy of the model, we dont need to to keep track of calculated gradients of the calculation. To prevent this to affect our training and calculations, we have to turn this tracking off. This can be done by calling torch.no_grad()
in a with
block or making a detached copy of the variables directly by calling the tensor.detach()
function of that tensor which returns a detached copy of that tensor and work with the returned value instead of the main tensor.
6. Conclusion
In this section we learned what is PyTorch, Tensors and the computation graph definition. Then, we implemented a simple linear regression from scratch using numpy which helped us to better understanding the problem and how to solve the problem by implementation. After that, we turned the calculations from numpy into PyTorch tensors. Finally, we completed the implementation using built-in PyTorch optimizers and loss functions and some examples.