PyTorch/Tensor

Tensor
The basic object in PyTorch is tensor. Tensors are similar to numpy matrices with two important additions: they work with CUDA, and they can calculate gradients.

Tensors are created and manipulated similarly to numpy matrices: >>> a = np.random.rand(10000, 10000).astype(np.float32) >>> b = np.random.rand(10000, 10000).astype(np.float32) >>> t = time.time; c = np.matmul(a, b); time.time-t 7.447854280471802

>>> a1 = torch.rand(10000, 10000, dtype=torch.float32) # note how torch.rand supports dtype >>> b1 = torch.rand(10000, 10000, dtype=torch.float32) >>> t = time.time; c1 = torch.matmul(a1, b1); time.time-t 7.758733749389648

All function like np.ones, np.zeros, np.empty and so on, as well as other main functions and arythmeric operators, also present in torch:

>>> torch.ones(2,2) tensor(1., 1.],           [1., 1.) >>> torch.ones(2,2, dtype=torch.int32) tensor(1, 1],           [1, 1, dtype=torch.int32) >>> a=torch.ones(2,2) # or torch.ones((2,2)) which is the same >>> b=a+1 >>> c=a*b >>> c.reshape(1,4) # or c.view(1,4) which is the same tensor(2., 2., 2., 2.)

For tensors, the function  is a function which returns   object, rather then a member which is a tuple. It is good, because torch.Size inherits tuple and has some additional operators defined:

>>> a=torch.ones(2,3,4) >>> a.size torch.Size([2, 3, 4]) >>> a.size.numel 24

The functions sum, mean and so on for tensors return not a number but a zero dimensional tensor. Tensor elements are also zero dimensional tensors rather than numbers: >>> a = torch.ones(2,2) >>> a.sum tensor(4.) >>> a.sum.size torch.Size([]) >>> a.sum.dim 0   >>> a[0,0] tensor(1.)

To convert a zero dimensional tensor to a number, you should explicitly call the function : >>> a.sum.item 4.0

Instead of numpy's, in torch there is a function >>> a.to(torch.int16) tensor(1, 1],           [1, 1, dtype=torch.int16)

The name is changed because the function  can do more than just change element types. It can also move data to and from CUDA, and it works for the wide range of torch datatypes, including neural networks.

Tensors and numpy matrices
Since tensors and numpy matrices are so similar, it would be nice if we could convert them to each other. And we, indeed, can. It is as easy as cake. To convert tensor to matrix, just call  method. For the opposite, call  constructor: >>> a=torch.ones(2,2, dtype=torch.float16) >>> a.numpy array(1., 1.],          [1., 1., dtype=float16) >>> b=np.ones((2,2), dtype=np.float16) >>> torch.tensor(b) tensor(1., 1.],           [1., 1., dtype=torch.float16)

CUDA
While you can use PyTorch without CUDA, it accelerates the computations by a factor of 10-20.

Before using CUDA, check whether it is available. Type: torch.cuda.is_available If it returned False, you may skip the rest of this section.

You may also check the versions of CUDA and cuDNN library: >>> torch.version.cuda '10.0'   >>> torch.backends.cudnn.version 7401   >>> torch.backends.cudnn.enabled True

Unlike numpy, tensors can be easily moved to and from CUDA memory. In CUDA, you can do almost whatever you can do out of it. If your computer is equipped with CUDA, and you installed the driver (NVIDIA CUDA 10.0 or higher), you can do the following: cuda = torch.device('cuda') a = torch.randn(10000, 10000, device=cuda) b = torch.randn(10000, 10000, device=cuda) t = time.time; c = torch.matmul(a, b); print(time.time-t)

On my computer, the time was 0.4 seconds, which is $$2.5\times 10^{12}$$ multiplications per second.

You can easily move tensors to and from CUDA memory with  method >>> cuda = torch.device('cuda') >>> cpu = torch.device('cpu') >>> a = torch.ones(5,5) >>> b = a.to(cuda) # move to cuda >>> c = b.to(cpu) # move back to cpu >>> a.device device(type='cpu') >>> b.device device(type='cuda') >>> c.device device(type='cpu')

You cannot mix CUDA and CPU tensors in your expressions: >>> a+b Traceback (most recent call last): File " ", line 1, in RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float

Autograd
The autograd module implemented into PyTorch makes calculating gradients via backpropagation a piece of cake. You need to specify the requires_grad parameter ("requires" with -s, "grad" without), and call  method.

>>> a=torch.ones(2,2, requires_grad=True) >>> b=torch.eye(2,2, requires_grad=True) >>> c = a*a*(b+1) >>> d=c.sum >>> d.backward # calculate gradients >>> a.grad # gradient of d with respect to a tensor(4., 2.],        [2., 4.) >>> b.grad # gradient of d with respect to b tensor(1., 1.],        [1., 1.)