Torch cuda is available false windows - Доктор Windows - ваш надежный помощник

Your graphics card does not support CUDA 9.0.

Since I’ve seen a lot of questions that refer to issues like this I’m writing a broad answer on how to check if your system is compatible with CUDA, specifically targeted at using PyTorch with CUDA support. Various circumstance-dependent options for resolving issues are described in the last section of this answer.

The system requirements to use PyTorch with CUDA are as follows:

Your graphics card must support the required version of CUDA
Your graphics card driver must support the required version of CUDA
The PyTorch binaries must be built with support for the compute capability of your graphics card

Note: If you install pre-built binaries (using either pip or conda) then you do not need to install the CUDA toolkit or runtime on your system before installing PyTorch with CUDA support. This is because PyTorch, unless compiled from source, is always delivered with a copy of the CUDA library.

1. How to check if your GPU/graphics card supports a particular CUDA version

First, identify the model of your graphics card.

Before moving forward ensure that you’ve got an NVIDIA graphics card. AMD and Intel graphics cards do not support CUDA.

NVIDIA doesn’t do a great job of providing CUDA compatibility information in a single location. The best resource is probably this section on the CUDA Wikipedia page. To determine which versions of CUDA are supported

Locate your graphics card model in the big table and take note of the compute capability version. For example, the GeForce 820M compute capability is 2.1.
In the bullet list preceding the table check to see if the required CUDA version is supported by the compute capability of your graphics card. For example, CUDA 9.2 is not supported for compute compatibility 2.1.

If your card doesn’t support the required CUDA version then see the options in section 4 of this answer.

Note: Compute capability refers to the computational features supported by your graphics card. Newer versions of the CUDA library rely on newer hardware features, which is why we need to determine the compute capability in order to determine the supported versions of CUDA.

2. How to check if your GPU/graphics driver supports a particular CUDA version

The graphics driver is the software that allows your operating system to communicate with your graphics card. Since CUDA relies on low-level communication with the graphics card you need to have an up-to-date driver in order use the latest versions of CUDA.

First, make sure you have an NVIDIA graphics driver installed on your system. You can acquire the newest driver for your system from NVIDIA’s website.

If you’ve installed the latest driver version then your graphics driver probably supports every CUDA version compatible with your graphics card (see section 1). To verify, you can check Table 3 in the CUDA release notes. In rare cases I’ve heard of the latest recommended graphics drivers not supporting the latest CUDA releases. You should be able to get around this by installing the CUDA toolkit for the required CUDA version and selecting the option to install compatible drivers, though this usually isn’t required.

If you can’t, or don’t want to upgrade the graphics driver then you can check to see if your current driver supports the specific CUDA version as follows:

On Windows

Determine your current graphics driver version (Source https://www.nvidia.com/en-gb/drivers/drivers-faq/)

Right-click on your desktop and select NVIDIA Control Panel. From the
NVIDIA Control Panel menu, select Help > System Information. The
driver version is listed at the top of the Details window. For more
advanced users, you can also get the driver version number from the
Windows Device Manager. Right-click on your graphics device under
display adapters and then select Properties. Select the Driver tab and
read the Driver version. The last 5 digits are the NVIDIA driver
version number.

Visit the CUDA release notes and scroll down to Table 3. Use this table to verify your graphics driver is new enough to support the required version of CUDA.

On Linux/OS X

Run the following command in a terminal window

nvidia-smi

This should result in something like the following

Sat Apr  4 15:31:57 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 206...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   35C    P8    16W / 175W |    502MiB /  7974MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1138      G   /usr/lib/xorg/Xorg                           300MiB |
|    0      2550      G   /usr/bin/compiz                              189MiB |
|    0      5735      G   /usr/lib/firefox/firefox                       5MiB |
|    0      7073      G   /usr/lib/firefox/firefox                       5MiB |
+-----------------------------------------------------------------------------+

Driver Version: ###.## is your graphic driver version. In the example above the driver version is 435.21.

CUDA Version: ##.# is the latest version of CUDA supported by your graphics driver. In the example above the graphics driver supports CUDA 10.1 as well as all compatible CUDA versions before 10.1.

Note: The CUDA Version displayed in this table does not indicate that the CUDA toolkit or runtime are actually installed on your system. This just indicates the latest version of CUDA your graphics driver is compatible with.

To be extra sure that your driver supports the desired CUDA version you can visit Table 3 on the CUDA release notes page.

3. How to check if a particular version of PyTorch is compatible with your GPU/graphics card compute capability

Even if your graphics card supports the required version of CUDA then it’s possible that the pre-compiled PyTorch binaries were not compiled with support for your compute capability. For example, in PyTorch 0.3.1 support for compute capability <= 5.0 was dropped.

First, verify that your graphics card and driver both support the required CUDA version (see Sections 1 and 2 above), the information in this section assumes that this is the case.

The easiest way to check if PyTorch supports your compute capability is to install the desired version of PyTorch with CUDA support and run the following from a python interpreter

>>> import torch
>>> torch.zeros(1).cuda()

If you get an error message that reads

Found GPU0 XXXXX which is of cuda capability #.#.
PyTorch no longer supports this GPU because it is too old.

then that means PyTorch was not compiled with support for your compute capability. If this runs without issue then you should be good to go.

Update If you’re installing an old version of PyTorch on a system with a newer GPU then it’s possible that the old PyTorch release wasn’t compiled with support for your compute capability. Assuming your GPU supports the version of CUDA used by PyTorch, then you should be able to rebuild PyTorch from source with the desired CUDA version or upgrade to a more recent version of PyTorch that was compiled with support for the newer compute capabilities.

4. Conclusion

If your graphics card and driver support the required version of CUDA (section 1 and 2) but the PyTorch binaries don’t support your compute capability (section 3) then your options are

Compile PyTorch from source with support for your compute capability (see here)
Install PyTorch without CUDA support (CPU-only)
Install an older version of the PyTorch binaries that support your compute capability (not recommended as PyTorch 0.3.1 is very outdated at this point). AFAIK compute capability older than 3.X has never been supported in the pre-built binaries
Upgrade your graphics card

If your graphics card doesn’t support the required version of CUDA (section 1) then your options are

Install PyTorch without CUDA support (CPU-only)
Install an older version of PyTorch that supports a CUDA version supported by your graphics card (still may require compiling from source if the binaries don’t support your compute capability)
Upgrade your graphics card

Источник

@SimplyLucKey Please run python -m torch.utils.collect_env and post its output here.

Oops sorry I think I uninstalled pytorch so I reinstalled it again and it worked this time.

However I have been running into an error with memory allocation and they’re only a few MB big (I have 8 GB RAM).

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

<ipython-input-19-334bcbb678d7> in train_epoch(model, data_loader, loss_fn, optimizer, device, scheduler, n_examples)
     10         targets = i['targets'].to(device)
     11 
---> 12         outputs = model(input_ids=input_ids, attention_mask=attention_mask)
     13         _, preds = torch.max(outputs, dim=1) # process with the highest probability
     14         loss = loss_fn(outputs, targets)

~anaconda3libsite-packagestorchnnmodulesmodule.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-16-80e63a2794b9> in forward(self, input_ids, attention_mask)
      7 
      8     def forward(self, input_ids, attention_mask):
----> 9         _, pooled_output = self.bert(input_ids=input_ids, attention_mask=attention_mask, return_dict=False)
     10         output = self.drop(pooled_output)
     11         return self.out(output)

~anaconda3libsite-packagestorchnnmodulesmodule.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~anaconda3libsite-packagestransformersmodelsbertmodeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    956             past_key_values_length=past_key_values_length,
    957         )
--> 958         encoder_outputs = self.encoder(
    959             embedding_output,
    960             attention_mask=extended_attention_mask,

~anaconda3libsite-packagestorchnnmodulesmodule.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~anaconda3libsite-packagestransformersmodelsbertmodeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    557                 )
    558             else:
--> 559                 layer_outputs = layer_module(
    560                     hidden_states,
    561                     attention_mask,

~anaconda3libsite-packagestorchnnmodulesmodule.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~anaconda3libsite-packagestransformersmodelsbertmodeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    493             present_key_value = present_key_value + cross_attn_present_key_value
    494 
--> 495         layer_output = apply_chunking_to_forward(
    496             self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output
    497         )

~anaconda3libsite-packagestransformersmodeling_utils.py in apply_chunking_to_forward(forward_fn, chunk_size, chunk_dim, *input_tensors)
   1785         return torch.cat(output_chunks, dim=chunk_dim)
   1786 
-> 1787     return forward_fn(*input_tensors)

~anaconda3libsite-packagestransformersmodelsbertmodeling_bert.py in feed_forward_chunk(self, attention_output)
    505 
    506     def feed_forward_chunk(self, attention_output):
--> 507         intermediate_output = self.intermediate(attention_output)
    508         layer_output = self.output(intermediate_output, attention_output)
    509         return layer_output

~anaconda3libsite-packagestorchnnmodulesmodule.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~anaconda3libsite-packagestransformersmodelsbertmodeling_bert.py in forward(self, hidden_states)
    408 
    409     def forward(self, hidden_states):
--> 410         hidden_states = self.dense(hidden_states)
    411         hidden_states = self.intermediate_act_fn(hidden_states)
    412         return hidden_states

~anaconda3libsite-packagestorchnnmodulesmodule.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~anaconda3libsite-packagestorchnnmoduleslinear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

~anaconda3libsite-packagestorchnnfunctional.py in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 4.00 GiB total capacity; 2.66 GiB already allocated; 27.74 MiB free; 2.88 GiB reserved in total by PyTorch)

Источник

CUDA(or Computer Unified Device Architecture) is a proprietary parallel computing platform and programming model from NVIDIA. Using the CUDA SDK, developers can utilize their NVIDIA GPUs(Graphics Processing Units), thus enabling them to bring in the power of GPU-based parallel processing instead of the usual CPU-based sequential processing in their usual programming workflow.

With deep learning on the rise in recent years, it’s seen that various operations involved in model training, like matrix multiplication, inversion, etc., can be parallelized to a great extent for better learning performance and faster training cycles. Thus, many deep learning libraries like Pytorch enable their users to take advantage of their GPUs using a set of interfaces and utility functions. This article will cover setting up a CUDA environment in any system containing CUDA-enabled GPU(s) and a brief introduction to the various CUDA operations available in the Pytorch library using Python.

Installation

First, you should ensure that their GPU is CUDA enabled or not by checking their system’s GPU through the official Nvidia CUDA compatibility list. Pytorch makes the CUDA installation process very simple by providing a nice user-friendly interface that lets you choose your operating system and other requirements, as given in the figure below. According to our computing machine, we’ll be installing according to the specifications given in the figure below.

Refer to Pytorch’s official link and choose the specifications according to their computer specifications. We also suggest a complete restart of the system after installation to ensure the proper working of the toolkit.

Screenshot from Pytorch’s installation page

pip3 install torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Getting started with CUDA in Pytorch

Once installed, we can use the torch.cuda interface to interact with CUDA using Pytorch. We’ll use the following functions:

Syntax:

torch.version.cuda(): Returns CUDA version of the currently installed packages

torch.cuda.is_available(): Returns True if CUDA is supported by your system, else False

torch.cuda.current_device(): Returns ID of current device

torch.cuda.get_device_name(device_ID): Returns name of the CUDA device with ID = ‘device_ID’

Code:

Python3

import torch

print(f"Is CUDA supported by this system?

{torch.cuda.is_available()}")

print(f"CUDA version: {torch.version.cuda}")

cuda_id = torch.cuda.current_device()

print(f"ID of current CUDA device:

{torch.cuda.current_device()}")

print(f"Name of current CUDA device:

{torch.cuda.get_device_name(cuda_id)}")

Output:

CUDA version

Handling Tensors with CUDA

For interacting Pytorch tensors through CUDA, we can use the following utility functions:

Syntax:

Tensor.device: Returns the device name of ‘Tensor’

Tensor.to(device_name): Returns new instance of ‘Tensor’ on the device specified by ‘device_name’: ‘cpu’ for CPU and ‘cuda’ for CUDA enabled GPU

Tensor.cpu(): Transfers ‘Tensor’ to CPU from it’s current device

To demonstrate the above functions, we’ll be creating a test tensor and do the following operations:

Checking the current device of the tensor and applying a tensor operation(squaring), transferring the tensor to GPU and applying the same tensor operation(squaring) and comparing the results of the 2 devices.

Code:

Python3

import torch

x = torch.randint(1, 100, (100, 100))

print(x.device)

res_cpu = x ** 2

x = x.to(torch.device('cuda'))

print(x.device)

res_gpu = x ** 2

assert torch.equal(res_cpu, res_gpu.cpu())

Output:

cpu
cuda : 0

Handling Machine Learning models with CUDA

A good Pytorch practice is to produce device-agnostic code because some systems might not have access to a GPU and have to rely on the CPU only or vice versa. Once that’s done the following function can be used to transfer any machine learning model onto the selected device

Syntax: Model.to(device_name):

Returns: New instance of Machine Learning ‘Model’ on the device specified by ‘device_name’: ‘cpu’ for CPU and ‘cuda’ for CUDA enabled GPU

In this example, we are importing the pre-trained Resnet-18 model from the torchvision.models utility, the reader can use the same steps for transferring models to their selected device.

Code:

Python3

import torch

import torchvision.models as models

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model = models.resnet18(pretrained=True)

model = model.to(device)

Output:

ML with CUDA

Источник

#pytorch #torch

Вопрос:

Я попробовал несколько решений, которые намекали на то, что делать, когда графический процессор CUDA доступен и CUDA установлен, но Torch.cuda.is_available() возвращается False . Они помогли, но только временно, что означает torch.cuda-is_available() сообщение True, но через некоторое время оно снова переключилось на False. Я использую CUDA 9.0.176 и GTX 1080. Что я должен сделать, чтобы получить постоянный эффект?

Я попробовал следующие методы:

https://forums.fast.ai/t/torch-cuda-is-available-returns-false/16721/5
https://github.com/pytorch/pytorch/issues/15612

Примечание: Когда torch.cuda.is_available() работает нормально, но затем в какой-то момент переключается на False , тогда мне нужно перезагрузить компьютер, а затем он снова работает (в течение некоторого времени).

Ответ №1:

Также с torch.cuda.is_available () had false .
Но при установке драйвера Nvidia до самой последней версии 436.48 отображается значение True. Ранее я обновил Pytorch до 1.2.0. У меня Windows 10 и Anaconda.

Ответ №2:

Причиной torch.cuda.is_available() False этого является несовместимость между версиями pytorch и cudatoolkit .

По состоянию на июнь 2022 года текущая версия pytorch совместима с cudatoolkit = 11.3, тогда как текущая версия cuda toolkit = 11.7. Источник

Решение:

Удалите Pytorch для новой установки. Вы не можете установить старую версию поверх новой версии без принудительной установки (с помощью pip install --upgrade --force-reinstall <package_name> .
Запустите conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch для установки pytorch.
Установите версию CUDA 11.3 из https://developer.nvidia.com/cuda-11.3.0-download-archive .

Все готово. torch.cuda.is_available() --> True» src=»https://i.stack.imgur.com/skgmm.png»></a></p>
</div>
<h3><em>Комментарии:</em></h3>
<blockquote class=

1. 1 Это решило мою проблему. Я попробовал это с помощью CUDA 11.6, и он отлично работает. (в этом случае cuda-11.6.0-download-archive следует использовать вместе с conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge )

Ответ №3:

Ответ №4:

Я тоже видел эту проблему. Причиной была несинхронизация версии CUDA, используемой Pytorch, с установленным драйвером Nvidia. Как и в ответе Джо, решением было обновление драйверов Nvidia. Некоторая другая важная справочная информация, о которой следует знать:

Для каждого выпуска CUDA требуется минимальная версия драйвера Nvidia (см. Таблицу совместимости здесь).
Вы можете проверить версию драйвера Nvidia с nvidia-smi помощью .
Pytorch поставляется в комплекте с версией CUDA, которая может отличаться от версии, установленной на вашем компьютере.
Версия CUDA, которую вы установили вручную, отображается при запуске nvidia-smi . Даже если ваша версия драйвера совместима с этой версией CUDA, она может быть несовместима с версией Pytorch CUDA.
Вы можете получить версию Pytorch CUDA, напечатав torch.version.cuda переменную в ipython или в программе Python. Это версия, которая определяет необходимую версию драйвера Nvidia.

Parameters: device (torch.device or int, optional) – device for which to return the
device capability. This function is a no-op if this argument is
a negative integer. It uses the current device, given by
current_device(), if device is None
(default).
Returns: the major and minor cuda capability of the device
Return type: tuple(int, int)

torch.cuda.get_device_name(device: Union[torch.device, str, int, None] = None) → str[source]¶

Gets the name of a device.

Parameters: device (torch.device or int, optional) – device for which to return the
name. This function is a no-op if this argument is a negative
integer. It uses the current device, given by current_device(),
if device is None (default).

torch.cuda.init()[source]¶

Initialize PyTorch’s CUDA state. You may need to call
this explicitly if you are interacting with PyTorch via
its C API, as Python bindings for CUDA functionality will not
be until this initialization takes place. Ordinary users
should not need this, as all of PyTorch’s CUDA methods
automatically initialize CUDA state on-demand.

Does nothing if the CUDA state is already initialized.

torch.cuda.ipc_collect()[source]¶: Force collects GPU memory after it has been released by CUDA IPC.

Note

Checks if any sent CUDA tensors could be cleaned from the memory. Force
closes shared memory file used for reference counting if there is no
active counters. Useful when the producer process stopped actively sending
tensors and want to release unused memory.

torch.cuda.is_available() → bool[source]¶: Returns a bool indicating if CUDA is currently available.

torch.cuda.is_initialized()[source]¶: Returns whether PyTorch’s CUDA state has been initialized.

torch.cuda.set_device(device: Union[torch.device, str, int]) → None[source]¶

Sets the current device.

Usage of this function is discouraged in favor of device. In most
cases it’s better to use CUDA_VISIBLE_DEVICES environmental variable.

Parameters: device (torch.device or int) – selected device. This function is a no-op
if this argument is negative.

torch.cuda.stream(stream)[source]¶

Context-manager that selects a given stream.

All CUDA kernels queued within its context will be enqueued on a selected
stream.

Parameters: stream (Stream) – selected stream. This manager is a no-op if it’s
None.

Note

Streams are per-device. If the selected stream is not on the
current device, this function will also change the current device to
match the stream.

torch.cuda.synchronize(device: Union[torch.device, str, int] = None) → None[source]¶

Waits for all kernels in all streams on a CUDA device to complete.

Parameters: device (torch.device or int, optional) – device for which to synchronize.
It uses the current device, given by current_device(),
if device is None (default).

Random Number Generator¶

torch.cuda.get_rng_state(device: Union[int, str, torch.device] = ‘cuda’) → torch.Tensor[source]¶

Returns the random number generator state of the specified GPU as a ByteTensor.

Parameters: device (torch.device or int, optional) – The device to return the RNG state of.
Default: 'cuda' (i.e., torch.device('cuda'), the current CUDA device).

Warning

This function eagerly initializes CUDA.

torch.cuda.get_rng_state_all() → List[torch.Tensor][source]¶: Returns a list of ByteTensor representing the random number states of all devices.

torch.cuda.set_rng_state(new_state: torch.Tensor, device: Union[int, str, torch.device] = ‘cuda’) → None[source]¶

Sets the random number generator state of the specified GPU.

Parameters

new_state (torch.ByteTensor) – The desired state
device (torch.device or int, optional) – The device to set the RNG state.
Default: 'cuda' (i.e., torch.device('cuda'), the current CUDA device).

torch.cuda.set_rng_state_all(new_states: Iterable[torch.Tensor]) → None[source]¶

Sets the random number generator state of all devices.

Parameters: new_states (Iterable of torch.ByteTensor) – The desired state for each device

torch.cuda.manual_seed(seed: int) → None[source]¶

Sets the seed for generating random numbers for the current GPU.
It’s safe to call this function if CUDA is not available; in that
case, it is silently ignored.

Parameters: seed (int) – The desired seed.

Warning

If you are working with a multi-GPU model, this function is insufficient
to get determinism. To seed all GPUs, use manual_seed_all().

torch.cuda.manual_seed_all(seed: int) → None[source]¶

Sets the seed for generating random numbers on all GPUs.
It’s safe to call this function if CUDA is not available; in that
case, it is silently ignored.

Parameters: seed (int) – The desired seed.

torch.cuda.seed() → None[source]¶: Sets the seed for generating random numbers to a random number for the current GPU.
It’s safe to call this function if CUDA is not available; in that
case, it is silently ignored.

Warning

If you are working with a multi-GPU model, this function will only initialize
the seed on one GPU. To initialize all GPUs, use seed_all().

torch.cuda.seed_all() → None[source]¶: Sets the seed for generating random numbers to a random number on all GPUs.
It’s safe to call this function if CUDA is not available; in that
case, it is silently ignored.

torch.cuda.initial_seed() → int[source]¶: Returns the current random seed of the current GPU.

Warning

This function eagerly initializes CUDA.

Communication collectives¶

torch.cuda.comm.broadcast(tensor, devices=None, *, out=None)[source]¶

Broadcasts a tensor to specified GPU devices.

Parameters

tensor (Tensor) – tensor to broadcast. Can be on CPU or GPU.
devices (Iterable[torch.device, str or int], optional) – an iterable of
GPU devices, among which to broadcast.
out (Sequence[Tensor], optional, keyword-only) – the GPU tensors to
store output results.

Note

Exactly one of devices and out must be specified.

Returns

If devices is specified,

a tuple containing copies of tensor, placed on
devices.
If out is specified,

a tuple containing out tensors, each containing a copy of
tensor.

torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)[source]¶

Broadcasts a sequence tensors to the specified GPUs.
Small tensors are first coalesced into a buffer to reduce the number
of synchronizations.

Parameters

tensors (sequence) – tensors to broadcast. Must be on the same device,
either CPU or GPU.
devices (Iterable[torch.device, str or int]) – an iterable of GPU
devices, among which to broadcast.
buffer_size (int) – maximum size of the buffer used for coalescing

Returns

A tuple containing copies of tensor, placed on devices.

torch.cuda.comm.reduce_add(inputs, destination=None)[source]¶

Sums tensors from multiple GPUs.

All inputs should have matching shapes, dtype, and layout. The output tensor
will be of the same shape, dtype, and layout.

Parameters

inputs (Iterable[Tensor]) – an iterable of tensors to add.
destination (int, optional) – a device on which the output will be
placed (default: current device).

Returns

A tensor containing an elementwise sum of all inputs, placed on the
destination device.

torch.cuda.comm.scatter(tensor, devices=None, chunk_sizes=None, dim=0, streams=None, *, out=None)[source]¶

Scatters tensor across multiple GPUs.

Parameters

tensor (Tensor) – tensor to scatter. Can be on CPU or GPU.
devices (Iterable[torch.device, str or int], optional) – an iterable of
GPU devices, among which to scatter.
chunk_sizes (Iterable[int], optional) – sizes of chunks to be placed on
each device. It should match devices in length and sums to
tensor.size(dim). If not specified, tensor will be divided
into equal chunks.
dim (int, optional) – A dimension along which to chunk tensor.
Default: 0.
out (Sequence[Tensor], optional, keyword-only) – the GPU tensors to
store output results. Sizes of these tensors must match that of
tensor, except for dim, where the total size must
sum to tensor.size(dim).

Note

Exactly one of devices and out must be specified. When
out is specified, chunk_sizes must not be specified and
will be inferred from sizes of out.

Returns

If devices is specified,

a tuple containing chunks of tensor, placed on
devices.
If out is specified,

a tuple containing out tensors, each containing a chunk of
tensor.

torch.cuda.comm.gather(tensors, dim=0, destination=None, *, out=None)[source]¶

Gathers tensors from multiple GPU devices.

Parameters

tensors (Iterable[Tensor]) – an iterable of tensors to gather.
Tensor sizes in all dimensions other than dim have to match.
dim (int, optional) – a dimension along which the tensors will be
concatenated. Default: 0.
destination (torch.device, str, or int, optional) – the output device.
Can be CPU or CUDA. Default: the current CUDA device.
out (Tensor, optional, keyword-only) – the tensor to store gather result.
Its sizes must match those of tensors, except for dim,
where the size must equal sum(tensor.size(dim) for tensor in tensors).
Can be on CPU or CUDA.

Note

destination must not be specified when out is specified.

Returns

If destination is specified,

a tensor located on destination device, that is a result of
concatenating tensors along dim.
If out is specified,

the out tensor, now containing results of concatenating
tensors along dim.

Streams and events¶

class torch.cuda.Stream[source]¶

Wrapper around a CUDA stream.

A CUDA stream is a linear sequence of execution that belongs to a specific
device, independent from other streams. See CUDA semantics for
details.

Parameters

device (torch.device or int, optional) – a device on which to allocate
the stream. If device is None (default) or a negative
integer, this will use the current device.
priority (int, optional) – priority of the stream. Lower numbers
represent higher priorities.

query()[source]¶

Checks if all the work submitted has been completed.

Returns: A boolean indicating if all kernels in this stream are completed.

record_event(event=None)[source]¶

Records an event.

Parameters: event (Event, optional) – event to record. If not given, a new one
will be allocated.
Returns: Recorded event.

synchronize()[source]¶: Wait for all the kernels in this stream to complete.

wait_event(event)[source]¶

Makes all future work submitted to the stream wait for an event.

Parameters: event (Event) – an event to wait for.

Note

This is a wrapper around cudaStreamWaitEvent(): see
CUDA Stream documentation for more info.

This function returns without waiting for event: only future
operations are affected.

wait_stream(stream)[source]¶

Synchronizes with another stream.

All future work submitted to this stream will wait until all kernels
submitted to a given stream at the time of call complete.

Parameters: stream (Stream) – a stream to synchronize.

Note

This function returns without waiting for currently enqueued
kernels in stream: only future operations are affected.

class torch.cuda.Event[source]¶

Wrapper around a CUDA event.

CUDA events are synchronization markers that can be used to monitor the
device’s progress, to accurately measure timing, and to synchronize CUDA
streams.

The underlying CUDA events are lazily initialized when the event is first
recorded or exported to another process. After creation, only streams on the
same device may record the event. However, streams on any device can wait on
the event.

Parameters

enable_timing (bool, optional) – indicates if the event should measure time
(default: False)
blocking (bool, optional) – if True, wait() will be blocking (default: False)
interprocess (bool) – if True, the event can be shared between processes
(default: False)

elapsed_time(end_event)[source]¶: Returns the time elapsed in milliseconds after the event was
recorded and before the end_event was recorded.

classmethod from_ipc_handle(device, handle)[source]¶: Reconstruct an event from an IPC handle on the given device.

ipc_handle()[source]¶: Returns an IPC handle of this event. If not recorded yet, the event
will use the current device.

query()[source]¶

Checks if all work currently captured by event has completed.

Returns: A boolean indicating if all work currently captured by event has
completed.

record(stream=None)[source]¶

Records the event in a given stream.

Uses torch.cuda.current_stream() if no stream is specified. The
stream’s device must match the event’s device.

synchronize()[source]¶

Waits for the event to complete.

Waits until the completion of all work currently captured in this event.
This prevents the CPU thread from proceeding until the event completes.

wait(stream=None)[source]¶

Makes all future work submitted to the given stream wait for this
event.

Use torch.cuda.current_stream() if no stream is specified.

Memory management¶

torch.cuda.empty_cache() → None[source]¶: Releases all unoccupied cached memory currently held by the caching
allocator so that those can be used in other GPU application and visible in
nvidia-smi.

Note

empty_cache() doesn’t increase the amount of GPU
memory available for PyTorch. However, it may help reduce fragmentation
of GPU memory in certain cases. See Memory management for
more details about GPU memory management.

torch.cuda.memory_stats(device: Union[torch.device, str, None, int] = None) → Dict[str, Any][source]¶

Returns a dictionary of CUDA memory allocator statistics for a
given device.

The return value of this function is a dictionary of statistics, each of
which is a non-negative integer.

Core statistics:

"allocated.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
number of allocation requests received by the memory allocator.
"allocated_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
amount of allocated memory.
"segment.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
number of reserved segments from cudaMalloc().
"reserved_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
amount of reserved memory.
"active.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
number of active memory blocks.
"active_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
amount of active memory.
"inactive_split.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
number of inactive, non-releasable memory blocks.
"inactive_split_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}":
amount of inactive, non-releasable memory.

For these core statistics, values are broken down as follows.

Pool type:

all: combined statistics across all memory pools.
large_pool: statistics for the large allocation pool
(as of October 2019, for size >= 1MB allocations).
small_pool: statistics for the small allocation pool
(as of October 2019, for size < 1MB allocations).

Metric type:

current: current value of this metric.
peak: maximum value of this metric.
allocated: historical total increase in this metric.
freed: historical total decrease in this metric.

In addition to the core statistics, we also provide some simple event
counters:

"num_alloc_retries": number of failed cudaMalloc calls that
result in a cache flush and retry.
"num_ooms": number of out-of-memory errors thrown.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistics for the current device, given by current_device(),
if device is None (default).

torch.cuda.memory_summary(device: Union[torch.device, str, None, int] = None, abbreviated: bool = False) → str[source]¶

Returns a human-readable printout of the current memory allocator
statistics for a given device.

This can be useful to display periodically during training, or when
handling out-of-memory exceptions.

Parameters

device (torch.device or int, optional) – selected device. Returns
printout for the current device, given by current_device(),
if device is None (default).
abbreviated (bool, optional) – whether to return an abbreviated summary
(default: False).

torch.cuda.memory_snapshot()[source]¶

Returns a snapshot of the CUDA memory allocator state across all devices.

Interpreting the output of this function requires familiarity with the
memory allocator internals.

torch.cuda.memory_allocated(device: Union[torch.device, str, None, int] = None) → int[source]¶

Returns the current GPU memory occupied by tensors in bytes for a given
device.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistic for the current device, given by current_device(),
if device is None (default).

Note

This is likely less than the amount shown in nvidia-smi since some
unused memory can be held by the caching allocator and some context
needs to be created on GPU. See Memory management for more
details about GPU memory management.

torch.cuda.max_memory_allocated(device: Union[torch.device, str, None, int] = None) → int[source]¶

Returns the maximum GPU memory occupied by tensors in bytes for a given
device.

By default, this returns the peak allocated memory since the beginning of
this program. reset_peak_stats() can be used to
reset the starting point in tracking this metric. For example, these two
functions can measure the peak allocated memory usage of each iteration in a
training loop.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistic for the current device, given by current_device(),
if device is None (default).

torch.cuda.reset_max_memory_allocated(device: Union[torch.device, str, None, int] = None) → None[source]¶

Resets the starting point in tracking maximum GPU memory occupied by
tensors for a given device.

See max_memory_allocated() for details.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistic for the current device, given by current_device(),
if device is None (default).

Warning

This function now calls reset_peak_memory_stats(), which resets
/all/ peak memory stats.

torch.cuda.memory_reserved(device: Union[torch.device, str, None, int] = None) → int[source]¶

Returns the current GPU memory managed by the caching allocator in bytes
for a given device.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistic for the current device, given by current_device(),
if device is None (default).

torch.cuda.max_memory_reserved(device: Union[torch.device, str, None, int] = None) → int[source]¶

Returns the maximum GPU memory managed by the caching allocator in bytes
for a given device.

By default, this returns the peak cached memory since the beginning of this
program. reset_peak_stats() can be used to reset
the starting point in tracking this metric. For example, these two functions
can measure the peak cached memory amount of each iteration in a training
loop.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistic for the current device, given by current_device(),
if device is None (default).

torch.cuda.memory_cached(device: Union[torch.device, str, None, int] = None) → int[source]¶: Deprecated; see memory_reserved().

torch.cuda.max_memory_cached(device: Union[torch.device, str, None, int] = None) → int[source]¶: Deprecated; see max_memory_reserved().

torch.cuda.reset_max_memory_cached(device: Union[torch.device, str, None, int] = None) → None[source]¶

Resets the starting point in tracking maximum GPU memory managed by the
caching allocator for a given device.

See max_memory_cached() for details.

Parameters: device (torch.device or int, optional) – selected device. Returns
statistic for the current device, given by current_device(),
if device is None (default).

Warning

This function now calls reset_peak_memory_stats(), which resets
/all/ peak memory stats.

Источник

You can accelerate the training of your model by making use of hardware accelerators such as Graphics Processing Units(GPUs). You can make use of GPU hardware accelerators for training your Pytorch models if you have NVIDIA GPU(s) by making use of Compute Unified Device Architecture(CUDA) API. In this chapter of the Pytorch tutorial, you will learn how you can make use of CUDA/GPU to accelerate the training process.

Checking CUDA Availability

Before you start using CUDA, you need to check if you have CUDA available in your environment. You can check it by using the torch.cuda.is_available() function.

# Checking if CUDA is available
torch.cuda.is_available()

The function returns True if CUDA is available, else it will return False.

Moving to CUDA

If you have CUDA available in your environment, you can move your tensor to GPU. You can do this by calling the to() method on the variable(tensors and models) and specifying the device parameter.

Example

In this example, we are moving a tensor from CPU to GPU by creating a copy of the tensor in the GPU with the same variable name, therefore, effectively moving the tensor to GPU.

# Create a tensor
tensor_1 = torch.tensor([1, 2, 3, 4])

# Copy the tensor to GPU with same variable name
tensor_1 = tensor_1.to(device='cuda')

Similarly, you can also move your Neural Network model to GPU.

Example

In this example, we are moving our Neural Network model mynet to GPU. We don’t need to create another variable, you can just make use of the to() method.

mynet.to(device='cuda')

You can check the device on which your variable exists.

Example

For a tensor, this can be achieved by checking the device attribute of your tensor.

tensor_2 = torch.tensor([5, 6, 7, 8])

# Print the device on which tensor is present
print(tensor_2.device)
# Outputs- device(type='cpu')

# Moving the tensor to GPU
tensor_2 = tensor_2.to(device='cuda')

# Print the device on which tensor is present
print(tensor_2.device)
# Outputs- device(type='cuda', index=0)

Example

A model consists of many layers, each of which has its own set of parameters. Therefore you need to check the device on which the parameters of the models exist. In this example, we will see how to check the device on which our Neural Network model mynet exists by checking the device on which its parameters exist.

# Print the device on which mynet exists
print(next(mynet.parameters()).device)
# Outputs- device(type='cpu')

# Moving the model to GPU
mynet.to(device='cuda')

# Print the device on which mynet exists
print(next(mynet.parameters()).device)
# Outputs- device(type='cuda', index=0)

Notice how Pytorch by default makes use of CPU for storing tensors and models. If you have to make use of GPU to store your tensors and models, you will have to explicitly move them to a GPU. However, it can be quite challenging to check if CUDA is available in your environment and moving your Pytorch tensors and models to a GPU if it’s available. This step can be simplified a bit, take a look at the example below.

Example

In this example, we are setting the value of device to 'cuda' if CUDA is available, else we are setting it to 'cpu'. Next, we can move all the tensors and models to device. Therefore, if CUDA is available, then all the models and tensors will be moved to GPU else they will be in CPU.

# Assigning the value of device
if torch.cuda.is_available():
    device = 'cuda'
else:
    device = 'cpu'

# Moving the Neural Network model to the available device
mynet.to(device=device)

# Moving the tensor to the available device
tensor_3 = tensor_3.to(device=device)

Now you can move all your models and parameters to device before training without having to worry about if CUDA will be available or not in the environment in which your code will be running. This way you will be able to take advantage of GPU for training if it’s available.

Источник

runtimeerror: attempting to deserialize object on a cuda device but torch.cuda.is_available() is false. if you are running on a cpu-only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the cpu, How to solve the error presented here with Pytorch in python in the context of deeplearning?

Introduction

First of all, usually this error is encountered when trying to run your model code on the machine’s CPU instead of the GPU.

lib/python3.6/site-packages/torch/serialization.py", ... in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but 
  torch.cuda.is_available() is False. If you are running on a CPU-only machine, 
  please use torch.load with map_location='cpu' to map your storages to the CPU.

In this case you encounter an error which is a raised RuntimeError exception.

This means that you are looking to deserialize an object using code that runs with the GPU while the GPU is disabled.

Then, The error then occurs when you try to load the pytorch model using the torch.load instruction (see https://pytorch.org/docs/stable/generated/torch.load.html)

Example :

torch.load('mymodel.pt')

So when the reloading is done with the wrong configuration you get the error:

lib/python3.6/site-packages/torch/serialization.py", ... in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but 
  torch.cuda.is_available() is False. If you are running on a CPU-only machine, 
  please use torch.load with map_location='cpu' to map your storages to the CPU.

Solution

Also, As you can see in the Pytorch documentation (see https://pytorch.org/docs/stable/generated/torch.load.html) there is a map_location parameter – (a function, torch.device, string or a dict specifying how to remap storage locations).

This is where we must force the parameter to specify the location to use.

Note that if your model is registered and saved as using the GPU you will have to specify GPU otherwise you will have to put CPU. The objective here is to put a consistent parameter between what has been saved and what is reloaded.

And If you want to force CPU usage :

torch.load('mymodel.pt',map_location='cpu')

For CPU usage you can also use :

torch.load('mymodel.pt', map_location=torch.device('cpu'))

# Load on first GPU
torch.load('mymodel.pt', map_location=lambda storage, loc: storage.cuda(1))
# Lord on GPU 0 and 1
torch.load('mymodel.pt', map_location={'cuda:1':'cuda:0'})

External links :

https://pytorch.org/tutorials/beginner/saving_loading_models.html

https://pytorch.org/tutorials/recipes/recipes/save_load_across_devices.html

https://discuss.pytorch.org/t/saving-and-loading-a-model-in-pytorch/2610

Internal links – runtimeerror: attempting to deserialize object on a cuda device but torch.cuda.is_available() is false :

https://128mots.com/index.php/en/2020/11/20/deep-learning-pytorch-from-0-to-1-2/

https://128mots.com/index.php/en/category/non-classe-en/

Источник

I installed CUDA and NVIDIA driver using the following two commands.

$ sudo ubuntu-drivers install

$ sudo apt install nvidia-cuda-toolkit

However, now cuda is not available from within torch. Do you know how I could fix it?

$ python
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.6.0'
>>> torch.version.cuda
'10.1'
>>> torch.cuda.is_available()
False

I am also not sure why after installing the drivers still nvidia-smi is not working:

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:    20.04
Codename:   focal

$ lspci  | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2070 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C UCSI Controller (rev a1)

asked Sep 24, 2020 at 6:45

The problem was fixed after I did a reboot:

(base) mona@mona:~$ nvidia-smi
Thu Sep 24 13:04:12 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P8     8W /  N/A |    428MiB /  7982MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1256      G   /usr/lib/xorg/Xorg                275MiB |
|    0   N/A  N/A      2422      G   /usr/bin/gnome-shell              151MiB |
+-----------------------------------------------------------------------------+
(base) mona@mona:~$ python
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.6.0'
>>> torch.version.cuda
'10.1'
>>> torch.cuda.is_
torch.cuda.is_available(    torch.cuda.is_initialized(
>>> torch.cuda.is_available()
True

answered Sep 24, 2020 at 17:05

Mona JalalMona Jalal

4,00919 gold badges59 silver badges93 bronze badges

Источник

1. How to check if your GPU/graphics card supports a particular CUDA version

2. How to check if your GPU/graphics driver supports a particular CUDA version

On Windows

On Linux/OS X

3. How to check if a particular version of PyTorch is compatible with your GPU/graphics card compute capability

4. Conclusion

Installation

Getting started with CUDA in Pytorch

Python3

Handling Tensors with CUDA

Python3

Handling Machine Learning models with CUDA

Python3

Вопрос:

Комментарии:

Ответ №1:

Ответ №2:

Ответ №3:

Ответ №4:

Комментарии:

Random Number Generator¶

Communication collectives¶

Streams and events¶

Memory management¶

Checking CUDA Availability

Moving to CUDA

Example

Example

Example

Example

Example

Introduction

Solution

External links :

Internal links – runtimeerror: attempting to deserialize object on a cuda device but torch.cuda.is_available() is false :

Вот еще несколько интересных статей: