RunTimeError: CUDA -Fehler: Ungültiges Geräteordinal ordinal

Anonymous · Post by **Anonymous** » 10 Apr 2025, 07:02

Wenn ich versuche, mein Programm auszuführen, erhalte ich einen Fehler:

RuntimeError: CUDA error: invalid device ordinal< /code>
Der vollständige Fehler ist unten. < /p>
Ich habe nicht viel Erfahrung mit so etwas; Darüber hinaus ist es weder mein eigenes Programm noch meine eigene Maschine.Python 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.8.1+cu102'
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
'GeForce RTX 2080 Ti'
>>>
< /code>
Anders als in dieser Frage scheint die Maschine, die ich verwende, nur Zugriff auf eine einzelne GPU zu haben. Ein Kollege schlug vor, dass es möglicherweise etwas mit Selbst zu tun hat.  Denvice, der den falschen Wert ergibt? < /P>
Natürlich wird jede Hilfe sehr geschätzt!(rlpyt) hbp@aklma-MS-7B24:~$ cd Documents/Bing/Mathieu/learning_to_be_taught/experiments/vmpo_replay_ratio/(rlpyt) hbp@aklma-MS-7B24:~/Documents/Bing/Mathieu/learning_to_be_taught/experiments/vmpo_replay_ratio$ python vmpo_replay_ratio.py
/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
training started with parameters: Namespace(epochs=None, log_dir=None, log_dir_positional=None, name='run', run_id=None, serial_mode=False, slot_affinity_code=None, snapshot_file=None)
exp_dir: /home/hbp/Documents/Bing/Mathieu/learning_to_be_taught/experiments/vmpo_replay_ratio/logs/run_6
using seed 5986
2021-05-27 14:11:40.546471  | run_6 Running 1520 sampler iterations.
2021-05-27 14:11:40.600944  | run_6 Optimizer master CPU affinity: [0].
2021-05-27 14:11:40.626970  | run_6 Initialized async CPU agent model.
2021-05-27 14:11:40.627073  | run_6 WARNING: unequal number of envs per process, from batch_B 6400 and n_worker 7 (possible suboptimal speed).
2021-05-27 14:11:40.627223  | run_6 Total parallel evaluation envs: 21.
2021-05-27 14:11:40.657946  | run_6 Optimizer master Torch threads: 1.
using seed 5987
using seed 5986
using seed 5988
using seed 5989
using seed 5990
using seed 5991
Traceback (most recent call last):
File "vmpo_replay_ratio.py", line 213, in 
build_and_train(slot_affinity_code=args.slot_affinity_code,
File "vmpo_replay_ratio.py", line 135, in build_and_train
runner.train()
File "/home/hbp/Documents/Bing/Mathieu/rlpyt/rlpyt/runners/async_rl.py", line 87, in train
throttle_itr, delta_throttle_itr = self.startup()
File "/home/hbp/Documents/Bing/Mathieu/rlpyt/rlpyt/runners/async_rl.py", line 161, in startup
throttle_itr, delta_throttle_itr = self.optim_startup()
File "/home/hbp/Documents/Bing/Mathieu/rlpyt/rlpyt/runners/async_rl.py", line 177, in optim_startup
self.agent.to_device(main_affinity.get("cuda_idx", None))
File "/home/hbp/Documents/Bing/Mathieu/rlpyt/rlpyt/agents/base.py", line 115, in to_device
using seed 5992
self.model.to(self.device)
File "/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/home/hbp/anaconda3/envs/rlpyt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
2021-05-27 14:11:40.987723  | run_6 Sampler rank 1 initialized, CPU affinity [2], Torch threads 1, Seed 5987
2021-05-27 14:11:40.987714  | run_6 Sampler rank 0 initialized, CPU affinity [1], Torch threads 1, Seed 5986
2021-05-27 14:11:40.988088  | run_6 Sampler rank 2 initialized, CPU affinity [3], Torch threads 1, Seed 5988
2021-05-27 14:11:40.989922  | run_6 Sampler rank 3 initialized, CPU affinity [4], Torch threads 1, Seed 5989
2021-05-27 14:11:40.992058  | run_6 Sampler rank 4 initialized, CPU affinity [5], Torch threads 1, Seed 5990
2021-05-27 14:11:40.995587  | run_6 Sampler rank 5 initialized, CPU affinity [6], Torch threads 1, Seed 5991
2021-05-27 14:11:40.996119  | run_6 Sampler rank 6 initialized, CPU affinity [7], Torch threads 1, Seed 5992

RunTimeError: CUDA -Fehler: Ungültiges Geräteordinal ordinal

RunTimeError: CUDA -Fehler: Ungültiges Geräteordinal ordinal ⇐ Python

Quick Reply