Kann mir jemand helfen? Ich möchte dem CAS-ViT-Algorithmus neue Berechnungen hinzufügen, erhalte jedoch die folgende Fehlermeldung: „RuntimeError: Es wurde erwartet, dass die Reduktion in der vorherigen Iteration abgeschlossen wurde, bevor eine neue gestartet wurde. Dieser Fehler weist darauf hin, dass Ihr Modul Parameter hat, die nicht vorhanden waren wird zur Verlusterzeugung verwendet".
Der CAS-ViT-Algorithmus und der Artikel sind verfügbar unter: https://github.com/Tianfang-Zhang/CAS-ViT
Die Änderungen Ich habe in den CAS-ViT-Algorithmen Folgendes gemacht:
Ich verwende: NUM_MY_TOKENS = 10 und TASK_EMB = 16.
Irgendwelche Vorschläge zur Behebung? Ich verwende mein Notebook, einen Prozessor und einen Imagenet-1K-Datensatz. Es folgt der Eingabeaufforderungsbefehl:
python -m Torch.distributed.run --nproc_per_node 1 main.py --data_path ../dataset/imagenet --output_dir ./output --model rcvit_xs --lr 6e-3 --batch_size 16 --drop_path 0.1 --model_ema True --use_amp True --multi_scale_sampler
Danke.
This is the error:
Start training for 30 epochs
task_id_emb: None
Epoch: [0] [ 0/1281167] eta: 78 days, 3:40:41 lr: 0.000000 min_lr: 0.000000 loss: 6.9109 (6.9109) class_acc: 0.0000 (0.0000) weight_decay: 0.0500 (0.0500) time: 5.2705 data: 1.1663 max mem: 981
task_id_emb: None
[rank0]: Traceback (most recent call last):
[rank0]: File "main.py", line 411, in
[rank0]: main(args)
[rank0]: File "main.py", line 407, in main
[rank0]: train(args, cfg)
[rank0]: File "main.py", line 306, in train
[rank0]: train_stats = train_one_epoch(
[rank0]: File "/home/evandro/doutorado/cas_dap/classification/engine.py", line 64, in train_one_epoch
[rank0]: output = model(samples, task_id_emb=task_id_emb)
[rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1632, in forward
[rank0]: inputs, kwargs = self._pre_forward(*inputs, **kwargs)
[rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1523, in _pre_forward
[rank0]: if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
[rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
[rank0]: making sure all `forward` function outputs participate in calculating loss.
[rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
[rank0]: Parameter indices which did not receive grad for rank 0: 41 42 43 44 45 46 47 81 82 83 84 85 86 87 125 126 127 128 129 130 131 165 166 167 168 169 170 171 209 210 211 212 213 214 215 249 250 251 252 253 254 255 289 290 291 292 293 294 295 329 330 331 332 333 334 335 373 374 375 376 377 378 379 413 414 415 416 417 418 419
[rank0]: In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
E0116 09:50:36.465787 140226484999680 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 10243) of binary: /home/evandro/anaconda3/envs/dap-cas/bin/python
Traceback (most recent call last):
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 905, in
main()
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-01-16_09:50:36
host : nitro5
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 10243)
error_file:
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Kann mir jemand helfen? Ich möchte dem CAS-ViT-Algorithmus neue Berechnungen hinzufügen, erhalte jedoch die folgende Fehlermeldung: „RuntimeError: Es wurde erwartet, dass die Reduktion in der vorherigen Iteration abgeschlossen wurde, bevor eine neue gestartet wurde. Dieser Fehler weist darauf hin, dass Ihr Modul Parameter hat, die nicht vorhanden waren wird zur Verlusterzeugung verwendet". Der CAS-ViT-Algorithmus und der Artikel sind verfügbar unter: https://github.com/Tianfang-Zhang/CAS-ViT Die Änderungen Ich habe in den CAS-ViT-Algorithmen Folgendes gemacht: [list] [*]Ich verwende: NUM_MY_TOKENS = 10 und TASK_EMB = 16. [/list] Irgendwelche Vorschläge zur Behebung? Ich verwende mein Notebook, einen Prozessor und einen Imagenet-1K-Datensatz. Es folgt der Eingabeaufforderungsbefehl: python -m Torch.distributed.run --nproc_per_node 1 main.py --data_path ../dataset/imagenet --output_dir ./output --model rcvit_xs --lr 6e-3 --batch_size 16 --drop_path 0.1 --model_ema True --use_amp True --multi_scale_sampler Danke. [code]This is the error: Start training for 30 epochs task_id_emb: None Epoch: [0] [ 0/1281167] eta: 78 days, 3:40:41 lr: 0.000000 min_lr: 0.000000 loss: 6.9109 (6.9109) class_acc: 0.0000 (0.0000) weight_decay: 0.0500 (0.0500) time: 5.2705 data: 1.1663 max mem: 981 task_id_emb: None [rank0]: Traceback (most recent call last): [rank0]: File "main.py", line 411, in [rank0]: main(args) [rank0]: File "main.py", line 407, in main [rank0]: train(args, cfg) [rank0]: File "main.py", line 306, in train [rank0]: train_stats = train_one_epoch( [rank0]: File "/home/evandro/doutorado/cas_dap/classification/engine.py", line 64, in train_one_epoch [rank0]: output = model(samples, task_id_emb=task_id_emb) [rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1632, in forward [rank0]: inputs, kwargs = self._pre_forward(*inputs, **kwargs) [rank0]: File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1523, in _pre_forward [rank0]: if torch.is_grad_enabled() and self.reducer._rebuild_buckets(): [rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by [rank0]: making sure all `forward` function outputs participate in calculating loss. [rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). [rank0]: Parameter indices which did not receive grad for rank 0: 41 42 43 44 45 46 47 81 82 83 84 85 86 87 125 126 127 128 129 130 131 165 166 167 168 169 170 171 209 210 211 212 213 214 215 249 250 251 252 253 254 255 289 290 291 292 293 294 295 329 330 331 332 333 334 335 373 374 375 376 377 378 379 413 414 415 416 417 418 419 [rank0]: In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error E0116 09:50:36.465787 140226484999680 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 10243) of binary: /home/evandro/anaconda3/envs/dap-cas/bin/python Traceback (most recent call last): File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 905, in main() File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper return f(*args, **kwargs) File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 133, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ main.py FAILED ------------------------------------------------------------ Failures:
------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-01-16_09:50:36 host : nitro5 rank : 0 (local_rank: 0) exitcode : 1 (pid: 10243) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ [/code] [b]Meine Implementierung:[/b] [code]class AdditiveBlock(nn.Module): """ """ def __init__(self, dim, mlp_ratio=4., attn_bias=False, drop=0., drop_path=0., act_layer=nn.ReLU, norm_layer=nn.GELU, my_config=None): super().__init__() self.local_perception = LocalIntegration(dim, ratio=1, act_layer=act_layer, norm_layer=norm_layer) self.norm1 = norm_layer(dim) self.attn = AdditiveTokenMixer(dim, attn_bias=attn_bias, proj_drop=drop) # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
# Apply FiLM modulation x = gamma * x + beta #print(f"FiLM x shape: {x.shape}")
# Optionally append my tokens to the input - ERROR #down = down.view(B, -1, H, W) #x = torch.cat((x, down), dim=1) # Concatenate along channels #print(f"Concatenated x shape: {x.shape}")
return x
def forward(self, x, task_id_emb=None): """
""" # Apply Local Perception x = x + self.local_perception(x)
# Always use the new logic, even if task_id_emb is None if task_id_emb is not None: x = self.inject_prompts(x, task_id_emb) # Use task-specific logic # else: # Use a dummy embedding if task_id_emb is None, to ensure parameters are used dummy_task_id_emb = torch.zeros((x.size(0), self.my_config.TASK_EMB), device=x.device) x = self.inject_prompts(x, dummy_task_id_emb)
# Self-Attention with Domain-Adaptive Prompts x = x + self.drop_path(self.attn(self.norm1(x)))
# Feedforward Network (MLP) x = x + self.drop_path(self.mlp(self.norm2(x)))
if not fork_feat: self.num_classes = num_classes self.fork_feat = fork_feat
self.patch_embed = stem(3, embed_dims[0])
network = [] for i in range(len(layers)): stage = Stage(embed_dims[i], i, layers, mlp_ratio=mlp_ratios, act_layer=act_layer, attn_bias=attn_bias, drop=drop_rate, drop_path_rate=drop_path_rate, my_config=self.my_config)
network.append(stage)
if i >= len(layers) - 1: break if downsamples[i] or embed_dims[i] != embed_dims[i + 1]: # downsampling between two stages network.append( Embedding( patch_size=3, stride=2, padding=1, in_chans=embed_dims[i], embed_dim=embed_dims[i+1], norm_layer=nn.BatchNorm2d) )
self.network = nn.ModuleList(network)
if self.fork_feat: # add a norm layer for each output self.out_indices = [0, 2, 4, 6] for i_emb, i_layer in enumerate(self.out_indices): if i_emb == 0 and os.environ.get('FORK_LAST3', None): layer = nn.Identity() else: layer = norm_layer(embed_dims[i_emb]) layer_name = f'norm{i_layer}' self.add_module(layer_name, layer) else: # Classifier head self.norm = norm_layer(embed_dims[-1])
print('Classifier head embed_dims[-1]: ', embed_dims[-1]) self.head = nn.Linear( embed_dims[-1], num_classes) if num_classes > 0 \ else nn.Identity() self.dist = distillation if self.dist: self.dist_head = nn.Linear( embed_dims[-1], num_classes) if num_classes > 0 \ else nn.Identity()
self.init_cfg = copy.deepcopy(init_cfg) # load pre-trained model if self.fork_feat and ( self.init_cfg is not None or pretrained is not None): self.init_weights()
def forward_tokens(self, x, task_id_emb=None): outs = [] for idx, block in enumerate(self.network): # x = block(x, task_id_emb=task_id_emb) #x = block(x) if isinstance(block, AdditiveBlock): x = block(x, task_id_emb=task_id_emb) # Pass task embedding else: x = block(x) # For other blocks like downsampling
if self.fork_feat and idx in self.out_indices: norm_layer = getattr(self, f'norm{idx}') x_out = norm_layer(x) outs.append(x_out) if self.fork_feat: return outs return x [/code]
Ich möchte eine Werkzeugleiste hinzufügen In meiner App, genau wie die Apple -Datei -App mit Pure Swiftui, kann ich einen UIKIT verwenden, um ihn zu implementieren. App, aber ich kann keine Methode...
Ich arbeite an einer VITE- / React -Anwendung mit einem ziemlich typischen Setup. Ich muss jetzt in ein älteres Paket integrieren und bin auf einige Probleme. Das Paket hat einen solchen Code in...
Ich arbeite mit der MagicDraw OpenAPI und möchte der Hauptmenüleiste von MagicDraw rechts neben der Schaltfläche „Hilfe“ eine neue Schaltfläche mit dem Namen „TEST“ hinzufügen (siehe Abbildung...
flask-migrate erkannte meine Spaltenänderungen und konnte die Migration erfolgreich erstellen und ausführen.def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
with...