Wie füge ich neue Berechnungen zum ViT-Modell hinzu?

Guest · Post by **Guest** » 17 Jan 2025, 06:18

Kann mir jemand helfen? Ich möchte dem CAS-ViT-Algorithmus neue Berechnungen hinzufügen, erhalte jedoch die folgende Fehlermeldung: „RuntimeError: Es wurde erwartet, dass die Reduktion in der vorherigen Iteration abgeschlossen wurde, bevor eine neue gestartet wurde. Dieser Fehler weist darauf hin, dass Ihr Modul Parameter hat, die nicht vorhanden waren wird zur Verlusterzeugung verwendet".
Der CAS-ViT-Algorithmus und der Artikel sind verfügbar unter:
https://github.com/Tianfang-Zhang/CAS-ViT
Die Änderungen Ich habe in den CAS-ViT-Algorithmen Folgendes gemacht:

Ich verwende: NUM_MY_TOKENS = 10 und TASK_EMB = 16.

Irgendwelche Vorschläge zur Behebung? Ich verwende mein Notebook, einen Prozessor und einen Imagenet-1K-Datensatz. Es folgt der Eingabeaufforderungsbefehl:
python -m Torch.distributed.run --nproc_per_node 1 main.py --data_path ../dataset/imagenet --output_dir ./output --model rcvit_xs --lr 6e-3 --batch_size 16 --drop_path 0.1 --model_ema True --use_amp True --multi_scale_sampler
Danke.

Code: Select all

This is the error:
Start training for 30 epochs
task_id_emb:  None
Epoch: [0]  [      0/1281167]  eta: 78 days, 3:40:41  lr: 0.000000  min_lr: 0.000000  loss: 6.9109 (6.9109)  class_acc: 0.0000 (0.0000)  weight_decay: 0.0500 (0.0500)  time: 5.2705  data: 1.1663  max mem: 981
task_id_emb:  None
[rank0]: Traceback (most recent call last):
[rank0]:   File "main.py", line 411, in 
[rank0]:     main(args)
[rank0]:   File "main.py", line 407, in main
[rank0]:     train(args, cfg)
[rank0]:   File "main.py", line 306, in train
[rank0]:     train_stats = train_one_epoch(
[rank0]:   File "/home/evandro/doutorado/cas_dap/classification/engine.py", line 64, in train_one_epoch
[rank0]:     output = model(samples, task_id_emb=task_id_emb)
[rank0]:   File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1632, in forward
[rank0]:     inputs, kwargs = self._pre_forward(*inputs, **kwargs)
[rank0]:   File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1523, in _pre_forward
[rank0]:     if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
[rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
[rank0]: making sure all `forward` function outputs participate in calculating loss.
[rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g.  list, dict, iterable).
[rank0]: Parameter indices which did not receive grad for rank 0: 41 42 43 44 45 46 47 81 82 83 84 85 86 87 125 126 127 128 129 130 131 165 166 167 168 169 170 171 209 210 211 212 213 214 215 249 250 251 252 253 254 255 289 290 291 292 293 294 295 329 330 331 332 333 334 335 373 374 375 376 377 378 379 413 414 415 416 417 418 419
[rank0]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
E0116 09:50:36.465787 140226484999680 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 10243) of binary: /home/evandro/anaconda3/envs/dap-cas/bin/python
Traceback (most recent call last):
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 905, in 
main()
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/evandro/anaconda3/envs/dap-cas/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:

------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time      : 2025-01-16_09:50:36
host      : nitro5
rank      : 0 (local_rank: 0)
exitcode  : 1 (pid: 10243)
error_file: 
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Meine Implementierung:

Code: Select all

class AdditiveBlock(nn.Module):
"""
"""
def __init__(self, dim, mlp_ratio=4., attn_bias=False, drop=0., drop_path=0.,
act_layer=nn.ReLU, norm_layer=nn.GELU, my_config=None):
super().__init__()
self.local_perception = LocalIntegration(dim, ratio=1, act_layer=act_layer, norm_layer=norm_layer)
self.norm1 = norm_layer(dim)
self.attn = AdditiveTokenMixer(dim, attn_bias=attn_bias, proj_drop=drop)
# NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
self.drop_path = DropPath(drop_path) if drop_path > 0.  else nn.Identity()

self.norm2 = norm_layer(dim)
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

# MY NEW CODE
self.my_config = my_config
self.my_emb = nn.Embedding(
num_embeddings=my_config.NUM_TASKS_FOR_EMB,  # Number of tasks
embedding_dim=my_config.TASK_EMB  # Embedding size
)
self.my_downsample = nn.Linear(dim, my_config.NUM_MY_TOKENS)
self.my_film = nn.Linear(my_config.TASK_EMB, dim * 2)  # For gamma and beta
self.my_norm = nn.BatchNorm2d(dim)

def inject_prompts(self, x, task_id_emb):
"""
"""
B, C, H, W = x.size()
print(f"Original x shape: {x.shape}")

# Normalize input
x_norm = self.my_norm(x)

# Downsample for new tokens
#down = self.my_downsample(x_norm).view(B, -1, H * W)

# Generate FiLM parameters
film = self.my_film(task_id_emb)           # Shape: [B, 2 * C]
gamma, beta = film.chunk(2, dim=-1)
gamma = gamma.view(B, C, 1, 1)
beta = beta.view(B, C, 1, 1)

# Apply FiLM modulation
x = gamma * x + beta
#print(f"FiLM x shape: {x.shape}")

# Optionally append my tokens to the input - ERROR
#down = down.view(B, -1, H, W)
#x = torch.cat((x, down), dim=1)  # Concatenate along channels
#print(f"Concatenated x shape: {x.shape}")

return x

def forward(self, x, task_id_emb=None):
"""

"""
# Apply Local Perception
x = x + self.local_perception(x)

# Always use the new logic, even if task_id_emb is None
if task_id_emb is not None:
x = self.inject_prompts(x, task_id_emb)  # Use task-specific logic
# else:
# Use a dummy embedding if task_id_emb is None, to ensure parameters are used
dummy_task_id_emb = torch.zeros((x.size(0), self.my_config.TASK_EMB), device=x.device)
x = self.inject_prompts(x, dummy_task_id_emb)

# Self-Attention with Domain-Adaptive Prompts
x = x + self.drop_path(self.attn(self.norm1(x)))

# Feedforward Network (MLP)
x = x + self.drop_path(self.mlp(self.norm2(x)))

return x

def Stage(dim, index, layers, mlp_ratio=4., act_layer=nn.GELU, attn_bias=False, drop=0., drop_path_rate=0., my_config=None):
"""
"""
blocks = []
for block_idx in range(layers[index]):
block_dpr = drop_path_rate * (block_idx + sum(layers[:index])) / (sum(layers) - 1)

blocks.append(
AdditiveBlock(
dim, mlp_ratio=mlp_ratio, attn_bias=attn_bias, drop=drop, drop_path=block_dpr,
act_layer=act_layer, norm_layer=nn.BatchNorm2d, my_config=my_config)
)
blocks = nn.Sequential(*blocks)
return blocks

class RCViT(nn.Module):
def __init__(self, layers, embed_dims, mlp_ratios=4, downsamples=[True, True, True, True],
norm_layer=nn.BatchNorm2d, attn_bias=False, act_layer=nn.GELU, num_classes=1000,
drop_rate=0., drop_path_rate=0., fork_feat=False, init_cfg=None, pretrained=None,
distillation=True, cfg=None, **kwargs):
super().__init__()

self.my_config = cfg.MODEL.ALGO

if not fork_feat:
self.num_classes = num_classes
self.fork_feat = fork_feat

self.patch_embed = stem(3, embed_dims[0])

network = []
for i in range(len(layers)):
stage = Stage(embed_dims[i], i, layers, mlp_ratio=mlp_ratios, act_layer=act_layer,
attn_bias=attn_bias, drop=drop_rate, drop_path_rate=drop_path_rate, my_config=self.my_config)

network.append(stage)

if i >= len(layers) - 1:
break
if downsamples[i] or embed_dims[i] != embed_dims[i + 1]:
# downsampling between two stages
network.append(
Embedding(
patch_size=3, stride=2, padding=1, in_chans=embed_dims[i],
embed_dim=embed_dims[i+1], norm_layer=nn.BatchNorm2d)
)

self.network = nn.ModuleList(network)

if self.fork_feat:
# add a norm layer for each output
self.out_indices = [0, 2, 4, 6]
for i_emb, i_layer in enumerate(self.out_indices):
if i_emb == 0 and os.environ.get('FORK_LAST3', None):
layer = nn.Identity()
else:
layer = norm_layer(embed_dims[i_emb])
layer_name = f'norm{i_layer}'
self.add_module(layer_name, layer)
else:
# Classifier head
self.norm = norm_layer(embed_dims[-1])

print('Classifier head embed_dims[-1]:  ', embed_dims[-1])
self.head = nn.Linear(
embed_dims[-1], num_classes) if num_classes > 0 \
else nn.Identity()
self.dist = distillation
if self.dist:
self.dist_head = nn.Linear(
embed_dims[-1], num_classes) if num_classes > 0 \
else nn.Identity()

# Initialize weights
self.apply(self.cls_init_weights)

self.init_cfg = copy.deepcopy(init_cfg)
# load pre-trained model
if self.fork_feat and (
self.init_cfg is not None or pretrained is not None):
self.init_weights()

def forward_tokens(self, x, task_id_emb=None):
outs = []
for idx, block in enumerate(self.network):
# x = block(x, task_id_emb=task_id_emb)
#x = block(x)
if isinstance(block, AdditiveBlock):
x = block(x, task_id_emb=task_id_emb)  # Pass task embedding
else:
x = block(x)  # For other blocks like downsampling

if self.fork_feat and idx in self.out_indices:
norm_layer = getattr(self, f'norm{idx}')
x_out = norm_layer(x)
outs.append(x_out)
if self.fork_feat:
return outs
return x

Wie füge ich neue Berechnungen zum ViT-Modell hinzu?

Wie füge ich neue Berechnungen zum ViT-Modell hinzu? ⇐ Python

Quick Reply