atm, ich stecke in Level 1 mit Feind werfen Messer fest. (close combat is fine so far, distance is a problem)

I have this training script
Here is the step function I got from AI. Es schlägt vor, eine einfache Belohnung zuerst zu trainieren, bevor sie zu komplexeren Belohnungsfunktionen wechseln. < /P>
def step(self, action):
obs, _, done, info = super().step(action)
ram = self.env.get_ram()
# Extract game state information
current_score = ram[self.ram_positions['score']] * 100
current_scroll = ram[self.ram_positions['scroll']]
current_hp = ram[self.ram_positions['hero_hp']]
current_pos_x = ram[self.ram_positions['hero_pos_x']]
current_stage = ram[self.ram_positions['stage']]
# Calculate deltas
score_delta = current_score - self.last_score
scroll_delta = current_scroll - self.last_scroll if current_scroll >= self.last_scroll else (current_scroll + 256 - self.last_scroll)
hp_loss = max(0, int(self.last_hp) - int(current_hp)) if self.last_hp is not None else 0
pos_delta = current_pos_x - self.last_pos_x
# Calculate reward components
reward = 0
reward += score_delta * 5 # Points gained
reward += scroll_delta * 10 # Progress through level
reward += pos_delta * 0.1 # Movement (small reward for moving)
reward -= hp_loss * 50 # Penalty for losing health
# Update tracking variables
self.last_score = current_score
self.last_scroll = current_scroll
self.last_hp = current_hp
self.last_pos_x = current_pos_x
# Calculate survival time
survival_time = time.time() - self.episode_start_time if self.episode_start_time else 0
# Add detailed info to the info dict
info['score'] = current_score
info['hp'] = current_hp
info['scroll'] = current_scroll
info['pos_x'] = current_pos_x
info['stage'] = current_stage
info['survival_time'] = survival_time
info['score_delta'] = score_delta
info['scroll_delta'] = scroll_delta
info['hp_loss'] = hp_loss
info['episode'] = {
'r': reward,
'l': 1, # Episode length (steps)
't': survival_time,
'score': current_score,
'scroll': current_scroll,
'hp': current_hp
}
return obs, reward, done, info
< /code>
zuvor habe ich für die ganze Nacht mit 5_000_000 mit komplexeren Belohnungsfunktionen trainiert, aber immer noch nicht passieren. Ich werde die Belohnung nutzen, um mehr zu trainieren, aber ich bin besorgt, dass dies eine weitere Zeitverschwendung ist. Ich brauche einige Vorschläge, wie man das übernimmt.