Wie man Kung Fu Messerfeind nes mit Sehtransformator umgeht, ML

Wie man Kung Fu Messerfeind nes mit Sehtransformator umgeht, ML ⇐ Python

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Anonymous

Wie man Kung Fu Messerfeind nes mit Sehtransformator umgeht, ML

Post by Anonymous » 03 Apr 2025, 07:05

Ich spiele herum, indem ich maschinelles Lernen spiele, indem ich ein Modell trainiere, das die 1 -köpfige Stufe 1 in Kung Fu Nes umgehen will. Repo < /p>
atm, ich stecke in Level 1 mit Feind werfen Messer fest. (close combat is fine so far, distance is a problem)

I have this training script
Here is the step function I got from AI. Es schlägt vor, eine einfache Belohnung zuerst zu trainieren, bevor sie zu komplexeren Belohnungsfunktionen wechseln. < /P>
def step(self, action):
obs, _, done, info = super().step(action)
ram = self.env.get_ram()

# Extract game state information
current_score = ram[self.ram_positions['score']] * 100
current_scroll = ram[self.ram_positions['scroll']]
current_hp = ram[self.ram_positions['hero_hp']]
current_pos_x = ram[self.ram_positions['hero_pos_x']]
current_stage = ram[self.ram_positions['stage']]

# Calculate deltas
score_delta = current_score - self.last_score
scroll_delta = current_scroll - self.last_scroll if current_scroll >= self.last_scroll else (current_scroll + 256 - self.last_scroll)
hp_loss = max(0, int(self.last_hp) - int(current_hp)) if self.last_hp is not None else 0
pos_delta = current_pos_x - self.last_pos_x

# Calculate reward components
reward = 0
reward += score_delta * 5 # Points gained
reward += scroll_delta * 10 # Progress through level
reward += pos_delta * 0.1 # Movement (small reward for moving)
reward -= hp_loss * 50 # Penalty for losing health

# Update tracking variables
self.last_score = current_score
self.last_scroll = current_scroll
self.last_hp = current_hp
self.last_pos_x = current_pos_x

# Calculate survival time
survival_time = time.time() - self.episode_start_time if self.episode_start_time else 0

# Add detailed info to the info dict
info['score'] = current_score
info['hp'] = current_hp
info['scroll'] = current_scroll
info['pos_x'] = current_pos_x
info['stage'] = current_stage
info['survival_time'] = survival_time
info['score_delta'] = score_delta
info['scroll_delta'] = scroll_delta
info['hp_loss'] = hp_loss
info['episode'] = {
'r': reward,
'l': 1, # Episode length (steps)
't': survival_time,
'score': current_score,
'scroll': current_scroll,
'hp': current_hp
}

return obs, reward, done, info
< /code>
zuvor habe ich für die ganze Nacht mit 5_000_000 mit komplexeren Belohnungsfunktionen trainiert, aber immer noch nicht passieren. Ich werde die Belohnung nutzen, um mehr zu trainieren, aber ich bin besorgt, dass dies eine weitere Zeitverschwendung ist. Ich brauche einige Vorschläge, wie man das übernimmt.

1743656753

Anonymous

Ich spiele herum, indem ich maschinelles Lernen spiele, indem ich ein Modell trainiere, das die 1 -köpfige Stufe 1 in Kung Fu Nes umgehen will. Repo < /p>
atm, ich stecke in Level 1 mit Feind werfen Messer fest. (close combat is fine so far, distance is a problem)
[img]https://i.sstatic.net/bxpfamUr.png[/img]

I have this training script
Here is the step function I got from AI. Es schlägt vor, eine einfache Belohnung zuerst zu trainieren, bevor sie zu komplexeren Belohnungsfunktionen wechseln. < /P>
def step(self, action):
obs, _, done, info = super().step(action)
ram = self.env.get_ram()

# Extract game state information
current_score = ram[self.ram_positions['score']] * 100
current_scroll = ram[self.ram_positions['scroll']]
current_hp = ram[self.ram_positions['hero_hp']]
current_pos_x = ram[self.ram_positions['hero_pos_x']]
current_stage = ram[self.ram_positions['stage']]

# Calculate deltas
score_delta = current_score - self.last_score
scroll_delta = current_scroll - self.last_scroll if current_scroll >= self.last_scroll else (current_scroll + 256 - self.last_scroll)
hp_loss = max(0, int(self.last_hp) - int(current_hp)) if self.last_hp is not None else 0
pos_delta = current_pos_x - self.last_pos_x

# Calculate reward components
reward = 0
reward += score_delta * 5          # Points gained
reward += scroll_delta * 10         # Progress through level
reward += pos_delta * 0.1           # Movement (small reward for moving)
reward -= hp_loss * 50              # Penalty for losing health

# Update tracking variables
self.last_score = current_score
self.last_scroll = current_scroll
self.last_hp = current_hp
self.last_pos_x = current_pos_x

# Calculate survival time
survival_time = time.time() - self.episode_start_time if self.episode_start_time else 0

# Add detailed info to the info dict
info['score'] = current_score
info['hp'] = current_hp
info['scroll'] = current_scroll
info['pos_x'] = current_pos_x
info['stage'] = current_stage
info['survival_time'] = survival_time
info['score_delta'] = score_delta
info['scroll_delta'] = scroll_delta
info['hp_loss'] = hp_loss
info['episode'] = {
'r': reward,
'l': 1,  # Episode length (steps)
't': survival_time,
'score': current_score,
'scroll': current_scroll,
'hp': current_hp
}

return obs, reward, done, info
< /code>
zuvor habe ich für die ganze Nacht mit 5_000_000 mit komplexeren Belohnungsfunktionen trainiert, aber immer noch nicht passieren. Ich werde die Belohnung nutzen, um mehr zu trainieren, aber ich bin besorgt, dass dies eine weitere Zeitverschwendung ist. Ich brauche einige Vorschläge, wie man das übernimmt.

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Quick Reply

Username:

Change Text Case:

Smilies

View more smilies

Similar Topics

Replies

Views

Last post

Wie man mit NoSuchFileException umgeht, während ich Bilder in MySQL mit Spring Stiefel speichere

Last post by Anonymous « 03 Jun 2025, 16:49
Posted in Java

by Anonymous » 03 Jun 2025, 16:49 » in Java

Ich habe gerade angefangen, Spring Boot zu studieren. Als ich versuchte, ein Bild mit der Swagger -Benutzeroberfläche zu speichern, bekam ich eine NoSuchFileException. Kann mir jemand sagen, was ich...

0 Replies

1 Views

Last post by Anonymous
03 Jun 2025, 16:49
Wie man mit onchange in owl odoo 18 umgeht

Last post by Anonymous « 01 Mar 2025, 13:00
Posted in JavaScript

by Anonymous » 01 Mar 2025, 13:00 » in JavaScript

Dies ist meine benutzerdefinierte Vorlage, die die Website von weberiten.

Select Shop

Select a Shop

Please select the shop nearest to your location.

Und dies ist mein...

0 Replies

6 Views

Last post by Anonymous
01 Mar 2025, 13:00
Django Wie man mit Datenbankverriegelung umgeht [geschlossen]

Last post by Anonymous « 04 Mar 2025, 13:59
Posted in Python

by Anonymous » 04 Mar 2025, 13:59 » in Python

Ich habe zwei Funktionen, Periodic_Signals () und update_choice () und verwenden Sie postgres .
update_choice() wird aufgerufen, sobald Benutzer eine neue Auswahl der Koordination treffen und die...

0 Replies

10 Views

Last post by Anonymous
04 Mar 2025, 13:59
Wie man mit verschachtelten Zitaten in Bash umgeht

Last post by Anonymous « 21 Mar 2025, 00:52
Posted in MySql

by Anonymous » 21 Mar 2025, 00:52 » in MySql

Ich möchte so etwas tun:
docker exec -it abc bash -c 'mysql -e SET GLOBAL clone_valid_donor_list = '10.18.111.76:3306'; '

Aber es funktioniert nicht, ich habe einen Fehler wie:
ERROR 1064 (42000)...

0 Replies

12 Views

Last post by Anonymous
21 Mar 2025, 00:52
Weiter.js Wie man Wurzel mit dynamischer Route [Gebietsschale] (Lokalisierung) umgeht

Last post by Anonymous « 03 Apr 2025, 08:14
Posted in JavaScript

by Anonymous » 03 Apr 2025, 08:14 » in JavaScript

Ich habe ein Problem oder ich möchte eine Idee /Lösung, wie ich mich daran nähern soll. Und die Haupt -Seite funktioniert nur mit definiertem Gebietsschema, was korrekt ist! Aber wie soll ich mit...

0 Replies

17 Views

Last post by Anonymous
03 Apr 2025, 08:14

Return to “Python”