Der HTML-Code im Inspektionselement unterscheidet sich von dem auf dem Bildschirm angezeigten
Posted: 05 Jan 2025, 05:37
Ich versuche, die Daten von dieser Website zu löschen
https://www.eurobasket.com/Basketball-B ... 84-Lebanon
Die Website enthält zwei Tabellen:
aber die in der Tabellenzeile angezeigten Daten unterscheiden sich von denen in der HTML-Quelle (nach der Elementprüfung).
für Beispiel: Dies sind die Daten für die erste Zeile:
aber der Name des Spielers ist Jean Abdel-Nour und nicht SMdRl-XIuQ, zRij und ähnliches für die Zahlen.
Ich habe es mit Selen versucht, aber es hat nicht funktioniert
Können Sie mir bitte helfen, einen Weg zu finden, diese Daten zu extrahieren? Ich habe Selen ausprobiert
https://www.eurobasket.com/Basketball-B ... 84-Lebanon
Die Website enthält zwei Tabellen:
aber die in der Tabellenzeile angezeigten Daten unterscheiden sich von denen in der HTML-Quelle (nach der Elementprüfung).
für Beispiel: Dies sind die Daten für die erste Zeile:
Code: Select all
[url=https://basketball.asia-basket.com/player/Jean-Abdel-Nour/45278]SMdRl-XIuQ, zRij[/url]
45
4-9 (38.7%)
0-9 (96.3%)
5-5 (5%)
5
6
6
1
6
5
5
6
5
8
86
5
5
Ich habe es mit Selen versucht, aber es hat nicht funktioniert
Code: Select all
import pandas as pd
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
def extract_box_score_from_url(url):
# Fetch the webpage content
driver = webdriver.Chrome() # Ensure ChromeDriver is installed and in PATH
driver.get(url)
html_content = driver.page_source
soup = BeautifulSoup(html_content, 'html.parser')
driver.quit()
# Extract team and opponent names
team = soup.find('table', {'id': 'aannew'}).find('a').text.strip()
opponent = soup.find_all('table', {'id': 'aannew'})[1].find('a').text.strip()
# Extract headers
stats_divs = soup.find_all('div', class_='dvbs')
header_rows = stats_divs[0].find('thead').find_all('tr')
# Flatten headers by concatenating main headers and subheaders
headers = []
for th in header_rows[1].find_all('th'): # Process the second header row
main_header = th.get('colspan', None)
sub_header = th.get_text(strip=True)
headers.append(sub_header)
# Add Team and Opponent columns to headers
headers += ['Team', 'Opponent']
# Function to extract stats table for a team
def extract_team_stats(dvbs):
rows = dvbs.find('tbody').find_all('tr', class_=['my_pStats1', 'my_pStats2'])
stats = []
for row in rows:
cols = row.find_all('td')
player_data = [col.get_text(strip=True) for col in cols]
stats.append(player_data)
return stats
# Extract stats for both teams
team_stats = extract_team_stats(stats_divs[0])
opponent_stats = extract_team_stats(stats_divs[1])
# Add Team and Opponent columns
num_columns = len(headers)
team_stats = [row + [team, opponent] for row in team_stats if len(row) + 2 == num_columns]
opponent_stats = [row + [opponent, team] for row in opponent_stats if len(row) + 2 == num_columns]
# Combine data
combined_stats = team_stats + opponent_stats
# Create dataframe
df = pd.DataFrame(combined_stats, columns=headers)
return df
url = "https://www.eurobasket.com/Basketball-Box-Score.aspx?Game=2009_1211_2563_2684-Lebanon"
df = extract_box_score_from_url(url)
df
Code: Select all
import pandas as pd
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
def extract_box_score_from_url(url):
# Fetch the webpage content
driver = webdriver.Chrome() # Ensure ChromeDriver is installed and in PATH
driver.get(url)
html_content = driver.page_source
soup = BeautifulSoup(html_content, 'html.parser')
driver.quit()
# Extract team and opponent names
team = soup.find('table', {'id': 'aannew'}).find('a').text.strip()
opponent = soup.find_all('table', {'id': 'aannew'})[1].find('a').text.strip()
# Extract headers
stats_divs = soup.find_all('div', class_='dvbs')
header_rows = stats_divs[0].find('thead').find_all('tr')
# Flatten headers by concatenating main headers and subheaders
headers = []
for th in header_rows[1].find_all('th'): # Process the second header row
main_header = th.get('colspan', None)
sub_header = th.get_text(strip=True)
headers.append(sub_header)
# Add Team and Opponent columns to headers
headers += ['Team', 'Opponent']
# Function to extract stats table for a team
def extract_team_stats(dvbs):
rows = dvbs.find('tbody').find_all('tr', class_=['my_pStats1', 'my_pStats2'])
stats = []
for row in rows:
cols = row.find_all('td')
player_data = [col.get_text(strip=True) for col in cols]
stats.append(player_data)
return stats
# Extract stats for both teams
team_stats = extract_team_stats(stats_divs[0])
opponent_stats = extract_team_stats(stats_divs[1])
# Add Team and Opponent columns
num_columns = len(headers)
team_stats = [row + [team, opponent] for row in team_stats if len(row) + 2 == num_columns]
opponent_stats = [row + [opponent, team] for row in opponent_stats if len(row) + 2 == num_columns]
# Combine data
combined_stats = team_stats + opponent_stats
# Create dataframe
df = pd.DataFrame(combined_stats, columns=headers)
return df
url = "https://www.eurobasket.com/Basketball-Box-Score.aspx?Game=2009_1211_2563_2684-Lebanon"
df = extract_box_score_from_url(url)
df