So extrahieren Sie Informationen aus einer W2-Datei mit Python [geschlossen]

So extrahieren Sie Informationen aus einer W2-Datei mit Python [geschlossen] ⇐ Python

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Guest

So extrahieren Sie Informationen aus einer W2-Datei mit Python [geschlossen]

Post by Guest » 03 Jan 2025, 06:31

Ich möchte Informationen aus einer im PDF-Format gespeicherten W2-Datei extrahieren.
Die Idee besteht darin, für jedes Rechteck in der W2-Datei Kästchen zu erstellen und die W2-Datei als Referenz zu bereinigen.
Gereinigte W2-Datei als Referenz
Ich habe dies bisher versucht:

Code: Select all

import cv2

numpy als np importieren
def get_more_bboxes(image_path):
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Code: Select all

# 1. Apply noise reduction before anything
gray = cv2.medianBlur(gray, 5) # or cv2.GaussianBlur

# 2. Apply thresholding
# Try different methods, or combinations
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2)
# Another option would be cv2.threshold
#_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

# 3. Morphological operations to refine
kernel = np.ones((3,3), np.uint8)
thresh = cv2.erode(thresh, kernel, iterations=1) # maybe erode first
thresh = cv2.dilate(thresh, kernel, iterations=2) # maybe only dilate or open and close

# or try open and close operations
# thresh = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)

# 4. Find contours
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

# 5. Filter contours based on properties
min_area = 50 # example
filtered_cnts = []
for c in cnts:
if cv2.contourArea(c) > min_area:
filtered_cnts.append(c)

# 6. Get bounding boxes
for c in filtered_cnts:
x,y,w,h = cv2.boundingRect(c)
if h*w > 350:
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)

cv2.imwrite("bbox.png", image)
print(f"Found {len(filtered_cnts)} contours!")
return image

Bildpfad
get_more_bboxes("/content/Screenshot 2025-01-02 151226.png")
Mit dieser Methode erhalte ich inkonsistente Boxen. Gibt es eine Möglichkeit, genauere Ergebnisse zu erhalten?

1735882301

Guest

Ich möchte Informationen aus einer im PDF-Format gespeicherten W2-Datei extrahieren.
Die Idee besteht darin, für jedes Rechteck in der W2-Datei Kästchen zu erstellen und die W2-Datei als Referenz zu bereinigen.
Gereinigte W2-Datei als Referenz
Ich habe dies bisher versucht:
[code]import cv2
[/code]
numpy als np importieren
def get_more_bboxes(image_path):
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
[code]# 1. Apply noise reduction before anything
gray = cv2.medianBlur(gray, 5) # or cv2.GaussianBlur

# 2. Apply thresholding
# Try different methods, or combinations
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 11, 2)
# Another option would be cv2.threshold
#_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

# 3. Morphological operations to refine
kernel = np.ones((3,3), np.uint8)
thresh = cv2.erode(thresh, kernel, iterations=1) # maybe erode first
thresh = cv2.dilate(thresh, kernel, iterations=2) # maybe only dilate or open and close

# or try open and close operations
# thresh = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)

# 4. Find contours
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

# 5. Filter contours based on properties
min_area = 50 # example
filtered_cnts = []
for c in cnts:
if cv2.contourArea(c) > min_area:
filtered_cnts.append(c)

# 6. Get bounding boxes
for c in filtered_cnts:
x,y,w,h = cv2.boundingRect(c)
if h*w > 350:
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)

cv2.imwrite("bbox.png", image)
print(f"Found {len(filtered_cnts)} contours!")
return image
[/code]
Bildpfad
get_more_bboxes("/content/Screenshot 2025-01-02 151226.png")
Mit dieser Methode erhalte ich inkonsistente Boxen. Gibt es eine Möglichkeit, genauere Ergebnisse zu erhalten?

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Quick Reply

Username:

Change Text Case:

Smilies

View more smilies

Similar Topics

Replies

Views

Last post

So extrahieren Sie Informationen aus einer W2-Datei mit Python

Last post by Guest « 03 Jan 2025, 12:08
Posted in Python

by Guest » 03 Jan 2025, 12:08 » in Python

Ich möchte Informationen aus einer im PDF-Format gespeicherten W2-Datei extrahieren.
Die Idee besteht darin, für jedes Rechteck in der W2-Datei Kästchen zu erstellen und die W2-Datei als Referenz zu...

0 Replies

7 Views

Last post by Guest
03 Jan 2025, 12:08
So extrahieren Sie Informationen aus einer W2-Datei mit Python

Last post by Guest « 03 Jan 2025, 13:04
Posted in Python

by Guest » 03 Jan 2025, 13:04 » in Python

Ich möchte Informationen aus einer im PDF-Format gespeicherten W2-Datei extrahieren.
Die Idee besteht darin, für jedes Rechteck in der W2-Datei Kästchen zu erstellen und die W2-Datei als Referenz zu...

0 Replies

6 Views

Last post by Guest
03 Jan 2025, 13:04
Extrahieren spezifischer Informationen aus einer Nachricht mithilfe der WhatsApp-API auf meinem Websystem [geschlossen]

Last post by Guest « 27 Jan 2025, 07:03
Posted in Php

by Guest » 27 Jan 2025, 07:03 » in Php

Ich möchte meinem Offline-Websystem eine Funktion hinzufügen, die die Größe und das Gewicht eines Artikels aus einer von einem Kunden gesendeten Nachricht liest und diese Informationen in meiner...

0 Replies

10 Views

Last post by Guest
27 Jan 2025, 07:03
Extrahieren spezifischer Informationen aus einer Nachricht mithilfe der WhatsApp-API auf meinem Websystem [geschlossen]

Last post by Guest « 27 Jan 2025, 07:03
Posted in JavaScript

by Guest » 27 Jan 2025, 07:03 » in JavaScript

Ich möchte meinem Offline-Websystem eine Funktion hinzufügen, die die Größe und das Gewicht eines Artikels aus einer von einem Kunden gesendeten Nachricht liest und diese Informationen in meiner...

0 Replies

10 Views

Last post by Guest
27 Jan 2025, 07:03
Scraping Amazon Seller -Informationen mit Selenium: Der Geschäftsname/Telefonnummer kann nicht extrahieren [geschlossen]

Last post by Anonymous « 11 Apr 2025, 12:50
Posted in Python

by Anonymous » 11 Apr 2025, 12:50 » in Python

Problembeschreibung:
Ich versuche, den Firmennamen und die Telefonnummer des Verkäufers von Amazon -Produktseiten mit Selenium und BeautifulSoup zu extrahieren. Mein Code navigiert zum...

0 Replies

2 Views

Last post by Anonymous
11 Apr 2025, 12:50

Return to “Python”