Wie man es vermeidet, HTML- und Kopf -Tags in JSOUP Parse umzingeln

Wie man es vermeidet, HTML- und Kopf -Tags in JSOUP Parse umzingeln ⇐ HTML

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Anonymous

Wie man es vermeidet, HTML- und Kopf -Tags in JSOUP Parse umzingeln

Report
Quote

Post by Anonymous » 23 Feb 2025, 14:23

Mit JSOUP versuche ich, den angegebenen HTML -Inhalt zu analysieren. Nach jSOUP.PARSE () findet die HTML -Ausgabe HTML , head und body zum Eingang an. Ich möchte diese nur ignorieren.

Code: Select all

[b]This [i]is[/i][/b] [i]my sentence[/i] of text.
[b]< /code>
Java-Code: < /p>
import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class HTMLParse {

public static void main(String args[]) throws IOException {
try{
File input = new File("/ab.html");
String html = FileUtils.readFileToString(input, null);

Document doc = Jsoup.parseBodyFragment(html);
doc.outputSettings().prettyPrint(false);
System.out.println(doc.html());
}
catch(Exception e){
e.printStackTrace();
}
}
}

Tatsächliche Ausgabe:

Code: Select all

This [i]is[/i][/b] [i]my sentence[/i] of text.
[b]

Erwartete Ausgabe:

Code: Select all

This [i]is[/i][/b] [i]my sentence[/i] of text.

Wie kann ich JSOUP daran hindern, diese Tags hinzuzufügen?

1740317001

Anonymous

Mit JSOUP versuche ich, den angegebenen HTML -Inhalt zu analysieren. Nach jSOUP.PARSE ()  findet die HTML -Ausgabe HTML , head  und body  zum Eingang an. Ich möchte diese nur ignorieren.[code]
[b]This [i]is[/i][/b] [i]my sentence[/i] of text.
[b]< /code>
Java-Code: < /p>
import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class HTMLParse {

public static void main(String args[]) throws IOException {
try{
File input = new File("/ab.html");
String html = FileUtils.readFileToString(input, null);

Document doc = Jsoup.parseBodyFragment(html);
doc.outputSettings().prettyPrint(false);
System.out.println(doc.html());
}
catch(Exception e){
e.printStackTrace();
}
}
}
[/code]
[b] Tatsächliche Ausgabe: [/b]
[code]This [i]is[/i][/b] [i]my sentence[/i] of text.
[b]    
[/code]
[b] Erwartete Ausgabe: [/b] 
[code]This [i]is[/i][/b] [i]my sentence[/i] of text.
[/code]
Wie kann ich JSOUP daran hindern, diese Tags hinzuzufügen?

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Quick Reply

Subject:

Username:

Change Text Case:

Smilies

View more smilies

Similar Topics

Replies

Views

Last post

Wie man es vermeidet, HTML- und Kopf -Tags in JSOUP Parse umzingeln

Last post by Anonymous « 23 Feb 2025, 14:23
Posted in Java

by Anonymous » 23 Feb 2025, 14:23 » in Java

Mit JSOUP versuche ich, den angegebenen HTML -Inhalt zu analysieren. Nach jSOUP.PARSE () findet die HTML -Ausgabe HTML , head und body zum Eingang an. Ich möchte diese nur ignorieren.
This is my...

0 Replies

43 Views

Last post by Anonymous
23 Feb 2025, 14:23
PHP Simple Html Dom erhält den Klartext von div, vermeidet jedoch alle anderen Tags

Last post by Anonymous « 29 Oct 2025, 17:18
Posted in Php

by Anonymous » 29 Oct 2025, 17:18 » in Php

Ich verwende PHP Simple Html Dom, um etwas HTML zu erhalten. Jetzt habe ich einen HTML-Dom wie Follow-Code. Ich muss das innere Div im Klartext abrufen, aber die p-Tags und deren Inhalt vermeiden...

0 Replies

21 Views

Last post by Anonymous
29 Oct 2025, 17:18
Selenium auf Python mit Geckdriver stürzt, wenn sie nicht im Kopf ohne Kopf ohne Abteile stehen

Last post by Anonymous « 11 Apr 2025, 21:09
Posted in Python

by Anonymous » 11 Apr 2025, 21:09 » in Python

Ich versuche, Selenium auf einem Jupyter -Notizbuch mit Geckodriver zu verwenden. Funktioniert gut:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options =...

0 Replies

49 Views

Last post by Anonymous
11 Apr 2025, 21:09
Wie man vermeidet, dass der Hintergrund auf Tagcloud mit Linienpause zu groß ist

Last post by Anonymous « 01 Mar 2025, 14:39
Posted in CSS

by Anonymous » 01 Mar 2025, 14:39 » in CSS

Ich habe eine Tag -Wolke in einer Seitenleiste.
Jedes Tag hat eine Hintergrundfarbe. src = />

0 Replies

28 Views

Last post by Anonymous
01 Mar 2025, 14:39
Wie man vermeidet, dass Pandas to_json in URLs vorwärts Asche entkommen kann

Last post by Anonymous « 04 Mar 2025, 14:42
Posted in Python

by Anonymous » 04 Mar 2025, 14:42 » in Python

Ich versuche, JSON -Dateidaten in einen Datenrahmen zu laden, einige Datensätze zu filtern und sie wieder in die Datei zu schreiben. Meine Datei enthält einen JSON -Datensatz pro Zeile und jeder...

0 Replies

31 Views

Last post by Anonymous
04 Mar 2025, 14:42

Return to “HTML”