Problem, die auf doppelte Zitate abgrenzende Sätze abgrenzenPython

Python-Programme
Anonymous
 Problem, die auf doppelte Zitate abgrenzende Sätze abgrenzen

Post by Anonymous »

Ich habe die folgende Datensatzzeile für eine bestimmte Spalte: < /p>

Code: Select all

"['Say , Jim , how about going for a few beers after dinner ? '
' You know that is tempting but is really not good for our fitness . '
' What do you mean ? It will help us to relax . '
"" Do you really think so ? I don't . It will just make us fat and act silly . Remember last time ? ""
"" I guess you are right.But what shall we do ? I don't feel like sitting at home . ""
' I suggest a walk over to the gym where we can play singsong and meet some of our friends . '
"" That's a good idea . I hear Mary and Sally often go there to play pingpong.Perhaps we can make a foursome with them . ""
' Sounds great to me ! If they are willing , we could ask them to go dancing with us.That is excellent exercise and fun , too . '
"" Good.Let ' s go now . "" ' All right . ']"
< /code>
Ich möchte aus den folgenden Daten eine Liste von mit Kommas getrennten Sätzen erstellen (abgenommen durch ein einzelnes Zitat). Zuerst möchte ich einzelne Zitate in Wörtern ersetzen, z.inside_word_quote_delimiter = r"[\{1}w\'.\{1}w^]"
delimiters = r"[\'.*\'\".*\"]"# identify two delimiters (one single quote and another double quotes)
file_path = "./raw_data/train_data.csv"  # Replace with your file path
dataset = pd.read_csv(file_path, encoding='latin')

# select relevant columns
dataset  = dataset[["col1", "col2"]]

# list elements are considered as the sentences between quotes (change quoted sentences into list elements)
dataset["col1"] = dataset.apply(lambda x: x["col1"].replace(',', ';'), axis=1)

# remove simple quotes inside words (like "don't")
dataset["col1"] = dataset.apply(lambda x: x["col1"].replace(inside_word_quote_delimiter, ' '), axis=1)

# split list elements
dataset["col1"] = dataset.apply(lambda x: re.split(delimiters, x["col1"]), axis=1)
dataset["col1"] = dataset.apply(lambda x: x[0], axis=1)
dataset.to_csv("./pre_processed_data/pre_processed_train_data.csv", sep=',',encoding="utf_8", index=False)
< /code>
, aber ich bekomme so etwas wie Folgendes: < /p>
"['[', 'don', 'tSay ; Jim ; how about going for a few beers after dinner ? ', '\n ',
' You know that is tempting but is really not good for our fitness ', ' ', '\n ',
' What do you mean ? It will help us to relax ', ' ', '\n ', ' Do you really think so ? I don', 't ', ' It will just make us fat and act silly ',
' Remember last time ? ', '\n ', ' I guess you are right', 'But what shall we do ? I don', 't feel like sitting at home ', ' ', '\n ',
' I suggest a walk over to the gym where we can play singsong and meet some of our friends ', ' ', '\n ', ' That', 's a good idea ', '
' I hear Mary and Sally often go there to play pingpong', 'Perhaps we can make a foursome with them ', ' ', '\n ',
' Sounds great to me ! If they are willing ; we could ask them to go dancing with us', 'That is excellent exercise and fun ; too ', ' ', '\n ',
' Good', 'Let ', ' s go now ', ' ', ' ', ' All right ', ' ', ']']"
So erstellen Sie Sätze, die von Kommas getrennt werden und diese werden nur aus den Eingabedaten zitiert (da sind Kommas in den Sätzen bereits und dass die Sätze zunächst ein einzelnes oder ein Doppelzitat abgrenzen)?

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post