Page 1 of 1

Wie kann ich bestimmte Felder aus einem Dokument extrahieren?

Posted: 28 Jan 2025, 07:01
by Guest
Ich habe PDFPLumber verwendet, um Texte aus PDF zu extrahieren, aber es enthält mehrere Tags wie "\ n", "\ t", "u2019" und viele Räume dazwischen. Ich muss diesen Text in ein LLM übergeben, um bestimmte Felder zu extrahieren Irgendeine Idee, wie das geht?

Code: Select all

CURRICULUM CURRICULUMVITAEVITAE"\n"examplename"\n"E-mail:exampleemail@gmail.com"\n"\uf03f\uf030+91-examplenumber"\n"CAREER CAREEROBJECTIVE:- OBJECTIVE:-"\n"Toworkindynamicenvironmentwhichprovideamplescopetoenrichmylearning"\n"curvebyutilizingmyprofessionalknowledge.Ultimatelycontributingto"\n"organizationalandpersonaldevelopment."\n"EXPERIENCE:-:-"\n"Workedasan\u201cExecutive\u201dwith\u2018examplecompany2\u2019Pvt.Ltd.UdyogVihar,Phase-4,,"\n"Gurgaon,Haryana.Since24thSept2014to25thJan2016."\n"WorkingasSMELeadwithexamplecompanysinceNov2017totilldate,handlingcredit"\n"cardsoutboundprocess."\n"COMPANY&JOBPROFILE:-:-"\n"(OnePointOne)"\n"OnepointOneSolutionisaglobalbusinessservicedproviderintheareaofexperience"\n"management.Weprovideasuiteofsolutionsforoutclients-fromstrategyanddesignto"\n"implementationandexecutionthathelpglobalbrandsdelivermemorableandcustomer"\n"experiences"\n"Jobprofile:IhaveworkedinoutboundprocessandmyprocessnamewasAirtel"\n"Digital.Inthisprocesswehavetocallthecustomerregardingnewchannelsandnew"\n"offerslaunchbyAirtel.Wealsosolvethecustomerqueriesandcomplaintlike-"\n"deductionofamountandchannelactivation."\n"ACADEMIC ACADEMICPROFILE:- PROFILE:-"\n"BABAfromfromUtkalUtkalUniversity, University,Bhubaneswar, Bhubaneswar,OrissaOrissain2012.in2012."\n"Council CouncilofofHigherHigherSecondary SecondaryEducation, Education,OrissaOrissainin2009.2009."\n"BoardBoardofofSecondary SecondaryEducation, Education,OrissaOrissaBoardBoardinin2006.2006."\n"PERSONAL PERSONALDETAILS DETAILS"\n"NameName :: SameerSameerRanjanRanjanRoutRout"\n"DateDateofofbirthbirth :: 0n0n2222ththMarchMarch19891989"\n"GenderGender :: MaleMale"\n"Address Address :: D-2/1,D-2/1,Chattarpur Chattarpur"\n"NewNewDelhi-110068 Delhi-110068FatherFather\u2019\u2019ssnamename :: Mr.Mr.examplefathername"\n"MaritalMaritalstatusstatus :: Unmarried Unmarried"\n"Nationality Nationality :: IndianIndian"\n"LeisureLeisuredoingdoing :: PlayingPlaying&&Watching WatchingCricketCricket"\n"Language Languageknownknown :: EnglishEnglish,,Hindi,Hindi,OriyaOriya"\n"DECLARATION DECLARATION"\n"Ifindmyselfasanenthusiasticandambitiouspersonality.Asmyhardworkingnature"\n"anddeterminationaremybigassets.Managingskillsandcreativemindaremyadded"\n"qualities."\n"Iherebydeclarethattheinformationfurnishedabovebymeistruetothebestof"\n"myknowledge."\n"DateDate:: Signature Signature((\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026 \u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026))"\n"PlacePlace::NewNewDelhiDelhi Name:Name:exampleexamplenamename"
< /code>
Versuchte, den gesamten Text in lokale LLM zu übergeben, erwartete eine strukturierte Ausgabe, aber offensichtlich begann die LLM die Halluzination. < /p>
 ```json
{
"name": "examplename",
"email": "examplenumber@gmail.com",
"phone": "9718127215",
"location": "New Delhi",
"highest_qualification": "BA",
"gender":"Male",
"marital_status": "Unmarried",
"current_company": [
{
"company_name": "companyexample",
"designation": "SME Lead",
"duration": "2017-present"
}
],
"education": ["BA","HSC"],
"skills": [ "Customer Support","Credit Cards"],
"experience": [
{
"position": "Executive",
"company": "companyexample2",
"duration": "2014-2016"
}
]
}
```
Ich bin neu dabei, daher würde jede Hilfe geschätzt.