Wie kann ich bestimmte Felder aus einem Dokument extrahieren?
Posted: 28 Jan 2025, 07:01
Ich habe PDFPLumber verwendet, um Texte aus PDF zu extrahieren, aber es enthält mehrere Tags wie "\ n", "\ t", "u2019" und viele Räume dazwischen. Ich muss diesen Text in ein LLM übergeben, um bestimmte Felder zu extrahieren Irgendeine Idee, wie das geht?
Ich bin neu dabei, daher würde jede Hilfe geschätzt.
Code: Select all
CURRICULUM CURRICULUMVITAEVITAE"\n"examplename"\n"E-mail:exampleemail@gmail.com"\n"\uf03f\uf030+91-examplenumber"\n"CAREER CAREEROBJECTIVE:- OBJECTIVE:-"\n"Toworkindynamicenvironmentwhichprovideamplescopetoenrichmylearning"\n"curvebyutilizingmyprofessionalknowledge.Ultimatelycontributingto"\n"organizationalandpersonaldevelopment."\n"EXPERIENCE:-:-"\n"Workedasan\u201cExecutive\u201dwith\u2018examplecompany2\u2019Pvt.Ltd.UdyogVihar,Phase-4,,"\n"Gurgaon,Haryana.Since24thSept2014to25thJan2016."\n"WorkingasSMELeadwithexamplecompanysinceNov2017totilldate,handlingcredit"\n"cardsoutboundprocess."\n"COMPANY&JOBPROFILE:-:-"\n"(OnePointOne)"\n"OnepointOneSolutionisaglobalbusinessservicedproviderintheareaofexperience"\n"management.Weprovideasuiteofsolutionsforoutclients-fromstrategyanddesignto"\n"implementationandexecutionthathelpglobalbrandsdelivermemorableandcustomer"\n"experiences"\n"Jobprofile:IhaveworkedinoutboundprocessandmyprocessnamewasAirtel"\n"Digital.Inthisprocesswehavetocallthecustomerregardingnewchannelsandnew"\n"offerslaunchbyAirtel.Wealsosolvethecustomerqueriesandcomplaintlike-"\n"deductionofamountandchannelactivation."\n"ACADEMIC ACADEMICPROFILE:- PROFILE:-"\n"BABAfromfromUtkalUtkalUniversity, University,Bhubaneswar, Bhubaneswar,OrissaOrissain2012.in2012."\n"Council CouncilofofHigherHigherSecondary SecondaryEducation, Education,OrissaOrissainin2009.2009."\n"BoardBoardofofSecondary SecondaryEducation, Education,OrissaOrissaBoardBoardinin2006.2006."\n"PERSONAL PERSONALDETAILS DETAILS"\n"NameName :: SameerSameerRanjanRanjanRoutRout"\n"DateDateofofbirthbirth :: 0n0n2222ththMarchMarch19891989"\n"GenderGender :: MaleMale"\n"Address Address :: D-2/1,D-2/1,Chattarpur Chattarpur"\n"NewNewDelhi-110068 Delhi-110068FatherFather\u2019\u2019ssnamename :: Mr.Mr.examplefathername"\n"MaritalMaritalstatusstatus :: Unmarried Unmarried"\n"Nationality Nationality :: IndianIndian"\n"LeisureLeisuredoingdoing :: PlayingPlaying&&Watching WatchingCricketCricket"\n"Language Languageknownknown :: EnglishEnglish,,Hindi,Hindi,OriyaOriya"\n"DECLARATION DECLARATION"\n"Ifindmyselfasanenthusiasticandambitiouspersonality.Asmyhardworkingnature"\n"anddeterminationaremybigassets.Managingskillsandcreativemindaremyadded"\n"qualities."\n"Iherebydeclarethattheinformationfurnishedabovebymeistruetothebestof"\n"myknowledge."\n"DateDate:: Signature Signature((\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026 \u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026))"\n"PlacePlace::NewNewDelhiDelhi Name:Name:exampleexamplenamename"
< /code>
Versuchte, den gesamten Text in lokale LLM zu übergeben, erwartete eine strukturierte Ausgabe, aber offensichtlich begann die LLM die Halluzination. < /p>
```json
{
"name": "examplename",
"email": "examplenumber@gmail.com",
"phone": "9718127215",
"location": "New Delhi",
"highest_qualification": "BA",
"gender":"Male",
"marital_status": "Unmarried",
"current_company": [
{
"company_name": "companyexample",
"designation": "SME Lead",
"duration": "2017-present"
}
],
"education": ["BA","HSC"],
"skills": [ "Customer Support","Credit Cards"],
"experience": [
{
"position": "Executive",
"company": "companyexample2",
"duration": "2014-2016"
}
]
}
```