- My Forums
- Tiger Rant
- LSU Recruiting
- SEC Rant
- Saints Talk
- Pelicans Talk
- More Sports Board
- Fantasy Sports
- Golf Board
- Soccer Board
- O-T Lounge
- Tech Board
- Home/Garden Board
- Outdoor Board
- Health/Fitness Board
- Movie/TV Board
- Book Board
- Music Board
- Political Talk
- Money Talk
- Fark Board
- Gaming Board
- Travel Board
- Food/Drink Board
- Ticket Exchange
- TD Help Board
Customize My Forums- View All Forums
- Show Left Links
- Topic Sort Options
- Trending Topics
- Recent Topics
- Active Topics
Started By
Message
Smart extract from PDF to Excel
Posted on 6/19/17 at 2:55 pm
Posted on 6/19/17 at 2:55 pm
Ok so I have multiple PDF files (scans) that someone enters manually on an Excel spreadsheet and it's time consuming.
Is there a way to extract the data that I have onto excel? (after OCR of course) What kind of scripts would I need to run?
The data in the original file is "disorganized" and not in a table format, and I would need to only extract certain data.
For example, let's say the original file looks like this, and the excel looks like the file below, that would be the end product.
shite sounds impossible (or at the very least like a lot of work) to me, but I figured I'd ask.
Is there a way to extract the data that I have onto excel? (after OCR of course) What kind of scripts would I need to run?
The data in the original file is "disorganized" and not in a table format, and I would need to only extract certain data.
For example, let's say the original file looks like this, and the excel looks like the file below, that would be the end product.
shite sounds impossible (or at the very least like a lot of work) to me, but I figured I'd ask.
Posted on 6/19/17 at 3:58 pm to castorinho
How clean is the OCR extraction? If it's okay and consistent, you can search through the file and categorize the data by tags or triggers.
For example, if the OCR always has the text "Date" and then an accurate "Date" after that, you can scan the file for "Date" and then write the following data.
For example, if the OCR always has the text "Date" and then an accurate "Date" after that, you can scan the file for "Date" and then write the following data.
Posted on 6/19/17 at 4:19 pm to LSUtigerME
I just played with one earlier and the OCR is pretty clean.
How do I link it to the excel file?
quote:
you can scan the file for "Date" and then write the following data.
How do I link it to the excel file?
This post was edited on 6/19/17 at 4:23 pm
Posted on 6/22/17 at 3:48 pm to castorinho
The ocr result is simply text so the next step would be to save that and parse the "Factor" values by locating them with something like a regex expression and then grab the values to the right. This could then be written to a .csv or .xls file. There is no utility you can buy to do exactly what you are looking for, would need to be custom coded.
Posted on 6/22/17 at 3:57 pm to castorinho
A-PDF Data Extractor: https://www.a-pdf.com/data-extractor/index.htm
Posted on 6/22/17 at 9:21 pm to Scream4LSU
If you save the OCR as a Word document or .txt, you should be able to use Excel VBA and write a macro to search the text.
Posted on 6/23/17 at 8:36 am to LSUtigerME
quote:looks like this is the best option, but in the end it might be just as time consuming as entering it manually.
If you save the OCR as a Word document or .txt, you should be able to use Excel VBA and write a macro to search the text.
Thanks for all the replies
Popular
Back to top
Follow TigerDroppings for LSU Football News