Macro which search keywords in pdf and give page number - excel

I want a excel macro which search words in PDF and give the page number where macro finds the words. I have 20 words that I want to search in PDF. I have put the keywords in coulmn A of the excel spreadsheet and I want to populate the page number in coulmn b. Please note that I am currently using Adobe reader XI, so please help me with the code which also work in Adobe reader XI.

This is more of a direction and not an answer.
Try searching for command line tools that will export ocr data into a text file. I've looked at them before and a few gave me the option of looking at the particular page of a pdf. All of these tools require a purchase (I was trying to OCR a barcode and I could not find a free tool for this) but there are some free ones out there.
But using excel will make this project harder. I would look at using powershell or some other scripting language and exporting the results into a csv file.
Hope this helps.

Related

How can extract text from pdf to excel

I have Scanned Image, I converted it to pdf file, the content of image are rows and columns (table), I want Extract the text from table to excel file, any Idea? any good website or tool or program can I use it
I tried to use a lot of websites to extract text, but it does not work
Do you have Microsoft Excel? If you do, then first convert the PDF to a JPEG.
And with that, go to Microsoft Excel
Create a New Document
Go to the Data Tab
Choose "Data From Picture"
Choose Picture From File
You'll see a couple of instructions. Follow them to complete the process of converting the picture to table.
You'll also have the option of correcting any inaccuracies before adding them to your spreadsheet.
That's all!

Extracting specific data from a pdf into excel

community
I need to extract select data from a pdf form into excel. Eventually, the data gathered will be used in another step (excel table) as part of an additional calculation.
I am hoping to find a way to automate this process so I tried importing the pdf file to excel using Power Query. Unfortunately, each time I loaded the pdf, I get a message (Page is blank).
After doing some initial search, I found out that this may be due to the fact the way the pdf file was built originally (not as a table converted to a pdf).
I went back and converted the pdf file into a spreadsheet and now I can actually see the data that I need to extract in excel but needs a lot of cell formatting and rearranging.
I would really like to know if there is an alternative to solving this problem. More importantly, I'd very much appreciate any bright ideas or recommendations on how to best tackle this task since I have to repeat the same process 30+ times.
Also, I don't have a lot of coding experience, knowledge- very minimal.
Thank you so much

Import pdf into specific worksheet

There are lots of topics about this item, but I wasn't able to find an answer to my question. I want to select a pdf file and import all the text from this file into a specific sheet, let's call it sheet2. Please note this is a new pdf file every day, so it cannot come from a specific location, but the file has to be selected every day.
Any ideas?
If you can waver the limitation of using only VBA, you could use iText (combined with Apache POI). iText is more than capable of extracting the text from a pdf file, and the apache POI library allows you to generate Office documents (such as excel workbooks).

Google Script for spreadsheet to extract data from a PDF page linked on the sheet

I have a series of PDF files uploaded to Google Drive (and also stored on my computer here) in different rows of a Google or Excel spreadsheet. Each row has a distinct PDF file linked to it. What I want to figure out is a way to extract a 5 row of data (not a table) from the PDF and add it to certain columns on the sheet:
Here's a sample pdf:
https://www.dropbox.com/s/2j7pqeja38jxmzc/Sample.pdf?dl=0
The sheet looks like this:
https://www.dropbox.com/s/40u1n7umacd74kw/Sample%20sheet.xlsx?dl=0
So the process will be like Excel open linked file in Row 1, extracts data needed, then adds the data to certain columns in Excel/Google spreadsheet.
I was just wondering if this is possible.. The PDF has lots of pages, but I only need data from a single page in it.
If this doesn't work in Excel/Google spreadsheet, any suggestion how I can automate this process?
PS: I'm not asking for the exact way to do it, because I know that's a violation here, just wanted to know if this is something possible and can be done in Excel or Google spreadsheet. If not, any suggestion will greatly help. Thanks!
Yes, it's possible, but it depends a lot on the PDF, which I imagine would be the biggest hurdle. You'll probably find this answer is at least relevant, if not exactly what you're looking for.
Otherwise, if everything is stored in Drive, it's just a question of:
1) Looping through the sheet and opening the doc you want.
2) Getting the content of the PDF (Probably as a string).
3) Finding a consistent way to cut the relevant data from the PDF (This depends a lot on the content of the PDFs).
4) Pasting the data to the sheet.
Number 3 might be your biggest challenge, but once you get started you might find it to be a lot easier then you'd think.

Converting handwritten number to text

I need to convert a pdf form that contains a column of handwritten numbers to text and populate an excel spreadsheet.
Does anyone know of a program or a solution to solve this problem?
Thanks in advance.
Edit:
I have tried programs like pdfcompressor, but its returning me random symbols. Im assuming numbers should be easier to convert than random letters.
If you have a version of Microsoft Office from XP to 2007, you can use Microsoft Office Document Imaging. It is a PDF viewer-like program. Once you open your image file, you can use your mouse icon to crop and highlight sections of the image. You can then copy and paste the highlighted section into Excel using the built-in OCR software.
You'd need an OCR program (google OCR) to interpret the handwritten text/numbers. But that would then only give you a raw text or .doc file, not an excel sheet. You'd need to manually move the numbers across - might still be better than keying them in, if you're looking at a very large list.
Abbyy Finereader would be the first place to start. It has support for machine printed and hand printed OCR and comes with a nice GUI interface. You should be able to download a trial version from www.abbyy.com. It will be able to export to all sorts of formats. If you need an SDK then Kadmos from www.rerecognition.com supports hand and machine print OCR.

Resources