Add text to pdf using Excel -VBA - excel

I don't have much knowledge about VBA.
But I have a problem which I think can be solved with VBA.
I have a PDF file of 400 pages. I have an excel with page numbers and some text. Now I want this text to be copy pasted (Add Text under drawing markup in PDF tools) in the PDF.
I can do it manually but it will take 3 to 4 days. so can anybody help me and make my work easier. I wanted to do this in Excel-VBA.
I have 2013 Excel and Acrobat xi Pro.

It depends.
If the pdf has forms in it, you are of course able to fill them in a programmatic way.
If your document does not contain forms you are not going to be able to solve this problem in a trivial manner.
Why, I hear you ask?
PDF documents, despite their reputation are more like containers of instructions than they are a WYSIWYG format
instructions are bundled in groups called "objects"
objects can be compressed (DEFLATE) into streams
objects are indexed so they can be re-used (this is called the xref)
the index uses byte-offsets to get a grip on which object is where in the document
Now what would happen if you wanted to add a single character somewhere in the document
you would need to decode the streams to figure out where you're actually placing content
Once you've found the right stream, and you've inserted your character, you have also screwed up the xref table.
Nothing will work anymore

Related

How can I edit a DXF in node.js?

I'd like to make a custom lasered label from a user's input on a website. I have a template dxf file and I'd like to replace placeholder text with the user input. My problem is the dxf file format is very unreadable in its text format. Is there any way to make sense of the numeric data? If not are there any other formats (svg, etc) that would be easier to work with?
EDIT: The reason I've found it unreadable in terms of text is that the program (Solidworks) converted the text to curves.) At this point I'm trying to figure out how to prevent that.
AutoDesk was nice enough to document DXF syntax in great detail. Spend a couple hours understanding the documentation from the link below, and I think you will find it quite easy to parse and edit using code.
To just replace some placeholder text, it should be just as simple as reading the DXF file into a string (a dxf file is no different than a txt file), performing a text replace operation and saving it back to file. Just make sure that your placeholder text is very unique and is not contained in any of the key words in the document below (otherwise your DXF file will get corrupted). Something like "PlaceHolderText" will do the trick.
http://images.autodesk.com/adsk/files/autocad_2012_pdf_dxf-reference_enu.pdf
Edit: More Info
I do a lot of work with AutoDesk Inventor which is in direct competition with SolidWorks, so they are effectively the same tool. We were faced with a similar problem of needing to place text onto sheet metal flat pattern DXFs that came out of Inventor in order to identify the part, but Inventor simply could not do it (see, exactly the same!). One of our developers had the idea to place a very precise geometry punch onto the flat pattern. After the DXF was generated he wrote some code that parsed the DXF file and replaced the geometry with a text entity. More specifically we used a triangle with sides having each length defined to something like the 7th decimal place. You can then use one of the vertices of the triangle to position the text, including rotation. This process would be automatic, so once you write the code with the help of the document above (which won't take the long), it will just work. If your engraver can handle text the way you want it, I'd say this is a very good solution. We generate hundreds of parts every day using this code. Hope this helps.

CSV to Excel Doc?

I was curious what the community thinks is the easiest way to take a CSV file and 'save as' a Excel document with only a couple formulas pasted in?
I am trying to do this behind the scenes, and not physically navigating. e.g. opening, selecting save As, etc -- even though this is already VERY simple I **need to do this in code (Think automation)
Background: I have a c++ command line program generating the .csv, and a C# GUI starting this process. Either programs could hold the code, but I figure this is easiest in C# (InterOp?) The reasons I don't directly send code into the csv is because of the amount of comma characters that will mess up the csv and because other Excel documents need to reference the sheets so they need to be in .xls format.
=AVERAGE(C2:C999)
=COUNTIFS(C:C,">0",C:C,"<31")
=COUNTIFS(C:C,">31",C:C,"<55")
=COUNTIF(C:C,">55")
Have a look and see whether command-line scipting of openoffice will do the job. It can do quite a lot of conversions very easily. Otherwise there are a lot of Excel-producing libraries, for example PHPExcel, but you'd need to wrap some programming around them.

Extracting a page from a pdf using a keyword

OK so im quite sure i cant do this with excel vb or for free even.
Im writing these macros for work, and one of them needs to be able to chose a pdf based on keywords.
Then go into the pdf, search either the titles of the pages or the text on the pages themselves using a different set of keywords.
When it finds the page that matches one of the keywords in the second set, it will extract the whole page, as is, to a single page pdf.
This can then be attached to email.
This will be only a small part of the purpose of the macro.
From what i understand, im probably going to have to find an SDK, pay for it, and write a separate program in C# or VisualBasic which is run when the macro needs.
I dont even need the code, maybe just a point in the right direction :D
In the end i got a program called pdftk.exe, free, and runs in command line.
With this i can export the Bookmarks listing to txtfile.
Search text file for string/keywords.
Jump down a line or two and grab associated page number.
Then use the same exe to extract that page and save as specific name, then my vba macro can grab that newly created 1 page pdf.
Ive seen code on this site for creating delays while another process is doing its thing, so i will try to implement that also.

add a duplicate (hidden) text layer to a pdf for extra searching

My problem:
I have a pdf with lots of roman characters with complex diacritical marks (e.g., ṣ, ś, ṝ, ǎ, etc.). To make it easier to search within the pdf, I would like to add an additional layer, much as one does with hocr, where the same text is present without the diacritics.
When using full-text search engines I can index multiple terms at the same position (vector) - I would like to achieve the same effect here.
I have read lots about adding a hocr layer to scanned images, but I really just want to duplicate the text layer, pass it through a script that strips the diacritics (straightforward enough) and then adds it back in as a hidden but searchable layer.
Anyone have any suggestions? (Solutions involving any platform, language, library or toolchain will be useful!)
Thanks :)
Edit: please let me know if the question is unclear.
Well I have a (slightly ugly and hackish) solution, so I thought I'd share it.
I'm using PDFMiner to extract the text, along with the co-ordinates. Then I'm using ReportLab to write the normalized versions of the text to a new pdf, in exactly the same position, as hidden text. To make the positions line up properly, I found I had to use exactly the same font, so I've used a combination of FontForge and MuPDF to extract the required font(s) from the original pdf.
Finally, having created the new pdf, I'm using pdftk to merge it with the original.
It works pretty well, but has the downside that copying text out of the pdf results in the normalized text being copied too. But this is acceptable for my present purposes, and I can't see any way around it. The pdf spec. doesn't really support my objective, and so I don't imagine I can do better than this hackish solution.
I have written something similar to add searchable text by OCR'ing images and converting it to PDF in C#. I used QuickPDF from www.quickpdf.com to create hidden white text objects on top of the image and this worked reasonably well.
In your case QuickPDF would allow you to extract the text strings along with bounding boxes and font details. You could then normalize your text and create the invisible text objects using the existing font and position information and then save it out to a new file.
This would basically give you the same PDF as you have now and also give you both the original and normalised text as you are getting now.
QuickPDF is a commercial library. If your solution works well for you then there is no used buying a commercial engine though. The nice thing though is that it only requires 1 SDK and you would look at it if you had a more than a few PDF's to convert.

Converting troublesome delimited PDF to Excel

I'm trying to convert this delimited PDF to an excel (or some other delimited format). Using Adobe Acrobat 9, I attempt to save it and copy it) as Excel but it gives the error message "BAD PDF; error in processing fonts. [348]".
I'm open to any solution that will create a delimited file, ranging from using Adobe Acrobat, to programming to using other apps. The only limitation is that I don't have a budget to buy anything (such as Able2Extract).
The way I was able to export my images and fonts without buying any extra software to do the conversion was this way. go to Advanced, PDG Optimizer, select all of the options you want on the LEFT COLUMN and where it says MAKE COMPATIBLE WITH select Acrobat 8.0 and later, OK....you are in your road to success
Note: not really an answer, but some suggestions.
Sounds to me that Crystal Reports is not following the PDF spec close enough.
I'd make sure CR is fully updated/patched and try genning another file making sure that "tagging" is enabled - tagging defines the layout structure. I don't have a copy of CR handy, but you may have to define a distiller template to use so when you print to PDF you can select that job option.
You can also tell its a bad PDF by using Preflight in Acrobat, it says there is no tag structure and you can do it manually (draw boxes around each item...). Also that there is no language set, and it is somehow compatible with Acrobat 1.3? which isn't supported anymore and should be 4 at the lowest?
Once you have a "good" pdf can export to xml/word and import that to excel. Also, with Acrobat 8+ you can highlight using the select tool, right-click and choose Open As SpreadSheet. You might be able to get away with just highlighting the whole document -- though I'd hope the xml approach would be best.
Able2Extract does some OCRing and tricky fuzzy logic not only to define tags/layout so it is exportable, but also avoids any font, encoding, etc issue - at least to my knowledge.
In the rare case that you can't get a new file, then exporting to plain text/accessible seems to generate a nice flat text file. You could write a vbscript to parse it (adding your delimiter) and import that into excel.

Resources