Make an excel table from a PDF product catalog - Orange - excel

I'm new to this forum and to Orange.
I don't really now Python at this point but am ready to learn.
However, before going further in this environment I would like to know if it can answer my needs !
What I am basically doing is "transforming" PDF product catalogues into Excel files that can be used by another software to create a database for another software.
I have tiles catalogues in PDF just like this one :
and turn it into this type of xls table : http://imgur.com/BtLBkOS
I basically need it to retrieve the article number, the colour, the size (e.g: 20x20). The G/B parts are completed manually after it has been done.
All catalogues are not the same so I sorted out some using pdftotext, RegEx with Notepad++
But I would like to know if this data mining solution could work it out ?

Orange does not support reading PDF files. You will have to use specialized utilities or program it yourself.

Related

Import headshots via CSV

I have a sequence set up in my premiere project. It's a simple slide in from right (whilst fading in) of a headshot. Under the headshot is the name of the person (that follows the slide and fade in animation of the headshot) and I have to create a main sequence of 40 of these (each headshot sequence lasts around 3 seconds). That means manually placing the headshots (that are all the same size) and typing in the persons name for each one.
I am wondering if there is a way to automate this via a script in premiere pro that will read from a csv file the location of the headshot image and the name of the person and create it automatically for me?
I have tried looking around for such information but haven't had much success.
You can easily acheive this using Dataclay's Templater for Adobe After Effects. Even better than a CSV, you can use a cloud based Google Sheet and connect directly to that asset so you never have to deal with a file again.

A Study on the Modification of PDF in nodejs

Project Environment
The environment we are currently developing is using Windows 10. nodejs 10.16.0, express web framework. The actual environment being deployed is the Linux Ubuntu server and the rest is the same.
What technology do you want to implement?
The technology that I want to implement is the information that I entered when I joined the membership. For example, I want to automatically put it in the input text box using my name, age, address, phone number, etc. so that the user only needs to fill in the remaining information in the PDF. (PDF is on some of the webpages.)
If all the information is entered, the PDF is saved and the document is sent to another vendor, which is the end.
Current Problems
We looked at about four days for PDFs, and we tried to create PDFs when we implemented the outline, structure, and code, just like it was on this site at https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
However, most PDFs seem to be compressed into flatDecode rather than this simple. So I also looked at Data extraction from /Filter /FlateDecode PDF stream in PHP and tried to decompress it using QPDF.
Unzip it for now.Well, I thought it would be easy to find out the difference compared to the PDF without Kim after putting it in the first name.
However, there is too much difference even though only three characters are added... And the PDF structure itself is more difficult and complex to proceed with.
Note : https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf (PDF official document in English)
Is there a way to solve the problem now?
It sounds like you want to create a PDF from scratch and possibly extract data from it and you are finding this a more difficult prospect than you first imagined.
Check out my answer here on why PDF creation and reading is non-trivial and why you should reach for a tool you help you do this:
https://stackoverflow.com/a/53357682/1669243

generate multiple pdfs from a database

I would like to generate multiple pdfs at once. Those pdfs should pull data from a database. It can be an excel table or a relational database, doesn't matter, I can create whatever.
Using excel and javascript in adobe acrobat pro I managed to pull data into a template pdf, but for every record (row) I have in excel table I have to manually generate one pdf, then another, and so on.. and there are a lot of records, so I would like to do that automatically if possible.
Is there a way to do that? Any suggestions?
I added an image to better explain it...
Look into the Acrobat SDK, Section: "Interapplication Communication" to learn how you can control Acrobat via VB/VBA and how you can work with the JavaScript Object (JSO).
Then have a look into the "Acrobat JavaScript Scripting reference" and look at
the Doc Object with commands like .. addField and at
the Field object to set the properties of the fields.
That should do what you want, Reinhard
PS: With Open Office you can save spreadsheets as PDF and with newer version of Excel too. Wouldn't that be already enough or perhaps a mix of above and this.
Full disclosure: I founded and run Epsillion Software.
mirta, one option is Epsillion Publisher. We built it for your exact use case.
You would need to specify what your template should look like. The Epsillion team will design it for you.
You then specify what your variables are in a Word document (e.g., name, last name, date of birth). The software will process the Word and Excel files and return PDFs for you.
Templates are flexible and flow as needed.
Hope that helps. Good luck!

How to Generate a Flowchart from a Database or Spreadsheet on a Mac?

Is there an easy, economical way to generate a flowchart from either a database (e.g., FileMaker Pro, or MS-Access) or a spreadsheet (e.g., Apple Numbers, or MS-Excel)?
What's being sought--on a Mac--is a way to create a database/spreadsheet table of flowchart nodes (with title/text, symbol type and linkage info)..."press a button"...and have a visually appealing flowchart generated.
Then...examine the result...update the table...re-generate...LTD.
Take a look at this post. First, you'll need OmniGraffle. Next, you'd need to convert the FileMaker data into OmniGraffle XML or use AppleScript. http://forums.omnigroup.com/showthread.php?t=1860

How to generate application forms/documents programmatically?

At the moment, we use MS WORD and MS EXCEL to mail merge documents that needs to be sent to multiple recepients.
For example, say there is a complaint form where the complainant needs to fill in his/her name, address, etc. So we have a .doc file set up with the content and the dynamic entities set up for mail merging, with the name and address details put in an excel file, from where we can happily mail merge to generate all or just the necessary forms/documents.
However, I would like to automate this process, like a form in a website where the complainant can fill in his/her name, address and other details, and we could use that to generate the complaint form automatically and offer it to be downloaded (preferrably as a pdf).
Now, the only solution that comes to mind, is Latex, so that I can just replace the needed entities and just compile to PDF. However, that bit has to be negotiated with the webhost, if they are offering Latex or not.
Is there any other solution? Any other way we could get this done, with something that shouldn't be a problem for most webhosting solutions to offer?
EDIT: I would prefer a non .NET or rather non microsoft solution since, the servers are running linux and while mono might be capable of getting the job done, none of our devs know any .NET languages. However, if required we might have to dwelve into it.
Generating PDF using an XSL. Check the following: Apoc XSL-FO
You will need to create an XML file with the required fields and transform that with this tool.
If you wish to avoid .NET then XSL-FO is worth a look. Try the FOray project.
XSLT can be a steep learn if you do not have experience already. Also users will not be able to change the templates without asking the XSLT guru to do it.
If your templates are already in MS Word and MS Excel then I would stick with generating MS docs on the server. These are now easy to work with from code since OpenXML - check out OfficeOpenXML and OpenXMLDeveloper
Apache FOP : http://xmlgraphics.apache.org/fop/
I suggest generating rtf on the server: it's easy enough to automatically generate using cpan's RTF::Writer, has converters generating good pdf, can be edited by hand in word, oo-writer & TextEdit, doesn't have any really bad compatibility issues between the main editing applications, and has decent text & resource extraction tools, with text extraction being rather better than pdf.
There's some support for moving between rtf & latex, although the best rtf -> latex converter, docx2tex, depends on the System.IO.Packaging .net module, whose mono implementation isn't yet rock solid.
Postscript — Not a recommendation: it's too much of an unwieldy sledgehammer for this job, but iText will generate the pdf directly from the form data. If you wanted to do fancy things like signed pdf, that would be the way to go.
Postscript #2 — If you break up the Word document into individual files using word's master document representation, then you can clobber one of the parts with hand-generated content. This makes it easy to do something approximating form-filling on word .doc files using just standard file-utils and some trivial rtf->doc tweaking.

Resources