Project Environment
The environment we are currently developing is using Windows 10. nodejs 10.16.0, express web framework. The actual environment being deployed is the Linux Ubuntu server and the rest is the same.
What technology do you want to implement?
The technology that I want to implement is the information that I entered when I joined the membership. For example, I want to automatically put it in the input text box using my name, age, address, phone number, etc. so that the user only needs to fill in the remaining information in the PDF. (PDF is on some of the webpages.)
If all the information is entered, the PDF is saved and the document is sent to another vendor, which is the end.
Current Problems
We looked at about four days for PDFs, and we tried to create PDFs when we implemented the outline, structure, and code, just like it was on this site at https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
However, most PDFs seem to be compressed into flatDecode rather than this simple. So I also looked at Data extraction from /Filter /FlateDecode PDF stream in PHP and tried to decompress it using QPDF.
Unzip it for now.Well, I thought it would be easy to find out the difference compared to the PDF without Kim after putting it in the first name.
However, there is too much difference even though only three characters are added... And the PDF structure itself is more difficult and complex to proceed with.
Note : https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf (PDF official document in English)
Is there a way to solve the problem now?
It sounds like you want to create a PDF from scratch and possibly extract data from it and you are finding this a more difficult prospect than you first imagined.
Check out my answer here on why PDF creation and reading is non-trivial and why you should reach for a tool you help you do this:
https://stackoverflow.com/a/53357682/1669243
Related
More specifically, PDF, ePuB or AZW (Publishing formats). For example if an original document was downloaded and uploaded elsewhere (an unauthorised upload), is there any feasible way to tell me where? Ideally I'd like to receive basic data such as URL, server IP and approximate upload date.
I've looked at similar posts whereby users are looking to track statistics when a pdf is downloaded and used, however I'm purely looking to track its online whereabouts in a unobtrusive way.
Thank you!
I definitely recommend using Google dorks as they are very useful for finding information online. You can specify the filename in the URL ten add the publishing extension and some unique text related to the document. The approximate date of creation is cached in google itself. You can derive the IP address from the domain name using the command nslookup on Windows/OSX/Linux.
The google search query would look something like this:
inurl:"filename" & filetype:pdf & allintext:"some unique content of the document"
With newly released Webi there's no way to manipulate reports with VBA like it was in DESKI era.
I'd like to know if there's a way for me to click a button with parameters in Excel sheet and get a report from the server?
I've been thinking of using the RESTful Web-services but it seems that there is a performance problem.
I also considered using a JAVA app in the middle using the SDK but it's not really satisfying as I add one layer.
Do you know if there's an other way to download a Webi report from and to Excel?
For this type of requirement, you'd normally use the OpenDocument feature. There is one thing that it won't do however, at least not for Webi documents, and that is deliver the output in Excel format (HTML and PDF are the two possible formats for Webi). In all fairness, the export to Excel option is only about two or three clicks away, but I can understand that this wouldn't be an ideal solution.
Another option is the Java SDK, which I would not recommend, as the ReBEAN SDK (the part of the Java SDK you need to interface with Webi documents) is deprecated and replaced by the REST SDK.
The REST SDK would be the way to go if the OpenDocument feature is not sufficient. Keep in mind that this would involve quite a few steps, each time sending a command to the WACS server and then decoding the answer. The steps would be:
Authenticate and get a logon token
Refresh the document (if necessary pass prompt values)
Export the document to Excel
Close the document
The REST interface is only supported on the WACS server, which should run on your BI4 server (unless you have a customised landscape). If it's slow, I would suggest looking into the root cause of this performance issue, instead of discarding the SDK altogether.
If you're going to use the REST interface, I would recommend opting for JSON to communicate through REST instead of XML. It's easier to read and parse.
A last option, which I wouldn't recommend, is LiveOffice. This is a separate product which allows you to embed contents from Webi documents into Office documents (most notably Excel). LiveOffice has always had its share of problems and has not received much love from SAP regarding much needed updates.
One final thought: the report will never appear in the same sheet, at least not without an additional amount of coding. Whatever SDK you end up choosing, you will always end up with an Excel file. If you want to show the results in the Excel file you started from, you'll need to code the steps to open the generated file, grab the contents and then copy those to your worksheet.
I'm new to this forum and to Orange.
I don't really now Python at this point but am ready to learn.
However, before going further in this environment I would like to know if it can answer my needs !
What I am basically doing is "transforming" PDF product catalogues into Excel files that can be used by another software to create a database for another software.
I have tiles catalogues in PDF just like this one :
and turn it into this type of xls table : http://imgur.com/BtLBkOS
I basically need it to retrieve the article number, the colour, the size (e.g: 20x20). The G/B parts are completed manually after it has been done.
All catalogues are not the same so I sorted out some using pdftotext, RegEx with Notepad++
But I would like to know if this data mining solution could work it out ?
Orange does not support reading PDF files. You will have to use specialized utilities or program it yourself.
I need to use PDF in a way similar to ZIP/RAR. To hold many images (ancient tibetan buddist literature), ideally 60000. But splitting in 10-100 volumes is OK.
Anything can be used for packing, but for unpacking we need Node.js. Because same PDF file must be served on web. But some users will need to use whole PDF.
So the question is, what node module I can use to read any single arbitrary image from huge PDF? Example would really help.
Every image is a single page. (Or in otherwords every page is single image)
We have been using https://github.com/mirkokiefer/Node-Magick for this....
But the pngs we get out sometimes are fairly low quality..
At the moment, we use MS WORD and MS EXCEL to mail merge documents that needs to be sent to multiple recepients.
For example, say there is a complaint form where the complainant needs to fill in his/her name, address, etc. So we have a .doc file set up with the content and the dynamic entities set up for mail merging, with the name and address details put in an excel file, from where we can happily mail merge to generate all or just the necessary forms/documents.
However, I would like to automate this process, like a form in a website where the complainant can fill in his/her name, address and other details, and we could use that to generate the complaint form automatically and offer it to be downloaded (preferrably as a pdf).
Now, the only solution that comes to mind, is Latex, so that I can just replace the needed entities and just compile to PDF. However, that bit has to be negotiated with the webhost, if they are offering Latex or not.
Is there any other solution? Any other way we could get this done, with something that shouldn't be a problem for most webhosting solutions to offer?
EDIT: I would prefer a non .NET or rather non microsoft solution since, the servers are running linux and while mono might be capable of getting the job done, none of our devs know any .NET languages. However, if required we might have to dwelve into it.
Generating PDF using an XSL. Check the following: Apoc XSL-FO
You will need to create an XML file with the required fields and transform that with this tool.
If you wish to avoid .NET then XSL-FO is worth a look. Try the FOray project.
XSLT can be a steep learn if you do not have experience already. Also users will not be able to change the templates without asking the XSLT guru to do it.
If your templates are already in MS Word and MS Excel then I would stick with generating MS docs on the server. These are now easy to work with from code since OpenXML - check out OfficeOpenXML and OpenXMLDeveloper
Apache FOP : http://xmlgraphics.apache.org/fop/
I suggest generating rtf on the server: it's easy enough to automatically generate using cpan's RTF::Writer, has converters generating good pdf, can be edited by hand in word, oo-writer & TextEdit, doesn't have any really bad compatibility issues between the main editing applications, and has decent text & resource extraction tools, with text extraction being rather better than pdf.
There's some support for moving between rtf & latex, although the best rtf -> latex converter, docx2tex, depends on the System.IO.Packaging .net module, whose mono implementation isn't yet rock solid.
Postscript — Not a recommendation: it's too much of an unwieldy sledgehammer for this job, but iText will generate the pdf directly from the form data. If you wanted to do fancy things like signed pdf, that would be the way to go.
Postscript #2 — If you break up the Word document into individual files using word's master document representation, then you can clobber one of the parts with hand-generated content. This makes it easy to do something approximating form-filling on word .doc files using just standard file-utils and some trivial rtf->doc tweaking.