I am reading a pdf file from AWS s3 bucket and want to generate a new file with additional custom metadata using node in lambda.
I tried with pdf-lib NPM and was able to generate a new file but didn't find a way to add custom metadata. the NPM document only has methods to add default properties like title, author.
Please suggest any way to add custom metadata
the best library for PDF manipulation is pdfmake
https://www.npmjs.com/package/pdfmake
you can achieve any thing from using this library.
Related
I am trying to create a pdf file that contains images, tables from HTML data in AWS lambda using python. I searched a lot on google and I didn't find any super cool solution. I tried some libraries in local(FPDF, pdfKit) and but it doesn't work on AWS. Is there any simple tool to create pdf and upload it to S3 bucket. Thanks in advance.
you can use reportlab PDF python module. It is good for all the things you have asked for. You can add images, create tables etc. There are a lot of styling options available as well. You can find more about it here: https://www.reportlab.com/docs/reportlab-userguide.pdf
I am using this is in my production and works pretty well for my use case where I have to create an invoice. You can create the invoice in the /tmp directory and then upload this to S3
pdfkit library works with aws lambda. pdfkit internally needs the wkhtmltopdf binaries installed, you can add them as lambda layer. You can download files from https://wkhtmltopdf.org/downloads.html.
Once you add the lambda layers you can set the config path as following:
config = pdfkit.configuration(wkhtmltopdf="/opt/bin/wkhtmltopdf")
pdfkit.from_string/from_file(input, <temp path of pdf file on lambda>, configuration=config)
You can uplod the file generated in your lambda temp location to S3 bucket using upload_file(). You can refer https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file on how to upload to s3 bucket.
I'm trying to retrieve properties like Operation name, method type, request-response parameters etc. from a WSDL file/url through nodejs. I tried using an npm package 'wsdlrdr' which does return the operation name but does not retrieve the parameters the same way if I import the wsdl file in SOAP UI.
Is there a way to retrieve all the elements using nodejs?
Thanks
Please try SOAP npm package to retrive details from WSDL file/url. It works well with WSDL files
I'm trying to use Dropbox as a cloud-based file receptacle for an app/script. The script, written in Python, needs to take PDFs from the Dropbox and use the tika-python wrapper to convert to string.
I'm able to connect to the Dropbox API and use the files_download_to_file() method to download the PDFs to disk, and then use the tika from_file() method to pull that download file from the disk to process. Example:
# Download ex.pdf to local disk
dbx.files_download_to_file('/my_local_path/ex_on_disk.pdf', '/my_dropbox_path/ex.pdf')
from tika import parser
parsed = parser.from_file('ex_on_disk.pdf')
The problem is that I'm planning on running this app on something like Heroku. I don't think I'm able to save anything locally and then access it again. I'm not sure how to get something from the Dropbox API that can be directly referenced by the tika wrapper to run the same as above. I think the PHP SDK has a file_get_contents and a file_put_contents set of methods but it doesn't appear to have a companion in the Python SDK.
I've tried using the shareable links in place of a filename but that hasn't worked. Any ideas? I know there's also the files_download method which downloads the FileMetadata object but I have no idea what to do with this and am having trouble finding more about it.
TLDR; How can I reference a file on Dropbox with a filename string such as 'example.pdf' to be used in another function that is trying to read a file from disk, without saving that Dropbox file to disk?
I figured it out. I used the files_download method to get the byte string and then use the from_buffer method of tika instead:
md, response = dbx.files_download(path)
file_contents = response.content
parsed = parser.from_buffer(file_contents)
I know that it's possible to upload files to my cloud-files account in Node.js, using the following module: node-cloudfiles.
But is it also possible to upload a filestream directly?
In my case I am dowloading an image from a certain location in Node.js and want to upload this directly to my cloud-files account without saving the image temporary on my server.
Of course it is possible - you can just read the documentation on Rackspace Cloud Files API ( http://docs.rackspacecloud.com/files/api/cf-devguide-latest.pdf ) and implement the necessary parts yourself.
However, I'd suggest to wait until https://github.com/nodejitsu/node-cloudfiles/pull/11 gets integrated into the trunk - then node-cloudfiles library will support uploading files using streams so you won't have to create files before uploading.
Im using .NET Package and would like to store some extra data/metadata for each PackagePart. For now Im having a "shadow" file for each file. For myfile1.dat there is a myfile.dat.meta. In for instance SharpZipLib it is possible to add ExtraData (byte[]) for each ZipEntry but for some reasons I have choosen Package for this project. Is it possible somehow to add this functionality to (Zip)Package? I would like to be able to attach metadata to each PackagePart without using "shadow"-files. Is it possible to use ADS (Alternate Data Stream) for a ZipPackage? Any other options?