Convert pdf to images using Node package or open source tool

Convert pdf to images using Node package or open source tool - node.js

I'm looking for an open-source tool or a NPM package, which can be ran using node (for example by spawning a process and calling command line).
The result I need a PDF file converted/broken to images. Where each page in PDF is now an image file.
I checked
https://npmjs.com/package/pdf-image -- seems to be last maintained 3 years ago.
same for https://npmjs.com/package/pdf-img-convert
Please advise which package/tool I can use?
Thanks in advance.

Be aware generally https://npmjs.com/package/pdf-img-convert is frequently updated thus the better of the two, but has 3 pending pull requests so review if they impact your useage. (Note https://npmjs.com/package/pdf-image has a significantly much heavier set of dependencies to break and also has a much bigger list of pending pull requests thus your correct assumption the older it is ....)
However current pdf-img-convert 1.0.3 has a breaking dependency that needs a manual correction due to a change in Mozilla naming earlier this year from es5 to legacy.
see https://github.com/olliet88/pdf-img-convert.js/issues/10
For a cross platform Open Source CLI tool I would suggest Artifex MuTool (AGPL is not free for commercial use, but your getting quality support) has continuous daily commits, it can be programmed via Mutool Run ecma.js
Out of the box a simple convert in.pdf out%4d.png will attempt fixing broken PDF but may reject some that need a more forgiving secondary approach such as above.

Go ahead with the second one.
https://npmjs.com/package/pdf-img-convert

Related

Finding relevant info when choosing Node Alpine Docker image version

Some might view this question as an opinion based one, but please consider that I am just asking for information sources.
I am working on a Docker based project, using a Node Alpine image. Right now I'm using the node:16.13-alpine image.
When I start updating images to the latest version, I'm always at a loss as to which version to pick.
In my example, the node image page https://hub.docker.com/_/node?tab=description&amp%3Bpage=1&amp%3Bname=alpine lists the following available images versions:
18-alpine3.15, 18.10-alpine3.15, 18.10.0-alpine3.15, alpine3.15, current-alpine3.15
18-alpine, 18-alpine3.16, 18.10-alpine, 18.10-alpine3.16, 18.10.0-alpine, 18.10.0-alpine3.16, alpine, alpine3.16, current-alpine, current-alpine3.16
16-alpine3.15, 16.17-alpine3.15, 16.17.1-alpine3.15, gallium-alpine3.15, lts-alpine3.15
16-alpine, 16-alpine3.16, 16.17-alpine, 16.17-alpine3.16, 16.17.1-alpine, 16.17.1-alpine3.16, gallium-alpine, gallium-alpine3.16, lts-alpine, lts-alpine3.16
14-alpine3.15, 14.20-alpine3.15, 14.20.1-alpine3.15, fermium-alpine3.15
14-alpine, 14-alpine3.16, 14.20-alpine, 14.20-alpine3.16, 14.20.1-alpine, 14.20.1-alpine3.16, fermium-alpine, fermium-alpine3.16
This list is of course an ever moving target.
Now, when picking a version out of all of these, what element can I take into consideration (short of reading every single release note for each image)?
Is there a page somewhere offering a high level view of these images, of known issues? Are some of these images designed to be "safe bets", unlikely to introduce freshly introduced bugs? I run an npm audit on packages used inside my image from time to time, but is there some equivalent tool which might alert it is time to update the node image itself, because a new bug / security breach has been found?
I know this is a pretty wide question, but I am sure there are some good practice guidelines to follow here, any pointer is appreciated.
Thanks!

The two most important things to do here are
Have good integration tests; and
Check your Dockerfile into source control.
If you have both of these things then trying out any of the images you list isn't a huge risk. Update the Dockerfile FROM line, build an image, and run it; if the integration tests pass, check in the change; and if not, revert it. If you can set up your continuous-integration system to run the tests for you then this becomes "open a pull request and wait for a passing build".
The other trade-off is how much you want an image you know works, versus an image that gets regular updates. Most Docker images are built on some underlying Linux distribution. The node:16.13-alpine image you have currently isn't in the list of images you show, which means that, if there is some vulnerability in the underlying Alpine base, that particular image isn't getting rebuilt. But, conversely, your build might automatically update from Node 16.13.0 to 16.13.2 without you being aware of it.
It also helps to understand your language's update and versioning strategy. Node, for example, puts out an major release roughly annually with even major version numbers (14, 16, 18, ...), but Python's annual releases have minor version numbers (3.8, 3.9, 3.10, ...).
I'd suggest:
If you can tolerate not knowing the exact version, then use a release-version image like node:16-alpine or python:3.10. Make sure to docker build --pull to get the updates in the base image.
If you've pinned to an exact version like node:16.13.0-alpine, updating to the most recent patch release node:16.13.2-alpine is most likely safe.
If your base image uses semantic versioning, then upgrading minor releases node:16.17-alpine is supposed to be safe.
It is worth reading the release notes for major-version upgrades.

Vdbench 50404rc2 beta code - sparse file creation

Can anyone please let me know any pointers to create Sparse files(holey files) in latest vdbench 50404rc2. It seems that this is the latest supported feature.
Link for more info:
https://community.oracle.com/thread/3759500?start=0&tstart=0

The answer was given by Henk, but on oracle vdbench forum so posting the excerpt from it,also the below link to forum post.
This is EXPERIMENTAL, so it will work for now, but once I get feedback and decide that this experiment was successful I will change the instructions to activate it.
That means that the '-d86' info below will no longer work.
To activate truncate, add '-d86' as an execution parameter, or, add 'debug=86' at the top of your parameter file.
(For experiments adding an other 'debug=' parameter is much easier than fiddling with the Vdbench parameter parser. If I decide to make this permanently available I'll worry about adding a more 'official' parameter.)
This uses the Unix 'ftruncate()' and similar Windows function during file creation.
This will create ONLY sparse files during the format, not one block of data is written until further Vdbench workloads are run against these files.
The definition file.
debug=86
fsd=fsd1,anchor=/tmp/sparsedir,depth=1,width=1,files=10,size=40k
fwd=fwd1,fsd=fsd1,operation=read,xfersize=4k,fileio=sequential,fileselect=random,threads=1
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10,interval=1

What is .ntvs_analysis.dat

I am using Node.js Tools for Visual Studio.
When I am opening a project it will take some time to load, because of Node.js analysis process.
Another problem is .ntvs_analysis.dat is growing larger and larger?
What is it and do I need it?

To my understanding, the NTVS extension analyzes your code to provide IntelliSense support. The result of the analyzed code is stored in ntvs_analysis.dat. However, it doesn't only analyze your code but also all installed npm_modules and their dependencies (and theirs, and theirs)). So installing more modules will make your ntvs_analysis.dat grow really fast.
There is an open issue on github https://github.com/Microsoft/nodejstools/issues/88 about this. The file is getting really big for some people including myself.
One proposed solution in the discussion is to reduce the depth of scanned folders. Turning off IntelliSense would help keeping the file smaller according to the discussion.

How can I create a customized version of an existing pdf file with node.js?

I have an old system that was written in PHP a long time ago that I would like to update to node.js to allow me to share code with a more modern system. Unfortunately, one of the main features of the PHP system is a tool that allows it to load an existing PDF file (which happens to be a government form), fill out the user's information, and provide a PDF to the browser that has all of that information present.
I have considered making a PHP script that will just do the PDF customization and using node for everything else, but it seems like something like this should be able to be done without requiring PHP to be installed.
Any idea how I might solve my problem just using node?

After a lot of searching and nearly giving up, I did eventually find that the HummusJS library will do what I want to do!
Update April 2020: In the intervening years since I posted this other options have cropped up which look like they should work. Since this question still gets a lot of attention I thought I'd come back and update with some other options:
pdf-lib - This one is my current favorite; it works great. It may have limitations for extremely large PDFs, but it is constantly improving and you can do nearly anything with it -- if not through the helper API then through the abstraction they provide which allows you to use nearly any raw PDF feature, though that requires more knowledge of the PDF file format than most possess.
It's worth noting that pdf-lib doesn't support loading encrypted pdfs, but you can use something like qpdf to strip the encryption before loading it.
https://www.npmjs.com/package/nopodofo - This one should be one of the best options out there, but I couldn't get it working myself on a mac
https://www.npmjs.com/package/node-pdfsign - Not exactly the same thing but can be used with other tools to do digital signatures on a PDF. Haven't used it yet, but I expect do
Update Dec 2021: I'm still using pdf-lib and I think it's still the best available library, but there are a lot of new libraries that have come out in the last couple of years for handling PDFs, so it's worth looking around a bit.

Memcached on NodeJS - node-memcached or node-memcache, which one is more stable?

I need to implement a memory cache with Node, it looks like there are currently two packages available for doing this:
node-memcached (https://github.com/3rd-Eden/node-memcached)
node-memcache (https://github.com/vanillahsu/node-memcache)
Looking at both Github pages it looks like both projects are under active development with similar features.
Can anyone recommend one over the other? Does anyone know which one is more stable?

At the moment of writing this, the project 3rd-Eden/node-memcached doesn't seem to be stable, according to github issue list. (e.g. see issue #46) Moreover I found it's code quite hard to read (and thus hard to update), so I wouldn't suggest using it in your projects.
The second project, elbart/node-memcache, seems to work fine , and I feel good about the way it's source code is written. So If I were to choose between only this two options, I would prefer using the elbart/node-memcache.
But as of now, both projects suffer from the problem of storing BLOBs. There's an opened issue for the 3rd-Eden/node-memcached project, and the elbart/node-memcache simply doesn't support the option. (it would be fair to add that there's a fork of the project that is said to add option of storing BLOBs, but I haven't tried it)
So if you need to store BLOBs (e.g. images) in memcached, I suggest using overclocked/mc module. I'm using it now in my project and have no problems with it. It has nice documentation, it's highly-customizable, but still easy-to-use. And at the moment it seems to be the only module that works fine with BLOBs storing and retrieving.

Since this is an old question/answer (2 years ago), and I got here by googling and then researching, I feel that I should tell readers that I definitely think 3rd-eden's memcached package is the one to go with. It seems to work fine, and based on the usage by others and recent updates, it is the clear winner. Almost 20K downloads for the month, 1300 just today, last update was made 21 hours ago. No other memcache package even comes close. https://npmjs.org/package/memcached

The best way I know of to see which modules are the most robust is to look at how many projects depend on them. You can find this on npmjs.org's search page. For example:
memcache has 3 dependent projects
memcached has 31 dependent projects
... and in the latter, I see connect-memcached, which would seem to lend some credibility there. Thus, I'd go with the latter barring any other input or recommenations.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string