How to import an existing PDF file in node.js - node.js

I am working on import routines for node, so far I can import text nodes from a PDF using pdf2json, this works well, but doesn't work on PDF's that are image based and contain no text.
So I downloaded pdf2img, however there are plenty of issues with this module, the one I have now is that after running it, I get a lot of 0 byte png files created, no content and an error message:
/docfire/node_modules/gm/lib/command.js:228
proc.stdin.once('error', cb);
^
TypeError: Cannot read property 'once' of undefined
at gm._spawn (/docfire/node_modules/gm/lib/command.js:228:15)
at /docfire/node_modules/gm/lib/command.js:140:19
at series (/docfire/node_modules/array-series/index.js:11:36)
at gm._preprocess
(/docfire/node_modules/gm/lib/command.js:177:5)
at gm.stream (/docfire/node_modules/gm/lib/command.js:138:10)
at convertPdf2Img (/docfire/node_modules/pdf2img/lib/pdf2img.js:93:6)
at /docfire/node_modules/pdf2img/lib/pdf2img.js:67:9
at /docfire/node_modules/async/lib/async.js:246:17
at /docfire/node_modules/async/lib/async.js:122:13
at _each (/docfire/node_modules/async/lib/async.js:46:13)
I've tried posting a issue on the GIT site for the module, but it looks like quite a few people are having exactly the same problem and there doesn't seem to be any activity regarding any fixes.
What I would ideally like is a way to extract text and images from a PDF for node.
I'm running on an iMAC running macOS Sierra v10.12.4
With node version 7.8.0, pdf2img 0.2.0, gm 1.23.0

You can try pdf-image npm package.
https://www.npmjs.com/package/pdf-image
Hope this helps.

Related

Trouble importing VTK files into Paraview (Error reading ascii data)

I am very new to using Paraview, and I'm trying to import a few VTK files and view them. However, I'm receiving the following errors:
Generic Warning: In /Users/kitware/dashboards/buildbot-slave/8275bd07/build/superbuild/paraview/src/VTK/IO/Legacy/vtkDataReader.cxx, line 1436
Error reading ascii data. Possible mismatch of datasize with declaration.
ERROR: In /Users/kitware/dashboards/buildbot-slave/8275bd07/build/superbuild/paraview/src/VTK/IO/Legacy/vtkUnstructuredGridReader.cxx, line 346
vtkUnstructuredGridReader (0x7fb15582bd10): Unrecognized keyword: ,
I can't seem to figure out what's wrong, I've tried converting them to other formats to no avail.
I don't think there's a problem with the files. I can open them with Paraview 5.6. Maybe they were generated with a version of VTK that is more recent than the one used for your version of Paraview. You should install the latest version of Paraview (or at least 5.6).
The big file results in some visible geometry, the smaller one does not. But I have no error message, everything seems ok.

Google Colab - downloads some files, TypeError: Failed to fetch on others

I have a Google Colab notebook with PyTorch code running in it.
At the beginning of the train function, I create, save and download word_to_ix and tag_to_ix dictionaries without a problem, using the following code:
from google.colab import files
torch.save(tag_to_ix, pos_dict_path)
files.download(pos_dict_path)
torch.save(word_to_ix, word_dict_path)
files.download(word_dict_path)
I train the model, and then try to download it with the code:
torch.save(model.state_dict(), model_path)
files.download(model_path)
Then I get a MessageError: TypeError: Failed to fetch.
Obviously, the problem is not with the third party cookies (as suggested here), because the first files are downloaded without a problem. (I actually also tried adding the link in my Allow section, but, surprise surprise, it made no difference.)
I was originally trying to save the model as is (which, to my understanding, saves it as a Pickle), and I thought maybe Colab files doesn't handle downloading Pickles well, but as you can see above, I'm now trying to save a dict object (which is also what word_to_ix and tag_to_ix) are, and it's still not working.
Downloading the file manually with right-click isn't a solution, because sometimes I leave the code running while I do other things, and by the time I get back to it, the runtime has disconnected, and the files are gone.
Any suggestions?

opencv fisherface recognizer

I'm trying to load model files for a FisherFaceRecognizer. The initial problem is that the program was written for an older OpenCV version and it seems some interfaces were changed.
Info about my project:
programming language: Python 3.5
OpenCV Version: 3.3.0
These are the two lines were I had a problem with:
model = cv2.face.createFisherFaceRecognizer()
model.load('foo_model.xml')
In the OpenCV documentation I found out that there is a new way to call the create functions and it seems to work. But I could not find the right call for the load function. I have tried to use the read function of the recognizer, but that results in an error.
model = cv2.face.FisherFaceRecognizer_create()
model.read('foo_model.xml')
The error message I've got when I try to use read():
File can't be opened for reading! in function read
Does somebody can help me with loading the model files? Thank you :)
The problem is with the xml file format. if you open the XML file you will not find "my_object" tag. I will not go to the details of this but, every time I face this problem, it works when I modify the xml file as follows.
<?xml version="1.0"?>
<opencv_storage>
<my_object> //add this
.........
.........
.........
</my_object> //and this
</opencv_storage>
The problem seems to be that the xml format in which the models are saved had been changed. This seems to be a known issue. I am using OpenCV 3.3.0 and want to load a model from an older OpenCV version which results in the mentioned error from the read-function. In the OpenCV Q&A forum a solution was suggested to me, but in my case it did not work. Nonetheless I will drop the link to my post at OpenCV Q&A here. Maybe someone else with the same problem can benefit from it.

Can't get to exif data .JPG image

I'm trying to read the exif data from a .JPG image. I've tried differents solutions found here and there (PIL, piexif, exifread...) and none of them worked for this set of images. It worked for other images taken from another camera but not for this one, all these different methods returning empty dictionaries. It seems that there is no exif data but (I apologies for my newbyness) when I RIGHT-click + properties (I use windows), I do see what is exif data to me : date of creation, etc...
Here is one image :
image.JPG
If another of the thousands of anonymous heroes could help me on this one, I would be very grateful...
Alright so I found a solution which I share now.
The problem is that the libraries that open metadata are not taking all possible configurations for the image file and therefore, they can handle some and some others they cannot. I finally made it using exiftool, an executable that I dowloaded on my windows on this link :
https://sno.phy.queensu.ca/~phil/exiftool/
Then I paste the executable in a folder and I add exiftool.py in that folder, that I got from :
https://github.com/smarnach/pyexiftool/find/master
Then, using this small piece of code (for example):
import exiftool
with exiftool.ExifTool("exiftool.exe") as et:
metadata = et.get_metadata_batch(files)
for d in metadata:
print("{:20.20} {:20.20}".format(d["SourceFile"],
d["File:FileCreateDate"]))
Of course, this is just to show that you indeed can access the metadata, then you can do whatever you want with that. Here is the documentation of the library exiftool : http://smarnach.github.io/pyexiftool/
Cheers, JM

TinyTag import error python 3.3

I have been trying to import tinytag into python to be able to read mp3 tags but I keep receiving the same error. This is the code I am running
from tinytag import TinyTag
tag = TinyTag.get('/some/music.mp3)
print(tag.album)
and the error I recieve from this is
ImportError: No module named 'tinytag'
If anyone could give me any information on how to fix this would be greatly appreciated or can suggest another reader to use that is compatible with python 3.
Like you, I'm new to Python and I struggled with this but I worked it out eventually. I'm sure there is better way, but this worked (on windows, with my examples)
I installed a python module called easy_install (bundled with
setuptools). you can Google this. In the directory \Python26\Scripts you should see an exe file called easy_install if this has worked
Then I downloaded TinyTag to my pc eg
\downloads\tinytag-0.6.1.tar.gz
Then in note pad I wrote a small text file called myinstall.bat with
the contents
easy_install C:/downloads/tinytag-0.6.1.tar.gz
pause
then saved it into \Python26\Scripts and ran it (the pause keeps the
window open so you can see it worked)
Subsequently I started using some software called JetBrains to code with (it's commercial but there is a free edition) and that has an install tool built in which is even easier) I hope this helps

Resources