Issue in pdf to text depending on PDF file's version - linux

I'm actually working on ubuntu as I'm trying to parse pdf files to extract text from them, which I managed to get working (using tesseract for example), BUT as I get a 1.7 pdf file version, conversion doesn't work (I get a blank page in my 'name.txt' file).
So I was wondering if anyone knew about some magic that can solve my problem regarding this pdf version issue...
I looked pretty much everywhere I could on the web, without seeing similar issues, therefore I came to y'all.
Hope you'll find a way to help me, cause google hasn't been such a friend so far...

Related

Trouble Decoding Encoded game cheat file

So i noticed someone was cheating in i game i play and i wanted the file to analyse how the cheats were written and how they work, i do not intend to use them!
I downloaded the zipped file in windows sandbox (as i want to be safe) and extracted it then opened it in vs code.
I got this message first
then when i continue the code looks like this
i am not very experienced so any help in how i can decode this or any errors i made would be appreciated thanks :)

ImageMagick issue on AppEngine Standard (PDFs and NodeJS)

I am using App Engine Standard. Since ImageMagick is available on it, I tried a few PDF manipulation libraries and basically, what I would like to do, is simply converting a PDF into an image.
The issue I am getting is this:
'convert-im6.q16: not authorized /tmp/ygM1sF-Txq00JkGbpal8YWBQ.pdf\'
# error/constitute.c/ReadImage/412.\nconvert-im6.q16: no images
defined/tmp/ygM1sF-Txq00JkGbpal8YWBQ-0.png\' #
error/convert.c/ConvertImageCommand/3258.\n' }
After some research, I found out that post here: Fix for ImageMagick convert errors with pdf files. Here is what he says:
PDF files on Linux systems are usually handled by ghostscript (via the
terminal command gs). And, ImageMagick (done through the terminal
convert command) uses ghostscript for reading and writing PDF files.
Because the security problems are serious and numerous, ImageMagick’s
access to PDF files is then cut off.
Granted, through these security flaws in PDF someone could craft a
malicious image file that, when converted by ImageMagick into a PDF,
will then do very nasty things to your computer.
But, ghostscript has since been updated once and once again with
security fixes. How about a fix for ImageMagick to get PDF
functionality back? Or, at least an explanation of progress towards
fixing this issue?
I can't change the ImageMagick configuration on App Engine Standard, but I wonder if there is something else I can do. Or maybe the engineers at Google would be able to update ImageMagick instead and remove that limitation?
I really need to convert PDF into images, so I wonder if it worth waiting, or if I need to find another solution.
Thanks for your ideas.

Is There a way to recover a severely corrupted EXCEL FILE?

I'm currently working on my data using Ms.Excel.
But suddenly the file that I'm working in was broken. when I tried to open it again I get error where it contain "ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ" (See Picture).
Screenshoot Corrupted Excel File
The data was not really big, it's only arround 130KB. Besides, Autorecovery Mode was Turned On every 10 minutes. But I don't know How To use it.
I tried to solve the problem many times using below method:
First, I tried to check the "Ignore .... " in Excel Option.
Second, I tried Open and Repair the file, but still can't open it.
Third, I tried to use a third party, Using 2 Software (Stellar Phoenix Excel Repair And Recovery Tools For Excel). but Both not Working.
So I wonder if anyone can help me to get my data back? is there any way to recover the file? or is there a way to get the data within the file?
Thank you for all the suggestion you gave me, From the last comment I tried to get and extract the data using Programmer editor and using VBA, but both not give a good result, basically when you tried to extract the data when the file is corrupted it only get a unique character.
So here is my solution, since i'm using windows, I tried to Reinstall my Ms.Office and recover my Excel file on mac, and boom the magic worked.
Once again, thanks for your help.

Converting ichat file into text format using applescript or any other script

I recently received several .ichat archived logs.
I am using Windows 7 and I wanted to know if there was any way to view or convert these files so I would be able to read them on my PC.
I've tried using different formats such as HTML and txt.
I have had no luck in seeing the contents of the files, I searched all over the internet with no luck so this is my last resort.
Please respond if you know any way that I can open these files.

Converting HTML to odt, doc, docx

Is there an easy way to convert HTML(with CSS styles and embedded images) to ODT, DOCX, DOC from the command line on linux server. I searched a lot but have not found a good option.
There was a problem the same way to convert to PDF, decided by wkhtmltopdf. Perhaps there are ways to convert the resulting PDF documents to other formats?
To convert to odt it's pretty easy after installing pandoc.
After the relatively hard part: from odt (or even html) you can script (Open|Libre)Office via e.g. unoconv
Or you can like:
abiword --to=doc filename.odt
Also see this thread, and this blog post.
HTH
If you want to convert HTML into docx you may use a solution like PHPDocX. You need to get the PRO version though because the free one does not include the conversion functionality.
If you're on ruby there is a gem based on libreoffice headless (with pyod/jod converter) and pdf tools.
Post with your issues to the pandoc GoogleGroup, John is very responsive in every way.
You may even find the latest release v1.9 may fix your problem, or maybe you just need to get to know the toolset in more detail.
I found soultion - is abiword in console variant.

Resources