"Convert" creates a subfolder "_dirname" when converting .doc - linux

I try to create thumbnails of every page of a word document (*.doc) on an ubuntu server using "convert", but somehow it creates a subfolder named "_dirname" where it puts a pdf, created out of the doc and then, when trying to create the thumbnails out of that pdf, it can't find the pdf, because it looks in /tmp instead of my work-directory.
convert /var/www/test/test.doc /var/www/test/test.png
=>
convert /tmp/magick-241835c58j2ZyO2oP -> /var/www/test/_dirname/magick-241835c58j2ZyO2oP.pdf using filter : writer_pdf_Export
mv: Error '/tmp/magick-241835c58j2ZyO2oP.pdf' not found
convert: delegate failed `"soffice" --headless --convert-to pdf --outdir `dirname "%i"` "%i" 2> "%Z"; mv "%i.pdf" "%o"' # error/delegate.c/InvokeDelegate/1310.
convert: unable to open image `/tmp/magick-24183JEbOOHLGHxoL': File not found # error/blob.c/OpenBlob/2712.
convert: unable to open file `/tmp/magick-24183JEbOOHLGHxoL': File not found # error/constitute.c/ReadImage/540.
convert: no images defined `doctest.png' # error/convert.c/ConvertImageCommand/3210.
Any ideas how to fix this or what I am doing wrong?

Related

why does the -f switch of kaggle API download not recognize the file name passed to it as a string

i want to extract a subset of image files from the kaggle dataset 'hpa-single-cell-image-classification'. i tried to use the kaggle API.
using the command below, when i download an individual image, it downloads fine,
!kaggle competitions download -c hpa-single-cell-image-classification -f /train/5c27f04c-bb99-11e8-b2b9-ac1f6b6435d0_blue.png
but when i try to pass it through a loop ( kaggle_img_names.csv contains the names of the images )
with open('kaggle_img_names.csv','r') as fh:
data=fh.readlines()
data=[item.strip() for item in data]
data=data[1:10]
for file in data:
print(file)
!kaggle competitions download -c hpa-single-cell-image-classification -f file
'''
it shows 404- file not found .
I have realized that the with quotes at the end of the file name, the API says file not found
!kaggle competitions download -c hpa-single-cell-image-classification -f '/train/5c27f04c-bb99-11e8-b2b9-ac1f6b6435d0_blue.png'
how to pass the name of the file to the API such that the API processes it ? more than downloading the images i want to know why the -f switch of the API does not recognize the the string object (file name) passed to it ? what is the type of the object passed to the -f switch ? is it something other than a string ?
Thanks in advance !

How to define the condition of a corrupted file for audio file in Python

I am using Python 3.6, Jupyter notebook by connecting to a remote machine. I have a large dataset of mp3 files. I use FFmpeg (version is 2.8.14-0ubuntu0.16.04.1.) to convert mp3 files to wav format.
My code below goes over the file path list and if the file is mp3 it converts it to wav format and deletes the mp3 file. The code works but for a few files it stops and gives error. I opened those files and saw that they have no duration and each of them has size 600 looking at the terminal folder size column but it might be a coincidence. The error is file not found for 'temp_name.wav'.
I can see that these corrupted files are not able to be converted to wav. When I delete them manually and run the code again it works. But I have large datasets and cannot know which files are corrupted beforehand. Is there a way to make the code (before converting the file to wav) if the file is corrupted it deletes it and continues to next file. I just don`t know how to define the condition of a corrupted file or if the file cannot be converted to wav.
# npaths is the list of full file paths
for fpath in npaths:
if (fpath.endswith(".mp3")):
cdir=os.path.dirname(fpath) # extract the directory of file
os.chdir(cdir) # change the directory to cdir
filename=os.path.basename(fpath) # extract the filename from the path
os.system("ffmpeg -i {0} temp_name.wav".format(filename))
ofnamepath=os.path.splitext(fpath)[0] # filename without extension
temp_name=os.path.join(cdir, "temp_name.wav")
new_name = os.path.join(ofnamepath+'.wav')
os.rename(temp_name,new_name) # use original filename with wav ext
old_file = os.path.join(ofnamepath+'.mp3') # find and delete the mp3
os.remove(old_file)

How to convert PDF to DOCX on linux

I try to convert pdf file to word, excel and powerpoint.
I already tried a lot of command like these:
soffice -env:UserInstallation=file:///$HOME/.libreoffice-headless/ --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf
/usr/bin/soffice --headless --invisible --convert-to docx file.pdf
soffice --infilter="writer_pdf_import" --convert-to doc file.pdf
/usr/bin/libreoffice --headless --invisible --convert-to doc file.pdf
/usr/bin/soffice --headless --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf
abiword --to=doc file.pdf
unoconv -f doc file.pdf
lowriter --invisible --convert-to doc 'file.pdf'
Always got this error message from soffice/libreoffice/unoconv:
:1: parser error : Document is empty
%PDF-1.7
And this one for abiword
Unable to init server: Could not connect: Connection refused
** (abiword:6477): WARNING **: clutter failed 0, get a life.
Unable to init server: Could not connect: Connection refused
With every command but abiword. I got a doc file with bad character inside.
But never get a proper file.
I try to create a file converter so I only want command line method. Don't want to use someone API.
Thank you
Managed to do it with soffice.
I had to install this package: libreoffice-pdfimport
And don't forget to use --infilter="writer_pdf_import"
Linux has a few apps that can import a pdf as an image: LibreOffice, Okular, Calibre.
But if you want editable text, then you need to install the pdf toolkit pdftk, then run the conversion utility pdf2txt. The terminal command is:
pdf2txt input.pdf output.txt
Thereafter, import the txt file into a wordpro, and complete the final editing/formatting.

How to convert BCP exported .xls file to actual Excel format file?

I have a batch file with the statement below. The export works fine and results are in the file Corp.xls. However, when I try to open this file, I get a warning 'The file you are trying to open is in a different format that that specified by the file extenion ..........
When I open the file and try to 'Save As', I find that it is in Text-tab delimted format.
Is there any way to convert such a file to excel without having to open the file - i.e from the batch file ?
Note: The batch file is very comples. Given below is just a modified snippet.
BCP "exec DBname.dbo.sp_abc '201503' " queryout "\\ABC\3_MAR\Corp.xls" -T -c -S SCC-SLDB

Unocov and libreoffice command line .svg conversion error

I'm trying to convert .png to .svg files using unoconv. The command line tool seems to work well with other formats, but is giving me the following error for conversion to svg files specifically:
$ unoconv -f svg ./sample.png
Unable to store document to file:///sample.svg (ErrCode 3088)
I've successfully used the tool with other formats, and the unoconv page even indicates that the .svg ouput format is supported.
I thought the issue might have something to do with the libreoffice used by unoconv, so I tried using the libreoffice command line tool directly. I used both of the following commands with no success:
./soffice --headless --invisible --convert-to svg --outdir ./result ./sample.png
./soffice --headless --invisible --convert-to svg:"impress_svg_Export" --outdir ./result ./sample.png
Both commands resulted in Error: Please reverify input parameters..., although the first command worked perfectly for when jpeg was used rather than svg.

Resources