Convert xva (citrix xenserver) image to .img (vanilla xen) image

Convert xva (citrix xenserver) image to .img (vanilla xen) image - linux

I am looking if there is any tried and tested way to convert Citrix xenserver .xva file to vanilla xen .img file?
Tried "qemu-img convert -O raw" option but that didnt work.
Thanks

To summarize the .xva format:
it's a tar file
it contains a folder
the folder contains chunks of 1MB each
they can be concatenated, but blank space needs to be filled
Please follow
http://wiki.sysconfig.org.uk/display/howto/Convert+Citrix+XenServer+images+to+plain+Xen

Related

Tesseract Batch Convert Images to Searchable PDF And Multiple Corresponding Text Files

I’m using tesseract to batch convert a list of images to both a searchable PDF as well as a TXT file containing the OCRd text.
tesseract infile outfile -l eng myconfig
infile contains a list of image paths to process
myconfig contains tesseract preferences to specify the output types (tessedit_create_text 1 and tessedit_create_pdf 1)
This leaves me with outfile.pdf and outfile.txt, the latter of which contains page separators for delimiting text between images.
What I’m really looking to do, however, is to output multiple TXT files on a per-image basis, using the same corresponding image name. For example, Image1.jpg.txt, Image2.jpg.txt, Image3.jpg.txt...
Does tesseract have the option to support this behavior natively? I realize that I can loop through the image file list and execute tesseract on a per-image basis, but this is not ideal as I’d also have to run tesseract a second time to generate the merged PDF. Instead, I’d like to run both options at the same time, with less overall execution time.
I also realize that I can split the merged TXT file on the page separator into multiple text files, but then I have to introduce less elegant code to map and rename all of those split files to correspond to their original image names: Rename 0001.txt to Image1.jpg.txt...
I’m working with both Python 3 and Linux commands at my disposal.

You can prepare a batch file that loops through the input images and output to both txt and pdf at the same time -- more efficient, one single OCR operation instead of two. You can then split output .txt file to pages.
tesseract inimagefile outfile txt pdf

Converting multiple images to a single PDF file.
On Linux, you can list all images and then pipe them to tesseract
ls *.jpg | tesseract - yourFileName txt pdf
Where:
youFileName: is the name of the output file.
txt pdf: are the output formats, you can also use only one of them.
Converting images to individual text files
On Linux, you can use the for loop to go through files and execute an action for every file.
for FILE in *.jpg; do tesseract $FILE ${FILE::-4}; done
Where:
for FILE in *.jpg : loop through all JPG files (you can change the extension based on your format)
$FILE: is the name of the image file, e.g. 001.jpg
${FILE::-4}: is the name of the image but without the extension, e.g. 001.jpg will be 001 because we removed the last 4 characters.
We need this to name the text files to the corresponding names, e.g.
001.jpg will be converted to 001.txt
002.jpg will be converted to 002.txt

Since Tesseract doesn't seem to handle this natively, I've just developed a function to split the merged TXT file on the page separator into multiple text files. Although from my observations, I'm not sure that Tesseract runs any faster by simultaneously converting batch images to both PDF and TXT (versus running it twice - once for PDF, and once for TXT).

Thank you!
BTW i'm using 4.1.1.
And i discovered another trainedata for spanish language that do a better job than the standard one. Actually recognizes well the "o" character. The only problem is the processing time, but i let the PC working overnight.
Honestly i don't know how the new trainedata file is doing the job better. I donwloaded at:
https://github.com/tesseract-ocr/tessdata_best

How to generate pdf file of text and image in linux?

I am generating a logfile on one of my servers.
Storing alot of data, then sending it to my mail once a month as a pdf file.
The prosess i am using is to 'cat' alot of commands to a text file, then convert it and send.
Is there any linux programs or some eazy way to do something simulare and add a image i have stored on the server in the pdf file?

This answer assumes that you just want to put the image at the end of the PDF.
You could first convert the image using imagemagick to a PDF doing this (will also work with different file types):
convert image.jpg image.pdf
Then, you can use a tool like stapler or pdftk to combine your generated text PDF and the image.pdf (you can add multiple images):
stapler cat text.pdf image.pdf combined.pdf
pdftk text.pdf image.pdf output combined.pdf

Possible EOL issue during unrar?

I have large CSV file that I have compressed with WINRAR in windows 8.
After uploading the file to Centos 7 x64 on my VPS, I used unrar. The file is exactly the same size as it should be after being uncompressed but when I run my import script exactly half the number of rows are imported.
The same script works file locally so my hunch is that rar/unrar might be doing something to EOL?
Just wondered if anyone has experienced anything like this before?

Rar was not the issue at all. It was a column I was using in the CSV to index the data that was not always present.

how to convert downloaded video file to text file

I have downloaded many video files and it should be converted to text files. Can anybody please help me out to convert a video file into a text? This is very important for my project.. Please suggest the answers soon as possible..
Thanks in advance

Generally i recommend convert file to Base64. On linux You can use base64 command line tool to convert binary file to text representation.
base64 file.avi > file.txt
To decode from text file:
base64 -d file.txt > file.avi

If it's a youtube video then you can just go to more next to save and click open transcript. I found it pretty easy. Then you can copy and paste it into a text file.

Corrupt image file after uploading it using ftp command from linux, with type ASCII

I have try to transfer a image file using ftp command in linux, from linux platform to windows platform, doing like this:
ftp cs.unitbv.ro
ascii
get test.jpg
After this, when I open the image it says that the file is corrupted. If somebody know how to repair the image file let me know.
Thanks

transfer it as binary otherwise it will get corrupted as newlines are converted, for example.

Try transferring it using binary mode.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Convert xva (citrix xenserver) image to .img (vanilla xen) image - linux

I am looking if there is any tried and tested way to convert Citrix xenserver .xva file to vanilla xen .img file? Tried "qemu-img convert -O raw" option but that didnt work. Thanks

To summarize the .xva format: it's a tar file it contains a folder the folder contains chunks of 1MB each they can be concatenated, but blank space needs to be filled Please follow http://wiki.sysconfig.org.uk/display/howto/Convert+Citrix+XenServer+images+to+plain+Xen

Related

Tesseract Batch Convert Images to Searchable PDF And Multiple Corresponding Text Files

How to generate pdf file of text and image in linux?

Possible EOL issue during unrar?

how to convert downloaded video file to text file

Corrupt image file after uploading it using ftp command from linux, with type ASCII

Categories

Resources