I want to convert .mnc files from BrainWeb (https://brainweb.bic.mni.mcgill.ca/brainweb/anatomic_normal_20.html) to .mha file format for use in TumorSim (https://www.nitrc.org/projects/tumorsim/).
I have tried converting the file from .mnc to .nii using nibabel and mnc2nii, and then converting the .nii file to the .mha format.
However, this process leads to the file size increasing dramatically (from 56.9 MB .mha to 56.9~227.5 MB .nii depending on output voxel format)
From there, converting the .nii file to the .mha format retains the same file size. The size of .mha files used in TumorSim are around 4.8 MB.
Objective: I want a 1 step solution to convert .mnc files to .mha files
Code:
import SimpleITK as sitk
inputImageFileName = 'subject04_wm_v.mnc'
outputImageFileName = 'white_matter.mha'
reader = sitk.ImageFileReader()
reader.SetImageIO("MINCImageIO")
reader.SetFileName(inputImageFileName)
image = reader.Execute()
writer = sitk.ImageFileWriter()
writer.SetFileName(outputImageFileName)
writer.Execute(image)
Output:
(py3env) russ#russ-Latitude-E5450:~/Documents/Testing_Space/ITK$ python mncconverter.py
/tmp/SimpleITK-build/ITK/Modules/ThirdParty/MINC/src/libminc/libsrc2/volume.c:1399 (from MINC): Unable to open file 'subject04_wm_v.mnc'
Traceback (most recent call last):
File "mncconverter.py", line 9, in <module>
image = reader.Execute()
File "/home/russ/Documents/freesurfer/psacnn_brain_segmentation/py3env/lib/python3.6/site-packages/SimpleITK/SimpleITK.py", line 8654, in Execute
return _SimpleITK.ImageFileReader_Execute(self)
RuntimeError: Exception thrown in SimpleITK ImageFileReader_Execute: /tmp/SimpleITK-build/ITK/Modules/IO/MINC/src/itkMINCImageIO.cxx:322:
itk::ERROR: MINCImageIO(0x2de7600): Could not open file "subject04_wm_v.mnc".
After trying to load a BrainWeb image myself, I found this web page that describes the problem:
https://www.slicer.org/wiki/How_to_read_.mnc_files_using_ITK
The issue is that BrainWeb images are stored in MINC 1 format, while ITK/SimpleITK read MINC 2.
There is a utility, mincconvert that converts from 1 to 2 that will allow ITK to read the images:
http://bic-mni.github.io/man-pages/man/mincconvert.html
The other option is to download the images from BrainWeb in the raw format, and then create a MHA header with the proper dimensions. MHA's header is text, so that wouldn't be too hard, either.
It looks like you might have a permissions or path problem. SimpleITK can't seem to find the file. Try checking the permissions and put in a full path name.
Here's a little test program I wrote to check the MNC IO:
import SimpleITK as sitk
img = sitk.GaussianSource(sitk.sitkFloat32, [64,64,64])
sitk.WriteImage(img, "test.mnc")
img2 = sitk.ReadImage("test.mnc")
print(img2)
sitk.Show(img2)
It worked OK for me.
Related
I am using Keras OCR and PyTesseract and was wondering if it is possible to use PDF files as the image input.
If not, does anyone have a suggestion as to how to convert a very massive PDF file into PNG or another acceptable format?
Thank you!
No, as far as I know PyTesseract works only with images. You'll need to convert your pdf to images first.
By "very massive PDF" I'm assuming you mean a pdf with lots of pages. This is not an issue. You can use pdf2image library (see the docs here). The method convert_from_path has an output_folder argument that lets you specify the folder where all your generated images will be saved:
Output directory for the generated files, should be seen more as a
“working directory” than an output folder. The converted images will
be written there to save system memory.
You can later use them one by one instead of your pdf to work with PyTesseract. If you don't assign the returned list of images from convert_from_path you don't risk filling up your memory.
Otherwise, if you are willing to keep everything in memory you can use the returned pages directly, like so:
pages = convert_from_path(pdf_path)
for example, my code :
Python : 3.9
Macos: BigSur
from PIL import Image
from fonctions_images import *
from pdf2image import convert_from_path
path='/Users/yves/documents_1/'
fichier =path+'TOUTOU.pdf'
images = convert_from_path(fichier,500, transparent=True,grayscale=True,poppler_path='/usr/local/Cellar/poppler/21.12.0/bin')
for v in range(0,len(images)):
image=images[v]
image.save(path+"image.png", format="png")
test=path+"image.png"
img = cv2.imread(test) # to store image in memory
img = del_lines(path,img) # to supprime the lines
img = cv2.imread(path+"img_final_bin_1.png")
pytesseract.pytesseract.tesseract_cmd = "/usr/local/bin/tesseract"
d=pytesseract.image_to_data(img[3820:4050,2340:4000], lang='fra',config=custom_config,output_type='data.frame')
I'm reading a tiff file using OpenSlide. Due to its large size, I'm planning to read the image by regions of 4k x 4k using read_region() function. After getting that region, I want to do the same process I have planned for the complete tiff file. To continue that process, I need the image read in OpenSlide. So I can use OpenSlide parameters.
I tried to read the selected region using read_region with Openslide again as follows.
wsi = wsi.read_region((0,0),0,(4000,4000))
wsi = openslide.OpenSlide(wsi)
The issue was I could use parameters I usually get when reading a tiff file using OpenSlide. Does anyone know a way to solve this issue?
I cannot load a csv file using the numpy loadtxt function. There must be something wrong with the file format or something else. I am using anocanda notebook on macbook.
OSError: Macintosh HD\\Users\\binhao\\Downloads\\Iris_data.csv not found.
np.loadtxt("Macintosh HD\\Users\\binhao\\Downloads\\Iris_data.csv")
I tried a solution I found on stackflow involved using:
f = open(u"Macintosh HD\\Users\\binhao\\Downloads\\Iris_data.csv")
f = open("Macintosh HD\\Users\\binhao\\Downloads\\Iris_data.csv")
Above don't work - No such file or directory error
Most of the time is due to some non-escaped charter, try to use raw string:
r"Macintosh HD\\Users\\binhao\\Downloads\\Iris_data.csv"
I was looping some files to copy the content of somes file to a new file but after I run the code, the result shows lot of symbols in the new file, not the text content of the files I looped.
first, when I ran the code without putting the 'encoding' attribute in open file line, it showed an error message like,
UnicodeEncodeError: 'charmap' codec can't encode character '\x8b' in position 12: character maps to .
I tried various encodings like utf-8,latin1 but nothing worked and when i put 'errors=ignore' in the open file line, then the result showed like I described above.
import os
import glob
folder = os.path.join('R:', os.sep, 'Files')
def notes():
for doc in glob.glob(folder + r'\*'):
if doc.endswith('.pdf'):
with open(doc,'r') as f:
x = f.readlines()
with open('doc1.text', 'w+') as f1:
for line in x:
f1.write(line)
notes()
If I understand your example correctly and you’re trying to read PDF files, your problem is not one of encoding but of file format. PDF files don’t just to store your text in coding materials are unique format that you need to be able to read in order to extract the text. There are a couple of python libraries that can read PDF files (such as Py2PDF), please refer to this thread for more information: How to extract text from a PDF file?
I generated some plots in pandas and save it in BytesIO streams, and then I want to add it to a pdf page, then send out the pdf file as attachment in email:
import matplotlib.pyplot as plt
import io
from fpdf import FPDF
fig = plt.figure()
...
buf = io.BytesIO()
fig.savefig(buf, format='png')
pdf = FPDF()
pdf.add_page()
pdf.image(buf.getvalue(), type='PNG')
buf.close()
But this is not working, with the following error reported:
Traceback (most recent call last):
File "XXXX.py", line 166, in send_email
pdf.image(buf.getvalue(), type='PNG')
File "/usr/local/lib/python3.6/site-packages/fpdf/fpdf.py", line 150, in wrapper
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/fpdf/fpdf.py", line 971, in image
info=self._parsepng(name)
File "/usr/local/lib/python3.6/site-packages/fpdf/fpdf.py", line 1769, in _parsepng
if name.startswith("http://") or name.startswith("https://"):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
I want to solve this purely in memory and not to save image files locally. Can anyone help me with this? Thank you so much.
"I want to solve this purely in memory and not to save image files locally."
No you should not do that. UNLESS you save a PDF in memory on a massive memory file drive. Most devices use the file system for extended memory so its a lot easier to use the physical file system rather than build a custom one in your very precious memory resources. Small files may work if your system has a Memory stream File System to use filenames with.
A PDF is a resource hog since you MUST
save the image as a naturally compressed image object
decompress in that precious memory a reprex to inject as a PDF image one way or another
usually duplicated and re-expanded to be visible on screen
then the image data needs to be encoded into a deflated stream object
then the new flated image needs to be written to the pdf as a partial file object with a hardcoded decimal address in a PDF file System.
then the object needs to be cataloged and indexed at the end of the physical file so the WHOLE PDF must also be expanded in memory.
... am I explaining why its simpler to just save down to drip feed cached file objects ? Rather than mess about using Gigabytes of RAM drives.
Yes, you can use bytes instead of a file ...
In my case i have an MSSQL query that contains images as a binary string. And i want to use them directly, without saving multiple image-files. So i was looking for a solution.
What we need:
pip install fpdf2
Then import IO and FPDF2 into your python 3.x file:
import io
from fpdf import FPDF
A look into the image_parsing.py of the fpdf2-repository on Github shows that fpdf2 can work with binary objects too. No real image-file needed.
With BytesIO from io we can create a binary object out of our bytes string.
We name the binary object 'picture' and place it as our image into the PDF page.
import io
from fpdf import FPDF
# create bytes object of the image data
picture = io.BytesIO(b"some initial binary data: \x00\x01")
# set page to portrait A4 and positioning in mm
pdf = FPDF("P", "mm", "A4")
# create a page
pdf.add_page()
# insert image
# on position: x=10mm, y=10mm
# size: width=50mm hight=auto
pdf.image(picture,10,10,50)
# create PDF file
pdf.output("fpdf_test.pdf")
I hope this is helpful to some people.