Python3 extract contents of an .iso file - python-3.x

I am trying to find a way to extract the contents of an iso image to a designated file path. This can be done using 7-Zip easily, but I can't find a Python way to do it. There seems to be a library isoparser https://github.com/barneygale/isoparser, but it does not give many examples of how to do this.
Does anyone have experience doing this or can provide some examples?

I'd switch gears and use pycdlib. Check out this library's example of an iso extract script. Note: I haven't used either pycdlib or isoparser. But the former looks more friendly.

Take a look a this lib7zip bindings.
Example usage:
# pip install git+https://github.com/topia/pylib7zip
from lib7zip import Archive, formats
archive = Archive("cd.iso", forcetype="Iso")
#extract all items to the directory, directory will be created if it doesn't exist
archive.extract('output_dir')
# iterate over archive contents
for f in archive:
if f.is_dir:
continue
print("; %12s %s %s" % ( f.size, f.mtime.strftime("%H:%M.%S %Y-%m-%d"), f.path))

Related

Julia: Using ProtoBuf to read messages from gzipped file

A sensor provides a stream of frames containing object coordinates, which are stored in ProtoBuf format in a gzipped file. I would like to read this file in Julia.
Using protoc, I have generated the Protobuf files for both Python and Julia, coordinate_push.py and coordinate_push.jl
My Python code is as follows:
frameList = []
with gzip.open(filePath) as f:
data = f.read()
next_pos, pos = 0, 0
while pos < len(data):
msg = coordinate_push.CoordinatesFrame()
next_pos, pos = _DecodeVarint32(data, pos)
msg.ParseFromString(data[pos:pos + next_pos])
frameList.append(msg)
pos += next_pos
I'd like to rewrite the above in Julia, and don't know where to start. Part of the problem is that I haven't fully understood the Python script (IO is not my strong point).
I understand that I need:
to open the gzip file, presumably using using GZip; file = GZip.open(file_path, "r")
to read in the data, along the lines of using ProtoBuf; data = readproto(iob, CoordinatesFrame())
What I don't understand is:
how to define iob, and especially how to link it to file (in the Julia Protobuf manual, we had iob = PipeBuffer(), but here it's a gzip-file that we'd like to read)
how to replicate the while-loop in Julia, and in particular the mysterious _DecodeVarint32 (I'm on Windows, if it's related to that.)
whether the file coordinate_push.jl has to be in the same directory as my main file, and if not, how I can properly import it (it is currently in a proto subfolder, and in Python I'd import it using from src.proto import coordinate_push)
Insight on any of the three points would be highly appreciated.
You should open an issue on the Gzip GitHub repo and ask this first part of your question there (I am not a Gzip expert unfortunately).
On the second point, I suggest looking here: https://github.com/JuliaIO/FileIO.jl/blob/master/README.md for lots of examples of FileIO loops which seems exactly what you need to replicate that Python loop. For the second part of this question, you best bet for that function is to try and hunt down the definition on GitHub or in the docs somewhere.
For the 3rd questions, coordinate_push.jl does not need to be in the same folder as your "main file" (I am not sure what you mean by this so perhaps it would help to add context on the structure of your files). To import that file all you need to do is add include("path/to/coordinate_push.jl") at the top of the file you want to call/run the code from. It's worth noting that the path can either be the absolute path or the relative project path (in some cases).

Using Python to search specific file names in (sub)folders

I'am trying to make a file searching, Python based program, with GUI.
It's going to be used to search specified directories and subdirectories. For files which filenames have to be inserted in an Entry-box.
while I'am fairly new to python programming, I searched the web and gained some information on the os module.
Then I moved on and tried to write a simple code with os.walk and without the GUI program:
import os
for root, dirs, files in os.walk( 'Path\to\files'):
for file in files:
if file.endswith('.doc'):
print(os.path.join(root, file))
Which worked fine, however... file.endswith() Only looks to the last part of the filename.
The problem is that in the file path are over 1000 files with .doc. And I want the code to be able to search parts of the file name, for example "Caliper" in filename "Hilka_Vernier_Caliper.doc".
So I went on and searched for other methods than file.endswith() and found something about file.index(). So I changed the code to:
import os
for root, dirs, files in os.walk( 'Path\to\files'):
for file in files:
if file.index('Caliper'):
print(os.path.join(root, file))
But that didn't work as planned...
Does someone on here have an idea, how I could make this work?
You may use pathlib instead of the old os: https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob
BTW, file.index raises an exception if the name is not not found, so you need a try/except clause.
Another way is to use if "Caliper" in str(file):

gdcm2vtk usage to convert a stack of images into a vti file

I've managed to compile GDCM with VTK and I have a particular application I would like to use, which is the "gdcm2vtk.exe".
Now, how's the syntax for converting a stack of imags into a ".vti" file? so far I have this:
gdcm2vtk Input_Directory file.vti
Now, when I run somthing like this:
gdcm2vtk "C:/dicom/dicom directory" output.vti I get an error:
could not find no reader to handle file: "C:/dicom/dicom directory"
Is there anything I'm missing there?
gdcm2vtk does not handle a directory as input as specified in the documentation.
You may want to convert your DICOM Series into a single DICOM Instance using gdcmimg
As of GDCM 2.6 gdcm2vtk is now able to take as input a directory. Pay attention to sort the file according to the well known Image Orientation (Patient) & Image Position (Patient) strategy instead of relying on the filenames ordering to reconstruct your VTK (*.vti) file:
$ gdcm2vtk --ipp-sort input_dir output.vti

how to verify links in a PDF file

I have a PDF file which I want to verify whether the links in that are proper. Proper in the sense - all URLs specified are linked to web pages and nothing is broken. I am looking for a simple utility or a script which can do it easily ?!
Example:
$ testlinks my.pdf
There are 2348 links in this pdf.
2322 links are proper.
Remaining broken links and page numbers in which it appears are logged in brokenlinks.txt
I have no idea of whether something like that exists, so googled & searched in stackoverflow also. But did not find anything useful yet. So would like to anyone has any idea about it !
Updated: to make the question clear.
You can use pdf-link-checker
pdf-link-checker is a simple tool that parses a PDF document and checks for broken hyperlinks. It does this by sending simple HTTP requests to each link found in a given document.
To install it with pip:
pip install pdf-link-checker
Unfortunately, one dependency (pdfminer) is broken. To fix it:
pip uninstall pdfminer
pip install pdfminer==20110515
I suggest first using the linux command line utility 'pdftotext' - you can find the man page:
pdftotext man page
The utility is part of the Xpdf collection of PDF processing tools, available on most linux distributions. See http://foolabs.com/xpdf/download.html.
Once installed, you could process the PDF file through pdftotext:
pdftotext file.pdf file.txt
Once processed, a simple perl script that searched the resulting text file for http URLs, and retrieved them using LWP::Simple. LWP::Simple->get('http://...') will allow you to validate the URLs with a code snippet such as:
use LWP::Simple;
$content = get("http://www.sn.no/");
die "Couldn't get it!" unless defined $content;
That would accomplish what you want to do, I think. There are plenty of resources on how to write regular expressions to match http URLs, but a very simple one would look like this:
m/http[^\s]+/i
"http followed by one or more not-space characters" - assuming the URLs are property URL encoded.
There are two lines of enquiry with your question.
Are you looking for regex verification that the link contains key information such as http:// and valid TLD codes? If so I'm sure a regex expert will drop by, or have a look at regexlib.com which contains lots of existing regex for dealing with URLs.
Or are you wanting to verify that a website exists then I would recommend Python + Requests as you could script out checks to see if websites exist and don't return error codes.
It's a task which I'm currently undertaking for pretty much the same purpose at work. We have about 54k links to get processed automatically.
Collect links by:
enumerating links using API, or dumping as text and linkifying the result, or saving as html PDFMiner.
Make requests to check them:
there are plethora of options depending on your needs.
https://stackoverflow.com/a/42178474/1587329's advice was inspiration to write this simple tool (see gist):
'''loads pdf file in sys.argv[1], extracts URLs, tries to load each URL'''
import urllib
import sys
import PyPDF2
# credits to stackoverflow.com/questions/27744210
def extract_urls(filename):
'''extracts all urls from filename'''
PDFFile = open(filename,'rb')
PDF = PyPDF2.PdfFileReader(PDFFile)
pages = PDF.getNumPages()
key = '/Annots'
uri = '/URI'
ank = '/A'
for page in range(pages):
pageSliced = PDF.getPage(page)
pageObject = pageSliced.getObject()
if pageObject.has_key(key):
ann = pageObject[key]
for a in ann:
u = a.getObject()
if u[ank].has_key(uri):
yield u[ank][uri]
def check_http_url(url):
urllib.urlopen(url)
if __name__ == "__main__":
for url in extract_urls(sys.argv[1]):
check_http_url(url)
Save to filename.py, run as python filename.py pdfname.pdf.

Zip Directory with Python

I'm trying to zip bunch of folders individually. The folders contain files. I wrote a script that seems to work perfectly, except that the resulting zip files are not actually compressed. THey're the same size as the original directory!
Here is my code:
import os, zipfile
workspace = "C:\\ziptest"
dirList = os.listdir(workspace)
def zipDir(path, zip):
for root, dirs, files in os.walk(path):
for file in files:
zip.write(os.path.join(root, file))
for item in dirList:
zip = zipfile.ZipFile('%s.zip' % item, 'w')
zipDir('C:\\ziptest\%s' % item, zip)
zip.close()
I'm not a Python expert, but a quick lookup shows that there is another argument for zip.write such as zipfile.ZIP_DEFLATED. I grabbed that from here. I quote:
The third, optional argument to the write method controls what compression method to use. Or rather, it controls whether data should be compressed at all. The default is zipfile.ZIP_STORED, which stores the data in the archive without any compression at all. If the zlib module is installed, you can also use zipfile.ZIP_DEFLATED, which gives you “deflate” compression.
The reference is here. Look for the constant ZIP_DEFLATED; it's definition:
The numeric constant for the usual ZIP compression method. This requires the zlib module. No other compression methods are currently supported.
I suppose that means that only default compression is supported... hope that helps!
Is there any reason you don't just call the shell command, like
def zipDir(path, zip):
subprocess.Popen('7z a -tzip %s %s'%(path, zip))

Resources