Check ISO is valid or not - iso

Is there any C# way to check an ISO file is valid or not i.e. valid Iso format or any other check possible or not.
The scenario is like, if any text file(or any other format file) is renamed to ISO and given it for further processing. I want to check weather this ISO file is a valid ISO file or not? Is there any way exist programmatically like to check any property of the file or file header or any other things
Thanks for any reply in advance

To quote the wiki gods:
There is no standard definition for ISO image files. ISO disc images
are uncompressed and do not use a particular container format; they
are a sector-by-sector copy of the data on an optical disc, stored
inside a binary file. ISO images are expected to contain the binary
image of an optical media file system (usually ISO 9660 and its
extensions or UDF), including the data in its files in binary format,
copied exactly as they were stored on the disc. The data inside the
ISO image will be structured according to the file system that was
used on the optical disc from which it was created.
reference

So you basically want to detect whether a file is an ISO file or not, and not so much check the file, to see if it's valid (e.g. incomplete, corrupted, ...) ?
There's no easy way to do that and there certainly is not a C# function (that I know of) that can do this.
The best way to approach this is to guess the amount of bytes per block stored in the ISO.
Guess, or simply try all possible situations one by one, unless you have an associated CUE file that actually stores this information. PS. If the ISO is accompanied by a same-name .CUE file then you can be 99.99% sure that it's an ISO file anyway.
Sizes would be 2048 (user data) or 2352 (raw or audio) bytes per block. Other sizes are possible as well !!!! I just mentioned the two most common ones. In case of 2352 bytes per block the user data starts at an offset in this block. Usually 16 or 24 depending on the Mode.
Next I would try to detect the CD/DVD file-systems. Assume that the image starts at sector 0 (although you could for safety implement a scan that assumes -150 to 16 for instance).
You'll need to look into specifics of ISO9660 and UDF for that. Sectors 16, 256 etc. will be interesting sectors to check !!
Bottom line, it's not an easy task to do and you will need to familiarize yourself with optical disc layouts and optical disc file-systems (ISO9660, UDF but possibly also HFS and even FAT on BD).
If you're digging into this I strongly suggest to get IsoBuster (www.isobuster.com) to help you see what the size per block is, what file systems there are, to inspect the different key blocks etc.

In addition to the answers above (and especially #peter's answer): I recently made a very simple Python tool for the detection of truncated/incomplete ISO images. Definitely not validation (which as #Jake1164 correctly points out is impossible), but possibly useful for some scenarios nevertheless. It also supports ISO images that contain Apple (HFS) partitions. For more details see the following blog post:
Detecting broken ISO images: introducing Isolyzer
And the software's Github repo is here:
Isolyzer

You may run md5sum command to check the integrity of an image
For example, here's a list of ISO: http://mirrors.usc.edu/pub/linux/distributions/centos/5.4/isos/x86_64/
You may run:
md5sum CentOS-5.4-x86_64-LiveCD.iso
The output is supposed to be the same as 1805b320aba665db3e8b1fe5bd5a14cc, which you may find from here:
http://mirrors.usc.edu/pub/linux/distributions/centos/5.4/isos/x86_64/md5sum.txt

Related

Simplest format to archive a directory

I have a script that needs to work on multiple platforms and machines. Some of those machines don't have any available archiving software (e.g. zip, tar). I can't download any software onto these machines.
The script creates a directory containing output files. I need to package all those files into a single file so i can download it easily.
What is the simplest possible archiving format to implement, so I can easily roll my own impl in the script. It doesn't have to support compression.
I could make up something ad-hoc, e.g.
file1 base64EncodedContents
dir1/file1 base64EncodedContents
etc.
However if one already exists then that will save me having to roll my own packing and unpacking, only packing, which would be nice. Bonus points if it's zip compatible, so that I can try zipping it with compression if possible, and them impl my own without compression otherwise, and not have to worry about which it is on the other side.
The tar archive format is extremely simple - simple enough that I was able to implement a tar archiver in powershell in a couple of hours.
It consists of a sequence of file header, file data, file header, file data etc.
The header is pure ascii, so doesn't require any bit manipulation - you can literally append strings. Once you've written the header, you then append the file bytes, and pad it with nil chars till it's a multiple of 512 bytes. You then repeat for the next file.
Wikipedia has more details on the exact format: https://en.wikipedia.org/wiki/Tar_(computing).

Convert Open VMS FDL (File Definition Language) to linux

I am working on a project where we are migrating from Open VMS to Unix/Linux.
There's a functionality called "FDL" in open vms, which i want to achieve in Unix.
What FDL actually does is , it defines a certain set of attributes for a file or a record, like fixing some block size for a particular file, file organization as sequential, variable or relative, specifying record size in a file beforehand, specifying carriage return(escape sequence) for record etc.
How can i set these attributes before a file gets created in unix.
FDL is merely a syntax/descriptive method to set/view OpenVMS file attributes (metadata) which has no equivalent in typical Linux file systems. Those attributes are implemented by the (Files-11 / ODS) file system an acted on by RMS (the OpenVMS Record management Services) for which again there is no equivalent in Linux although there are packages (sector7).
So much more than an FDL question , this is an RMS question.
RMS offers 'record' access where a record is a blob of byte defined in the file which can be read sequentially, by number or by key (indexed file). The attributes mentioned in the question are to do with simple sequential access, but there Linux just offers a byte-stream method. The application is supposed to know how much to read / when to stop reading. Possibly a (record) terminator like (frequently) (linefeed) is used but that's about it (fscanf).
Other than using a 'parallel' meta file, or reserving an initial byte stream in your files there is no standard way to store metadata on how to use the bytestream in the file, and making them hard to use by other applications.
All this to say: No Can Do.
Sorry.

How do I seek for holes and data in a sparse file in golang [duplicate]

I want to copy files from one place to another and the problem is I deal with a lot of sparse files.
Is there any (easy) way of copying sparse files without becoming huge at the destination?
My basic code:
out, err := os.Create(bricks[0] + "/" + fileName)
in, err := os.Open(event.Name)
io.Copy(out, in)
Some background theory
Note that io.Copy() pipes raw bytes – which is sort of understandable once you consider that it pipes data from an io.Reader to an io.Writer which provide Read([]byte) and Write([]byte), correspondingly.
As such, io.Copy() is able to deal with absolutely any source providing
bytes and absolutely any sink consuming them.
On the other hand, the location of the holes in a file is a "side-channel" information which "classic" syscalls such as read(2) hide from their users.
io.Copy() is not able to convey such side-channel information in any way.
IOW, initially, file sparseness was an idea to just have efficient storage of the data behind the user's back.
So, no, there's no way io.Copy() could deal with sparse files in itself.
What to do about it
You'd need to go one level deeper and implement all this using the syscall package and some manual tinkering.
To work with holes, you should use the SEEK_HOLE and SEEK_DATA special values for the lseek(2) syscall which are, while formally non-standard, are supported by all major platforms.
Unfortunately, the support for those "whence" positions is not present
neither in the stock syscall package (as of Go 1.8.1)
nor in the golang.org/x/sys tree.
But fear not, there are two easy steps:
First, the stock syscall.Seek() is actually mapped to lseek(2)
on the relevant platforms.
Next, you'd need to figure out the correct values for SEEK_HOLE and
SEEK_DATA for the platforms you need to support.
Note that they are free to be different between different platforms!
Say, on my Linux system I can do simple
$ grep -E 'SEEK_(HOLE|DATA)' </usr/include/unistd.h
# define SEEK_DATA 3 /* Seek to next data. */
# define SEEK_HOLE 4 /* Seek to next hole. */
…to figure out the values for these symbols.
Now, say, you create a Linux-specific file in your package
containing something like
// +build linux
const (
SEEK_DATA = 3
SEEK_HOLE = 4
)
and then use these values with the syscall.Seek().
The file descriptor to pass to syscall.Seek() and friends
can be obtained from an opened file using the Fd() method
of os.File values.
The pattern to use when reading is to detect regions containing data, and read the data from them – see this for one example.
Note that this deals with reading sparse files; but if you'd want to actually transfer them as sparse – that is, with keeping this property of them, – the situation is more complicated: it appears to be even less portable, so some research and experimentation is due.
On Linux, it appears you could try to use fallocate(2) with
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE to try to punch a hole at the
end of the file you're writing to; if that legitimately fails
(with syscall.EOPNOTSUPP), you just shovel as many zeroed blocks to the destination file as covered by the hole you're reading – in the hope
the OS will do the right thing and will convert them to a hole by itself.
Note that some filesystems do not support holes at all – as a concept.
One example is the filesystems in the FAT family.
What I'm leading you to is that inability of creating a sparse file might
actually be a property of the target filesystem in your case.
You might find Go issue #13548 "archive/tar: add support for writing tar containing sparse files" to be of interest.
One more note: you might also consider checking whether the destination directory to copy a source file resides in the same filesystem as the source file, and if this holds true, use the syscall.Rename() (on POSIX systems)
or os.Rename() to just move the file across different directories w/o
actually copying its data.
You don't need to resort to syscalls.
package main
import "os"
func main() {
f, _ := os.Create("/tmp/sparse.dat")
f.Write([]byte("start"))
f.Seek(1024*1024*10, 0)
f.Write([]byte("end"))
}
Then you'll see:
$ ls -l /tmp/sparse.dat
-rw-rw-r-- 1 soren soren 10485763 Jun 25 14:29 /tmp/sparse.dat
$ du /tmp/sparse.dat
8 /tmp/sparse.dat
It's true you can't use io.Copy as is. Instead you need to implement an alternative to io.Copy which reads a chunk from the src, checks if it's all '\0'. If it is, just dst.Seek(len(chunk), os.SEEK_CUR) to skip past that part in dst. That particular implementation is left as an exercise to the reader :)

How does the bittorrent assemble the missing pieces?

I use BitTorrent and sometimes encounter files that do not have seed(missing pieces).
At that time, we sometimes force the file transfer to end and try to open the incompleted files (for example, an image file).
If we are lucky, may be able to see the downloaded image even if some parts are lost.
I would like to artificially reproduce this situation, and here's how I tried:
1) spliting a bmp image file of about 1 megabyte into 16 kilobytes by the Linux split command,
2) and then make just one of the divided files 0 kilobytes.
3) after that, rejoin all the files with the cat command.
However, in this case, unlike the torrent's "lost pieces" situation, the file becomes completely corrupt and can not be read.
Theoretically it does not seem like anything special, but what's wrong? And how can I achieve what I want?
I would appreciate your help.
Use dd:
dd if=/dev/zero of=image.jpg bs=1 conv=notrunc seek=X count=Y
being X the offset in the file you want to erase and Y the number of bytes.
About the corruption, it depends on the type of file, the piece you are losing and the program you are using to read it.
For instance, JPG files use a variable bit-length encoding, meaning that just losing one bit may corrupt all the file from that point on. But just for that, there can be resyncronization points where the bitstream is reset, so from that point on, the file will look ok. But those resync points are optional when writing the file, and not every reader honor them in case of corruption...
And anyway, losing part of the headers will make the file totally unreadable.

ZIP file format. How to read file properly?

I'm currently working on one Node.js project. I want to have an ability to read, modify and write ZIP file without saving it into FS (we receive it by TCP and send it back after modifications were made), and so far it looks like possible bocause of simple ZIP file structure. Currently I refer to this documentation.
So ZIP file has simple structure:
File header 1
File data 1
File data descriptor 1
File header 2
File data 2
File data descriptor 2
...
[other not important yet]
First we need to read file header, which contains field compressed size, and it could be the perfect way to read file data 1 by it's length. But it's actually not. This field may contain '0' or '0xFFFFFFFF', and those values don't describe its actual length. In that case we have to read file data without information about it's length. But how?..
Compression/Decopression algorithm descriptions looks pretty complex to me, and I plan to use ZLIB for compression itself anyway. So if something useful described there, then I missed the point.
Can someone explain the proper way to read those files?
P.S. Please avoid suggesting npm modules. I do not want to only solve the problem, but also to understand how things work.
Note - I'm assuming you want to read and process the zip file as
it comes off the socket, rather than reading the complete zip file into
memory before processing. Both options are valid.
I'd initially ignore the use cases where the compressed size has a value of '0' or '0xFFFFFFFF'. The former is only present in zip files created in streaming mode, the latter for zip files larger than 4Gig.
Dealing with them adds a lot of complexity - you can add support for them later, if necessary. Whether you ever need to support the 0/0xFFFFFFFF use cases depends on the nature of the zip files you intend to process.
When the compression method is deflated (8), use zlib for compression/decompression. You also need to support compression method stored (0). It gets used for very small files where compression isn't appropriate.

Resources