Decompress LZO indexed files - apache-spark

I need to decompress some lzo indexed files.
First, I've tried to decompress the first lzo file with lzop but in the first line of the fie i have some extra byte.
How I can decompress the file correctly?

Related

Is it possible to get kernel version from ELF image file without disassemble or using grep or strings?

I have a vmlinuz ELF image file. I need to get the kernel version from the image file without disassembling it. Is it possible to get kerenel version from offsets of that compressed image file? The file is ELF 64-bit MSB executable, statically linked, not stripped.
As previously mentioned, the version number is hardcoded into the compressed image file. First it depends on the compression algorithm used to compress the content, how to decompress it.
Decompressing files in linux could be challenging due to the combination of compression algorithms and the correlated tool options (not to forget a newer version of tar for newer algorithms).
For files with
file extension tar.gz, tgz use e.g. $ tar -xzv -f vmlinuz.tgz
file extension tar.xz, use e.g. $ tar -xJv -f vmlinuz.tar.xz
file extension tar.bz2, use e.g. $ tar -xjv -f vmlinuz.tar.bz2
So if you have access to the file utility (should also run on windows), run the following to receive the version string and additional information of your file named e.g. vmlinuz-4.x.y-z-a.
file vmlinuz-4.x.y-z-a
Another possibility to reverse-engineer would be to read all strings of the binary file vmlinuz-4.x.y-z-a and grep for a part of the possible solution.
strings vmlinuz-4.x.y-z-a | grep 'linked,'

decompress and compress to another format on the fly

I have a big gzip file that I need to change it to bzip2.
Obvious way is to 1) decompress the file in memory, 2) write it on disk, 3) read the file again and compress it to bzip2 and write into disk.
Now I'm wondering if it's possible to avoid the middle phase (writing into disk) and do the decompression and compression in memory and write the final result in disk?
You could decompress to stdout and then pipe to bzip2, something like this should work:
gunzip -c file.gz | bzip2 > file.bz2

Why can't we gzip more than 1 file using 7zip

Recently I noticed that if I try to compress more than 1 file using 7zip, gzip format is not present in the Archive Format-List. Can anyone explain why?
Can't we have more than 1 file in gzip?
Screenshot
I'm using 7-Zip v9.20. I also tried with v16. But same there/
Thanks in advance
No. You can't have more than one file in the gzip format.
Instead you would use tar followed by gzip. tar converts a set of directories and files into a stream of bytes, which is then compressed by gzip. You have probably already seen these files, with the suffix .tar.gz.

Problems reading .bz2 or .tar.bz2 files as hdf5 in R

I downloaded some files with extension .tar.bz2. I was able to untar these into folders containing .bz2 files. These should unzip as hdf5 files (Metadata said they were hdf5) , but they unzip into files with no extensions I have tried the following but didnt work:
untar("File.tar.bz2")
#Read lines of one of the files from the unzipped file
readLines(bzfile("File1.bz2"))
[1] "‰HDF" "\032"
library (rhdf5)
#Explore just as a bzip2 file
bzfile("File1.bz2")
description "File1.bz2"
class "bzfile"
mode "rb"
text "text"
opened "closed"
can read "yes"
can write "yes"
#Try to read as hdf5 using rhdf5 library
h5ls(bzfile("File1.bz2"))
Error in h5checktypeOrOpenLoc(). Argument neither of class H5IdComponent nor a character.
Is there some sort of encoding I need to do? What am I missing? What should I do?

Difference in .tar.gz and first gz and then tar

I made two compressed copy of my folder, first by using the command tar czf dir.tar.gz dir
This gives me an archive of size ~16kb. Then I tried another method, first i gunzipped all files inside the dir and then used
gzip ./dir/*
tar cf dir.tar dir/*.gz
but the second method gave me dir.tar of size ~30kb (almost double). Why there is so much difference in size?
Because zip process in general is more efficient on big sample than on small files. You have zipped 100 files of 1ko for example. Each file will have a certain compression, plus the overhead of the gzip format.
file1.tar -> files1.tar.gz (admit 30 bytes of headers/footers)
file2.tar -> files2.tar.gz (admit 30 bytes of headers/footers)
...
file100.tar -> files100.tar.gz (admit 30 bytes of headers/footers)
------------------------------
30*100 = 3ko of overhead.
But if you try to compress a tar file of 100ko (which contains your 100 files), the overhead of the gzip format will be added only one time (instead of 100 times) and the compression can be better)
Overhead from the per-file metadata and suboptimal conpression by gzip when processing files individually resulting from gzip not observing data in full and thus compressing with suboptimal dictionary (which is reset after each file).
tar cf should create an uncompressed archive, it means the size of your directory should almost be the same as your archive, maybe even more.
tar czf will run gunzip compression through it.
This can be further checked by doing a man tar in shell prompt in Linux,
-z, --gzip, --gunzip, --ungzip
filter the archive through gzip

Resources