How do I read multiple ZipEntries through a single InputStream - zip

I have to pass an InputStream as a parameter to a 3rd party library, which will read the complete contents from the InputStream and do its job.
My problem is, some of my files are Zip files - with more than one ZipEntry. From what I understand, one can read one zipEntry at a time and then do a zipInputStream.getNextEntry() and then again read and so on. But, the 3rd party library doesn't understand this and expects a single InputStream. All the zipEntries of a zip file should be available as a single inputStream.
Please enlighten me as to how to do it. I cannot use ZipFile as the file is not stored locally (in a different server). I also cannot read all zipEntries and construct a ByteArrayOutputStream or a string as the files can be very big and that will spike memory usage.
I want a way to let one inputStream read from multiple zip entries of a single zip file transparantly.
thanks in advance,
Prasanna

I'm no expert but I see two ways of doing what you are asking for:
While I wouldn't call it the most efficient way, the fairly foolproof method is to write them all to a buffer and then use that buffer to feed an inputstream. You could look into using a ByteArrayOutputStream, feed it your ZipEntries, and then create a ByteArrayInputStream with the same buf
Alternatively the SequenceInputStream sounds like it does what you are after, but I've not had a chance to look into it in much detail.
K.Barad

Related

Node.js read and write stream to the same file at the same time

TL;DR
I'm browsing through a number of solutions on npm and github looking for something that would allow me to read and write to the same file in two different places at the same time. So far I'm having trouble actually finding anything like this. Is there a module of some sort that will allow that?
Background
In essence my requirement is that in a large file I need to, in the following order:
read
transform
write
Ideally the usage would be something like:
const fd = fs.open(file, "r+");
const read = createReadStreamSomehowFrom(fd);
const write = createWriteStreamSomehowFrom(fd);
read
.pipe(new Transform(transform() {...}))
.pipe(write);
I could do that with standard fs.create[Read/Write]Stream but there's no way to control the flow of both streams and if my write position goes beyond read position then I'm reading something I just wrote...
The use case is the same as perl -p -i -e, read and write to the same file (meaning the same inode) asynchronously and replace the contents without loading everything into memory.
I would expect this a real world use case, yet all implementations I found actually load the whole file into memory and then save it. Am I missing a known module here or is there a need to actually write something like this?
Hmm... a tough one it seems. :)
So here's for the record - I found no such module and actually discussed this with some people responsible for a nice in-file replacing module. Seeing no way to solve this I decided to write it from scratch and here it is:
signicode/rw-stream repo on github
rw-stream at npm
The module works on a simple principle that no byte can be written until it has been consumed in the readable stream and it's fairly simple underneath (couple fs.read/write ops with keeping eye on the point of read and write).
If you find this useful then I'm happy. :)

ZIP file format. How to read file properly?

I'm currently working on one Node.js project. I want to have an ability to read, modify and write ZIP file without saving it into FS (we receive it by TCP and send it back after modifications were made), and so far it looks like possible bocause of simple ZIP file structure. Currently I refer to this documentation.
So ZIP file has simple structure:
File header 1
File data 1
File data descriptor 1
File header 2
File data 2
File data descriptor 2
...
[other not important yet]
First we need to read file header, which contains field compressed size, and it could be the perfect way to read file data 1 by it's length. But it's actually not. This field may contain '0' or '0xFFFFFFFF', and those values don't describe its actual length. In that case we have to read file data without information about it's length. But how?..
Compression/Decopression algorithm descriptions looks pretty complex to me, and I plan to use ZLIB for compression itself anyway. So if something useful described there, then I missed the point.
Can someone explain the proper way to read those files?
P.S. Please avoid suggesting npm modules. I do not want to only solve the problem, but also to understand how things work.
Note - I'm assuming you want to read and process the zip file as
it comes off the socket, rather than reading the complete zip file into
memory before processing. Both options are valid.
I'd initially ignore the use cases where the compressed size has a value of '0' or '0xFFFFFFFF'. The former is only present in zip files created in streaming mode, the latter for zip files larger than 4Gig.
Dealing with them adds a lot of complexity - you can add support for them later, if necessary. Whether you ever need to support the 0/0xFFFFFFFF use cases depends on the nature of the zip files you intend to process.
When the compression method is deflated (8), use zlib for compression/decompression. You also need to support compression method stored (0). It gets used for very small files where compression isn't appropriate.

How to synchronously read from a ReadStream in node

I am trying to read UTF-8 text from a file in a memory and time efficient way. There are two ways to read directly from a file synchronously:
fs.readFileSync will read the entire file and return a buffer containing the file's entire contents
fs.readSync will read a set amount of bytes from a file and return a buffer containing just those contents
I initially just used fs.readFileSync because it's easiest, but I'd like to be able to efficiently handle potentially large files by only reading in chunks of text at a time. So I started using fs.readSync instead. But then I realized that fs.readSync doesn't handle UTF-8 decoding. UTF-8 is simple, so I could whip up some logic to manually decode it, but Node already has services for that, so I'd like to avoid that if possible.
I noticed fs.createReadStream, which returns a ReadStream that can be used for exactly this purpose, but unfortunately it seems to only be available in an asynchronous mode of operation.
Is there a way to read from a ReadStream in a synchronous way? I have a massive stack built on top of this already, and I'd rather not have to refactor it to be asynchronous.
I discovered the string_decoder module, which handles all that UTF-8 decoding logic I was worried I'd have to write. At this point, it seems like a no-brainer to use this on top of fs.readSync to get the synchronous behavior I was looking for.
You basically just keep feeding bytes to it, and as it is able to successfully decode characters, it will emit them. The Node documentation is sufficient at describing how it works.

Using sed on a compressed file

I have written a file processing program and now it needs to read from a zipped file(.gz unzipped file may get as large as 2TB),
Is there a sed equivalent for zipped files like (zcat/cat) or else what would be the best approach to do the following efficiently
ONE=`zcat filename.gz| sed -n $counts`
$counts : counter to read(line by line)
The above method works, but is quite slow for large file as I need to read each line and perform the matching on certain fields.
Thanks
EDIT
Though not directly helpful, here are a set of zcommands
http://www.cyberciti.biz/tips/decompress-and-expand-text-files.html
Well you either can have more speed (i.e. use uncompressed files) or more free space (i.e. use compressed files and the pipe you showed)... sorry. Using compressed files will always have an overhead.
If you understand the internal structure of the compression format it is possible that you could write a pattern matcher that can operate on compressed data without fully decompressing it, but instead by simply determining from the compressed data if the pattern would be present in a given piece of decompressed data.
If the pattern has any complexity at all this sounds like quite a complicated project as you'd have to handle cases where the pattern could be satisfied by the combination of output from two (or more) separate pieces of decompression.

File random access in J2ME

does J2ME have something similar to RandomAccessFile class, or is there any way to emulate this particular (random access) functionality?
The problem is this: I have a rather large binary data file (~600 KB) and would like to create a mobile application for using that data. Format of that data is home-made and contains many index blocks and data blocks. Reading the data on other platforms (like PHP or C) usually goes like this:
Read 2 bytes for index key (K), another 2 for index value (V) for the data type needed
Skip V bytes from the start of the file to seek to a file position there the data for index key K starts
Read the data
Profit :)
This happens many times during the program flow.
Um, and I'm investigating possibility of doing the very same on J2ME, and while I admit I'm quite new to the whole Java thing, I can't seem to be able to find anything beyond InputStream (DataInputStream) classes which don't have the basic seeking/skipping to byte/returning position functions I need.
So, what are my chances?
You should have something like this
try {
DataInputStream di = new DataInputStream(is);
di.marke(9999);
short key = di.readShort();
short val = di.readShort();
di.reset();
di.skip(val);
byte[] b= new byte[255];
di.read(b);
}catch(Exception ex ) {
ex.printStackTrace();
}
I prefer not to use the marke/reset methods, I think it is better to save the offset from the val location not from the start of the file so you can skip these methods. I think they have som issues on some devices.
One more note, I don't recommend to open a 600 KB file, it will crash the application on many low end devices, you should split this file to multiple files.

Resources