Concatenating GZip/Deflate data on node.js request

Concatenating GZip/Deflate data on node.js request - node.js

Hi I understand that Concat is possible with Gzip function on OS File system,
i.e.
gzip -c a.txt > a.gzip
gzip -c b.txt > b.gzip
now below is also correct,
cat a.txt b.txt | gzip -c > ab.gzip # is same as
cat a.gzip b.gzip > ab.gzip
At file system this seems correct to me, but when I try to implement the same concept with node.js to concat, header (pre-gzipped content), main-content (pre-gzip), side-bar and other widgets which are pre-gzip binary data files on filesystem than it doesn't seem working for me, I can only see text content of first chunk (header) and other appended content displayed as random binary symbols.
First want to understand is it possible and if yes then how can I implement fragmented caching.
I just want to see if it is possible with compressed fragmented caching, otherwise plan B is to use plain fragmented caching and gzip content runtime.
var rs1 = fs.createReadStream('./node_fs/index/index.txt.gz');
var rs2 = fs.createReadStream('./node_fs/index/content.txt.gz');
res.write(rs1);
res.write(rs2);
Additionally, both files are compressed using gzip.exe command line and if I write only one of them than it works fine, but append doesn't work.

Your original gzip example "works" because the gunzip tool is written to handle multiple entries in a single file. It doesn't work with some browsers because they expect a single gzip entry.
See: Concatenate multiple zlib compressed data streams into a single stream efficiently

Related

What command to search for ID in .bz2 file?

I am new to Linux and I'm trying to look for an ID number within a .bz2 file. Seems like a fairly straight forward requirement, however I cannot find the correct command anywhere online. I believe I need to use bzgrep.
I want to look for '123456' in the file Bulk9876.bz2
How would I construct this command?

You probably just need to tell grep that it's okay to parse that data as text:
bzgrep -a 123456 Bulk9876.bz2
If you're trying to view the compressed data (rather than decompressing it and searching the decompressed data), just use grep -a ….
Otherwise, it might make sense to verify that the desired string is even present in the file; bunzip2 it and grep -a the decompressed file. If that works, the problem is in your bzgrep instance (which is odd because it should be using the same decompression library as bunzip2).

Why gzip is not consistent?

Why these lines won't give me identical results?
>>> gzip.compress('same'.encode('ascii'))
b'\x1f\x8b\x08\x00\xe2\x0e0V\x02\xff+N\xccM\x05\x00D\xf1P\xfc\x04\x00\x00\x00'
>>> gzip.compress('same'.encode('ascii'))
b'\x1f\x8b\x08\x00\xe3\x0e0V\x02\xff+N\xccM\x05\x00D\xf1P\xfc\x04\x00\x00\x00'
This is quite annoying for unit testing.

The gzip header contains a modification timestamp.
See here
For unit testing, you might be able to get away with skipping the header and comparing the rest.
Something like this:
a = gzip.compress('same'.encode('ascii'))
b = gzip.compress('same'.encode('ascii'))
a[5:] == b[5:]
Not sure about the value 5 in that but that seems to be the header size it is using.

As noted, the gzip header contains a timestamp. If you pass the -n or --no-name option (to the command-line zip program) these are omitted.

How to call a large list of paired files to be executed by a program in BASH?

I have a large directory of files (100+) that I'd like to pass through a program via the terminal.
The files are paired and all follow a naming scheme like such:
TS-8_S53_L001_R1_001.fastq
TS-8_S53_L001_R2_001.fastq
RS-9_S54_L001_R1_001.fastq
RS-9_S54_L001_R2_001.fastq
And the program execution looks like:
Seqprogram -i1 Blah_R1_001.fastq -i2 Blah_R2_001.fastq -o Blah_paired.fastq
All of these files are in one directory.
I'd like to able to run the program on all of the files, using the files paired together in the proper sequence (R1 files are passed through i1, the R1 and R2 files have the same base name) and the output file (-o) is saved under the base name with some identifier attached ("_paired", etc).
I've envisioned on how I'd do this over Python; however, I am trying to get better with BASH.
I'm familiar with how one might call multiple files into a single command; i.e., uncompressing all .gz files in a particular directory
gunzip "*.gz"
But this command has two inputs, and the inputs must be ordered, so the wildcard scheme isn't sufficient.
Thanks

Use a wildcard to get one file of the pair, and then use parameter substitution to get the other corresponding filenames.
for i1 in *_R1_001.fastq; do
i2=${i1/R1_001/R2_001}
paired=${i1/R1_001/paired}
Seqprogram -i1 "$i1" -i2 "$i2" -o "$paired"
done

The easiest way to do this is to match a single one of the three filenames patterned, and to modify it to get the other two.
That is to say:
for r1file in *_R1_*.fastq; do
r2file=${r1file/_R1_/_R2_}
pairfile=${r1file%_R1_*}_paired.fastq
Seqprogram -i1 "$r1file" -i2 "$r2file" -o "$pairfile"
done

Bash to get timestamp from file list and compare it to filename

Implementing a GIT repository for a project we are including the DB structure by generating a dump on the post-commit hook on deployment.
What I would like to have is a simple versioning system for the file based on the timestamp of the last change to the tables structure.
After finding this post with the suggestion to check for the dates of the the *.frm files in the MySQL data dir I thought the solution would be to implement it based on that last date as part of the generated file. This is:
Find out the latest date-time of the files of the DB (i.e. /var/lib/mysql/databaseX/) via an ls command (of type ls -la *.frm)
compare that value (last changed file) with the one of a certain file (ie /project/dump_2012102620001.sql) where the numbers correspond to the last generated dump.
If files timestamp is after that date generate the mysqldump command, otherwise ignore so the dump does not get generated and
committed as a change to GIT
Unfortunately my Linux console/bash concepts are too far from being capable and have not found any similar script to use.

You can use [[ file1 -ot file2 ]] to test whether file1 is older than file2.
last=$(ls -tr /path/to/db/files/*.frm | tail -n1)
if [[ dump -ot $last ]] ; then
create_new_dump
fi

You can save yourself a lot of grief by just dumping the table structure every time with the appropriate mysqldump command as this is relatively lightweight since it won't include table contents. Strip out the variable timestamp information at the top and compare with the previous file. Store if different.

How to examine .torrent file?

I tried to examine the content of a .torrent file using a
$ od -c xyz.torrent
Some of the content of the file is in plain text like the information regarding the trackers, creation date, the encoding used,the length and the number of pieces but the rest is encoded. Can somebody please tell me how i can examine the torrent file so that i can decode everything.

.torrent files are bencoded dictionaries
More information

Use lstor from the pyroscope package: https://code.google.com/p/pyroscope/wiki/CommandLineTools#lstor
It pretty-prints the contents of .torrent files.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Concatenating GZip/Deflate data on node.js request - node.js

Your original gzip example "works" because the gunzip tool is written to handle multiple entries in a single file. It doesn't work with some browsers because they expect a single gzip entry. See: Concatenate multiple zlib compressed data streams into a single stream efficiently

Related

What command to search for ID in .bz2 file?

Why gzip is not consistent?

How to call a large list of paired files to be executed by a program in BASH?

Bash to get timestamp from file list and compare it to filename

How to examine .torrent file?

Categories

Resources