Possible to stream a compressed data stream using Perl

Possible to stream a compressed data stream using Perl - linux

I am trying to send a compressed tarball from a perl cgi script. Everything is working fine, except for the fact that the tarball is only being sent after it has been compressed and created. In other words it is not 'streaming' the data live which is quite problematic because the data is quite large.
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=download.tar.\n\n";
print `tar zc $path/$file`
I have also tried doing tar zcf - $path/$file which writes to stdout and it does the same thing.

As Geo pointed, you are waiting for tar to finish. Reading from pipe should also output data in parallel with its creation:
open my $pipe_fh, '-|', "tar zc $path/$file" or die;
while(<$pipe_fh>) {
print;
}

Well, in the case of backticks, you are practically waiting for the process to finish, and then you are sending it's output. I would suggest something from the IPC::Open family. IPC::Open3 could do the trick.

Related

Node.js delete first N bytes from a file

How to delete (remove | trim) N bytes from the beginning of a binary file without loading it in the memory?
We have fs.ftruncate(fd, len, callback), which cuts out bytes from the end of the file (if it is bigger).
How to cut bytes from the beginning, or trim from the beginning in Node.js without reading a file in the memory?
I need something like truncateFromBeggining(fd, len, callback) or removeBytes(fd, 0, N, callback).
If it is not possible, what is the fastest way to do it with file streams?
On most filesystems you can't "cut" a part out from the beginning or from the middle of a file, you can only truncate it at the end.
Having the above in mind I imagine, we have to probably open the input file stream, to seek to after the Nth byte, and to pipe the rest of the bytes to an output file stream.

You're asking for an OS file system operation: the ability to remove some bytes from the beginning of a file in place, without rewriting the file.
You're asking for a file system operation that does not exist, at least in Linux / FreeBSD / MacOS / Windows.
If your program is the only user of the file and it fits in RAM, your best bet is to read the whole thing into RAM, then reopen the file for writing, then write out the part you want to keep.
Or you can create a new file. Let's say your input file is called q. Then you'd create a file called, maybe new_q with a stream attached. You'd pipe the contents you wanted to the new file. Then you'd unlink (delete) the input file q and rename the output file new_q to q.
Careful: this unlink / rename operation will create a short time when no file named q is available. So if some other program tries to open it and doesn't find it, it should try again a few times.
If you're creating a queueing scheme, you might consider using some other scheme to hold your queue data. This file read / rewrite / unlink / rename sequence has lots of ways it can go wrong on you under heavy load. (Ask me how I know that when you have a couple of hours to spare ;-) redis is worth a look.

I decided to solve the problem in bash.
The script truncates the files in a temp folder first, then moves them back to the original folder.
The truncate is done with tail:
tail --bytes="$max_size" "$from_file" > "$to_file"
The full script:
#!/bin/bash
declare -r store="/my/data/store"
declare -r temp="/my/data/temp"
declare -r max_size=$(( 200000 * 24 ))
or_exit() {
local exit_status=$?
local message=$*
if [ $exit_status -gt 0 ]
then
echo "$(date '+%F %T') [$(basename "$0" .sh)] [ERROR] $message" >&2
exit $exit_status
fi
}
# Checks if there are any files in 'temp'. It should be empty.
! ls "$temp/"* &> '/dev/null'
or_exit 'Temp folder is not empty'
# Loops over all the files in 'store'
for file_path in "$store/"*
do
# Trim bigger then 'max_size' files from 'store' to 'temp'
if [ "$( stat --format=%s "$file_path" )" -gt "$max_size" ]
then
# Truncates the file to the temp folder
tail --bytes="$max_size" "$file_path" > "$temp/$(basename "$file_path")"
or_exit "Cannot tail: $file_path"
fi
done
unset -v file_path
# If there are files in 'temp', move all of them back to 'store'
if ls "$temp/"* &> '/dev/null'
then
# Moves all the truncated files back to the store
mv "$temp/"* "$store/"
or_exit 'Cannot move files from temp to store'
fi

gzip: unexpected end of file when using gzip

I have to process a file using my Linux machine.
When I try to write my output to a csv file then gzip it in the same line of script:
processing > output.csv | gzip -f output.csv
I get an 'unexpected end of file' error. Even when I download the file using the Linux machine I get the same error.
When I do not gzip via terminal (or in a single line) everything works fine.
Why does it fail like this when the commands are all in a single line?

You should remove > output.csv
You can either:
Use a pipe: | or:
Redirect to a file
For the same stream (stdout)
You can redirect errors from stderr to a file with 2>errors.txt or they will display on screen

When you redirect a process' IO with the > operator, its output cannot be used by a pipe afterwards (because there's no "output" anymore to be piped). You have two options:
processing > output.csv &&
gzip output.csv
Writes the unprocessed output of your program to the file output.csv and then in a second task gzips this file, replacing it with output.gz. Depending on the amount of data, this might not be feasible (storage reqirements are the full uncompressed output PLUS the compressed size)
processing | gzip > output.csv.gz
This will compress the output of your process in-line and write it directly to the output file, without storing the uncompressed output in an intermediate file.

Splitting large tar file into multiple tar files

I have a tar file which is 3.1 TB(TeraByte)
File name - Testfile.tar
I would like to split this tar file into 2 parts - Testfil1.tar and Testfile2.tar
I tried the following so far
split -b 1T Testfile.tar "Testfile.tar"
What i get is Testfile.taraa(what is "aa")
And i just stopped my command. I also noticed that the output Testfile.taraa doesn't seem to be a tar file when I do ls in the directory. It seems like it is a text file. May be once the full split is completed it will look like a tar file?

The behavior from split is correct, from man page online: http://man7.org/linux/man-pages/man1/split.1.html
Output pieces of FILE to PREFIXaa, PREFIXab, ...
Don't stop the command let it run and then you can use cat to concatenate (join) them all back again.
Examples can be seen here: https://unix.stackexchange.com/questions/24630/whats-the-best-way-to-join-files-again-after-splitting-them
split -b 100m myImage.iso
# later
cat x* > myImage.iso
UPDATE
Just as clarification since I believe you have not understood the approach. You split a big file like this to transport it for example, files are not usable this way. To use it again you need to concatenate (join) pieces back. If you want usable parts, then you need to decompress the file, split it in parts and compress them. With split you basically split the binary file. I don't think you can use those parts.

You are doing the compression first and the partition later.
If you want each part to be a tar file, you should use 'split' first with de original file, and then 'tar' with each part.

Regarding lz4mt compression and linux buffering issue

I am using lz4mt multi-threaded version of lz4 and in my workflow I am sending thousands of large size files (620 MB) from client to server and when file reaches on server my rule will trigger and compress file using lz4mt and then remove uncompressed file. The problem is sometimes when I remove uncompressed file, I am not able to get compressed file of right size its because lz4mt returns immediately before sending output to disk.
So is there any way lz4mt will remove uncompressed file itself after compressing as done by bzip2.
Input: bzip2 uncompress_file
Output: Compressed file only
whereas
Input: lz4mt uncompress_file
Output: (Uncompressed + Compressed) file
Below script sync command also not working properly I think.
The script which execute as my rule triggers is:
script.sh
/bin/lz4mt uncompressed_file output_file
/bin/sync
/bin/rm uncompressed_file
Please tell me how to solve above issue.
Thanks a lot

Author here. You could try the following methods
Concatenate commands with && or ;.
Add lz4mt command line option -q (suppress prompt), and -f (force overwrite).
Try it with original lz4.

How to compress a "tail -f" to a compressed (gziped) file?

I have tried the following, but the resulting file stays at size 0.
tail -f /logs/localhost.log | gzip -c -9 -f > compressed.gz
localhost.log is very active.
Thank you.

logrotate(8) was designed to solve this sort of problem - it rotates and compresses log files.

You're just not patient enough. That will work, and it will write to the gzip file. It will take a while to accumulate enough input to write the first compressed block. Especially if the input is highly compressible, e.g. the log entries are very similar to each other.
This has a problem though, in that the gzip file will never be properly terminated, since gzip will never get an end-of-file.

You can't do this, because gzip utility doesn't read input line by line, it expects EOF.
But you can write your on wrapper using any programming language which has zlib implementation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Possible to stream a compressed data stream using Perl - linux

As Geo pointed, you are waiting for tar to finish. Reading from pipe should also output data in parallel with its creation: open my $pipe_fh, '-|', "tar zc $path/$file" or die; while(<$pipe_fh>) { print; }

Well, in the case of backticks, you are practically waiting for the process to finish, and then you are sending it's output. I would suggest something from the IPC::Open family. IPC::Open3 could do the trick.

Related

Node.js delete first N bytes from a file

gzip: unexpected end of file when using gzip

Splitting large tar file into multiple tar files

Regarding lz4mt compression and linux buffering issue

How to compress a "tail -f" to a compressed (gziped) file?

Categories

Resources