Regarding lz4mt compression and linux buffering issue

Regarding lz4mt compression and linux buffering issue - linux

I am using lz4mt multi-threaded version of lz4 and in my workflow I am sending thousands of large size files (620 MB) from client to server and when file reaches on server my rule will trigger and compress file using lz4mt and then remove uncompressed file. The problem is sometimes when I remove uncompressed file, I am not able to get compressed file of right size its because lz4mt returns immediately before sending output to disk.
So is there any way lz4mt will remove uncompressed file itself after compressing as done by bzip2.
Input: bzip2 uncompress_file
Output: Compressed file only
whereas
Input: lz4mt uncompress_file
Output: (Uncompressed + Compressed) file
Below script sync command also not working properly I think.
The script which execute as my rule triggers is:
script.sh
/bin/lz4mt uncompressed_file output_file
/bin/sync
/bin/rm uncompressed_file
Please tell me how to solve above issue.
Thanks a lot

Author here. You could try the following methods
Concatenate commands with && or ;.
Add lz4mt command line option -q (suppress prompt), and -f (force overwrite).
Try it with original lz4.

Related

gzip: unexpected end of file when using gzip

I have to process a file using my Linux machine.
When I try to write my output to a csv file then gzip it in the same line of script:
processing > output.csv | gzip -f output.csv
I get an 'unexpected end of file' error. Even when I download the file using the Linux machine I get the same error.
When I do not gzip via terminal (or in a single line) everything works fine.
Why does it fail like this when the commands are all in a single line?

You should remove > output.csv
You can either:
Use a pipe: | or:
Redirect to a file
For the same stream (stdout)
You can redirect errors from stderr to a file with 2>errors.txt or they will display on screen

When you redirect a process' IO with the > operator, its output cannot be used by a pipe afterwards (because there's no "output" anymore to be piped). You have two options:
processing > output.csv &&
gzip output.csv
Writes the unprocessed output of your program to the file output.csv and then in a second task gzips this file, replacing it with output.gz. Depending on the amount of data, this might not be feasible (storage reqirements are the full uncompressed output PLUS the compressed size)
processing | gzip > output.csv.gz
This will compress the output of your process in-line and write it directly to the output file, without storing the uncompressed output in an intermediate file.

Splitting large tar file into multiple tar files

I have a tar file which is 3.1 TB(TeraByte)
File name - Testfile.tar
I would like to split this tar file into 2 parts - Testfil1.tar and Testfile2.tar
I tried the following so far
split -b 1T Testfile.tar "Testfile.tar"
What i get is Testfile.taraa(what is "aa")
And i just stopped my command. I also noticed that the output Testfile.taraa doesn't seem to be a tar file when I do ls in the directory. It seems like it is a text file. May be once the full split is completed it will look like a tar file?

The behavior from split is correct, from man page online: http://man7.org/linux/man-pages/man1/split.1.html
Output pieces of FILE to PREFIXaa, PREFIXab, ...
Don't stop the command let it run and then you can use cat to concatenate (join) them all back again.
Examples can be seen here: https://unix.stackexchange.com/questions/24630/whats-the-best-way-to-join-files-again-after-splitting-them
split -b 100m myImage.iso
# later
cat x* > myImage.iso
UPDATE
Just as clarification since I believe you have not understood the approach. You split a big file like this to transport it for example, files are not usable this way. To use it again you need to concatenate (join) pieces back. If you want usable parts, then you need to decompress the file, split it in parts and compress them. With split you basically split the binary file. I don't think you can use those parts.

You are doing the compression first and the partition later.
If you want each part to be a tar file, you should use 'split' first with de original file, and then 'tar' with each part.

Unzip the archive with more than one entry

I'm trying to decompress ~8GB .zip file piped from curl command. Everything I have tried is being interrupted at <1GB and returns a message:
... has more than one entry--rest ignored
I've tried: funzip, gunzip, gzip -d, zcat, ... also with different arguments - all end up in the above message.
The datafile is public, so it's easy to repro the issue:
curl -L https://archive.org/download/nycTaxiTripData2013/faredata2013.zip | funzip > datafile

Are you sure the mentioned file deflates to a single file? If it extracts to multiple files you unfortunately cannot unzip on the fly.
Zip is a container as well as compression format and it doesn't know where the new file begins. You'll have to download the whole file and unzip it.

midi to ogg - pipeline distortion

I am trying to convert midi files to ogg or mp3. Eventually this will happen on a linux webserver but currently I am using a Windows 7 machine. I am using timidity to convert the midi to wav and then either sox or ffmpeg to convert the wav to ogg/mp3.
When I use an intermediate file the process works fine (in the first line below timidity creates file.wav)
timidity.exe file.mid -Ow
sox.exe file.wav file.ogg
However, when I try to pipe the timidity output into sox (as below), the resulting file ogg is horribly distorted
timidity.exe file.mid -Ow -o - | sox.exe -t wav - file.ogg
and I get a warning
sox.exe WARN wav: Premature EOF on .wav input file
I also get the same distortion problem when I replace sox with ffmpeg (and the appropriate command line options), or when I replace ogg with mp3 as the output format.
So what am I doing wrong?
Thanks,
Chris

Regarding the warning itself, you're doing nothing wrong. You may also see a warning from timidity that reads something like
Warning: -: Illegal seek: Can't make valid header
What's happening there is explained in the timidity manual page:
If output is directed to a non-seekable file, or if TiMidity++ is interrupted before closing the file, the file header will contain 0xffffffff in the RIFF and data block length fields.
Note that RIFF is the encoding format commonly called by its file extension, .wav. When timidity writes a RIFF file, it doesn't know how long the file will be, so it writes some placeholder junk in the header and moves on to writing the data. When it finishes with the data, it knows how long the file is, so it goes back to the beginning of the file and writes over that junk in the header. When you write to a pipe, it has no way to go back and rewrite anything: the downstream program has to handle the placeholder junk. Also from the timidity manual page:
The popular sound conversion utility sox is able to read such malformed files, so you can pipe data directly to sox for on-the-fly conversion to other formats.
Thus, the message you mentioned. Sox is informing you that the chef prepared the file wrong BUT SOX IS HAPPY TO EAT IT ANYWAY BECAUSE SOX IS NOT PICKY. Sox is apparently passive-aggressive. Who knew?
You can ignore those warning messages, because now they are telling you something you already know. Or, you can use a raw format and explicitly tell timidity and sox how to play well with one another:
timidity file.midi -Or1Ssl -s44.1 -o- | sox -t raw -b 16 -e signed -r 44.1k -c 2 - file.ogg
As for the distortion, that may be caused in part by quirks in the audio libraries on the Windows system. I note that the pipeline in the question, sans .exe extensions, produces output with no notable distortion on a linux system. Using a well-defined raw format in the pipeline may also help with that issue.
Note that for Ogg output, you can now get that directly from timidity:
timidity file.midi -o file.ogg -Ov

How to compress a "tail -f" to a compressed (gziped) file?

I have tried the following, but the resulting file stays at size 0.
tail -f /logs/localhost.log | gzip -c -9 -f > compressed.gz
localhost.log is very active.
Thank you.

logrotate(8) was designed to solve this sort of problem - it rotates and compresses log files.

You're just not patient enough. That will work, and it will write to the gzip file. It will take a while to accumulate enough input to write the first compressed block. Especially if the input is highly compressible, e.g. the log entries are very similar to each other.
This has a problem though, in that the gzip file will never be properly terminated, since gzip will never get an end-of-file.

You can't do this, because gzip utility doesn't read input line by line, it expects EOF.
But you can write your on wrapper using any programming language which has zlib implementation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string