How to compress a "tail -f" to a compressed (gziped) file? - linux

I have tried the following, but the resulting file stays at size 0.
tail -f /logs/localhost.log | gzip -c -9 -f > compressed.gz
localhost.log is very active.
Thank you.

logrotate(8) was designed to solve this sort of problem - it rotates and compresses log files.

You're just not patient enough. That will work, and it will write to the gzip file. It will take a while to accumulate enough input to write the first compressed block. Especially if the input is highly compressible, e.g. the log entries are very similar to each other.
This has a problem though, in that the gzip file will never be properly terminated, since gzip will never get an end-of-file.

You can't do this, because gzip utility doesn't read input line by line, it expects EOF.
But you can write your on wrapper using any programming language which has zlib implementation.

Related

Splitting large tar file into multiple tar files

I have a tar file which is 3.1 TB(TeraByte)
File name - Testfile.tar
I would like to split this tar file into 2 parts - Testfil1.tar and Testfile2.tar
I tried the following so far
split -b 1T Testfile.tar "Testfile.tar"
What i get is Testfile.taraa(what is "aa")
And i just stopped my command. I also noticed that the output Testfile.taraa doesn't seem to be a tar file when I do ls in the directory. It seems like it is a text file. May be once the full split is completed it will look like a tar file?
The behavior from split is correct, from man page online: http://man7.org/linux/man-pages/man1/split.1.html
Output pieces of FILE to PREFIXaa, PREFIXab, ...
Don't stop the command let it run and then you can use cat to concatenate (join) them all back again.
Examples can be seen here: https://unix.stackexchange.com/questions/24630/whats-the-best-way-to-join-files-again-after-splitting-them
split -b 100m myImage.iso
# later
cat x* > myImage.iso
UPDATE
Just as clarification since I believe you have not understood the approach. You split a big file like this to transport it for example, files are not usable this way. To use it again you need to concatenate (join) pieces back. If you want usable parts, then you need to decompress the file, split it in parts and compress them. With split you basically split the binary file. I don't think you can use those parts.
You are doing the compression first and the partition later.
If you want each part to be a tar file, you should use 'split' first with de original file, and then 'tar' with each part.

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

Regarding lz4mt compression and linux buffering issue

I am using lz4mt multi-threaded version of lz4 and in my workflow I am sending thousands of large size files (620 MB) from client to server and when file reaches on server my rule will trigger and compress file using lz4mt and then remove uncompressed file. The problem is sometimes when I remove uncompressed file, I am not able to get compressed file of right size its because lz4mt returns immediately before sending output to disk.
So is there any way lz4mt will remove uncompressed file itself after compressing as done by bzip2.
Input: bzip2 uncompress_file
Output: Compressed file only
whereas
Input: lz4mt uncompress_file
Output: (Uncompressed + Compressed) file
Below script sync command also not working properly I think.
The script which execute as my rule triggers is:
script.sh
/bin/lz4mt uncompressed_file output_file
/bin/sync
/bin/rm uncompressed_file
Please tell me how to solve above issue.
Thanks a lot
Author here. You could try the following methods
Concatenate commands with && or ;.
Add lz4mt command line option -q (suppress prompt), and -f (force overwrite).
Try it with original lz4.

Possible to stream a compressed data stream using Perl

I am trying to send a compressed tarball from a perl cgi script. Everything is working fine, except for the fact that the tarball is only being sent after it has been compressed and created. In other words it is not 'streaming' the data live which is quite problematic because the data is quite large.
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=download.tar.\n\n";
print `tar zc $path/$file`
I have also tried doing tar zcf - $path/$file which writes to stdout and it does the same thing.
As Geo pointed, you are waiting for tar to finish. Reading from pipe should also output data in parallel with its creation:
open my $pipe_fh, '-|', "tar zc $path/$file" or die;
while(<$pipe_fh>) {
print;
}
Well, in the case of backticks, you are practically waiting for the process to finish, and then you are sending it's output. I would suggest something from the IPC::Open family. IPC::Open3 could do the trick.

Resources