Extract 3 smallest files from Tar archive in descending order by size [duplicate] - linux

This question already has an answer here:
Extract top 10 biggest files from tar archive Linux
(1 answer)
Closed 3 years ago.
How can I extract from a Tar file in Linux the 3 smallest files in descending order using command line?

You can list the file details, sort by size, pick the top3 files, build the tar x command, and execute to extract the 3 files:
tar tvf foo.tar
|awk '$0=$3"\x99"$NF'
|sort -n
|awk -F'\x99' 'NR<4{s=s" "$2}END{print "tar xvf foo.tar "s}'
|sh
Note:
The above one-liner assumes that all filenames in the tarball don't contain spaces or other special characters
The tarball name foo.tar is hardcoded., you should replace it by your real tarball
you can test the cmd without the last pipe: |sh it will only output the generated tar -x command, if it is fine, you can pipe to |sh to do real extraction.

Related

shell script for copying log files into a single compressed file

We have a folder in our embedded board "statuslogs", this folder contains logs which are of the format : daily_status_date_time.log.
We need to get all the files of a particular year into a single file, for fetching from the server.
We did the following in our script
gzip -c statuslogs/daily_status_2017*.log > status_2017.gz
gzip -c statuslogs/daily_status_2018*.log > status_2018.gz
gzip -c statuslogs/daily_status_2019*.log > status_2019.gz
gzip -c statuslogs/daily_status_2020*.log > status_2020.gz
gzip -c statuslogs/daily_status_2021*.log > status_2021.gz
The problem with this logic is that it will still create status_*.gz file for the years 2019,2020,2021.
I tried writing the following logic
if [ - f statuslogs/daily_status_2017*.log ] but it fails due to regex may be. And I am not using bash, the interpreter is ash.
Can you please help me to optimize the script
Thanks for your time
You have a syntax error. It's -f, not - f. Example:
if [ -f statuslogs/daily_status_2017*.log ]; then
gzip -c statuslogs/daily_status_2017*.log > status_2017.gz
fi
However, with this you will probably run into a "too many arguments" error, which will happen if you have more than one matching file. So this would work better:
if find statuslogs/daily_status_2017*.log -mindepth 0 -maxdepth 0|head -n1; then
gzip -c statuslogs/daily_status_2017*.log > status_2017.gz
fi
It would be better to instead stop the loop when you reach the current year. For example,
for year in $(seq 2017 $(date +%Y)); do
Gzip only works on single files. If you want the separate files you need to do one of the following:
Combine the files using tar:
tar cf status_2017.tar.gz statuslogs/daily_status_2017*.log
OR use zip which supports multiple files directly
zip status_2017.zip statuslogs/daily_status_2017*.log
Now, if the problem is just that you want one archive for every year, but only for the years for which files exist, you can handle all the years using a for loop:
for year in `ls statuslogs/daily_status_* | cut -d _ -f 3 | sort | uniq`; do
tar cf status_$year.tar.gz statuslogs/daily_status_$year*.log;
done
If your shell doesn't support that format of calling, you can try this instead
ls statuslogs/daily_status_* | cut -d _ -f 3 | sort | uniq > years
cat years | while read year; do
tar cf status_$year.tar.gz statuslogs/daily_status_$year*.log;
done
If you just want one file for all the logs, you can just forget about the year part completely
tar cf statuslogs.tar.gz statuslogs/daily_status*.log

search and remove specific file using linux command [duplicate]

This question already has answers here:
Delete files with string found in file - Linux cli
(8 answers)
Closed 5 years ago.
I using this command for search all file contain this word . I want to remove all file contain this word in specific directory . grep command perfectly. suggest me how can I used
rm -rf
with below command
grep -l -r -i "Pending" . | grep -n . | wc -l
This could be done by using the l flag and piping the filenames to xargs:
-l
(The letter ell.) Write only the names of files containing selected
lines to standard output. Pathnames are written once per file searched.
If the standard input is searched, a pathname of (standard input) will
be written, in the POSIX locale. In other locales, standard input may be
replaced by something more appropriate in those locales.
grep -l -r 'Pending' . | xargs rm
The above will delete all files in the current directory containing the word Pending.

Merge some parts of a split tar.gz file Linux command

I have a large tar.gz file (approximately 63 GB) on a linux server. This file has about 1000 compressed csv files. I need to save the data of csv files in a database.
I can't extract whole file in one go due to limited space on the server. So I split the tar.gz file into 5 parts (4 parts of 15 GB and 1 of 3GB) but did not merge all of them as the server won't have any space left when extraction would be done. I merged the first two parts to make a new tar.gz file and extracted the csv files from that.
When I tried to merge the last 3 parts, it did not make a valid tar.gz file and that file could not be extracted. This problem was not because of server space because I deleted the files that were no longer required after extraction from first two parts.
Is there any way through which the last 3 parts of the split tar.gz file can be merged in a valid tar.gz format and then extracted?
Command used to split :
split -b 15G file.tar.gz parts
Command used to merge :
cat parts* > combined.tar.gz
Command used to extract :
tar zxvf file.tar.gz -C folderwhereextracted
You can use short shell script:
#/bin/sh
path='./path'
list="$path/*.tar.gz"
for file in `ls ./da/*.tar.gz.*`
do
let i++
if [[ -f $(find $path/*.tar.gz.$i) ]]
then
echo "file $path/*.tar.gz.$i found."
list="$list $path/*.tar.gz.$i"
else
echo "file $path/*.tar.gz.$i not found!"
fi
done
cat $list > full.tar.gz
tar zxvf ./full.tar.gz -C $path
# rm -rf $list
Put your path to variable with the same name.
Uncomment last line to remove source files after untar.

How to extract first few lines from a csv file inside a tar file without extracting it in linux?

I have a tar file which has lot of csv files in it.
How to get the first few lines of each csv file without extracting it?
I tried:
$(tar -Oxf $tarfile $file | head -n "$NL") >> cdn.log
But got error saying:
time(http:index: command not found
This is some line in one of the csv files. Similar errors are reported for all csv files...
Any idea??
Using -O you can tell tar to extract a file to standard output instead of to file. So you should be able to first use tar tf <YOUR_FILE> to list the files from archive and filter it using grep to find the CSV files, and then for each file use tar xf <YOUR_FILE> <NAME_OF_CSV> -O | head to get the file's beginning to stdout. This may be a bit ineffective since you unpack the archive as many tiems as there are CSV files, but should work.
You can use perl and its Archive::Tar module. Here a one-liner that extract the first two lines of each one:
perl -MArchive::Tar -E '
for (Archive::Tar->new(shift)->get_files) {
say (join qq|\n|, (split /\n/, $_->get_content, 3)[0..1])
}
' file.tar
It assumes that the tar file only has text files and they are csv. Otherwise you will have to grep the list to filter those you want.

Split files using tar, gz, zip, or bzip2 [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I need to compress a large file of about 17-20 GB. I need to split it into several files of around 1GB per file.
I searched for a solution via Google and found ways using split and cat commands. But they did not work for large files at all. Also, they won't work in Windows; I need to extract it on a Windows machine.
You can use the split command with the -b option:
split -b 1024m file.tar.gz
It can be reassembled on a Windows machine using #Joshua's answer.
copy /b file1 + file2 + file3 + file4 filetogether
Edit: As #Charlie stated in the comment below, you might want to set a prefix explicitly because it will use x otherwise, which can be confusing.
split -b 1024m "file.tar.gz" "file.tar.gz.part-"
// Creates files: file.tar.gz.part-aa, file.tar.gz.part-ab, file.tar.gz.part-ac, ...
Edit: Editing the post because question is closed and the most effective solution is very close to the content of this answer:
# create archives
$ tar cz my_large_file_1 my_large_file_2 | split -b 1024MiB - myfiles_split.tgz_
# uncompress
$ cat myfiles_split.tgz_* | tar xz
This solution avoids the need to use an intermediate large file when (de)compressing. Use the tar -C option to use a different directory for the resulting files. btw if the archive consists from only a single file, tar could be avoided and only gzip used:
# create archives
$ gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_
# uncompress
$ cat myfile_split.gz_* | gunzip -c > my_large_file
For windows you can download ported versions of the same commands or use cygwin.
If you are splitting from Linux, you can still reassemble in Windows.
copy /b file1 + file2 + file3 + file4 filetogether
use tar to split into multiple archives
there are plenty of programs that will work with tar files on windows, including cygwin.
Tested code, initially creates a single archive file, then splits it:
gzip -c file.orig > file.gz
CHUNKSIZE=1073741824
PARTCNT=$[$(stat -c%s file.gz) / $CHUNKSIZE]
# the remainder is taken care of, for example for
# 1 GiB + 1 bytes PARTCNT is 1 and seq 0 $PARTCNT covers
# all of file
for n in `seq 0 $PARTCNT`
do
dd if=file.gz of=part.$n bs=$CHUNKSIZE skip=$n count=1
done
This variant omits creating a single archive file and goes straight to creating parts:
gzip -c file.orig |
( CHUNKSIZE=1073741824;
i=0;
while true; do
i=$[i+1];
head -c "$CHUNKSIZE" > "part.$i";
[ "$CHUNKSIZE" -eq $(stat -c%s "part.$i") ] || break;
done; )
In this variant, if the archive's file size is divisible by $CHUNKSIZE, then the last partial file will have file size 0 bytes.

Resources