rename files which produced by split - linux

I splitted the huge file and output is several files which start by x character.
I want to rename them and make a list which sorted by name like below:
part-1.gz
part-2.gz
part-3.gz ...
I tried below CMD:
for (( i = 1; i <= 3; i++ )) ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv $f part-$i.gz ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do for i in 1 .. 3 ; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for i in 1 .. 3 ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${f%}.gz" ;done

Tip: don't do ls -l if you only need the file names. Even better, don't use ls at all, just use the shell's globbing ability. x* expands to all file names starting with x.
Here's a way to do it:
i=1; for f in x*; do mv $f $(printf 'part-%d.gz' $i); ((i++)); done
This initializes i to 1, and then loops over all file names starting with x in alphabetical order, assigning each file name in turn to the variable f. Inside the loop, it renames $f to $(printf 'part-%d.gz' $i), where the printf command replaces %d with the current value of i. You might want something like %02d if you need to prefix the number with zeros. Finally, still inside the loop, it increments i so that the next file receives the next number.
Note that none of this is safe if the input file names contain spaces, but yours don't.

Related

Filtering a list by 5 files per directory

SO i have a list of files inside a tree of folders
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/folder2/7
/home/user/Scripts/example/tmp/folder2/8
/home/user/Scripts/example/tmp/folder2/9
/home/user/Scripts/example/tmp/folder2/10
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
/home/user/Scripts/example/tmp/other_folder/files/6
/home/user/Scripts/example/tmp/other_folder/files/7
/home/user/Scripts/example/tmp/other_folder/files/8
/home/user/Scripts/example/tmp/other_folder/files/9
/home/user/Scripts/example/tmp/other_folder/files/10
/home/user/Scripts/example/tmp/test/example/1
/home/user/Scripts/example/tmp/test/example/2
/home/user/Scripts/example/tmp/test/example/3
/home/user/Scripts/example/tmp/test/example/4
/home/user/Scripts/example/tmp/test/example/5
/home/user/Scripts/example/tmp/test/example/6
/home/user/Scripts/example/tmp/test/example/7
/home/user/Scripts/example/tmp/test/example/8
/home/user/Scripts/example/tmp/test/example/9
/home/user/Scripts/example/tmp/test/example/10
/home/user/Scripts/example/tmp/test/other/1
/home/user/Scripts/example/tmp/test/other/2
/home/user/Scripts/example/tmp/test/other/3
/home/user/Scripts/example/tmp/test/other/4
/home/user/Scripts/example/tmp/test/other/5
/home/user/Scripts/example/tmp/test/other/6
/home/user/Scripts/example/tmp/test/other/7
/home/user/Scripts/example/tmp/test/other/8
/home/user/Scripts/example/tmp/test/other/9
/home/user/Scripts/example/tmp/test/other/10
I want to basically filter out the content of this list so I only have the highest 5 numbers for each directory.
Any ideas?
preferable in bash/shell
Expected Output:(small sample size cause of SO says too much code)
/home/user/Scripts/example/tmp/test/example/6
/home/user/Scripts/example/tmp/test/example/7
/home/user/Scripts/example/tmp/test/example/8
/home/user/Scripts/example/tmp/test/example/9
/home/user/Scripts/example/tmp/test/example/10
/home/user/Scripts/example/tmp/test/other/6
/home/user/Scripts/example/tmp/test/other/7
/home/user/Scripts/example/tmp/test/other/8
/home/user/Scripts/example/tmp/test/other/9
/home/user/Scripts/example/tmp/test/other/10
Thanks
edit - using for i in $(for i in $(dirname $(find $(pwd) -type f -name "*[0-9]*" | sort -V) | uniq) ;do ls $i | sort -V |tail -n 5 ; done) ; do readlink -f $i ; done works for a small sample size. However expanding said sample appears to long for dirname
Assuming your input data is sorted.
Try:
awk -F'/[^/]*$' '{if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; if ( i<=5){ prev_dir=$1 ; print $0}; }'
Explanation:
'/[^/]*$' <-- Set regex delimiter to get directory base-name as first field
if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; <-- Check file is from same directory. if yes increment counter by 1 else reset.
if ( i<=5){ prev_dir=$1 ; print $0}; }' <-- Print first 5 records of current directory.
Demo:
$awk -F'/[^/]*$' '{if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; if ( i<=5){ prev_dir=$1 ; print $0 }; }' temp.txt
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
$cat temp.txt
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/folder2/7
/home/user/Scripts/example/tmp/folder2/8
/home/user/Scripts/example/tmp/folder2/9
/home/user/Scripts/example/tmp/folder2/10
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
/home/user/Scripts/example/tmp/other_folder/files/6
/home/user/Scripts/example/tmp/other_folder/files/7
/home/user/Scripts/example/tmp/other_folder/files/8
/home/user/Scripts/example/tmp/other_folder/files/9
/home/user/Scripts/example/tmp/other_folder/files/10
$
Here is an implementation in plain bash:
#!/bin/bash
prevdir=
while read -r line; do
dir=${line%/*}
[[ $dir == "$prevdir" ]] || { n=0; prevdir=$dir; }
((n++ < 5)) && echo "$line"
done
You can use it like:
./script < file.list # If file.list already sorted by a reverse version sort
or,
sort -rV file.list | ./script # If the file.list is not sorted
or,
find /home/user/Scripts -type f | sort -rV | ./script
Also, you may want to append | tac to the pipelines above.

wc with find. error if space in folders name

I need to calculate folder size in bytes.
if folder name contains space /folder/with spaces/ then following command not work properly
wc -c `find /folder -type f` | grep total | awk '{print $1}'
with error
wc: /folder/with: No such file or directory
wc: spaces/file2: No such file or directory
How can it done?
Try this line instead:
find /folder -type f | xargs -I{} wc -c "{}" | awk '{print $1}'
You need the names individually quoted.
$: while read n; # assign whole row read to $n
do a+=("$n"); # add quoted "$n" to array
done < <( find /folder -type f ) # reads find as a stream
$: wc -c "${a[#]}" | # pass wc the quoted names
sed -n '${ s/ .*//; p; }' # ignore all but total, scrub and print
Compressed to short couple lines -
$: while read n; do a+=( "$n"); done < <( find /folder -type f )
$: wc -c "${a[#]}" | sed -n '${ s/ .*//; p; }'
This is because bash (different to zsh) word-splits the result of the command substitution. You could use an array to collect the file names:
files=()
for entry in *
do
[[ -f $entry ]] && files+=("$entry")
done
wc -c "${files[#]}" | grep .....

How to replace file's names with numbers starting with certain number?

I want files to be named like 177.jpg, 178.jpg and so on starting with 177.jpg.
I used this to rename them from 1 to amount of files:
ls | cat -n | while read n f; do mv "$f" "$n.jpg"; done
How to modify this ? But completely new script also would be great.
Bash can do simple math for you:
mv "$f" $(( n + 176 )).jpg
Just hope no filename contains a newline.
There are safer ways than parsing the output of ls, e.g. iterating over an expanded wildcard:
n=177
for f in * ; do
mv "$f" $(( n++ )).jpg
done
This should work.
#!/bin/bash
c=177;
for i in `ls | grep -v '^[0-9]' | grep .png`; # This will make sure only png files are selected to replace and only the files which have filenames which starts with non-numeric
do
mv "$i" "$c".png;
(( c=c+1 ));
done

How to get result from background process linux shell script?

For example let's say I want to count the number of lines of 10 BIG files and print a total.
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' &
done
I was trying something like:
for f in files
do
#this does not work :/
n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done
echo $n
I finally found a working solution using anonymous pipes and bash:
#!/bin/bash
# this executes a separate shell and opens a new pipe, where the
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)
# ... do other stuff
# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)
You should probably use gnu parallel:
find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'
or else xargs in parallel mode:
find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'
Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.
#!/bin/bash
declare -a temp_files
count=0
for f in *
do
if [[ -f "$f" ]]; then
temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
((count++))
fi
done
count=0
for f in *
do
if [[ -f "$f" ]]; then
cat "$f" | wc -l > "${temp_files[$count]}" &
((count++))
fi
done
wait
cat "${temp_files[#]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'
for tf in "${temp_files[#]}"
do
rm "$tf"
done
By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.
You could write that to a file or better, listen to a fifo as soon as data arrives.
Here is a small example on how they work:
# create the fifo
mkfifo test
# listen to it
while true; do if read line <test; then echo $line; fi done
# in another shell
echo 'hi there'
# notice 'hi there' being printed in the first shell
So you could
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' > fifo &
done
and listen on the fifo for sizes.

File name printed twice when using wc

For printing number of lines in all ".txt" files of current folder, I am using following script:
for f in *.txt;
do l="$(wc -l "$f")";
echo "$f" has "$l" lines;
done
But in output I am getting:
lol.txt has 2 lol.txt lines
Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as :
lol.txt has 2 lines
You can remove the filename with 'cut':
for f in *.txt;
do l="$(wc -l "$f" | cut -f1 -d' ')";
echo "$f" has "$l" lines;
done
The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.
wc prints the filename, so you could just write the script as:
ls *.txt | while read f; do wc -l "$f"; done
or, if you really want the verbose output, try
ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done
There is a trick here. Get wc to read stdin and it won't print a file name:
for f in *.txt; do
l=$(wc -l < "$f")
echo "$f" has "$l" lines
done

Resources