rename files which produced by split

rename files which produced by split - linux

I splitted the huge file and output is several files which start by x character.
I want to rename them and make a list which sorted by name like below:
part-1.gz
part-2.gz
part-3.gz ...
I tried below CMD:
for (( i = 1; i <= 3; i++ )) ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv $f part-$i.gz ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do for i in 1 .. 3 ; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for i in 1 .. 3 ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${f%}.gz" ;done

Tip: don't do ls -l if you only need the file names. Even better, don't use ls at all, just use the shell's globbing ability. x* expands to all file names starting with x.
Here's a way to do it:
i=1; for f in x*; do mv $f $(printf 'part-%d.gz' $i); ((i++)); done
This initializes i to 1, and then loops over all file names starting with x in alphabetical order, assigning each file name in turn to the variable f. Inside the loop, it renames $f to $(printf 'part-%d.gz' $i), where the printf command replaces %d with the current value of i. You might want something like %02d if you need to prefix the number with zeros. Finally, still inside the loop, it increments i so that the next file receives the next number.
Note that none of this is safe if the input file names contain spaces, but yours don't.

Related

Filtering a list by 5 files per directory

SO i have a list of files inside a tree of folders
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/folder2/7
/home/user/Scripts/example/tmp/folder2/8
/home/user/Scripts/example/tmp/folder2/9
/home/user/Scripts/example/tmp/folder2/10
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
/home/user/Scripts/example/tmp/other_folder/files/6
/home/user/Scripts/example/tmp/other_folder/files/7
/home/user/Scripts/example/tmp/other_folder/files/8
/home/user/Scripts/example/tmp/other_folder/files/9
/home/user/Scripts/example/tmp/other_folder/files/10
/home/user/Scripts/example/tmp/test/example/1
/home/user/Scripts/example/tmp/test/example/2
/home/user/Scripts/example/tmp/test/example/3
/home/user/Scripts/example/tmp/test/example/4
/home/user/Scripts/example/tmp/test/example/5
/home/user/Scripts/example/tmp/test/example/6
/home/user/Scripts/example/tmp/test/example/7
/home/user/Scripts/example/tmp/test/example/8
/home/user/Scripts/example/tmp/test/example/9
/home/user/Scripts/example/tmp/test/example/10
/home/user/Scripts/example/tmp/test/other/1
/home/user/Scripts/example/tmp/test/other/2
/home/user/Scripts/example/tmp/test/other/3
/home/user/Scripts/example/tmp/test/other/4
/home/user/Scripts/example/tmp/test/other/5
/home/user/Scripts/example/tmp/test/other/6
/home/user/Scripts/example/tmp/test/other/7
/home/user/Scripts/example/tmp/test/other/8
/home/user/Scripts/example/tmp/test/other/9
/home/user/Scripts/example/tmp/test/other/10
I want to basically filter out the content of this list so I only have the highest 5 numbers for each directory.
Any ideas?
preferable in bash/shell
Expected Output:(small sample size cause of SO says too much code)
/home/user/Scripts/example/tmp/test/example/6
/home/user/Scripts/example/tmp/test/example/7
/home/user/Scripts/example/tmp/test/example/8
/home/user/Scripts/example/tmp/test/example/9
/home/user/Scripts/example/tmp/test/example/10
/home/user/Scripts/example/tmp/test/other/6
/home/user/Scripts/example/tmp/test/other/7
/home/user/Scripts/example/tmp/test/other/8
/home/user/Scripts/example/tmp/test/other/9
/home/user/Scripts/example/tmp/test/other/10
Thanks
edit - using for i in $(for i in $(dirname $(find $(pwd) -type f -name "*[0-9]*" | sort -V) | uniq) ;do ls $i | sort -V |tail -n 5 ; done) ; do readlink -f $i ; done works for a small sample size. However expanding said sample appears to long for dirname

Assuming your input data is sorted.
Try:
awk -F'/[^/]*$' '{if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; if ( i<=5){ prev_dir=$1 ; print $0}; }'
Explanation:
'/[^/]*$' <-- Set regex delimiter to get directory base-name as first field
if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; <-- Check file is from same directory. if yes increment counter by 1 else reset.
if ( i<=5){ prev_dir=$1 ; print $0}; }' <-- Print first 5 records of current directory.
Demo:
$awk -F'/[^/]*$' '{if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; if ( i<=5){ prev_dir=$1 ; print $0 }; }' temp.txt
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
$cat temp.txt
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/folder2/7
/home/user/Scripts/example/tmp/folder2/8
/home/user/Scripts/example/tmp/folder2/9
/home/user/Scripts/example/tmp/folder2/10
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
/home/user/Scripts/example/tmp/other_folder/files/6
/home/user/Scripts/example/tmp/other_folder/files/7
/home/user/Scripts/example/tmp/other_folder/files/8
/home/user/Scripts/example/tmp/other_folder/files/9
/home/user/Scripts/example/tmp/other_folder/files/10
$

Here is an implementation in plain bash:
#!/bin/bash
prevdir=
while read -r line; do
dir=${line%/*}
[[ $dir == "$prevdir" ]] || { n=0; prevdir=$dir; }
((n++ < 5)) && echo "$line"
done
You can use it like:
./script < file.list # If file.list already sorted by a reverse version sort
or,
sort -rV file.list | ./script # If the file.list is not sorted
or,
find /home/user/Scripts -type f | sort -rV | ./script
Also, you may want to append | tac to the pipelines above.

wc with find. error if space in folders name

I need to calculate folder size in bytes.
if folder name contains space /folder/with spaces/ then following command not work properly
wc -c `find /folder -type f` | grep total | awk '{print $1}'
with error
wc: /folder/with: No such file or directory
wc: spaces/file2: No such file or directory
How can it done?

Try this line instead:
find /folder -type f | xargs -I{} wc -c "{}" | awk '{print $1}'

You need the names individually quoted.
$: while read n; # assign whole row read to $n
do a+=("$n"); # add quoted "$n" to array
done < <( find /folder -type f ) # reads find as a stream
$: wc -c "${a[#]}" | # pass wc the quoted names
sed -n '${ s/ .*//; p; }' # ignore all but total, scrub and print
Compressed to short couple lines -
$: while read n; do a+=( "$n"); done < <( find /folder -type f )
$: wc -c "${a[#]}" | sed -n '${ s/ .*//; p; }'

This is because bash (different to zsh) word-splits the result of the command substitution. You could use an array to collect the file names:
files=()
for entry in *
do
[[ -f $entry ]] && files+=("$entry")
done
wc -c "${files[#]}" | grep .....

How to replace file's names with numbers starting with certain number?

I want files to be named like 177.jpg, 178.jpg and so on starting with 177.jpg.
I used this to rename them from 1 to amount of files:
ls | cat -n | while read n f; do mv "$f" "$n.jpg"; done
How to modify this ? But completely new script also would be great.

Bash can do simple math for you:
mv "$f" $(( n + 176 )).jpg
Just hope no filename contains a newline.
There are safer ways than parsing the output of ls, e.g. iterating over an expanded wildcard:
n=177
for f in * ; do
mv "$f" $(( n++ )).jpg
done

This should work.
#!/bin/bash
c=177;
for i in `ls | grep -v '^[0-9]' | grep .png`; # This will make sure only png files are selected to replace and only the files which have filenames which starts with non-numeric
do
mv "$i" "$c".png;
(( c=c+1 ));
done

How to get result from background process linux shell script?

For example let's say I want to count the number of lines of 10 BIG files and print a total.
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' &
done
I was trying something like:
for f in files
do
#this does not work :/
n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done
echo $n

I finally found a working solution using anonymous pipes and bash:
#!/bin/bash
# this executes a separate shell and opens a new pipe, where the
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)
# ... do other stuff
# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)

You should probably use gnu parallel:
find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'
or else xargs in parallel mode:
find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'
Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.
#!/bin/bash
declare -a temp_files
count=0
for f in *
do
if [[ -f "$f" ]]; then
temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
((count++))
fi
done
count=0
for f in *
do
if [[ -f "$f" ]]; then
cat "$f" | wc -l > "${temp_files[$count]}" &
((count++))
fi
done
wait
cat "${temp_files[#]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'
for tf in "${temp_files[#]}"
do
rm "$tf"
done
By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.

You could write that to a file or better, listen to a fifo as soon as data arrives.
Here is a small example on how they work:
# create the fifo
mkfifo test
# listen to it
while true; do if read line <test; then echo $line; fi done
# in another shell
echo 'hi there'
# notice 'hi there' being printed in the first shell
So you could
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' > fifo &
done
and listen on the fifo for sizes.

File name printed twice when using wc

For printing number of lines in all ".txt" files of current folder, I am using following script:
for f in *.txt;
do l="$(wc -l "$f")";
echo "$f" has "$l" lines;
done
But in output I am getting:
lol.txt has 2 lol.txt lines
Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as :
lol.txt has 2 lines

You can remove the filename with 'cut':
for f in *.txt;
do l="$(wc -l "$f" | cut -f1 -d' ')";
echo "$f" has "$l" lines;
done

The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.

wc prints the filename, so you could just write the script as:
ls *.txt | while read f; do wc -l "$f"; done
or, if you really want the verbose output, try
ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done

There is a trick here. Get wc to read stdin and it won't print a file name:
for f in *.txt; do
l=$(wc -l < "$f")
echo "$f" has "$l" lines
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

rename files which produced by split - linux

Related

Filtering a list by 5 files per directory

wc with find. error if space in folders name

How to replace file's names with numbers starting with certain number?

How to get result from background process linux shell script?

File name printed twice when using wc

Categories

Resources