How to get result from background process linux shell script? - linux

For example let's say I want to count the number of lines of 10 BIG files and print a total.
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' &
done
I was trying something like:
for f in files
do
#this does not work :/
n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done
echo $n

I finally found a working solution using anonymous pipes and bash:
#!/bin/bash
# this executes a separate shell and opens a new pipe, where the
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)
# ... do other stuff
# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)

You should probably use gnu parallel:
find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'
or else xargs in parallel mode:
find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'
Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.
#!/bin/bash
declare -a temp_files
count=0
for f in *
do
if [[ -f "$f" ]]; then
temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
((count++))
fi
done
count=0
for f in *
do
if [[ -f "$f" ]]; then
cat "$f" | wc -l > "${temp_files[$count]}" &
((count++))
fi
done
wait
cat "${temp_files[#]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'
for tf in "${temp_files[#]}"
do
rm "$tf"
done
By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.

You could write that to a file or better, listen to a fifo as soon as data arrives.
Here is a small example on how they work:
# create the fifo
mkfifo test
# listen to it
while true; do if read line <test; then echo $line; fi done
# in another shell
echo 'hi there'
# notice 'hi there' being printed in the first shell
So you could
for f in files
do
#this creates a background process for each file
wc -l $f | awk '{print $1}' > fifo &
done
and listen on the fifo for sizes.

Related

wc with find. error if space in folders name

I need to calculate folder size in bytes.
if folder name contains space /folder/with spaces/ then following command not work properly
wc -c `find /folder -type f` | grep total | awk '{print $1}'
with error
wc: /folder/with: No such file or directory
wc: spaces/file2: No such file or directory
How can it done?
Try this line instead:
find /folder -type f | xargs -I{} wc -c "{}" | awk '{print $1}'
You need the names individually quoted.
$: while read n; # assign whole row read to $n
do a+=("$n"); # add quoted "$n" to array
done < <( find /folder -type f ) # reads find as a stream
$: wc -c "${a[#]}" | # pass wc the quoted names
sed -n '${ s/ .*//; p; }' # ignore all but total, scrub and print
Compressed to short couple lines -
$: while read n; do a+=( "$n"); done < <( find /folder -type f )
$: wc -c "${a[#]}" | sed -n '${ s/ .*//; p; }'
This is because bash (different to zsh) word-splits the result of the command substitution. You could use an array to collect the file names:
files=()
for entry in *
do
[[ -f $entry ]] && files+=("$entry")
done
wc -c "${files[#]}" | grep .....

rename files which produced by split

I splitted the huge file and output is several files which start by x character.
I want to rename them and make a list which sorted by name like below:
part-1.gz
part-2.gz
part-3.gz ...
I tried below CMD:
for (( i = 1; i <= 3; i++ )) ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv $f part-$i.gz ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do for i in 1 .. 3 ; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for i in 1 .. 3 ;do for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${part-$i}.gz" ;done ; done;
for f in `ls -l | awk '{print $9}' | grep '^x'`; do mv -- "$f" "${f%}.gz" ;done
Tip: don't do ls -l if you only need the file names. Even better, don't use ls at all, just use the shell's globbing ability. x* expands to all file names starting with x.
Here's a way to do it:
i=1; for f in x*; do mv $f $(printf 'part-%d.gz' $i); ((i++)); done
This initializes i to 1, and then loops over all file names starting with x in alphabetical order, assigning each file name in turn to the variable f. Inside the loop, it renames $f to $(printf 'part-%d.gz' $i), where the printf command replaces %d with the current value of i. You might want something like %02d if you need to prefix the number with zeros. Finally, still inside the loop, it increments i so that the next file receives the next number.
Note that none of this is safe if the input file names contain spaces, but yours don't.

bash count sequential files

I'm pretty new to bash scripting so some of the syntaxes may not be optimal. Please do point them out if you see one.
I have files in a directory named sequentially.
Example: prob01_01 prob01_03 prob01_07 prob02_01 prob02_03 ....
I am trying to have the script iterate through the current directory and count how many extensions each problem has. Then print the pre-extension name then count
Sample output for above would be:
prob01 3
prob02 2
This is my code:
#!/bin/bash
temp=$(mktemp)
element=''
count=0
for i in *
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1"
else
echo $element $count >> temp
element=$current
count=1
fi
done
echo 'heres the temp:'
cat temp
rm 'temp'
The Problem:
Current output:
prob1 3
Desired output:
prob1 3
prob2 2
The last count isn't appended because it's not seeing a different element after it
My Guess on possible solutions:
Have the last append occur at the end of the for loop?
Your code has 2 problems.
The first problem doesn't answer your question. You make a temporary file, the filename is stored in $temp. You should use that one, and not the file with the fixed name temp.
The problem is that you only write results when you see a new problem/filename. The last one will not be printed.
Fixing only these problems will result in
results() {
if (( count == 0 )); then
return
fi
echo $element $count >> "${temp}"
}
temp=$(mktemp)
element=''
count=0
for i in prob*
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1" # Better is using ((count++))
else
results
element=$current
count=1
fi
done
results
echo 'heres the temp:'
cat "${temp}"
rm "${temp}"
You can do without the script with
ls prob* | cut -d"_" -f1 | sort | uniq -c
When you want the have the output displayed as given, you need one more step.
ls prob* | cut -d"_" -f1 | sort | uniq -c | awk '{print $2 " " $1}'
You may use printf + awk solution:
printf '%s\n' *_* | awk -F_ '{a[$1]++} END{for (i in a) print i, a[i]}'
prob01 3
prob02 2
We use printf to print each file that has at least one _
We use awk to get a count of each file's first element delimited by _ by using an associative array.
I would do it like this:
$ ls | awk -F_ '{print $1}' | sort | uniq -c | awk '{print $2 " " $1}'
prob01 3
prob02 2

Shell script to recursively print full directory tree using ls

Assignment: I have to create a shell script using diff and sort, and a pipeline using ls -l, grep '^d', and awk '{print $9}' to print a full directory tree.
I wrote a C program to display what I am looking for. Here is the output:
ryan#chrx:~/Documents/OS-Projects/Project5_DirectoryTree$ ./a.out
TestRoot/
[Folder1]
[FolderC]
[FolderB]
[FolderA]
[Folder2]
[FolderD]
[FolderF]
[FolderE]
[Folder3]
[FolderI]
[FolderG]
[FolderH]
I wrote this so far:
ls -R -l $1 | grep '^d' | awk '{print $9}'
to print the directory tree but now I need a way to sort it by folder depth and possibly indent but not required. Any suggestions? I can't use find or tree commands.
EDIT: The original assignment & restrictions were mistaken and changed at a later date. The current answers are good solutions if you disregard the restrictions so please leave them for any people with similar issues. As for the the new assignment in case anybody was wondering. I was to recursively print all sub directories, sort them, then compare them with my program to make sure they have similar results. Here was my solution:
#!/bin/bash
echo Program:
./a.out $1 | sort
echo Shell Script:
ls -R -l $1 | grep '^d' | awk '{print $9}' | sort
diff <(./a.out $1 | sort) <(ls -R -l $1 | grep '^d' | awk '{print $9}' | sort)
DIFF=$?
if [[ $DIFF -eq 0 ]]
then
echo "The outputs are similar!"
fi
You don't need neither ls nor grep nor awk for getting the tree. The Simple recursive bash function will be enouh, like:
#!/bin/bash
walk() {
local indent="${2:-0}"
printf "%*s%s\n" $indent '' "$1"
for entry in "$1"/*; do
[[ -d "$entry" ]] && walk "$entry" $((indent+4))
done
}
walk "$1"
If you run it as bash script.sh /etc it will print the dir-tree like:
/etc
/etc/apache2
/etc/apache2/extra
/etc/apache2/original
/etc/apache2/original/extra
/etc/apache2/other
/etc/apache2/users
/etc/asl
/etc/cups
/etc/cups/certs
/etc/cups/interfaces
/etc/cups/ppd
/etc/defaults
/etc/emond.d
/etc/emond.d/rules
/etc/mach_init.d
/etc/mach_init_per_login_session.d
/etc/mach_init_per_user.d
/etc/manpaths.d
/etc/newsyslog.d
/etc/openldap
/etc/openldap/schema
/etc/pam.d
/etc/paths.d
/etc/periodic
/etc/periodic/daily
/etc/periodic/monthly
/etc/periodic/weekly
/etc/pf.anchors
/etc/postfix
/etc/postfix/postfix-files.d
/etc/ppp
/etc/racoon
/etc/security
/etc/snmp
/etc/ssh
/etc/ssl
/etc/ssl/certs
/etc/sudoers.d
Borrowing from #jm666's idea of running it on /etc:
$ find /etc -type d -print | awk -F'/' '{printf "%*s[%s]\n", 4*(NF-2), "", $0}'
[/etc]
[/etc/alternatives]
[/etc/bash_completion.d]
[/etc/defaults]
[/etc/defaults/etc]
[/etc/defaults/etc/pki]
[/etc/defaults/etc/pki/ca-trust]
[/etc/defaults/etc/pki/nssdb]
[/etc/defaults/etc/profile.d]
[/etc/defaults/etc/skel]
[/etc/fonts]
[/etc/fonts/conf.d]
[/etc/fstab.d]
[/etc/ImageMagick]
[/etc/ImageMagick-6]
[/etc/pango]
[/etc/pkcs11]
[/etc/pki]
[/etc/pki/ca-trust]
[/etc/pki/ca-trust/extracted]
[/etc/pki/ca-trust/extracted/java]
[/etc/pki/ca-trust/extracted/openssl]
[/etc/pki/ca-trust/extracted/pem]
[/etc/pki/ca-trust/source]
[/etc/pki/ca-trust/source/anchors]
[/etc/pki/ca-trust/source/blacklist]
[/etc/pki/nssdb]
[/etc/pki/tls]
[/etc/postinstall]
[/etc/preremove]
[/etc/profile.d]
[/etc/sasl2]
[/etc/setup]
[/etc/skel]
[/etc/ssl]
[/etc/texmf]
[/etc/texmf/tlmgr]
[/etc/texmf/web2c]
[/etc/xml]
Sorry, I couldn't find a sensible way to use the other tools you mentioned so it may not help you but maybe it'll help others with the same question but without the requirement to use specific tools.

File name printed twice when using wc

For printing number of lines in all ".txt" files of current folder, I am using following script:
for f in *.txt;
do l="$(wc -l "$f")";
echo "$f" has "$l" lines;
done
But in output I am getting:
lol.txt has 2 lol.txt lines
Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as :
lol.txt has 2 lines
You can remove the filename with 'cut':
for f in *.txt;
do l="$(wc -l "$f" | cut -f1 -d' ')";
echo "$f" has "$l" lines;
done
The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.
wc prints the filename, so you could just write the script as:
ls *.txt | while read f; do wc -l "$f"; done
or, if you really want the verbose output, try
ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done
There is a trick here. Get wc to read stdin and it won't print a file name:
for f in *.txt; do
l=$(wc -l < "$f")
echo "$f" has "$l" lines
done

Resources