bash: set variable inside loop when piping find and grep [duplicate] - linux

i want to compute all *bin files inside a given directory. Initially I was working with a for-loop:
var=0
for i in *ls *bin
do
perform computations on $i ....
var+=1
done
echo $var
However, in some directories there are too many files resulting in an error: Argument list too long
Therefore, I was trying it with a piped while-loop:
var=0
ls *.bin | while read i;
do
perform computations on $i
var+=1
done
echo $var
The problem now is by using the pipe subshells are created. Thus, echo $var returns 0.
How can I deal with this problem?
The original Code:
#!/bin/bash
function entropyImpl {
if [[ -n "$1" ]]
then
if [[ -e "$1" ]]
then
echo "scale = 4; $(gzip -c ${1} | wc -c) / $(cat ${1} | wc -c)" | bc
else
echo "file ($1) not found"
fi
else
datafile="$(mktemp entropy.XXXXX)"
cat - > "$datafile"
entropy "$datafile"
rm "$datafile"
fi
return 1
}
declare acc_entropy=0
declare count=0
ls *.bin | while read i ;
do
echo "Computing $i" | tee -a entropy.txt
curr_entropy=`entropyImpl $i`
curr_entropy=`echo $curr_entropy | bc`
echo -e "\tEntropy: $curr_entropy" | tee -a entropy.txt
acc_entropy=`echo $acc_entropy + $curr_entropy | bc`
let count+=1
done
echo "Out of function: $count | $acc_entropy"
acc_entropy=`echo "scale=4; $acc_entropy / $count" | bc`
echo -e "===================================================\n" | tee -a entropy.txt
echo -e "Accumulated Entropy:\t$acc_entropy ($count files processed)\n" | tee -a entropy.txt

The problem is that the while loop is part of a pipeline. In a bash pipeline, every element of the pipeline is executed in its own subshell [ref]. So after the while loop terminates, the while loop subshell's copy of var is discarded, and the original var of the parent (whose value is unchanged) is echoed.
One way to fix this is by using Process Substitution as shown below:
var=0
while read i;
do
# perform computations on $i
((var++))
done < <(find . -type f -name "*.bin" -maxdepth 1)
Take a look at BashFAQ/024 for other workarounds.
Notice that I have also replaced ls with find because it is not good practice to parse ls.

A POSIX compliant solution would be to use a pipe (p file). This solution is very nice, portable, and POSIX, but writes something on the hard disk.
mkfifo mypipe
find . -type f -name "*.bin" -maxdepth 1 > mypipe &
while read line
do
# action
done < mypipe
rm mypipe
Your pipe is a file on your hard disk. If you want to avoid having useless files, do not forget to remove it.

So researching the generic issue, passing variables from a sub-shelled while loop to the parent. One solution I found, missing here, was to use a here-string. As that was bash-ish, and I preferred a POSIX solution, I found that a here-string is really just a shortcut for a here-document. With that knowledge at hand, I came up with the following, avoiding the subshell; thus allowing variables to be set in the loop.
#!/bin/sh
set -eu
passwd="username,password,uid,gid
root,admin,0,0
john,appleseed,1,1
jane,doe,2,2"
main()
{
while IFS="," read -r _user _pass _uid _gid; do
if [ "${_user}" = "${1:-}" ]; then
password="${_pass}"
fi
done <<-EOT
${passwd}
EOT
if [ -z "${password:-}" ]; then
echo "No password found."
exit 1
fi
echo "The password is '${password}'."
}
main "${#}"
exit 0
One important note to all copy pasters, is that the here-document is setup using the hyphen, indicating that tabs are to be ignored. This is needed to keep the layout somewhat nice. It is important to note, because stackoverflow doesn't render tabs in 'code' and replaces them with spaces. Grmbl. SO, don't mangle my code, just cause you guys favor spaces over tabs, it's irrelevant in this case!
This probably breaks on different editor(settings) and what not. So the alternative would be to have it as:
done <<-EOT
${passwd}
EOT

This could be done with a for loop, too:
var=0;
for file in `find . -type f -name "*.bin" -maxdepth 1`; do
# perform computations on "$i"
((var++))
done
echo $var

Related

Bash Loop with counter gives a count of 1 when no item found. Why?

In the function below my counter works fine as long as an item is found in $DT_FILES. If the find is empty for that folder the counter gives me a count of 1 instead of 0. I am not sure what I am missing.
What the script does here is 1) makes a variable containing all the parent folders. 2) Loop through each folder, cd inside each one and makes a list of all files that contain the string "-DT-". 3) If it finds a file that doesn't not end with ".tif", it then copy the DT files and put a .tif extension to it. Very simple.
I count the number of times the loop did create a new file with the ".tif" extension.
So I am not sure why I am getting a count of 1 at times.
function create_tifs()
{
IFS=$'\n'
# create list of main folders
LIST=$( find . -maxdepth 1 -mindepth 1 -type d )
for f in $LIST
do
echo -e "\n${OG}>>> Folder processed: ${f} ${NONE}"
cd ${f}
DT_FILES=$(find . -type f -name '*-DT-*' | grep -v '.jpg')
if (( ${#DT_FILES} ))
then
count=0
for b in ${DT_FILES}
do
if [[ "${b}" != *".tif" ]]
then
# cp -n "${b}" "${b}.tif"
echo -e "TIF created ${b} as ${b}.tif"
echo
((count++))
else
echo -e "TIF already done ${b}"
fi
done
fi
echo -e "\nCount = ${count}"
}
I can't repro your problem, but your code contains several dubious constructs. Here is a refactoring might coincidentally also remove whatever problem you were experiencing.
#!/bin/bash
# Don't use non-portable function definition syntax
create_tifs() {
# Don't pollute global namespace; don't attempt to parse find output
# See also https://mywiki.wooledge.org/BashFAQ/020
local f
for f in ./*/; do
# prefer printf over echo -e
# print diagnostic messages to standard error >&2
# XXX What are these undeclared global variables?
printf "\n%s>>> Folder processed: %s %s" "$OG" "$f" "$NONE" >&2
# Again, avoid parsing find output
find "$f" -name '*-DT-*' -not -name '*.jpg' -exec sh -c '
for b; do
if [[ "${b}" != *".tif" ]]
then
# cp -n "${b}" "${b}.tif"
printf "TIF created %s as %s.tif\n" "$b" "$b" >&2
# print one line for wc
printf ".\n"
else
# XXX No newline, really??
printf "TIF already done %s" "$b" >&2
fi
done
fi' _ {} +
# Missing done!
done |
# Count lines produced by printf inside tif creation
wc -l |
sed 's/.*/Count = &/'
}
This could be further simplified by using find ./*/ instead of looping over f but then you don't (easily) get to emit a diagnostic message for each folder separately. Similarly, you could add -not -name '*.tif' but then you don't get to print "tif already done" for those.
Tangentially perhaps see also Correct Bash and shell script variable capitalization; use lower case for your private variables.
Printing a newline before your actual message (like in the first printf) is a weird antipattern, especially when you don't do that consequently. The usual arrangement would be to put a newline at the end of each emitted message.
If you've got Bash 4.0 or later you can use globstar instead of (the error-prone) find. Try this Shellcheck-clean code:
#! /bin/bash -p
shopt -s dotglob extglob nullglob globstar
function create_tifs
{
local dir dtfile
local -i count
for dir in */; do
printf '\nFolder processed: %s\n' "$dir" >&2
count=0
for dtfile in "$dir"**/*-DT-!(*.jpg); do
if [[ $dtfile == *.tif ]]; then
printf 'TIF already done %s\n' "$dtfile" >&2
else
cp -v -n -- "$dtfile" "$dtfile".tif
count+=1
fi
done
printf 'Count = %d\n' "$count" >&2
done
return 0
}
shopt -s ... enables some Bash settings that are required by the code:
dotglob enables globs to match files and directories that begin with .. find shows such files by default.
extglob enables "extended globbing" (including patterns like !(*.jpg)). See the extglob section in glob - Greg's Wiki.
nullglob makes globs expand to nothing when nothing matches (otherwise they expand to the glob pattern itself, which is almost never useful in programs).
globstar enables the use of ** to match paths recursively through directory trees.
Note that globstar is potentially dangerous in versions of Bash prior to 4.3 because it follows symlinks, possibly leading to processing the same file or directory multiple times, or getting stuck in a cycle.
The -v option with cp causes it to print details of what it does. You might prefer to drop the option and print a different format of message instead.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I used printf instead of echo.
I didn't use cd because it often leads to problems in programs.

Bash concurrent jobs gets stuck

I've implemented a way to have concurrent jobs in bash, as seen here.
I'm looping through a file with around 13000 lines. I'm just testing and printing each line, as such:
#!/bin/bash
max_bg_procs(){
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
echo "success in if"
break
fi
echo "has to wait"
sleep 4
done
}
download_data(){
echo "link #" $2 "["$1"]"
}
mapfile -t myArray < $1
i=1
for url in "${myArray[#]}"
do
max_bg_procs 6
download_data $url $i &
((i++))
done
echo "finito!"
I've also tried other solutions such as this and this, but my issue is persistent:
At a "random" given step, usually between the 2000th and the 5000th iteration, it simply gets stuck. I've put those various echo in the middle of the code to see where it would get stuck but it the last thing it prints is the $url $i.
I've done the simple test to remove any parallelism and just loop the file contents: all went fine and it looped till the end.
So it makes me think I'm missing some limitation on the parallelism, and I wonder if anyone could help me out figuring it out.
Many thanks!
Here, we have up to 6 parallel bash processes calling download_data, each of which is passed up to 16 URLs per invocation. Adjust per your own tuning.
Note that this expects both bash (for exported function support) and GNU xargs.
#!/usr/bin/env bash
# ^^^^- not /bin/sh
download_data() {
echo "link #$2 [$1]" # TODO: replace this with a job that actually takes some time
}
export -f download_data
<input.txt xargs -d $'\n' -P 6 -n 16 -- bash -c 'for arg; do download_data "$arg"; done' _
Using GNU Parallel it looks like this
cat input.txt | parallel echo link '\#{#} [{}]'
{#} = the job number
{} = the argument
It will spawn one process per CPU. If you instead want 6 in parallel use -j:
cat input.txt | parallel -j6 echo link '\#{#} [{}]'
If you prefer running a function:
download_data(){
echo "link #" $2 "["$1"]"
}
export -f download_data
cat input.txt | parallel -j6 download_data {} {#}

Linux find with symlink recursion and some extras

I'm currently monitoring a number of dirs for log files; specifically those just created. It's been a long time since my Linux and after some trial and error I've hacked together what I need but it takes a full 20secs or more to return. I'm hoping I can have an expert look at it and advise me on something a little more streamlined.
find . -type f -follow -print | xargs ls -ltr 2>/dev/null | grep '2\?10' | tail
So for example find the last 10 files matching the name. Optimally I'd like to turn this into a bash script that accepts one argument and replaces the grep expression but I figure one thing at a time.
Thanks for your help in advance!
I bit the bullet and wrote the script; I'll tinker with it more later.
#!/bin/bash
if [ $# != 2 ]; then
echo findLog Usage: findLog [3 digit cluster] [pick 1: main message service detail soap]
exit 0
fi
if [ "$2" == "service" ]; then
file="$2-time-"
elif [ "$2" == "detail" ]; then
file="$2-time-"
else file="$2-"
fi
cluster="$1"
#store logpaths for readability
a="/pathto/A"
b="/pathto/B"
c="/pathto/C"
d="/pathto/D"
e="/pathto/E"
f="/pathto/F"
g="/pathto/G"
h="/pathto/H"
logpaths=( $a $b $c $d $e $f $g $h )
for i in "${logpaths[#]}"
do
ls -ltr "$i"/*.log | grep "$file"${cluster:0:1}${i: -1}${cluster: -2}
done

Running diff and have it stop on a difference

I have a script running that is checking multiples directories and comparing them to expanded tarballs of the same directories elsewhere.
I am using diff -r -q and what I would like is that when diff finds any difference in the recursive run it will stop running instead of going through more directories in the same run.
All help appreciated!
Thank you
#bazzargh I did try it like you suggested or like this.
for file in $(find $dir1 -type f);
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]];
then echo differs: $file > /tmp/$runid.tmp 2>&1; break;
else echo same: $file > /dev/null; fi; done
But this only works with files that exist in both directories. If one file is missing I won't get information about that. Also the directories I am working with have over 300.000 files so it seems to be a bit of overhead to do a find for each file and then diff.
I would like something like this to work, with and elif statement that checks if $runid.tmp contains data and breaks if it does. I added 2> after the first if statement so stderr is sent to the $runid.tmp file.
for file in $(find $dir1 -type f);
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]] 2> /tmp/$runid.tmp;
then echo differs: $file > /tmp/$runid.tmp 2>&1; break;
elif [[ -s /tmp/$runid.tmp ]];
then echo differs: $file >> /tmp/$runid.tmp 2>&1; break;
else echo same: $file > /dev/null; fi; done
Would this work?
You can do the loop over files with 'find' and break when they differ. eg for dirs foo, bar:
for file in $(find foo -type f); do if [[ $(diff -q $file ${file/#foo/bar}) ]]; then echo differs: $file; break; else echo same: $file; fi; done
NB this will not detect if 'bar' has directories that do not exist in 'foo'.
Edited to add: I just realised I overlooked the really obvious solution:
diff -rq foo bar | head -n1
It's not 'diff', but with 'awk' you can compare two files (or more) and then exit when they have a different line.
Try something like this (sorry, it's a little rough)
awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) exit }' file1 file2
Sources are here and here.
edit: to break out of the loop when two files have the same line, you may have to do the loop in awk. See here.
You can try the following:
#!/usr/bin/env bash
# Determine directories to compare
d1='./someDir1'
d2='./someDir2'
# Loop over the file lists and diff corresponding files
while IFS= read -r line; do
# Split the 3-column `comm` output into indiv. variables.
lineNoTabs=${line//$'\t'}
numTabs=$(( ${#line} - ${#lineNoTabs} ))
d1Only='' d2Only='' common=''
case $numTabs in
0)
d1Only=$lineNoTabs
;;
1)
d2Only=$lineNoTabs
;;
*)
common=$lineNoTabs
;;
esac
# If a file exists in both directories, compare them,
# and exit if they differ, continue otherwise
if [[ -n $common ]]; then
diff -q "$d1/$common" "$d2/$common" || {
echo "EXITING: Diff found: '$common'" 1>&2;
exit 1; }
# Deal with files unique to either directory.
elif [[ -n $d1Only ]]; then # fie
echo "File '$d1Only' only in '$d1'."
else # implies: if [[ -n $d2Only ]]; then
echo "File '$d2Only' only in '$d2."
fi
# Note: The `comm` command below is CASE-SENSITIVE, which means:
# - The input directories must be specified case-exact.
# To change that, add `I` after the last `|` in _both_ `sed commands`.
# - The paths and names of the files diffed must match in case too.
# To change that, insert `| tr '[:upper:]' '[:lower:]' before _both_
# `sort commands.
done < <(comm \
<(find "$d1" -type f | sed 's|'"$d1/"'||' | sort) \
<(find "$d2" -type f | sed 's|'"$d2/"'||' | sort))
The approach is based on building a list of files (using find) containing relative paths (using sed to remove the root path) for each input directory, sorting the lists, and comparing them with comm, which produces 3-column, tab-separated output to indicated which lines (and therefore files) are unique to the first list, which are unique to the second list, and which lines they have in common.
Thus, the values in the 3rd column can be diffed and action taken if they're not identical.
Also, the 1st and 2nd-column values can be used to take action based on unique files.
The somewhat complicated splitting of the 3 column values output by comm into individual variables is necessary, because:
read will treat multiple tabs in sequence as a single separator
comm outputs a variable number of tabs; e.g., if there's only a 1st-column value, no tab is output at all.
I got a solution to this thanks to #bazzargh.
I use this code in my script and now it works perfectly.
for file in $(find ${intfolder} -type f);
do if [[ $(diff -q $file ${file/#${intfolder}/${EXPANDEDROOT}/${runid}/$(basename ${intfolder})}) ]] 2> ${resultfile}.tmp;
then echo differs: $file > ${resultfile}.tmp 2>&1; break;
elif [[ -s ${resultfile}.tmp ]];
then echo differs: $file >> ${resultfile}.tmp 2>&1; break;
else echo same: $file > /dev/null;
fi; done
thanks!

Create new file but add number if filename already exists in bash

I found similar questions but not in Linux/Bash
I want my script to create a file with a given name (via user input) but add number at the end if filename already exists.
Example:
$ create somefile
Created "somefile.ext"
$ create somefile
Created "somefile-2.ext"
The following script can help you. You should not be running several copies of the script at the same time to avoid race condition.
name=somefile
if [[ -e $name.ext || -L $name.ext ]] ; then
i=0
while [[ -e $name-$i.ext || -L $name-$i.ext ]] ; do
let i++
done
name=$name-$i
fi
touch -- "$name".ext
Easier:
touch file`ls file* | wc -l`.ext
You'll get:
$ ls file*
file0.ext file1.ext file2.ext file3.ext file4.ext file5.ext file6.ext
To avoid the race conditions:
name=some-file
n=
set -o noclobber
until
file=$name${n:+-$n}.ext
{ command exec 3> "$file"; } 2> /dev/null
do
((n++))
done
printf 'File is "%s"\n' "$file"
echo some text in it >&3
And in addition, you have the file open for writing on fd 3.
With bash-4.4+, you can make it a function like:
create() { # fd base [suffix [max]]]
local fd="$1" base="$2" suffix="${3-}" max="${4-}"
local n= file
local - # ash-style local scoping of options in 4.4+
set -o noclobber
REPLY=
until
file=$base${n:+-$n}$suffix
eval 'command exec '"$fd"'> "$file"' 2> /dev/null
do
((n++))
((max > 0 && n > max)) && return 1
done
REPLY=$file
}
To be used for instance as:
create 3 somefile .ext || exit
printf 'File: "%s"\n' "$REPLY"
echo something >&3
exec 3>&- # close the file
The max value can be used to guard against infinite loops when the files can't be created for other reason than noclobber.
Note that noclobber only applies to the > operator, not >> nor <>.
Remaining race condition
Actually, noclobber does not remove the race condition in all cases. It only prevents clobbering regular files (not other types of files, so that cmd > /dev/null for instance doesn't fail) and has a race condition itself in most shells.
The shell first does a stat(2) on the file to check if it's a regular file or not (fifo, directory, device...). Only if the file doesn't exist (yet) or is a regular file does 3> "$file" use the O_EXCL flag to guarantee not clobbering the file.
So if there's a fifo or device file by that name, it will be used (provided it can be open in write-only), and a regular file may be clobbered if it gets created as a replacement for a fifo/device/directory... in between that stat(2) and open(2) without O_EXCL!
Changing the
{ command exec 3> "$file"; } 2> /dev/null
to
[ ! -e "$file" ] && { command exec 3> "$file"; } 2> /dev/null
Would avoid using an already existing non-regular file, but not address the race condition.
Now, that's only really a concern in the face of a malicious adversary that would want to make you overwrite an arbitrary file on the file system. It does remove the race condition in the normal case of two instances of the same script running at the same time. So, in that, it's better than approaches that only check for file existence beforehand with [ -e "$file" ].
For a working version without race condition at all, you could use the zsh shell instead of bash which has a raw interface to open() as the sysopen builtin in the zsh/system module:
zmodload zsh/system
name=some-file
n=
until
file=$name${n:+-$n}.ext
sysopen -w -o excl -u 3 -- "$file" 2> /dev/null
do
((n++))
done
printf 'File is "%s"\n' "$file"
echo some text in it >&3
Try something like this
name=somefile
path=$(dirname "$name")
filename=$(basename "$name")
extension="${filename##*.}"
filename="${filename%.*}"
if [[ -e $path/$filename.$extension ]] ; then
i=2
while [[ -e $path/$filename-$i.$extension ]] ; do
let i++
done
filename=$filename-$i
fi
target=$path/$filename.$extension
Use touch or whatever you want instead of echo:
echo file$((`ls file* | sed -n 's/file\([0-9]*\)/\1/p' | sort -rh | head -n 1`+1))
Parts of expression explained:
list files by pattern: ls file*
take only number part in each line: sed -n 's/file\([0-9]*\)/\1/p'
apply reverse human sort: sort -rh
take only first line (i.e. max value): head -n 1
combine all in pipe and increment (full expression above)
Try something like this (untested, but you get the idea):
filename=$1
# If file doesn't exist, create it
if [[ ! -f $filename ]]; then
touch $filename
echo "Created \"$filename\""
exit 0
fi
# If file already exists, find a similar filename that is not yet taken
digit=1
while true; do
temp_name=$filename-$digit
if [[ ! -f $temp_name ]]; then
touch $temp_name
echo "Created \"$temp_name\""
exit 0
fi
digit=$(($digit + 1))
done
Depending on what you're doing, replace the calls to touch with whatever code is needed to create the files that you are working with.
This is a much better method I've used for creating directories incrementally.
It could be adjusted for filename too.
LAST_SOLUTION=$(echo $(ls -d SOLUTION_[[:digit:]][[:digit:]][[:digit:]][[:digit:]] 2> /dev/null) | awk '{ print $(NF) }')
if [ -n "$LAST_SOLUTION" ] ; then
mkdir SOLUTION_$(printf "%04d\n" $(expr ${LAST_SOLUTION: -4} + 1))
else
mkdir SOLUTION_0001
fi
A simple repackaging of choroba's answer as a generalized function:
autoincr() {
f="$1"
ext=""
# Extract the file extension (if any), with preceeding '.'
[[ "$f" == *.* ]] && ext=".${f##*.}"
if [[ -e "$f" ]] ; then
i=1
f="${f%.*}";
while [[ -e "${f}_${i}${ext}" ]]; do
let i++
done
f="${f}_${i}${ext}"
fi
echo "$f"
}
touch "$(autoincr "somefile.ext")"
without looping and not use regex or shell expr.
last=$(ls $1* | tail -n1)
last_wo_ext=$($last | basename $last .ext)
n=$(echo $last_wo_ext | rev | cut -d - -f 1 | rev)
if [ x$n = x ]; then
n=2
else
n=$((n + 1))
fi
echo $1-$n.ext
more simple without extension and exception of "-1".
n=$(ls $1* | tail -n1 | rev | cut -d - -f 1 | rev)
n=$((n + 1))
echo $1-$n.ext

Resources