Compare a list of strings in Bash

Compare a list of strings in Bash - linux

I have a list of rpm files in a file called rpmlist.txt which I have to compare with another list, newlist.txt, and see if they are same in Bash. For example, this is my requirement:
files inside rpmlist.txt
bash-4.4-9.10.1_x86_64
binutils-2.32-7.8.1_x86_64
bison-3.0.4-1.268_x86_64
files inside newlist.txt
bash-5.4-9.10.1_x86_64
binutils-2.32-7.8.1_x86_64
bison-6.0.4-1.268_x86_64
And print if they are matching or not. Any help would be appreciated

Try with:
#!/bin/bash
# Load files into arrays
readarray source_list < rpmlist.txt
readarray target_list < newlist.txt
# Check files size
source_size=${#source_list[#]}
target_size=${#target_list[#]}
if [ ${source_size} -ne ${target_size} ]; then
echo "File lines count not matching!" >&2
exit 1
fi
# Enum files
for (( i=0; i < ${source_size}; i++ )); do
# Get file name
source_file=${source_list[$i]}
target_file=${target_list[$i]}
# Remove CR/LF
source_file=$(echo "${source_file}" | sed 's:\r$::')
target_file=$(echo "${target_file}" | sed 's:\r$::')
# Check if files exist
if [ ! -f ${source_file} ] || [ ! -f ${target_file} ]; then
echo "Source and/or Target does not exist." >&2
exit 2
fi
# Compare files
diff -q "${source_file}" "${target_file}"
done
PS: I tested it and it works.
Edit (1)
Based on comments, I think you should replace my script with the following simple command:
cat rpmlist.txt | xargs -I "{}" grep "{}" newlist.txt
Edit (2) - Unmatched list
cat rpmlist.txt | xargs -I "{}" grep -v "{}" newlist.txt

This will cross-compare an arbitrary number of such files.
#!/bin/bash
set -e
declare -ar list_names=("$#")
declare -Ai "${list_names[#]}"
for list in "${list_names[#]}"; do
declare -n set="$list"
while IFS= read -r line; do
((++set["$line"]))
done < "${list}.txt"
done
compare_lists() {
local -rn set1="$1"
local -rn set2="$2"
local name
echo "Lines in ${1}, but not in ${2}:"
for name in "${!set1[#]}"; do
((set2["${name}"])) || printf ' %s\n' "$name"
done
}
declare -i idx jdx
for ((idx = 0; idx < ${#list_names[#]}; ++idx)); do
for ((jdx = idx + 1; jdx < ${#list_names[#]}; ++jdx)); do
compare_lists "${list_names[idx]}" "${list_names[jdx]}"
compare_lists "${list_names[jdx]}" "${list_names[idx]}"
done
done
Example (when the above script is called listdiff.sh):
$ ./listdiff.sh rpmlist newlist
Lines in rpmlist, but not in newlist:
bison-3.0.4-1.268_x86_64
bash-4.4-9.10.1_x86_64
Lines in newlist, but not in rpmlist:
bison-6.0.4-1.268_x86_64
bash-5.4-9.10.1_x86_64
And it can get more arguments than 2.

Related

How can I iterate over some file names returned from ls command?

Hello I am trying to write a simple bash script,
the problem I am having is that I want to iterate through the 4th,5th,6th and the 7th files returned from the ls command and then to check if these files have write or read permissions and if they do so, to do a simple copy of these files to another directory. What I've done so far is to check if they have the permissions needed and then if they do, to copy them to the /tmp directory.
My solution:
#!/bin/sh
for f in *
do
if [ -w "$f" ] || [ -r "$f" ]; then
cp $f /tmp
fi
done
The way to get the 4th to 7th file names is through ls | head -7 | tail -4, but how can I iterate specifically though these files names only? Basically how can I make it so the * list in the for loop can be these 4 file names?
Thank you very much.

Do not try to parse the output of ls. See Why you shouldn't parse the output of ls
Using globbing, store filenames into an array and iterate through indices from 3 to 6 (both inclusive). Notice that array indices begin from zero in bash. For example, using a C style for loop
#!/bin/bash
files=(*)
for ((i = 3; i <= 6; ++i)); do
if [[ -r ${files[i]} || -w ${files[i]} ]]; then
cp "${files[i]}" /tmp
fi
done
or using a traditional for loop with array slices:
files=(*)
for file in "${files[#]:3:4}"; do
if [[ -r $file || -w $file ]]; then
cp "$file" /tmp
fi
done
Alternatively, as mentioned in the comments, a counter can be implemented to skip the first 3 files and break the loop after the 7th iteration, without resorting to an array:
itr=0 # iteration count
for file in *; do
((++itr < 4)) && continue
((itr > 7)) && break
[[ -r $file || -w $file ]] && cp "$file" /tmp
done

If this ls | head -7 | tail -4 works :
then iterate by ls command result:
ls | head -7 | tail -4 | while read line
do
if [ -w "$line" ] || [ -r "$line" ]; then
cp $line /tmp
fi
done

Bash- Count Newlines via terminal command

i'm trying to make a bash script that counts the newlines in an input. The first if statement (switch $0) works fine but the problem I'm having is trying to get it to read the WC of a file in a terminal argument.
e.g.
~$ ./script.sh
1
2
3
4
(User presses CTRL+D)
display word count here # answer is 5 - works fine
e.g.
~$ .script1.sh < script1.sh
WC here -(5)
~$ succesfully redirects the stdin from a file
but
e.g.
~$ ./script1.sh script1.sh script2.sh
WC displayed here for script1.sh
WC displayed here for script2.sh
NOTHING
~$
the problem I believe is the second if statement, instead of executing the script in the terminal it goes to the if statement and waits for a user input and its not giving back the echo statement.
Any help would be greatly appreciated since I cannot figure out why it won't work without the ~$ < operator.
#!/bin/bash
#!/bin/sh
read filename ## read provided filename
USAGE="Usage: $0 $1 $2..." ## switch statement
if [ "$#" == "0" ]; then
declare -i lines=0 words=0 chars=0
while IFS= read -r line; do
((lines++))
array=($line)
((words += ${#array[#]}))
((chars += ${#line} + 1)) # add 1 for the newline
done < /dev/stdin
fi
echo "$lines $words $chars $filename" ## filename doesn't print, just filler
### problem if statement####
if [ "$#" != "0" ]; then # space between [] IS VERY IMPORTANT
declare -i lines=0 words=0 chars=0
while IFS= read -r line; do
lines=$( grep -c '\n'<"filename") ##should use grep -c to compare only new lines in the filename. assign to variable line
words=$( grep -c '\n'<"filename")
chars=$( grep -c '\n'<"filename")
echo "$lines $words $chars"
#lets user press CTRL+D to end script and count the WC
fi

#!/bin/sh
set -e
if [ -t 0 ]; then
# We are *not* reading stdin from a pipe or a redirection.
# Get the counts from the files specified on the cmdline
if [ "$#" -eq 0 ]; then
echo "no files specified" >&2
exit 1
fi
cat "$#" | wc
else
# stdin is attached to a pipe or redirected from a file
wc
fi | { read lines words chars; echo "lines=$lines words=$words chars=$chars"; }
The variables from the read command only exist within the braces, due to the way the shell (some shells anyway) use subshells for commands in a pipeline. Typically, the solution for that is to redirect from a process substitution (bash/ksh).
This can be squashed down to
#!/bin/bash
[[ -t 0 ]] && files=true || files=false
read lines words chars < <({ ! $files && cat || cat "$#"; } | wc)
echo "lines=$lines words=$words chars=$chars"
a very quick demo of cmd | read x versus read x < <(cmd)
$ x=foo; echo bar | read x; echo $x
foo
$ x=foo; read x < <(echo bar); echo $x
bar

Use wc.
Maybe the simplest is to replace the second if block with a for.
$: cat tst
#! /bin/env bash
declare -i lines=0 words=0 chars=0
case "$#" in
0) wc ;;
*) for file in $*
do read lines words chars x <<< "$( wc $file )"
echo "$lines $words $chars $file"
done ;;
esac
$: cat foo
hello
world
and
goodbye cruel world!
$: tst < foo
6 6 40
$: tst foo tst
6 6 40 foo
9 38 206 tst

How to assign all the files listed in a directory to a filelist using shell script?

I have a requirement to compare files available in a directory and the below code is working fine only when I mention the file names (file1 file2 file3 file4) individually to the filelist
#!/bin/bash
filelist=(file1 file2 file3 file4)
# Outer for loop
for (( i=0; i<${#filelist[#]} ; i+=1 )) ; do
# Inner for loop
for (( j=i+1; j<${#filelist[#]} ; j+=1 )) ; do
echo "Unique between ${filelist[i]}" "${filelist[j]}" > unique${filelist[i]}${filelist[j]}.txt
echo -e "Unique in ${filelist[i]}" >> unique${filelist[i]}${filelist[j]}.txt
# Will produce unique lines in 'file i' when comparing 'file i' and 'file j'
join -v 1 <(sort ${filelist[i]}) <(sort ${filelist[j]}) >> unique${filelist[i]}${filelist[j]}.txt
echo -e "Unique in ${filelist[j]}" >> unique${filelist[i]}${filelist[j]}.txt
# Will produce unique lines in 'file j' when comparing 'file i' and 'file j'
join -v 2 <(sort ${filelist[i]}) <(sort ${filelist[j]}) >> unique${filelist[i]}${filelist[j]}.txt
done
done
But what exactly I want is to assign the files that are available in a directory to the filelist directly. Any working solution please?

try this code:
#!/bin/bash
#set basedir to . if no dir is given
if [ -n "$1" ]; then basedir=$1; else basedir=$(pwd);fi
filelist=($(/bin/ls $basedir))
# Outer for loop
for (( i=0; i<${#filelist[#]} ; i+=1 )) ; do
# Inner for loop
for (( j=i+1; j<${#filelist[#]} ; j+=1 )) ; do
echo "Unique between $basedir/${filelist[i]}" "$basedir${filelist[j]}" > $basedir/unique${filelist[i]}${filelist[j]}.txt
echo -e "Unique in $basedir/${filelist[i]}" >> $basedir/unique${filelist[i]}${filelist[j]}.txt
# Will produce unique lines in 'file i' when comparing 'file i' and 'file j'
join -v 1 <(sort $basedir/${filelist[i]}) <(sort $basedir/${filelist[j]}) >> $basedir/unique${filelist[i]}${filelist[j]}.txt
echo -e "Unique in $basedir/${filelist[j]}" >> $basedir/unique${filelist[i]}${filelist[j]}.txt
# Will produce unique lines in 'file j' when comparing 'file i' and 'file j'
join -v 2 <(sort $basedir/${filelist[i]}) <(sort $basedir/${filelist[j]}) >> $basedir/unique${filelist[i]}${filelist[j]}.txt
done
done
The folder should not contain directories. But its working ether.
The unique files will created in folder given to the script.

Assuming your script work on current directory, use this:-
filelist=(*)
It will get all files in directory via globbing. Don't use ls.
All the answers here except mine are wrong. (Out of first 3 till now.)
Simple test:
mkdir -p /tmp/test && cd test && touch List\ s && filelist=($(/bin/ls)) && shuf -e "${filelist[#]} | wc -l
mkdir -p /tmp/test && cd test && touch List\ s && filelist=(*) && shuf -e "${filelist[#]} | wc -l
It should give 1 in both cases. But it won't.

If your running the script from current directory you can use following code
fileArr=( $(ls) ) #list the files and store in array
echo ${#fileArr[#]}
echo "${fileArr[#]}"
for i in "${fileArr[#]}"
do
echo $i #read the fil
done
Edited : Without using ls
shopt -s nullglob
fileArr1=(*)
fileArr2=(file*)
fileArr3=(dir/*)
echo ${#fileArr1[#]}
echo "${fileArr1[#]}"
for i in "${fileArr1[#]}"
do
echo $i
done
echo ${#fileArr2[#]}
echo "${fileArr2[#]}"
for i in "${fileArr2[#]}"
do
echo $i
done
echo ${#fileArr3[#]}
echo "${fileArr3[#]}"
for i in "${fileArr3[#]}"
do
echo $i
done

Running diff and have it stop on a difference

I have a script running that is checking multiples directories and comparing them to expanded tarballs of the same directories elsewhere.
I am using diff -r -q and what I would like is that when diff finds any difference in the recursive run it will stop running instead of going through more directories in the same run.
All help appreciated!
Thank you
#bazzargh I did try it like you suggested or like this.
for file in $(find $dir1 -type f);
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]];
then echo differs: $file > /tmp/$runid.tmp 2>&1; break;
else echo same: $file > /dev/null; fi; done
But this only works with files that exist in both directories. If one file is missing I won't get information about that. Also the directories I am working with have over 300.000 files so it seems to be a bit of overhead to do a find for each file and then diff.
I would like something like this to work, with and elif statement that checks if $runid.tmp contains data and breaks if it does. I added 2> after the first if statement so stderr is sent to the $runid.tmp file.
for file in $(find $dir1 -type f);
do if [[ $(diff -q $file ${file/#$dir1/$dir2}) ]] 2> /tmp/$runid.tmp;
then echo differs: $file > /tmp/$runid.tmp 2>&1; break;
elif [[ -s /tmp/$runid.tmp ]];
then echo differs: $file >> /tmp/$runid.tmp 2>&1; break;
else echo same: $file > /dev/null; fi; done
Would this work?

You can do the loop over files with 'find' and break when they differ. eg for dirs foo, bar:
for file in $(find foo -type f); do if [[ $(diff -q $file ${file/#foo/bar}) ]]; then echo differs: $file; break; else echo same: $file; fi; done
NB this will not detect if 'bar' has directories that do not exist in 'foo'.
Edited to add: I just realised I overlooked the really obvious solution:
diff -rq foo bar | head -n1

It's not 'diff', but with 'awk' you can compare two files (or more) and then exit when they have a different line.
Try something like this (sorry, it's a little rough)
awk '{ h[$0] = ! h[$0] } END { for (k in h) if (h[k]) exit }' file1 file2
Sources are here and here.
edit: to break out of the loop when two files have the same line, you may have to do the loop in awk. See here.

You can try the following:
#!/usr/bin/env bash
# Determine directories to compare
d1='./someDir1'
d2='./someDir2'
# Loop over the file lists and diff corresponding files
while IFS= read -r line; do
# Split the 3-column `comm` output into indiv. variables.
lineNoTabs=${line//$'\t'}
numTabs=$(( ${#line} - ${#lineNoTabs} ))
d1Only='' d2Only='' common=''
case $numTabs in
0)
d1Only=$lineNoTabs
;;
1)
d2Only=$lineNoTabs
;;
*)
common=$lineNoTabs
;;
esac
# If a file exists in both directories, compare them,
# and exit if they differ, continue otherwise
if [[ -n $common ]]; then
diff -q "$d1/$common" "$d2/$common" || {
echo "EXITING: Diff found: '$common'" 1>&2;
exit 1; }
# Deal with files unique to either directory.
elif [[ -n $d1Only ]]; then # fie
echo "File '$d1Only' only in '$d1'."
else # implies: if [[ -n $d2Only ]]; then
echo "File '$d2Only' only in '$d2."
fi
# Note: The `comm` command below is CASE-SENSITIVE, which means:
# - The input directories must be specified case-exact.
# To change that, add `I` after the last `|` in _both_ `sed commands`.
# - The paths and names of the files diffed must match in case too.
# To change that, insert `| tr '[:upper:]' '[:lower:]' before _both_
# `sort commands.
done < <(comm \
<(find "$d1" -type f | sed 's|'"$d1/"'||' | sort) \
<(find "$d2" -type f | sed 's|'"$d2/"'||' | sort))
The approach is based on building a list of files (using find) containing relative paths (using sed to remove the root path) for each input directory, sorting the lists, and comparing them with comm, which produces 3-column, tab-separated output to indicated which lines (and therefore files) are unique to the first list, which are unique to the second list, and which lines they have in common.
Thus, the values in the 3rd column can be diffed and action taken if they're not identical.
Also, the 1st and 2nd-column values can be used to take action based on unique files.
The somewhat complicated splitting of the 3 column values output by comm into individual variables is necessary, because:
read will treat multiple tabs in sequence as a single separator
comm outputs a variable number of tabs; e.g., if there's only a 1st-column value, no tab is output at all.

I got a solution to this thanks to #bazzargh.
I use this code in my script and now it works perfectly.
for file in $(find ${intfolder} -type f);
do if [[ $(diff -q $file ${file/#${intfolder}/${EXPANDEDROOT}/${runid}/$(basename ${intfolder})}) ]] 2> ${resultfile}.tmp;
then echo differs: $file > ${resultfile}.tmp 2>&1; break;
elif [[ -s ${resultfile}.tmp ]];
then echo differs: $file >> ${resultfile}.tmp 2>&1; break;
else echo same: $file > /dev/null;
fi; done
thanks!

Create new file but add number if filename already exists in bash

I found similar questions but not in Linux/Bash
I want my script to create a file with a given name (via user input) but add number at the end if filename already exists.
Example:
$ create somefile
Created "somefile.ext"
$ create somefile
Created "somefile-2.ext"

The following script can help you. You should not be running several copies of the script at the same time to avoid race condition.
name=somefile
if [[ -e $name.ext || -L $name.ext ]] ; then
i=0
while [[ -e $name-$i.ext || -L $name-$i.ext ]] ; do
let i++
done
name=$name-$i
fi
touch -- "$name".ext

Easier:
touch file`ls file* | wc -l`.ext
You'll get:
$ ls file*
file0.ext file1.ext file2.ext file3.ext file4.ext file5.ext file6.ext

To avoid the race conditions:
name=some-file
n=
set -o noclobber
until
file=$name${n:+-$n}.ext
{ command exec 3> "$file"; } 2> /dev/null
do
((n++))
done
printf 'File is "%s"\n' "$file"
echo some text in it >&3
And in addition, you have the file open for writing on fd 3.
With bash-4.4+, you can make it a function like:
create() { # fd base [suffix [max]]]
local fd="$1" base="$2" suffix="${3-}" max="${4-}"
local n= file
local - # ash-style local scoping of options in 4.4+
set -o noclobber
REPLY=
until
file=$base${n:+-$n}$suffix
eval 'command exec '"$fd"'> "$file"' 2> /dev/null
do
((n++))
((max > 0 && n > max)) && return 1
done
REPLY=$file
}
To be used for instance as:
create 3 somefile .ext || exit
printf 'File: "%s"\n' "$REPLY"
echo something >&3
exec 3>&- # close the file
The max value can be used to guard against infinite loops when the files can't be created for other reason than noclobber.
Note that noclobber only applies to the > operator, not >> nor <>.
Remaining race condition
Actually, noclobber does not remove the race condition in all cases. It only prevents clobbering regular files (not other types of files, so that cmd > /dev/null for instance doesn't fail) and has a race condition itself in most shells.
The shell first does a stat(2) on the file to check if it's a regular file or not (fifo, directory, device...). Only if the file doesn't exist (yet) or is a regular file does 3> "$file" use the O_EXCL flag to guarantee not clobbering the file.
So if there's a fifo or device file by that name, it will be used (provided it can be open in write-only), and a regular file may be clobbered if it gets created as a replacement for a fifo/device/directory... in between that stat(2) and open(2) without O_EXCL!
Changing the
{ command exec 3> "$file"; } 2> /dev/null
to
[ ! -e "$file" ] && { command exec 3> "$file"; } 2> /dev/null
Would avoid using an already existing non-regular file, but not address the race condition.
Now, that's only really a concern in the face of a malicious adversary that would want to make you overwrite an arbitrary file on the file system. It does remove the race condition in the normal case of two instances of the same script running at the same time. So, in that, it's better than approaches that only check for file existence beforehand with [ -e "$file" ].
For a working version without race condition at all, you could use the zsh shell instead of bash which has a raw interface to open() as the sysopen builtin in the zsh/system module:
zmodload zsh/system
name=some-file
n=
until
file=$name${n:+-$n}.ext
sysopen -w -o excl -u 3 -- "$file" 2> /dev/null
do
((n++))
done
printf 'File is "%s"\n' "$file"
echo some text in it >&3

Try something like this
name=somefile
path=$(dirname "$name")
filename=$(basename "$name")
extension="${filename##*.}"
filename="${filename%.*}"
if [[ -e $path/$filename.$extension ]] ; then
i=2
while [[ -e $path/$filename-$i.$extension ]] ; do
let i++
done
filename=$filename-$i
fi
target=$path/$filename.$extension

Use touch or whatever you want instead of echo:
echo file$((`ls file* | sed -n 's/file\([0-9]*\)/\1/p' | sort -rh | head -n 1`+1))
Parts of expression explained:
list files by pattern: ls file*
take only number part in each line: sed -n 's/file\([0-9]*\)/\1/p'
apply reverse human sort: sort -rh
take only first line (i.e. max value): head -n 1
combine all in pipe and increment (full expression above)

Try something like this (untested, but you get the idea):
filename=$1
# If file doesn't exist, create it
if [[ ! -f $filename ]]; then
touch $filename
echo "Created \"$filename\""
exit 0
fi
# If file already exists, find a similar filename that is not yet taken
digit=1
while true; do
temp_name=$filename-$digit
if [[ ! -f $temp_name ]]; then
touch $temp_name
echo "Created \"$temp_name\""
exit 0
fi
digit=$(($digit + 1))
done
Depending on what you're doing, replace the calls to touch with whatever code is needed to create the files that you are working with.

This is a much better method I've used for creating directories incrementally.
It could be adjusted for filename too.
LAST_SOLUTION=$(echo $(ls -d SOLUTION_[[:digit:]][[:digit:]][[:digit:]][[:digit:]] 2> /dev/null) | awk '{ print $(NF) }')
if [ -n "$LAST_SOLUTION" ] ; then
mkdir SOLUTION_$(printf "%04d\n" $(expr ${LAST_SOLUTION: -4} + 1))
else
mkdir SOLUTION_0001
fi

A simple repackaging of choroba's answer as a generalized function:
autoincr() {
f="$1"
ext=""
# Extract the file extension (if any), with preceeding '.'
[[ "$f" == *.* ]] && ext=".${f##*.}"
if [[ -e "$f" ]] ; then
i=1
f="${f%.*}";
while [[ -e "${f}_${i}${ext}" ]]; do
let i++
done
f="${f}_${i}${ext}"
fi
echo "$f"
}
touch "$(autoincr "somefile.ext")"

without looping and not use regex or shell expr.
last=$(ls $1* | tail -n1)
last_wo_ext=$($last | basename $last .ext)
n=$(echo $last_wo_ext | rev | cut -d - -f 1 | rev)
if [ x$n = x ]; then
n=2
else
n=$((n + 1))
fi
echo $1-$n.ext
more simple without extension and exception of "-1".
n=$(ls $1* | tail -n1 | rev | cut -d - -f 1 | rev)
n=$((n + 1))
echo $1-$n.ext

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Compare a list of strings in Bash - linux

Related

How can I iterate over some file names returned from ls command?

Bash- Count Newlines via terminal command

How to assign all the files listed in a directory to a filelist using shell script?

Running diff and have it stop on a difference

Create new file but add number if filename already exists in bash

Categories

Resources