Bash loop to compare files - linux

I'm obviously missing something simply, and know the problem is that it's creating a blank output which is why it can't compare. However if someone could shed some light on this it would be great - I haven't isolated it.
Ultimately, I'm trying to compare the md5sum from a list stored in a txt file, to that stored on the server. If errors, I need it to report that. Here's the output:
root#vps [~/testinggrounds]# cat md5.txt | while read a b; do
> md5sum "$b" | read c d
> if [ "$a" != "$c" ] ; then
> echo "md5 of file $b does not match"
> fi
> done
md5 of file file1 does not match
md5 of file file2 does not match
root#vps [~/testinggrounds]# md5sum file*
2a53da1a6fbfc0bafdd96b0a2ea29515 file1
bcb35cddc47f3df844ff26e9e2167c96 file2
root#vps [~/testinggrounds]# cat md5.txt
2a53da1a6fbfc0bafdd96b0a2ea29515 file1
bcb35cddc47f3df844ff26e9e2167c96 file2

Not directly answering your question, but md5sum(1):
-c, --check
read MD5 sums from the FILEs and check them
Like:
$ ls
1.txt 2.txt md5.txt
$ cat md5.txt
d3b07384d113edec49eaa6238ad5ff00 1.txt
c157a79031e1c40f85931829bc5fc552 2.txt
$ md5sum -c md5.txt
1.txt: OK
2.txt: OK

The problem that you are having is that your inner read is executed in a subshell. In bash, a subshell is created when you pipe a command. Once the subshell exits, the variables $c and $d are gone. You can use process substitution to avoid the subshell:
while read -r -u3 sum filename; do
read -r cursum _ < <(md5sum "$filename")
if [[ $sum != $cursum ]]; then
printf 'md5 of file %s does not match\n' "$filename"
fi
done 3<md5.txt
The redirection 3<md5.txt causes the file to be opened as file descriptor 3. The -u 3 option to read causes it to read from that file descriptor. The inner read still reads from stdin.

I'm not going to argue. I simply try to avoid double read from inside loops.
#! /bin/bash
cat md5.txt | while read sum file
do
prev_sum=$(md5sum $file | awk '{print $1}')
if [ "$sum" != "$prev_sum" ]
then
echo "md5 of file $file does not match"
else
echo "$file is fine"
fi
done

Related

Looping through a set of files linux based on filepath

I have directory with lots of compressed data with with a couple of file names. I have two file types als.sumstats.lmm.chr and als.sumstats.meta.chr. After chr there is a number 1-22. I want to loop through only the als.sumstats.meta.chr. However, my code is not working. I keep getting gzip: /ALSsummaryGWAS/Summary_Statistics_GWAS_2016/als.sumstats.meta.chr*.txt.gz: no such file or directory, suggesting my files are not being found with my loop. Can someone help. This is what I have right now.
#!/bin/bash
FILES=/ALSsummaryGWAS/Summary_Statistics_GWAS_2016/als.sumstats.meta.chr*.txt.gz
for f in $FILES;
do
echo "$FILES"
echo "extracting columns 2,1,3,9"
gunzip -c $f | awk '{print $2, $1, $3, $14+$15}' >> ALSGWAS.txt
done
In your script snippet, wildcard '*' pattern is stored as a string in the $FILES variable which needs to be evaluated at some point to get the list of matching files.
In order to evaluate it, you can use eval like this:
FILES="ls -1 /ALSsummaryGWAS/Summary_Statistics_GWAS_2016/als.sumstats.meta.chr*.txt.gz"
for f in $(eval $FILES);
do
echo "$FILES"
echo "processing $f"
echo "extracting columns 2,1,3,9"
gunzip -c $f | awk '{print $2, $1, $3, $14+$15}' >> ALSGWAS.txt
done
But eval is not a recommended way to do such operations (eval is dangerous), so you can try it like this:
FILES=$(ls -1 /ALSsummaryGWAS/Summary_Statistics_GWAS_2016/als.sumstats.meta.chr*.txt.gz)
for f in $FILES;
do
echo "$FILES"
echo "processing $f"
echo "extracting columns 2,1,3,9"
gunzip -c $f | awk '{print $2, $1, $3, $14+$15}' >> ALSGWAS.txt
done

Linux - script to copy a line from one file to multiple other files

I'm stuck with this task:
I have a textfile 1.txt where there is 1 variable in each line. I have a textfile 2.txt where I want to replace line 3 with the variable of 1.txt and save it under a directory which has the same name as the variable. My idea was this:
!/bin/bash
for i in `cat 1.txt`;
do awk '{ if (NR == 3) print $i; else print $0}' 2.txt > "/$i/2.txt";
done
The last part works, I get the file in the expected folder. But it is always the same file just copied, not modified.
Any help appreciated
Edit: to make it more clear, my 1.txt. contains data like:
variable1
variable2
variable3
each in one line.
I now want to edit a file 2.txt, insert variable1 in line 3 and save it to /variable1/2.txt
then again open file 2.txt, insert variable2 in line 3 and save it to /variable2/2.txt
and so on....
hope that makes it more clear ;)
Not sure I quite follow what you are doing but the following code is one way to do what you describe:
#!/bin/bash
while IFS= read -r variable; do
mkdir "$variable"
echo -e "\n\n$variable" > "${variable}/2.txt"
done < 1.txt
The output is:
$ cat variable?/2.txt
variable1
variable2
variable3
EDIT: after reading the comment on the solution by #Jetchisel a similar solution using vims ex command is:
#!/bin/bash
while IFS= read -r variable; do
ex "+normal! 2G" "+normal o${variable}" "+wq" "${variable}/2.txt"
done < 1.txt
Here is how I understood your question.
while IFS= read -r lines; do
[[ ! -d "/$lines" ]] && { mkdir -p "/$lines" || exit; }
printf '%s\n' '3c' "$lines" . ,p Q | ed -s 2.txt > "/$lines/2.txt"
done < 1.txt

How to create files with content, from a list

Trying to figure out how to create files(.strm) with specific content using a file list.
Example:
List.txt(content)
American Gangster (2007).strm,http://xx.x.xxx.xxx:8000/movie/xxxx/xxx/2017
Mi movie trailer (2019).strm,http://xx.x.xxx.xxx:8000/movie/xxxx/xxx/123
Peter again (2020).strm,http://xx.x.xxx.xxx:8000/movie/xxxx/xxx/5684
etc.
I was able to create the .strm files using the below script:
#!/bin/bash
filename="path/list.txt"
while IFS= read -r line
do
# echo "$line"
touch "$line"
done < "$filename"`
but this leaves me with empty files, how can I read and append the content?
Desire output:
Filename: AmericanGangster(2007).strm
Inside the file: http://xx.x.xxx.xxx:8000/movie/xxxx/xxx/2017
Thanks
You need to put the contents of the file to the file names that you created. Try
filename=path/list.txt
while IFS= read -r line; do
if [[ -e $line ]]; then ##: The -e tests if the file does exists.
echo "${line##*,}" > "${line%,*}"
fi
done < "$filename"
Since there can't be a filename with a '/' in it the -e is not needed.
filename=path/list.txt
while IFS= read -r line; do
echo "${line#*,}" > "${line%,*}"
done < "$filename"
It can be done by setting IFS to IFS=,
while IFS=, read -r file contents; do
echo "$contents" > "$file"
done < "$filename"
... Or awk.
awk -F, '{print $2 > $1}' "$filename"
The , must not appear anywhere except where it is now, otherwise all solution will break.

MD5 comparison between two text files

I just started learning Linux shell scripting. I have to compare this two files in Linux shell scripting for version control example :
file1.txt
275caa62391ff4f3096b1e8a4975de40 apple
awd6s54g64h6se4h6se45wahae654j6 ball
e4rby1s6y4653a46h153a41bqwa54tvi cat
r53aghe4354hr35a4hr65a46eeh5j45ro castor
file2.txt
275caa62391ff4f3096b1e8a4975de40 apple
js65fg4a64zgr65f4w65ea465fa65gh7 ball
wroghah4a65ejdtse5z4g6sa7H658aw7 candle
wagjh54hr5ae454zrwrh354aha4564re castor
How to sort this text files in newly added(one which is added in file 2 but not in file 1) ,deleted(one which is deleted in file 2 but not in file 1) and changed files (have same name but different checksum) ?
I tried using diff , bcompare , vimdiff but I am not getting a proper output as a text file.
Thanks in advance
I don't know if such a command exist, but I've taken the liberty to write you a sorting mechanism in Bash. Although it's optimised, I suggest you recreate it in a language of your own choice.
#! /bin/bash
# Sets the array delimiter to a newline
IFS=$'\n'
# If $1 is empty, default to 'file1.txt'. Same for $2.
FILE1=${1:-file1.txt}
FILE2=${2:-file2.txt}
DELETED=()
ADDED=()
CHANGED=()
# Loop over array $1 and print content
function array_print {
# -n creates a "pointer" to an array. This
# way you can pass large arrays to functions.
local -n array=$1
echo "$1: "
for i in "${array}"; do
echo $i
done
}
# This function loops over the entries in file_in and checks
# if they exist in file_tst. Unless doubles are found, a
# callback is executed.
function array_sort {
local file_in="$1"
local file_tst="$2"
local callback=${3:-true}
local -n arr0=$4
local -n arr1=$5
while read -r line; do
tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
hit=$(grep $tst_name $file_tst)
# If found, skip. Nothing is changed.
[[ $hit != $line ]] || continue
# Run callback
$callback "$hit" "$line" arr0 arr1
done < "$file_in"
}
# If tst is empty, line will be added to not_found. For file 1 this
# means that file doesn't exist in file2, thus is deleted. Otherwise
# the file is changed.
function callback_file1 {
local tst=$1
local line=$2
local -n not_found=$3
local -n found=$4
if [[ -z $tst ]]; then
not_found+=($line)
else
found+=($line)
fi
}
# If tst is empty, line will be added to not_found. For file 2 this
# means that file doesn't exist in file1, thus is added. Since the
# callback for file 1 already filled all the changed files, we do
# nothing with the fourth parameter.
function callback_file2 {
local tst=$1
local line=$2
local -n not_found=$3
if [[ -z $tst ]]; then
not_found+=($line)
fi
}
array_sort "$FILE1" "$FILE2" callback_file1 DELETED CHANGED
array_sort "$FILE2" "$FILE1" callback_file2 ADDED CHANGED
array_print ADDED
array_print DELETED
array_print CHANGED
exit 0
Since it might be hard to understand the code above, I've written it out. I hope it helps :-)
while read -r line; do
tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
hit=$(grep $tst_name $FILE2)
# If found, skip. Nothing is changed.
[[ $hit != $line ]] || continue
# If name does not occur, it's deleted (exists in
# file1, but not in file2)
if [[ -z $hit ]]; then
DELETED+=($line)
else
# If name occurs, it's changed. Otherwise it would
# not come here due to previous if-statement.
CHANGED+=($line)
fi
done < "$FILE1"
while read -r line; do
tst_hash=$(grep -Eo '^[^ ]+' <<< "$line")
tst_name=$(grep -Eo '[^ ]+$' <<< "$line")
hit=$(grep $tst_name $FILE1)
# If found, skip. Nothing is changed.
[[ $hit != $line ]] || continue
# If name does not occur, it's added. (exists in
# file2, but not in file1)
if [[ -z $hit ]]; then
ADDED+=($line)
fi
done < "$FILE2"
Files which are only in file1.txt:
awk 'NR==FNR{a[$2];next} !($2 in a)' file2.txt file1.txt > only_in_file1.txt
Files which are only in file2.txt:
awk 'NR==FNR{a[$2];next} !($2 in a)' file1.txt file2.txt > only_in_file2.txt
Then something like this answer:
awk compare columns from two files, impute values of another column
e.g:
awk 'FNR==NR{a[$1]=$1;next}{print $0,a[$1]?a[$2]:"NA"}' file2.txt file1.txt | grep NA | awk '{print $1,$2}' > md5sdiffer.txt
You'll need to come up with how you want to present these though.
There might be a more elegant way to loop though the final example (as opposed to finding those with NA and then re-filtering), however it's still enough to go off

How to return string count in multiple files in Linux

I have multiple xml files and I want to count some string in it.
How to return string count with files names in Linux?
The string I want to count InvoıceNo:
Result will be;
test.xml InvoiceCount:2
test1.xml InvoiceCount:5
test2.xml InvoiceCount:10
You can probably use the following code
PATTERN=InvoiceNo
for file in *.xml
do
count=$(grep -o $PATTERN "$file" | wc -l)
echo "$file" InvoiceCount:$count
done
Output
test.xml InvoiceCount:1
test1.xml InvoiceCount:2
test2.xml InvoiceCount:3
Refered from: https://unix.stackexchange.com/questions/6979/count-total-number-of-occurrences-using-grep
Following awk may help you in same, since you haven't shown any sample Inputs so not tested it.
awk 'FNR==1{if(count){print value,"InvoiceCount:",count;count=""};value=FILENAME;close(value)} /InvoiceCount/{count++}' *.xml
Use grep -c to get the count of matching lines
for file in *.xml ; do
count=$(grep -c $PATTERN $file)
if [ $count -gt 0 ]; then
echo "$file $PATTERN: $count"
fi
done
First the test files:
$ cat foo.xml
InvoiceCount InvoiceCount
InvoiceCount
$ cat bar.xml
InvoiceCount
The GNU awk using gsub for counting:
$ awk '{
c+=gsub(/InvoiceCount/,"InvoiceCount")
}
ENDFILE {
print FILENAME, "InvoiceCount: " c
c=0
}' foo.xml bar.xml
foo.xml InvoiceCount: 3
bar.xml InvoiceCount: 1
A little shell skript will do want you want
#!/bin/bash
for file in *
do
awk '{count+=gsub(" InvoıceNo","")}
END {print FILENAME, "InvoiceCount:" count}' $file
done
Put the code in a file (e.g. counter.sh) and run it like this:
counter.sh text.xml text1.xml text2.xml

Resources