Shell nested for loop and string comparison - linux

I have two files
file1
104.128.225.208:8000
103.27.24.114:80
104.128.225.208:8000
and file2
103.27.24.114:99999999
103.27.24.114:88888888888
104.128.225.208:8000
103.27.24.114:80
104.128.225.208:8000
and in file2 there are two new lines
103.27.24.114:99999999
103.27.24.114:88888888888
So I want to check if there are new lines in file
for i in $(cat $2)
do
for j in $(cat $1)
do
if [ $i = $j ]; then
echo $i
fi
done
done
/.program file1 file2
but I don't get expected output. I think that my if statement is not working fine. What I'm doing wrong?

Your problem is probably that you are looping over every line in file1 for each line in file2.
The comm utility does what you want, but it assumes both files are sorted.
$ sort file1 -o file1
$ sort file2 -o file2
$ comm -13 file1 file2
103.27.24.114:99999999
103.27.24.114:88888888888

This is what diff is for. Example:
$ diff dat/newdat1.txt dat/newdat2.txt
0a1,2
> 103.27.24.114:99999999
> 103.27.24.114:88888888888
Where newdat1.txt and newdat2.txt are:
104.128.225.208:8000
103.27.24.114:80
104.128.225.208:8000
and
103.27.24.114:99999999
103.27.24.114:88888888888
104.128.225.208:8000
103.27.24.114:80
104.128.225.208:8000
You can simply test the return of diff with or without output depending on the options and your needs. (e.g. if diff -q $file1 $file2 >/dev/null; then echo same; else echo differ; fi)

#!/bin/bash
for n in $(diff file1 file2); do
if [ -z "$firstLineDiscarded" ]; then
firstLineDiscarded=TRUE
elif [ $n != ">" ]; then
echo $n
fi
done
If you're not attached to that particular approach this seems to work.
Of course it breaks down if the input syntax changes (includes spaces in the data), but for this strict application... maybe good enough.

Related

Join 2 files by common column header (without awk/sed)

Basically I want to get all records from file2, but filter out columns whose header doesn't appear in file1
Example:
file1
Name Location
file2
Name Phone_Number Location Email
Jim 032131 xyz xyz#qqq.com
Tim 037903 zzz zzz#qqq.com
Pimp 039141 xxz xxz#qqq.com
Output
Name Location
Jim xyz
Tim zzz
Pimp xxz
Is there a way to do this without awk or sed, but still using coreutils tools? I've tried doing it with join, but couldn't get it working.
ALL_COLUMNS=$(head -n1 file2)
for COLUMN in $(head -n1 file1); do
JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
done
join -a2 -o ${JOIN_FORMAT%?} /dev/null file2
Explanation:
ALL_COLUMNS=$(head -n1 file2)
It saves all the column names to filter next
for COLUMN in $(head -n1 file1); do
JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
done
For every column in file1, we look for the position of the one with the same name in file2 and append it to JOIN_FORMAT in the way of "2.<number_of_column>,"
join -a2 -o ${JOIN_FORMAT%?} /dev/null file2
Once we have the option string complete (2.1,2.3,), we pass it to join removing the last ,.
join prints the unpairable lines from the second file provided (-a2 -> file2), but only the columns specified in the -o option.
Not very efficient, but works for your example:
#!/bin/bash
read -r -a cols < file1
echo "${cols[#]}"
read -r -a header < <(head -n1 file2)
keep=()
for (( i=0; i<${#header}; i++ )) ; do
for c in "${cols[#]}" ; do
if [[ ${header[i]} == "$c" ]] ; then
keep+=($i)
fi
done
done
while read -r -a data ; do
for idx in ${keep[#]} ; do
printf '%s ' "${data[idx]}"
done
printf '\n'
done < <(tail -n+2 file2)
Tools used: head and tail. They aren't essential, though. And bash, of course.

Comparing two files script and finding the unmatched data

I am having two .txt files with data stored in the format
1.txt
ASF001-AS-ST73U12
ASF001-AS-ST92U14
ASF001-AS-ST105U33
ASF001-AS-ST107U20
and
2.txt
ASF001-AS-ST121U21
ASF001-AS-ST130U14
ASF001-AS-ST73U12
ASF001-AS-ST92U14
`
I need to find the files which are in 1.txt but not in 2.txt.
I tried to use
diff -a --suppress-common-lines -y 1.txt 2.txt > finaloutput
but it didn't work
Rather than diff you can use comm here:
comm -23 <(sort 1.txt) <(sort 2.txt)
ASF001-AS-ST105U33
ASF001-AS-ST107U20
Or this awk will also work:
awk 'FNR==NR {a[$1];next} $1 in a{delete a[$1]} END {for (i in a) print i}' 1.txt 2.txt
ASF001-AS-ST107U20
ASF001-AS-ST105U33
A relatively simple bash script can do what you need:
#!/bin/bash
while read line || test -n "$line"; do
grep -q $line "$2" || echo "$line"
done < "$1"
exit 0
output:
$ ./uniquef12.sh dat/1.txt dat/2.txt
ASF001-AS-ST105U33
ASF001-AS-ST107U20

Insert content of a file to another file on Linux

I have two files. I want to insert the content of the first file(file1) to the second file (file2) between some codes (second file is a script). For example the second file should look like this
upcode...
#upcode ends here
file1 content
downcode ...
upcode #upcode ends here and downcode should never change.
How this can be done?
You can try sed:
sed -e '/file1 content/{r file1' -e 'd}' file2
/pattern/: pattern to match line
r file1: read file1
d: delete line
Note: you can add -i option to change file2 inplace.
Here is a script to do that (note that your start tag has to be unique in the file)--
#!/bin/bash
start="what you need"
touch file2.tmp
while read line
do
if [ "$line" = "$start" ]
then
echo "$line" >> file2.tmp
cat file2 >> file2.tmp
fi
echo "$line" >> file2.tmp
done < file1
#mv file2.tmp file1 -- moves (i.e. renames) file2.tmp as file1.
while IFS= read -r f2line; do
echo "$f2line"
[[ "$f2line" = "#upcode ends here" ]] && cat file1
done < file2 > merged_file
or to edit file2 in place
ed file2 <<END
/#upcode ends here/ r file1
w
q
END

insert the contents of a file to another (in a specific line of the file that is sent)-BASH/LINUX

I tried doing it with cat and then after I type the second file I added | head -$line | tail -1 but it doesn't work because it performs cat first.
Any ideas? I need to do it with cat or something else.
I'd probably use sed for this job:
line=3
sed -e "${line}r file2" file1
If you're looking to overwrite file1 and you have GNU sed, add the -i option. Otherwise, write to a temporary file and then copy/move the temporary file over the original, cleaning up as necessary (that's the trap stuff below). Note: copying the temporary over the file preserves links; moving does not (but is swifter, especially if the file is big).
line=3
tmp="./sed.$$"
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
sed -e "${line}r file2" file1 > $tmp
cp $tmp file1
rm -f $tmp
trap 0
Just for fun, and just because we all love ed, the standard editor, here's an ed version. It's very efficient (ed is a genuine text editor)!
ed -s file2 <<< $'3r file1\nw'
If the line number is stored in the variable line then:
ed -s file2 <<< "${line}r file1"$'\nw'
Just to please Zack, here's one version with less bashism, in case you don't like bash (personally, I don't like pipes and subshells, I prefer herestrings, but hey, as I said, that's only to please Zack):
printf "%s\n" "${line}r file1" w | ed -s file2
or (to please Sorpigal):
printf "%dr %s\nw" "$line" file1 | ed -s file2
As Jonathan Leffler mentions in a comment, and if you intend to use this method in a script, use a heredoc (it's usually the most efficient):
ed -s file2 <<EOF
${line}r file1
w
EOF
Hope this helps!
P.S. Don't hesitate to leave a comment if you feel you need to express yourself about the ways to drive ed, the standard editor.
cat file1 >>file2
will append content of file1 to file2.
cat file1 file2
will concatenate file1 and file2 and send output to terminal.
cat file1 file2 >file3
will create or overwite file3 with concatenation of file1 and file2
cat file1 file2 >>file3
will append concatenation of file1 and file2 to end of file3.
Edit:
For trunking file2 before adding file1:
sed -e '11,$d' -i file2 && cat file1 >>file2
or for making a 500 lines file:
n=$((500-$(wc -l <file1)))
sed -e "1,${n}d" -i file2 && cat file1 >>file2
Lots of ways to do it, but I like to to choose a way that involves making tools.
First, setup test environment
rm -rf /tmp/test
mkdir /tmp/test
printf '%s\n' {0..9} > /tmp/test/f1
printf '%s\n' {one,two,three,four,five,six,seven,eight,nine,ten} > /tmp/test/f2
Now let's make the tool, and in this first pass we'll implement it badly.
# insert contents of file $1 into file $2 at line $3
insert_at () { insert="$1" ; into="$2" ; at="$3" ; { head -n $at "$into" ; ((at++)) ; cat "$insert" ; tail -n +$at "$into" ; } ; }
Then run the tool to see the amazing results.
$ insert_at /tmp/test/f1 /tmp/test/f2 5
But wait, the result is on stdout! What about overwriting the original? No problem, we can make another tool for that.
insert_at_replace () { tmp=$(mktemp) ; insert_at "$#" > "$tmp" ; mv "$tmp" "$2" ; }
And run it
$ insert_at_replace /tmp/test/f1 /tmp/test/f2 5
$ cat /tmp/test/f2
"Your implementation sucks!"
I know, but that's the beauty of making simple tools. Let's replace insert_at with the sed version.
insert_at () { insert="$1" ; into="$2" ; at="$3" ; sed -e "${at}r ${insert}" "$into" ; }
And insert_at_replace keeps working (of course). The implementation of insert_at_replace can also be changed to be less buggy, but I'll leave that as an exercise for the reader.
I like doing this with head and tail if you don't mind managing a new file:
head -n 16 file1 > file3 &&
cat file2 >> file3 &&
tail -n+56 file1 >> file3
You can collapse this onto one line if you like. Then, if you really need it to overwrite file1, do: mv file3 file1 (optionally include && between commands).
Notes:
head -n 16 file1 means first 16 lines of file1
tail -n+56 file1 means file1 starting from line 56 to the end
Hence, I actually skipped lines 17 through 55 from file1.
Of course, if you could change 56 to 17 so no lines are skipped.
I prefer to mix simple head and tail commands then try a magic sed command.

Bash loop to compare files

I'm obviously missing something simply, and know the problem is that it's creating a blank output which is why it can't compare. However if someone could shed some light on this it would be great - I haven't isolated it.
Ultimately, I'm trying to compare the md5sum from a list stored in a txt file, to that stored on the server. If errors, I need it to report that. Here's the output:
root#vps [~/testinggrounds]# cat md5.txt | while read a b; do
> md5sum "$b" | read c d
> if [ "$a" != "$c" ] ; then
> echo "md5 of file $b does not match"
> fi
> done
md5 of file file1 does not match
md5 of file file2 does not match
root#vps [~/testinggrounds]# md5sum file*
2a53da1a6fbfc0bafdd96b0a2ea29515 file1
bcb35cddc47f3df844ff26e9e2167c96 file2
root#vps [~/testinggrounds]# cat md5.txt
2a53da1a6fbfc0bafdd96b0a2ea29515 file1
bcb35cddc47f3df844ff26e9e2167c96 file2
Not directly answering your question, but md5sum(1):
-c, --check
read MD5 sums from the FILEs and check them
Like:
$ ls
1.txt 2.txt md5.txt
$ cat md5.txt
d3b07384d113edec49eaa6238ad5ff00 1.txt
c157a79031e1c40f85931829bc5fc552 2.txt
$ md5sum -c md5.txt
1.txt: OK
2.txt: OK
The problem that you are having is that your inner read is executed in a subshell. In bash, a subshell is created when you pipe a command. Once the subshell exits, the variables $c and $d are gone. You can use process substitution to avoid the subshell:
while read -r -u3 sum filename; do
read -r cursum _ < <(md5sum "$filename")
if [[ $sum != $cursum ]]; then
printf 'md5 of file %s does not match\n' "$filename"
fi
done 3<md5.txt
The redirection 3<md5.txt causes the file to be opened as file descriptor 3. The -u 3 option to read causes it to read from that file descriptor. The inner read still reads from stdin.
I'm not going to argue. I simply try to avoid double read from inside loops.
#! /bin/bash
cat md5.txt | while read sum file
do
prev_sum=$(md5sum $file | awk '{print $1}')
if [ "$sum" != "$prev_sum" ]
then
echo "md5 of file $file does not match"
else
echo "$file is fine"
fi
done

Resources