Comparing Two Files For Matching Words in Linux - linux

Lets say we have two files as follows
File A.txt
Karthick is not so intelligent
He is not lazy
File B.txt
karthick is not so bad either
He is hard worker
so in the two files above, the commone words are "karthick is not so" & "He is" in each of the lines. Is there any way to print all such common lines with either grep command or some linux command?

You want to use the dwdiff utility :).
Example usage:
dwdiff "File A.txt" "File B.txt"
It might take a little while to get used to it's output, but check http://linux.die.net/man/1/dwdiff for more details on that.
There are also several visual diff applications out there, but I prefer using it on the command line.

Related

How to split large file to small files with prefix in Linux/Bash

I have a file in Linux called test. Now I want to split the test into say 10 small files.
The test file has more than 1000 table names. I want the small files to have equal no of lines, the last file might have the same no of table names or not.
What I want is can we add a prefix to the split files while invoking the split command in the Linux terminal.
Sample:
test_xaa test_xab test_xac and so on..............
Is this possible in Linux.
I was able to solve my question with the following statement
split -l $(($(wc -l < test.txt )/10 + 1)) test.txt test_x
With this I was able to get the desired result
I would've sworn split did this on it's own, but to my surprise, it does not.
To get your prefix, try something like this:
for x in /path/to/your/x*; do
mv $x your_prefix_$x
done

Using diff command, ignore character at end of line

I'm not entirely sure what sort of diff command I'd do to match what I need. But basically I have two different directories full of files that I need to compare and outline the changes of. But in one set of files they basically have a '1' at the end of the line.
An example would be if comparing these two objects
File1/1.txt
I AM IDENTICAL
File2/1.txt
I AM IDENTICAL 1
So I'd just want the diff command to leave out the '1' at the end of the line and show me the files which actually have changes. So far I came up with something like
diff file1/ file2/ -rw -I "$1" | more
but that doesn't work.
Apologies if this is an easy obvious question.
If the number of files and/or size is not that much, you can eyeball the differences and simply use vimdiff command to compare two files side by side
vimdiff File1/1.txt File2/1.txt
Otherwise, as arkascha suggested, first you need to modify your files to eliminate the ending character(s) before comparing them.

in linux, how to compute the md5 of several files at once, and put the output in a text file?

I have many files in mypath that look like file_01.gz, file_02.gz, etc
I would like to compute the md5 checksums for each of them, and store the output in a text file with something like
filename md5
file01 fjiroeghreio
Is that possible on linux?
Many thanks!
md5sum file*.gz > output.txt
Output file is space separated, without header
You can use the shell filename expansion:
md5sum *.gz > file
Linux already has a tool called md5sum, so all you need to do is call it for every file you want. In the approach below you get the default format of the md5sum tool, "SUM NAME", one per line for each file found. By using the double redirect (>>) each call will append to the bottom of the output file, sums.txt
#!/bin/bash
for f in *.gz; do
md5sum "$f" >> sums.txt
done
The above is illustrative only, you should probably check for the pre-existence of the output file, deal with errors etc.
There's lots of ways of doing this, so it all depends on further requirements. Must the format be of the form you state, must it recurse directories etc.?

how to use do loop to read several files with similar names in shell script

I have several files named scale1.dat, scale2.dat scale3.dat ... up to scale9.dat.
I want to read these files in do loop one by one and with each file I want to do some manipulation (I want to write the 1st column of each scale*.dat file to scale*.txt).
So my question is, is there a way to read files with similar names. Thanks.
The regular syntax for this is
for file in scale*.dat; do
awk '{print $1}' "$file" >"${file%.dat}.txt"
done
The asterisk * matches any text or no text; if you want to constrain to just single non-zero digits, you could say for file in scale[1-9].dat instead.
In Bash, there is a non-standard additional glob syntax scale{1..9}.dat but this is Bash-only, and so will not work in #!/bin/sh scripts. (Your question has both sh and bash so it's not clear which you require. Your comment that the Bash syntax is not working for you suggests that you may need a POSIX portable solution.) Furthermore, Bash has something called extended globbing, which allows for quite elaborate pattern matching. See also http://mywiki.wooledge.org/glob
For a simple task like this, you don't really need the shell at all, though.
awk 'FNR==1 { if (f) close (f); f=FILENAME; sub(/\.dat/, ".txt", f); }
{ print $1 >f }' scale[1-9]*.dat
(Okay, maybe that's slightly intimidating for a first-timer. But the basic point is that you will often find that the commands you want to use will happily work on multiple files, and so you don't need shell loops at all in those cases.)
I don't think so. Similar names or not, you will have to iterate through all your files (perhaps with a for loop) and use a nested loop to iterate through lines or words or whatever you plan to read from those files.
Alternatively, you can copy your files into one (say, scale-all.dat) and read that single file.

Compare 2 files with shell script

I was trying to find the way for knowing if two files are the same, and found this post...
Parsing result of Diff in Shell Script
I used the code in the first answer, but i think it's not working or at least i cant get it to work properly...
I even tried to make a copy of a file and compare both (copy and original), and i still get the answer as if they were different, when they shouldn't be.
Could someone give me a hand, or explain what's happening?
Thanks so much;
peixe
Are you trying to compare if two files have the same content, or are you trying to find if they are the same file (two hard links)?
If you are just comparing two files, then try:
diff "$source_file" "$dest_file" # without -q
or
cmp "$source_file" "$dest_file" # without -s
in order to see the supposed differences.
You can also try md5sum:
md5sum "$source_file" "$dest_file"
If both files return same checksum, then they are identical.
comm is a useful tool for comparing files.
The comm utility will read file1 and file2, which should be
ordered in the current collating sequence, and produce three
text columns as output: lines only in file1; lines only in
file2; and lines in both files.

Resources