Diff-ing files with Linux command - linux

What Linux command allow me to check if all the lines in file A exist in file B? (it's almost like a diff, but not quite). Also file A has uniq lines, as is the case with file B as well.

The comm command compares two sorted files, line by line, and is part of GNU coreutils.

Are you looking for a better diff tool?

So, what if A has
and b has
What would you want the output to be(yes or no)?

Use diff command.
Here is a useful vide with complete usage of diff command under 3 min
Click Here

if cat A A B | sort | uniq -c | egrep -e '^[[:space:]]*2[[:space:]]' > /dev/null; then
echo "A has lines that are not in B."
If you do not redirect the output, you will get a list of all the lines that are in A that are not in B (except each line will have a 2 in front if it). This relies on the lines in A being unique, and the lines in B being unique.
If they aren't, and you don't care about counting duplicates, it's relatively simple to transform each file into a list of unique lines using sort and uniq.


What is the usage of sorted command?

I have read most of the example comes with sort command. How ever I am not sure what is the usage of sort command in this style?
sort <word> sorted
That would just be two file names, as in
sort file1 file2 file3...
If you pass multiple file names, sort concatenates them and sorts all of them together.
If you're asking how to sort a string with the sort command:
echo "tatoine" | grep -o . | sort | tr -d "\n"
because sort operate on lines so you've got to cut the string in multiple lines with one letter on each (grep -o .) and after sorting you just delete the new lines with the tr command.
Are those < and > symbols explicit, or do they indicate a parameter that is to be replaced? If the latter, then you're reading from a file called "word", and writing the sorted data to a file called "sorted".
Are you trying to save the content in a sorted order?
Let's say you have a file name.txt with the following content.
You can use the sort commmand "sort name.txt" and the output goes to the console
You can save the output using "sort name.txt -o sortedname.txt"
You can found more option with the command "man sort" and "info sort"
rojomoke was right about the > and < commands. Those are redirection commands.
We usually read the data from standard input (stdin) and output goes to standard output aka the screen (stdout)
< means get the data from somewhere else. e.g. a file.
> means redirect the output to somewhere else e.g. a file.
So for the command above "sort name.txt -o sortedname.txt", I could have written as follow.
sort < name.txt > sortedname.txt
You can read more about the redirection in this wiki entry.
Commands like | >> will come in handy down the road.

How can I find which lines in a certain file are not started by lines from another file using bash?

I have two text files, A and B:
a start
b stop
c start
e start
How can I find which lines in A are not started by lines from B using shell(bash...) command. In this case, I want to get this answer:
a start
e start
Can I implement this using a single line of command?
This should do:
sed '/^$/d;s/^/^/' B | grep -vf - A
The sed command will take all non-empty lines (observe the /^$/d command) from the file B and prepend a caret ^ in front of each line (so as to obtain an anchor for grep's regexp), and spits all this to stdout. Then grep, with the -f option (which means take all patterns from a file, which happens to be stdin here, thanks to the - symbol) and does an invert matching (thanks to the -v option) on file A. Done.
I think this should do it:
sed 's/^/\^/g' B > C.tmp
grep -vEf C.tmp A
rm C.tmp
You can try using a combination of xargs, cat, and grep
Save the first letters of each line into FIRSTLETTERLIST. You can do this with some cat and sed work.
The idea is to take the blacklist and then match it against the interesting file.
cat file1.txt | xargs grep ^[^[$FIRSTLETTERLIST]]
This is untested, so I won't guarantee it will work, but it should point you in the right direction.

How to find the particular text stored in the file "data.txt" and it occurs only once

The line I seek is stored in the file data.txt and is the only line of text that occurs only once.
How do I go about finding that particular line using linux?
This is a little bit old, but I think you are looking for this...
cat data.txt | sort | uniq -u
This will show the unique values that only occur once in the file. I assume you are familiar with "over the wire" if you are asking?? If so, this is what you are looking for.
To provide some context (I need more rep to comment) this is a question that features in an online "wargame" called Bandit that involves using the command line to discover passwords on an online Linux server to advance up the levels.
For those who would like to see data.txt in full I've Pastebin'd it here however it looks like this:
One way to do it is to use:
sort data.txt | uniq -u
The sort command is like cat in that it displays the contents of the file however it sorts the file lexicographically by lines (it reorders them alphabetically so that matching ones are together).
The | is a pipe that redirects the output from one command into another.
The uniq command reports or omits repeated lines and by passing it the -u argument we tell it to report only unique lines.
Used together like this, the command will sort data.txt lexicographically by each line, find the unique line and print it back in the terminal for you.
sort -u data.txt | while read line; do if [ $(grep -c $line data.txt) == 1 ] ;then echo $line; fi; done
was mine solution, until I saw here easy one:
sort data.txt | uniq -u
Add more information to you post.
How data.txt look like?
Like this:
Or like this
And, do you know the password is in file or you search for not repeat string.
If you know password, use something like this
cat data.txt | grep 'password'
If you don`t know the password and this password is only unique line in file you must create a script.
For example in Python
file = open("data.txt","r")
f = file.read()
for line in f:
if 'pass' in line:
print pass
Of course replace pass with something else.
For example some slice from line.
And one with only one tool in use, awk:
awk '{a[$1]++}END{for(i in a){if(a[i] == 1){print i} }}' data.txt
sort data.txt | uniq -c | grep 1\ ?*
and it will print the only text that occurs only one time
do not forget to put space after the backslash
sort data.txt | uniq -c | grep 1
you will find only one that accures one time

How to compare two text files for the same exact text using BASH?

Let's say I have two text files that I need to extract data out of. The text of the two files is as follows:
File 1:
1name - randomemail#email.com
2Name - superrandomemail#email.com
3Name - 123random#email.com
4Name - random123#email.com
File 2:
File 2 is File 1's list of domain names, extracted from the email addresses.
These are not the same domain names by any means, and are quite random.
How can I get the results of the domain names that match File 2 from File 1?
Thank you in advance!
Assuming that order does not matter,
grep -F -f FILE2 FILE1
should do the trick. (This works because of a little-known fact: the -F option to grep doesn't just mean "match this fixed string," it means "match any of these newline-separated fixed strings.")
The recipe:
join <(sed 's/^.*#//' file1|sort -u) <(sort -u file2)
it will output the intersection of all domain names in file1 and file2
See BashFAQ/036 for the list of usual solutions to this type of problem.
Use VimDIFF command, this gives a nice presentation of difference
If I got you right, you want to filter for all addresses with the host mentioned in File 2.
You could then just loop over File 2 and grep for #<line>, accumulating the result in a new file or something similar.
cat file2 | sort -u | while read host; do grep "#$host" file1; done > filtered

Sort & uniq in Linux shell

What is the difference between the following to commands?
sort -u FILE
sort FILE | uniq
Using sort -u does less I/O than sort | uniq, but the end result is the same. In particular, if the file is big enough that sort has to create intermediate files, there's a decent chance that sort -u will use slightly fewer or slightly smaller intermediate files as it could eliminate duplicates as it is sorting each set. If the data is highly duplicative, this could be beneficial; if there are few duplicates in fact, it won't make much difference (definitely a second order performance effect, compared to the first order effect of the pipe).
Note that there times when the piping is appropriate. For example:
sort FILE | uniq -c | sort -n
This sorts the file into order of the number of occurrences of each line in the file, with the most repeated lines appearing last. (It wouldn't surprise me to find that this combination, which is idiomatic for Unix or POSIX, can be squished into one complex 'sort' command with GNU sort.)
There are times when not using the pipe is important. For example:
sort -u -o FILE FILE
This sorts the file 'in situ'; that is, the output file is specified by -o FILE, and this operation is guaranteed safe (the file is read before being overwritten for output).
There is one slight difference: return code.
The thing is that unless shopt -o pipefail is set the return code of the piped command will be return code of the last one. And uniq always returns zero (success). Try examining exit code, and you'll see something like this (pipefail is not set here):
pavel#lonely ~ $ sort -u file_that_doesnt_exist ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
pavel#lonely ~ $ sort file_that_doesnt_exist | uniq ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
Other than this, the commands are equivalent.
Beware! While it's true that "sort -u" and "sort|uniq" are equivalent, any additional options to sort can break the equivalence. Here's an example from the coreutils manual:
For example, 'sort -n -u' inspects only the value of the initial numeric string when checking for uniqueness, whereas 'sort -n | uniq' inspects the entire line.
Similarly, if you sort on key fields, the uniqueness test used by sort won't necessarily look at the entire line anymore. After being bitten by that bug in the past, these days I tend to use "sort|uniq" when writing Bash scripts. I'd rather have higher I/O overhead than run the risk that someone else in the shop won't know about that particular pitfall when they modify my code to add additional sort parameters.
sort -u will be slightly faster, because it does not need to pipe the output between two commands
also see my question on the topic: calling uniq and sort in different orders in shell
I have worked on some servers where sort don't support '-u' option. there we have to use
sort xyz | uniq
Nothing, they will produce the same result
