Performing set operations using linux command [duplicate] - linux

This question already has answers here:
difference between the content of two files
(4 answers)
Closed 8 years ago.
I have the following two lists of numbers stored in two different files:
File A:
7
1
2
9
File B:
10
8
4
9
Now I want to find out the set operation A-B (i.e. find only those numbers that are in A but not in B). Is there some way by which I may do the same using linux command (like sed). I know it is possible to do it using python, but I am just curious to know if it is possible to do the same using some linux command?

The simple, almost-working version is this:
grep -v -f file2 file1
That is, use lines from file2 as patterns; match them in file1 and print the ones not found (i.e. file1 - file2). However, what if file1 contains 10 and file2 contains 1? Then we have a problem because substrings are matched too. We can fix it this way:
grep -v -f <(sed 's/\(.*\)/^\1$/' file2) file1
That is, preprocess file2 to prepend ^ and append $ so the matching occurs against entire lines, not substrings in file1.

diff is the tool for this:
diff f1 f2
1,3c1,3
< 7
< 1
< 2
---
> 10
> 8
> 4
f1 has this unique number 7,1,2
f2 has this unique number 10,8,4

Related

merge 2 txt file in linux [duplicate]

This question already has answers here:
How to interleave lines from two text files
(6 answers)
Closed 2 years ago.
file1
1
2
3
4
file2
5
6
7
8
result I want
1
5
2
6
3
7
4
8
which file1 alternative with file2
not like
1
2
3
4
5
6
7
8
can I merge file like this?
As an alternative to the sed method suggested by #Michael, you can also use the paste command, in this case: paste -d \\n file1 file2
Have you tried the below in the linux command line?
cat file2.txt >> file1.txt
This will append the text in file1 to file 2.
Edit: Having seen your query is referring to intertwining the files, I think this article will help you using the 'sed' command: https://unix.stackexchange.com/questions/288521/with-the-linux-cat-command-how-do-i-show-only-certain-lines-by-number

Unix Command Operations [duplicate]

This question already has answers here:
Filtering Rows Based On Number of Columns with AWK
(3 answers)
Closed 6 years ago.
Lets say there is a file in linux which has lines which are space separated.
e.g.
This is linux file
This is linux text
This is linux file 1
This is linux file 3
Now I want to only print those rows which has 5th column present in the lines of file. In this example my output should be 3rd and 4th line ( with 1 and 3 as 5th column )
What is the best way to do it?
This can be done with awk and its NF (number of fields) variable, as per the following transcript:
pax$ cat inputFile
This is linux file
This is linux text
This is linux file 1
This is linux file 3
pax$ awk 'NF >= 5 {print}' inputFile
This is linux file 1
This is linux file 3
This works because the basic form of an awk command is pattern { action }.
The pattern selects those lines (and sometimes things that aren't lines, such as with BEGIN and END patterns) which meet certain criteria and the action dictate what to do.
In this case, it selects lines that have five or more fields and simply prints them.
In addition to awk, you can also do it very simply in bash (or any of the shells) by reading each line into at least five fields, and then checking to insure the fifth field is populated. Something like this will work (it will read from the filename given as the first argument (or stdin if no name is given))
#!/bin/bash
fn="${1:-/dev/stdin}"
while read -r f1 f2 f3 f4 f5; do
[ -n "$f5" ] && printf "%s %s %s %s %s\n" "$f1" "$f2" "$f3" "$f4" "$f5"
done <"$fn"
Example
Using your data, the snippet above produces the following output:
$ bash prn5flds.sh dat/5fields.txt
This is linux file 1
This is linux file 3
(note: depending your your shell, read may or may not support the -r option. If it doesn't, simply omit it)

How to sort file names by specific part in linux? [duplicate]

This question already has answers here:
How can I sort file names by version numbers?
(7 answers)
Closed 8 years ago.
I have lots of files in my build folder and I am trying to sort them by using sort command.
The structure of the files are like that:
name - version - 'v' - build date
xxx-2.0.0-SNAPSHOT-V2014-07-10_18-01-05.log
xxx-2.0.0-SNAPSHOT-V2014-07-10_18-02-05.log
xxx-2.0.0-SNAPSHOT-V2014-07-10_18-03-05.log
xxx-2.0.0-SNAPSHOT-V2014-07-10_18-04-05.log
xxx-2.0.0-SNAPSHOT-V2014-07-10_18-05-05.log
if we assume that version string will be stay in 3 digit, sorting them is easy. What if I add different versions like 2.1 or 2.0.0.2 here ? I need a result like this:
xxx-2.1-SNAPSHOT-V2014-07-10_18-05-05.log
xxx-2.0.2-SNAPSHOT-V2014-07-10_18-04-05.log
xxx-2.0.0.2-SNAPSHOT-V2014-07-10_18-03-05.log
xxx-2.0.0.1-SNAPSHOT-V2014-07-10_18-02-05.log
xxx-2.0.-SNAPSHOT-V2014-07-10_18-01-05.log
$ cat file
xxx-2.0.2-SNAPSHOT-V2014-07-10_18-04-05.log
xxx-2.0.0.2-SNAPSHOT-V2014-07-10_18-03-05.log
xxx-2.1-SNAPSHOT-V2014-07-10_18-05-05.log
xxx-2.0.0.1-SNAPSHOT-V2014-07-10_18-02-05.log
xxx-2.0.-SNAPSHOT-V2014-07-10_18-01-05.log
$ sort -V -r -t- -k2,2 < file
xxx-2.1-SNAPSHOT-V2014-07-10_18-05-05.log
xxx-2.0.2-SNAPSHOT-V2014-07-10_18-04-05.log
xxx-2.0.0.2-SNAPSHOT-V2014-07-10_18-03-05.log
xxx-2.0.0.1-SNAPSHOT-V2014-07-10_18-02-05.log
xxx-2.0.-SNAPSHOT-V2014-07-10_18-01-05.log
Note: Some implementations of sort do not support -V option...
Explanation:
-V : Version sort
-t- : Split into columns with delimiter '-'
-k2,2: Sort by field 2 & only 2
-r : reverse sort (based on your expected output. Remove this flag, if not required.)

How can I use awk to display some certain fields in a text file? [duplicate]

This question already has answers here:
How to print third column to last column?
(19 answers)
Closed 9 years ago.
I have a text file like this:
1 http 2 3 4 5
2 dns 3 4
3 ftp 3 4 5 6 8
I want the output to be like this:
http 2 3 4 5
dns 3 4
ftp 3 4 5 6 8
Node that I just want to omit the first column in that file and the fields number in a certain line is not fixed.
Can I accomplish this goal using awk?
You can also use cut: cut -d' ' -f2-.
Edit: If you have to use awk, try awk '{$1=""; print $0}'
something like this maybe?
awk '{$1 =""; print }' file
If you don't care about the the field separators remaining in place, you can do this:
awk '{$1=""}1' filename
(Assuming filename is where you stored your data)
Drats, I was going to give you an Awk solution, then recommend cut. It looks like others have beaten me to the punch.
However, I see no sed solution yet!
$ sed -n `s/^[^ ][^ ]*//p` yourfile.txt
sed 's/^..//g' your_file
above should work based on the condition that the first field is always of a single character.
or in perl:
perl -pe 's/^[^\s]*\s//g' your_file

space/tab/newline insensitive comparison

Suppose I have these two files:
File 1:
1 2 3 4 5 6 7
File 2:
1
2
3
4
5
6
7
Is it possible to use diff to compare these two files so that the result is equal ?
(Or if not, what are other tools should I use? )
Thanks
You could collapse whitespace so file2 looks like file1, with every number on the same line:
$ cat file1
1 2 3 4 5 6 7
$ cat file2
1
2
4
3
5
6
7
$ diff <(echo $(< file1)) <(echo $(< file2))
1c1
< 1 2 3 4 5 6 7
---
> 1 2 4 3 5 6 7
Explanation:
< file # Equivalent to "cat file", but slightly faster since the shell doesn't
# have to fork a new process.
$(< file) # Capture the output of the "< file" command. Can also be written
# with backticks, as in `< file`.
echo $(< file) # Echo each word from the file. This will have the side effect of
# collapsing all of the whitespace.
<(echo $(< file)) # An advanced way of piping the output of one command to another.
# The shell opens an unused file descriptor (say fd 42) and pipes
# the echo command to it. Then it passes the filename /dev/fd/42 to
# diff. The result is that you can pipe two different echo commands
# to diff.
Alternately, you may want to make file1 look like file2, with each number on separate lines. That will produce more useful diff output.
$ diff -u <(printf '%s\n' $(< file1)) <(printf '%s\n' $(< file2))
--- /dev/fd/63 2012-09-10 23:55:30.000000000 -0400
+++ file2 2012-09-10 23:47:24.000000000 -0400
## -1,7 +1,7 ##
1
2
-3
4
+3
5
6
7
This is similar to the first command with echo changed to printf '%s\n' to put a newline after each word.
Note: Both of these commands will fail if the files being diffed are overly long. This is because of the limit on command-line length. If that happens then you will need to workaround this limitation, say by storing the output of echo/printf to temporary files.
Some diffs have a -b (ignore blanks) and -w (ingnore whitespace), but as unix utilities are all line-oriented, I don't thing whitespace will include \n chars.
Dbl-check that your version of diff doesn't have some fancy gnu-options with diff --help | less or man diff.
Is your formatting correct above, file 1, data all on one line? You could force file2 to match that format with
awk '{printf"%s ", $0}' file2
Or as mentioned in comments, convert file 1
awk '{for (i=1;i<=NF;i++) printf("%s\n", $i)}' file1
But I'm guessing that your data isn't really that simple. Also there are likely line length limitations that will appear when you can least afford the time to deal with them.
Probably not what you want to hear, and diffing of complicated stuff like source-code is not an exact science. So, if you still need help, create a slightly more complicated testcase and add it to your question.
Finally, you'll need to show us what you'd expect the output of such a diff project to look like. Right now I can't see any meaningful way to display such differences for a non-trival case.
IHTH
If it turns out the data is indeed simple enough to not run into limitations, and the only difference between the files is that the first one separates by space and the second by newline, you can also do process substitution (as was suggested above) but with sed to replace the spaces in the first file with newlines:
diff <(sed 's/ /\n/g' file1) file2

Resources