compare text between specific line linux using diff - linux

I have two java files as text file and I want to compare the two text file only between specific set of lines. So how do i specify the line number to search for in the diff command.

Diff can't do that, but you can use head and tail to extract the lines, and use process substitution to use the results in diff:
diff <(head -n 100 file1.java | tail -n 20) <(head -n 120 file2.java | tail -n25)

Related

diff command to get number of different lines only

Can I use the diff command to find out how many lines do two files differ in?
I don't want the contextual difference, just the total number of lines that are different between two files. Best if the result is just a single integer.
diff can do all the first part of the job but no counting; wc -l does the rest:
diff -y --suppress-common-lines file1 file2 | wc -l
Yes you can, and in true Linux fashion you can use a number of commands piped together to perform the task.
First you need to use the diff command, to get the differences in the files.
diff file1 file2
This will give you an output of a list of changes. The ones your interested in are the lines prefixed with a '>' symbol
You use the grep tool to filter these out as follows
diff file1 file2 | grep "^>"
finally, once you have a list of the changes your interested in, you simply use the wc command in line mode to count the number of changes.
diff file1 file2 | grep "^>" | wc -l
and you have a perfect example of the philosophy that Linux is all about.

Grep only a part of a text file

How can I apply the following command to only a part of a text file? For example from the beginning to the line 5000.
grep "^ A : 11 B : 10" filename | wc -l
I cannot use head and then apply the above command since the text file is huge.
You could try using the sed command, which I believe does better for large files, from this question and pipe to grep.
sed -n 1,5000p file | grep ...
You can try combination of -n (prefixing each line of output with line number) and -m (limiting number of matching lines). Something like this:
grep -n -m 5000 pattern file.txt | grep -B 5000 "^5000:" | wc -l
First grep search for pattern, add line numbers and limit output to first 5000 matching lines (worst case scenario: all lines from range match pattern). Second grep match line number 5000, and print all lines before this line.
I don't know if it is more efficient solution

Compare two files in Linux: ignoring first and last lines

I´d like to compare two files, but I don´t want to take into account the first 10 lines, and the last 3 lines of both files. I tried to do it with diff and tail commands, like in here, but without success.
how can I do it?
Use GNU tail and head:
To ignore the first 10 lines of a file, use tail like this:
tail -n +11 file
To ignore the last 3 lines of a file, use head like this:
head -n -4 file
You can then construct your diff command using process substitution as follows:
diff <(tail -n +11 file | head -n -4) <(tail -n +11 file2 | head -n -4)

Find line number in a text file - without opening the file

In a very large file I need to find the position (line number) of a string, then extract the 2 lines above and below that string.
To do this right now - I launch vi, find the string, note it's line number, exit vi, then use sed to extract the lines surrounding that string.
Is there a way to streamline this process... ideally without having to run vi at all.
Maybe using grep like this:
grep -n -2 your_searched_for_string your_large_text_file
Will give you almost what you expect
-n : tells grep to print the line number
-2 : print 2 additional lines (and the wanted string, of course)
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Use grep -n string file to find the line number without opening the file.
you can use cat -n to display the line numbers and then use awk to get the line number after a grep in order to extract line number:
cat -n FILE | grep WORD | awk '{print $1;}'
although grep already does what you mention if you give -C 2 (above/below 2 lines):
grep -C 2 WORD FILE
You can do it with grep -A and -B options, like this:
grep -B 2 -A 2 "searchstring" | sed 3d
grep will find the line and show two lines of context before and after, later remove the third one with sed.
If you want to automate this, simple you can do a Shell Script. You may try the following:
#!/bin/bash
VAL="your_search_keyword"
NUM1=`grep -n "$VAL" file.txt | cut -f1 -d ':'`
echo $NUM1 #show the line number of the matched keyword
MYNUMUP=$["NUM1"-1] #get above keyword
MYNUMDOWN=$["NUM1"+1] #get below keyword
sed -n "$MYNUMUP"p file.txt #display above keyword
sed -n "$MYNUMDOWN"p file.txt #display below keyword
The plus point of the script is you can change the keyword in VAL variable as you like and execute to get the needed output.

How to crop(cut) text files based on starting and ending line-numbers in cygwin?

I have few log files around 100MBs each.
Personally I find it cumbersome to deal with such big files. I know that log lines that are interesting to me are only between 200 to 400 lines or so.
What would be a good way to extract relavant log lines from these files ie I just want to pipe the range of line numbers to another file.
For example, the inputs are:
filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number: 39276
Is there a command that I can run in cygwin to cat out only that range in that file? I know that if I can somehow display that range in stdout then I can also pipe to an output file.
Note: Adding Linux tag for more visibility, but I need a solution that might work in cygwin. (Usually linux commands do work in cygwin).
Sounds like a job for sed:
sed -n '8,12p' yourfile
...will send lines 8 through 12 of yourfile to standard out.
If you want to prepend the line number, you may wish to use cat -n first:
cat -n yourfile | sed -n '8,12p'
You can use wc -l to figure out the total # of lines.
You can then combine head and tail to get at the range you want. Let's assume the log is 40,000 lines, you want the last 1562 lines, then of those you want the first 838. So:
tail -1562 MyHugeLogFile.log | head -838 | ....
Or there's probably an easier way using sed or awk.
I saw this thread when I was trying to split a file in files with 100 000 lines. A better solution than sed for that is:
split -l 100000 database.sql database-
It will give files like:
database-aaa
database-aab
database-aac
...
And if you simply want to cut part of a file - say from line 26 to 142 - and input it to a newfile :
cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt
How about this:
$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
It uses tail to output from the 10,000th line and onwards and then head to only keep 10 lines.
The same (almost) result with sed:
$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
This one has the advantage of allowing you to input the line range directly.
If you are interested only in the last X lines, you can use the "tail" command like this.
$ tail -n XXXXX yourlogfile.log >> mycroppedfile.txt
This will save the last XXXXX lines of your log file to a new file called "mycroppedfile.txt"
This is an old thread but I was surprised nobody mentioned grep. The -A option allows specifying a number of lines to print after a search match and the -B option includes lines before a match. The following command would output 10 lines before and 10 lines after occurrences of "my search string" in the file "mylogfile.log":
grep -A 10 -B 10 "my search string" mylogfile.log
If there are multiple matches within a large file the output can rapidly get unwieldy. Two helpful options are -n which tells grep to include line numbers and --color which highlights the matched text in the output.
If there is more than file to be searched grep allows multiple files to be listed separated by spaces. Wildcards can also be used. Putting it all together:
grep -A 10 -B 10 -n --color "my search string" *.log someOtherFile.txt

Resources