How to highlight the differences between subsequent lines in a file? - linux

I do a lot of urgent analysis of large logfile analysis. Often this will require tailing a log and looking for changes.
I'm keen to have a solution that will highlight these changes to make it easier for the eye to track.
I have investigated tools and there doesn't appear to be anything out there that does what I am looking for. I've written some scripts in Perl that do it roughly, but I would like a more complete solution.
Can anyone recommend a tool for this?

Levenshtein distance
Wikipedia:
Levenshtein distance between two strings is minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
public static int LevenshteinDistance(char[] s1, char[] s2) {
int s1p = s1.length, s2p = s2.length;
int[][] num = new int[s1p + 1][s2p + 1];
// fill arrays
for (int i = 0; i <= s1p; i++)
num[i][0] = i;
for (int i = 0; i <= s2p; i++)
num[0][i] = i;
for (int i = 1; i <= s1p; i++)
for (int j = 1; j <= s2p; j++)
num[i][j] = Math.min(Math.min(num[i - 1][j] + 1,
num[i][j - 1] + 1), num[i - 1][j - 1]
+ (s1[i - 1] == s2[j - 1] ? 0 : 1));
return num[s1p][s2p];
}
Sample App in Java
String Diff
Application uses LCS algorithm to concatenate 2 text inputs into 1. Result will contain minimal set of instructions to make one string for the other. Below the instruction concatenated text is displayed.
Download application:
String Diff.jar
Download source:
Diff.java

I wrote a Python script for this purpose that utilizes difflib.SequenceMatcher:
#!/usr/bin/python3
from difflib import SequenceMatcher
from itertools import tee
from sys import stdin
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
https://docs.python.org/3/library/itertools.html#itertools-recipes
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def color(c, s):
"""Wrap string s in color c.
Based on http://stackoverflow.com/a/287944/1916449
"""
try:
lookup = {'r':'\033[91m', 'g':'\033[92m', 'b':'\033[1m'}
return lookup[c] + str(s) + '\033[0m'
except KeyError:
return s
def diff(a, b):
"""Returns a list of paired and colored differences between a and b."""
for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
if tag == 'equal': yield 2 * [color('w', a[i:j])]
if tag in ('delete', 'replace'): yield color('r', a[i:j]), ''
if tag in ('insert', 'replace'): yield '', color('g', b[k:l])
if __name__ == '__main__':
for a, b in pairwise(stdin):
print(*map(''.join, zip(*diff(a, b))), sep='')
Example input.txt:
108 finished /tmp/ts-out.5KS8bq 0 435.63/429.00/6.29 ./eval.exe -z 30
107 finished /tmp/ts-out.z0tKmX 0 456.10/448.36/7.26 ./eval.exe -z 30
110 finished /tmp/ts-out.wrYCrk 0 0.00/0.00/0.00 tail -n 1
111 finished /tmp/ts-out.HALY18 0 460.65/456.02/4.47 ./eval.exe -z 30
112 finished /tmp/ts-out.6hdkH5 0 292.26/272.98/19.12 ./eval.exe -z 1000
113 finished /tmp/ts-out.eFBgoG 0 837.49/825.82/11.34 ./eval.exe -z 10
Output of cat input.txt | ./linediff.py:

http://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html
.. this look promising, will update this with more info when Ive played more..

Related

Tree formation issue by the given custom inputs

You are given a tree of N nodes rooted at 1. Each node of the tree has a color associated with it. Now you are given Q queries. In each query, you are given a node number X and for each query you have to mark the node X as special and all the other nodes in its subtree with the same color also as special. If a node is marked as special in a query then for all the other subsequent queries, it remains marked as special.
For each query, you need to print the total number of special nodes in the tree after you perform the marking operation in the query.
Input
The first line contains an integer N as input denoting the total number of nodes in the tree. Next, N−1 lines contain two integers U and V which denotes there is an edge between the nodes U and V in the tree. 
Next line contains N space separated integers that denotes the color of each node of the tree.
Next line contains an integer Q as input that denotes the count of queries.
Next Q lines contain an integer X that denotes the node whose subtree needs to be marked as special for that query.
Output
For each query, you need to print the count of nodes that are special after this query is performed.
Sample Input:
5
3 1
3 2
2 4
2 5
1 1 2 2 1
4
2
4
5
1
Sample Output:
2
3
3
4
Issue is when I try to change the test case ,the code fails (Runtime error)
5
3 1
2 4
3 2
2 5
1 1 2 2 1
4
2
4
5
1
Expected output must be the same as above
Here is the code we have used :
#the result will be shown here.
special=[]
def remove_1_from_tuple(tup):
if(tup[0]==1):
return tup[1]
else:
return tup[0]
def make_tree(node,hashmap):
latest=node
key=latest.data
if(key in hashmap):
hmap=hashmap[key]
for val in hmap:
latest.children.append(Node(None,val))
for child in latest.children:
make_tree(child,hashmap)
class Node():
def __init__(self, tree, data, parent=None):
self.special=False
self.data = data
self.parent = parent
self.children = []
self.tree = tree
def set_color(self,color):
self.color=color
def set_special(self):
self.special=True
def find(self, x):
if self.data is x: return self
for node in self.children:
n = node.find(x)
if n: return n
return None
def get_same_color_in_sub_tree(self,x):
for node in self.children:
if(x.color==node.color):
return node
node.get_same_color_in_sub_tree(node)
return None
Nodes_length=int(input())
#the tree is a node of 1 (root node)
n = Node(None,1)
node_tuples=[]
while(Nodes_length>1):
data_A,data_B=map(int,input().split(" "))
node_tuples.append((data_A,data_B))
Nodes_length=Nodes_length-1
elem=remove_1_from_tuple(node_tuples[0])
n.children.append(Node(None,elem))
node_tuples=node_tuples[1::]
#create a hashmap..
hashmap={}
for key,value in node_tuples:
if(key in hashmap):
hashmap[key].append(value)
else:
hashmap[key]=[]
hashmap[key].append(value)
#now make the tree suign recursive functon...
make_tree(n.children[0],hashmap)
#Assign color to each node..
colors=list(map(int,input().split(" ")))
for index in range(0,len(colors)):
node_value=index+1
node=n.find(node_value)
node.set_color(colors[index])
#get the count of the operations now..
operations_count=int(input())
#run the operations
while(operations_count>0):
node_value=int(input())
CTR=0
if(len(special)!=0):
CTR=special[-1]
node=n.find(node_value)
if(node.special==False):
node.set_special()
CTR=CTR+1
#traverse throught he sub tree and check if the color is same
same_color_node=node.get_same_color_in_sub_tree(node)
if(same_color_node!=None):
#mark that node as special..
same_color_node.set_special()
#increment the counter by 1
CTR=CTR+1
special.append(CTR)
operations_count=operations_count-1
for obj in special:
print(obj)
Thanks
This is the code:
#include<bits/stdc++.h>
using namespace std;
const int maxn = 1e5;
vector<int> graph[maxn + 5];
bool special[maxn + 5];
int A[maxn + 5];
int res;
int parent[maxn + 5];
unordered_set<int> chalo;
void dfs(int u, int p, int col){
if(col == A[u])
special[u] = true;
if(special[u])
chalo.insert(u);
for (int i: graph[u]){
if(i == p) continue;
dfs(i, u, col);
}
}
void dfs1(int u, int p){
for (int i: graph[u]){
if(i == p) continue;
parent[i] = u;
dfs1(i, u);
}
}
int main(){
int n; cin >> n;
for (int i = 0, x,y; i < n-1; ++i){
cin >> x >> y;
graph[x].push_back(y);
graph[y].push_back(x);
}
for (int i = 1; i <= n; ++i)
cin >> A[i];
int q; cin >> q;
dfs1(1, -1);
while(q--){
int x; cin >> x;
int p = parent[x];
dfs(x, parent[x], A[x]);
cout << chalo.size() << endl;
}
return 0;
}

total substrings with k ones

Given a binary string s, we need to find the number of its substrings, containing exactly k characters that are '1'.
For example: s = "1010" and k = 1, answer = 6.
Now, I solved it using binary search technique over the cumulative sum array.
I also used another approach to solve it. The approach is as follows:
For each position i, find the total substrings that end at i containing
exactly k characters that are '1'.
To find the total substrings that end at i containing exactly k characters that are 1, it can be represented as the set of indices j such that substring j to i contains exactly k '1's. The answer would be the size of the set. Now, to find all such j for the given position i, we can rephrase the problem as finding all j such that
number of ones from [1] to [j - 1] = the total number of ones from 1 to i - [the total number of ones from j to i = k].
i.e. number of ones from [1] to [j - 1] = C[i] - k
which is equal to
C[j - 1] = C[i] - k,
where C is the cumulative sum array, where
C[i] = sum of characters of string from 1 to i.
Now, the problem is easy because, we can find all the possible values of j's using the equation by counting all the prefixes that sum to C[i] - k.
But I found this solution,
int main() {
cin >> k >> S;
C[0] = 1;
for (int i = 0; S[i]; ++i) {
s += S[i] == '1';
++C[s];
}
for (int i = k; i <= s; ++i) {
if (k == 0) {
a += (C[i] - 1) * C[i] / 2;
} else {
a += C[i] * C[i - k];
}
}
cout << a << endl;
return 0;
}
In the code, S is the given string and K as described above, C is the cumulative sum array and a is the answer.
What is the code exactly doing by using multiplication, I don't know.
Could anybody explain the algorithm?
If you see the way C[i] is calculated, C[i] represents the number of characters between ith 1 and i+1st 1.
If you take an example S = 1001000
C[0] = 1
C[1] = 3 // length of 100
C[2] = 4 // length of 1000
So coming to your doubt, Why multiplication
Say your K=1, then you want to find out the substring which have only one 1, now you know that after first 1 there are two zeros since C[1] = 3. So number of of substrings will be 3, because you have to include this 1.
{1,10,100}
But when you come to the second part: C[2] =4
now if you see 1000 and you know that you can make 4 substrings (which is equal to C[2])
{1,10,100,1000}
and also you should notice that there are C[1]-1 zeroes before this 1.
So by including those zeroes you can make more substring, in this case by including 0 once
0{1,10,100,1000}
=> {01,010,0100,01000}
and 00 once
00{1,10,100,1000}
=> {001,0010,00100,001000}
so essentially you are making C[i] substrings starting with 1 and you can append i number of zeroes before this one and make another C[i] * C[i-k]-1 substrings. i varies from 1 to C[i-k]-1 (-1 because we want to leave that last one).
((C[i-k]-1)* C[i]) +C[i]
=> C[i-k]*C[i]

Modified longest common substring

Given two strings what is an efficient algorithm to find the number and length of longest common sub-strings with the sub-strings being called common if :
1) they have at-least x% characters same and at same position.
2) the start and end indexes of the sub-strings being same.
Ex :
String 1 -> abedefkhj
String 2 -> kbfdfjhlo
suppose the x% being asked is 40,then, ans is,
5 1
where 5 is the longest length and 1 is the number of sub-strings in each string satisfying the given property. Sub-String is "abede" in string 1 and "kbfdf" in string 2.
You can use smth like Levenshtein distance without deleting and inserting.
Build the table, where every element [i, j] is error for substring from position [i] to position [j].
foo(string a, string b, int x):
len = min(a.length, b.length)
error[0][0] = 0 if a[0] == b[0] else 1;
for (end: [1 -> len-1]):
for (start: [end -> 0]):
if a[end] == b[end]:
error[start][end] = error[start][end - 1]
else:
error[start][end] = error[start][end - 1] + 1
best_len = 0;
best_pos = 0;
for (i: [0 -> len-1]):
for (j: [i -> 0]):
len = i - j + 1
error_percent = 100 * error[i][j] / len
if (error_percent <= x and len > best_len):
best_len = len
best_pos = j
return (best_len, best_pos)

Finding minimum moves required for making 2 strings equal

This is a question from one of the online coding challenge (which has completed).
I just need some logic for this as to how to approach.
Problem Statement:
We have two strings A and B with the same super set of characters. We need to change these strings to obtain two equal strings. In each move we can perform one of the following operations:
1. swap two consecutive characters of a string
2. swap the first and the last characters of a string
A move can be performed on either string.
What is the minimum number of moves that we need in order to obtain two equal strings?
Input Format and Constraints:
The first and the second line of the input contains two strings A and B. It is guaranteed that the superset their characters are equal.
1 <= length(A) = length(B) <= 2000
All the input characters are between 'a' and 'z'
Output Format:
Print the minimum number of moves to the only line of the output
Sample input:
aab
baa
Sample output:
1
Explanation:
Swap the first and last character of the string aab to convert it to baa. The two strings are now equal.
EDIT : Here is my first try, but I'm getting wrong output. Can someone guide me what is wrong in my approach.
int minStringMoves(char* a, char* b) {
int length, pos, i, j, moves=0;
char *ptr;
length = strlen(a);
for(i=0;i<length;i++) {
// Find the first occurrence of b[i] in a
ptr = strchr(a,b[i]);
pos = ptr - a;
// If its the last element, swap with the first
if(i==0 && pos == length-1) {
swap(&a[0], &a[length-1]);
moves++;
}
// Else swap from current index till pos
else {
for(j=pos;j>i;j--) {
swap(&a[j],&a[j-1]);
moves++;
}
}
// If equal, break
if(strcmp(a,b) == 0)
break;
}
return moves;
}
Take a look at this example:
aaaaaaaaab
abaaaaaaaa
Your solution: 8
aaaaaaaaab -> aaaaaaaaba -> aaaaaaabaa -> aaaaaabaaa -> aaaaabaaaa ->
aaaabaaaaa -> aaabaaaaaa -> aabaaaaaaa -> abaaaaaaaa
Proper solution: 2
aaaaaaaaab -> baaaaaaaaa -> abaaaaaaaa
You should check if swapping in the other direction would give you better result.
But sometimes you will also ruin the previous part of the string. eg:
caaaaaaaab
cbaaaaaaaa
caaaaaaaab -> baaaaaaaac -> abaaaaaaac
You need another swap here to put back the 'c' to the first place.
The proper algorithm is probably even more complex, but you can see now what's wrong in your solution.
The A* algorithm might work for this problem.
The initial node will be the original string.
The goal node will be the target string.
Each child of a node will be all possible transformations of that string.
The current cost g(x) is simply the number of transformations thus far.
The heuristic h(x) is half the number of characters in the wrong position.
Since h(x) is admissible (because a single transformation can't put more than 2 characters in their correct positions), the path to the target string will give the least number of transformations possible.
However, an elementary implementation will likely be too slow. Calculating all possible transformations of a string would be rather expensive.
Note that there's a lot of similarity between a node's siblings (its parent's children) and its children. So you may be able to just calculate all transformations of the original string and, from there, simply copy and recalculate data involving changed characters.
You can use dynamic programming. Go over all swap possibilities while storing all the intermediate results along with the minimal number of steps that took you to get there. Actually, you are going to calculate the minimum number of steps for every possible target string that can be obtained by applying given rules for a number times. Once you calculate it all, you can print the minimum number of steps, which is needed to take you to the target string. Here's the sample code in JavaScript, and its usage for "aab" and "baa" examples:
function swap(str, i, j) {
var s = str.split("");
s[i] = str[j];
s[j] = str[i];
return s.join("");
}
function calcMinimumSteps(current, stepsCount)
{
if (typeof(memory[current]) !== "undefined") {
if (memory[current] > stepsCount) {
memory[current] = stepsCount;
} else if (memory[current] < stepsCount) {
stepsCount = memory[current];
}
} else {
memory[current] = stepsCount;
calcMinimumSteps(swap(current, 0, current.length-1), stepsCount+1);
for (var i = 0; i < current.length - 1; ++i) {
calcMinimumSteps(swap(current, i, i + 1), stepsCount+1);
}
}
}
var memory = {};
calcMinimumSteps("aab", 0);
alert("Minimum steps count: " + memory["baa"]);
Here is the ruby logic for this problem, copy this code in to rb file and execute.
str1 = "education" #Sample first string
str2 = "cnatdeiou" #Sample second string
moves_count = 0
no_swap = 0
count = str1.length - 1
def ends_swap(str1,str2)
str2 = swap_strings(str2,str2.length-1,0)
return str2
end
def swap_strings(str2,cp,np)
current_string = str2[cp]
new_string = str2[np]
str2[cp] = new_string
str2[np] = current_string
return str2
end
def consecutive_swap(str,current_position, target_position)
counter=0
diff = current_position > target_position ? -1 : 1
while current_position!=target_position
new_position = current_position + diff
str = swap_strings(str,current_position,new_position)
# p "-------"
# p "CP: #{current_position} NP: #{new_position} TP: #{target_position} String: #{str}"
current_position+=diff
counter+=1
end
return counter,str
end
while(str1 != str2 && count!=0)
counter = 1
if str1[-1]==str2[0]
# p "cross match"
str2 = ends_swap(str1,str2)
else
# p "No match for #{str2}-- Count: #{count}, TC: #{str1[count]}, CP: #{str2.index(str1[count])}"
str = str2[0..count]
cp = str.rindex(str1[count])
tp = count
counter, str2 = consecutive_swap(str2,cp,tp)
count-=1
end
moves_count+=counter
# p "Step: #{moves_count}"
# p str2
end
p "Total moves: #{moves_count}"
Please feel free to suggest any improvements in this code.
Try this code. Hope this will help you.
public class TwoStringIdentical {
static int lcs(String str1, String str2, int m, int n) {
int L[][] = new int[m + 1][n + 1];
int i, j;
for (i = 0; i <= m; i++) {
for (j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (str1.charAt(i - 1) == str2.charAt(j - 1))
L[i][j] = L[i - 1][j - 1] + 1;
else
L[i][j] = Math.max(L[i - 1][j], L[i][j - 1]);
}
}
return L[m][n];
}
static void printMinTransformation(String str1, String str2) {
int m = str1.length();
int n = str2.length();
int len = lcs(str1, str2, m, n);
System.out.println((m - len)+(n - len));
}
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
String str1 = scan.nextLine();
String str2 = scan.nextLine();
printMinTransformation("asdfg", "sdfg");
}
}

Minimum no. of comparisons to find median of 3 numbers

I was implementing quicksort and I wished to set the pivot to be the median or three numbers. The three numbers being the first element, the middle element, and the last element.
Could I possibly find the median in less no. of comparisons?
median(int a[], int p, int r)
{
int m = (p+r)/2;
if(a[p] < a[m])
{
if(a[p] >= a[r])
return a[p];
else if(a[m] < a[r])
return a[m];
}
else
{
if(a[p] < a[r])
return a[p];
else if(a[m] >= a[r])
return a[m];
}
return a[r];
}
If the concern is only comparisons, then this should be used.
int getMedian(int a, int b , int c) {
int x = a-b;
int y = b-c;
int z = a-c;
if(x*y > 0) return b;
if(x*z > 0) return c;
return a;
}
int32_t FindMedian(const int n1, const int n2, const int n3) {
auto _min = min(n1, min(n2, n3));
auto _max = max(n1, max(n2, n3));
return (n1 + n2 + n3) - _min - _max;
}
You can't do it in one, and you're only using two or three, so I'd say you've got the minimum number of comparisons already.
Rather than just computing the median, you might as well put them in place. Then you can get away with just 3 comparisons all the time, and you've got your pivot closer to being in place.
T median(T a[], int low, int high)
{
int middle = ( low + high ) / 2;
if( a[ middle ].compareTo( a[ low ] ) < 0 )
swap( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swap( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swap( a, middle, high );
return a[middle];
}
I know that this is an old thread, but I had to solve exactly this problem on a microcontroller that has very little RAM and does not have a h/w multiplication unit (:)). In the end I found the following works well:
static char medianIndex[] = { 1, 1, 2, 0, 0, 2, 1, 1 };
signed short getMedian(const signed short num[])
{
return num[medianIndex[(num[0] > num[1]) << 2 | (num[1] > num[2]) << 1 | (num[0] > num[2])]];
}
If you're not afraid to get your hands a little dirty with compiler intrinsics you can do it with exactly 0 branches.
The same question was discussed before on:
Fastest way of finding the middle value of a triple?
Though, I have to add that in the context of naive implementation of quicksort, with a lot of elements, reducing the amount of branches when finding the median is not so important because the branch predictor will choke either way when you'll start tossing elements around the the pivot. More sophisticated implementations (which don't branch on the partition operation, and avoid WAW hazards) will benefit from this greatly.
remove max and min value from total sum
int med3(int a, int b, int c)
{
int tot_v = a + b + c ;
int max_v = max(a, max(b, c));
int min_v = min(a, min(b, c));
return tot_v - max_v - min_v
}
There is actually a clever way to isolate the median element from three using a careful analysis of the 6 possible permutations (of low, median, high). In python:
def med(a, start, mid, last):
# put the median of a[start], a[mid], a[last] in the a[start] position
SM = a[start] < a[mid]
SL = a[start] < a[last]
if SM != SL:
return
ML = a[mid] < a[last]
m = mid if SM == ML else last
a[start], a[m] = a[m], a[start]
Half the time you have two comparisons otherwise you have 3 (avg 2.5). And you only swap the median element once when needed (2/3 of the time).
Full python quicksort using this at:
https://github.com/mckoss/labs/blob/master/qs.py
You can write up all the permutations:
1 0 2
1 2 0
0 1 2
2 1 0
0 2 1
2 0 1
Then we want to find the position of the 1. We could do this with two comparisons, if our first comparison could split out a group of equal positions, such as the first two lines.
The issue seems to be that the first two lines are different on any comparison we have available: a<b, a<c, b<c. Hence we have to fully identify the permutation, which requires 3 comparisons in the worst case.
Using a Bitwise XOR operator, the median of three numbers can be found.
def median(a,b,c):
m = max(a,b,c)
n = min(a,b,c)
ans = m^n^a^b^c
return ans

Resources