O(N) Simple Diffing Algorithm Implementation -- is this right? - node.js

I just posted this on HN but it doesn't seem to be getting much uptake, I had a question about diffing -- I wanted to know if an implementation I'm using is alright: it seems a little too simple, and the literature on diffing is dense.
Background: I've been building a rendering engine for a code editor the past couple of days. Rendering huge chunks of highlighted syntax can get laggy. It's not worth switching to React at this stage, so I wanted to just write a quick diff algorithm that would selectively update only changed lines.
I found this article:
https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
With a link to this paper, the initial Git diff implementation:
http://www.xmailserver.org/diff2.pdf
I couldn't find the PDF to start with, but read "edit graph" and immediately thought — why don't I just use a hashtable to store lines from LEFT_TEXT and references to where they are, then iterate over RIGHT_TEXT and return matches one by one, also making sure that I keep track of the last match to prevent jumbling?
The algorithm I produced is only a few lines and seems accurate. It's O(N) time complexity, whereas the paper above gives a best case of O(ND) where D is minimum edit distance.
function lineDiff (left, right) {
left = left.split('\n');
right = right.split('\n');
let lookup = {};
// Store line numbers from LEFT in a lookup table
left.forEach(function (line, i) {
lookup[line] = lookup[line] || [];
lookup[line].push(i);
});
// Last line we matched
var minLine = -1;
return right.map(function (line) {
lookup[line] = lookup[line] || [];
var lineNumber = -1;
if (lookup[line].length) {
lineNumber = lookup[line].shift();
// Make sure we're looking ahead
if (lineNumber > minLine) {
minLine = lineNumber;
} else {
lineNumber = -1
}
}
return {
value: line,
from: lineNumber
};
});
}
RunKit link: https://runkit.com/keithwhor/line-diff
What am I missing? I can't find other references to doing diffing like this. Everything just links back to that one paper.

Related

why word ladder problem works different on GFG and leetcode?

In this question we have to print the all minimum size string to reach the target word. When i solve this question on GFG, it runs fine but not on LeetCode.
Here is my code:
class Solution {
public:
vector<vector<string>> findSequences(string beginWord, string endWord, vector<string>& wordList) {
unordered_set<string> st(wordList.begin(), wordList.end());
queue<vector<string>> p;
p.push({beginWord});
vector<string> usedOnLevel;
usedOnLevel.push_back(beginWord);
int level = 0;
vector<vector<string>> ans;
while (!p.empty()) {
vector<string> vec = p.front();
p.pop();
if (vec.size() > level) {
level++;
for (auto it : usedOnLevel) {
st.erase(it);
}
}
string word = vec.back();
if (word == endWord) {
if (ans.size() == 0) {
ans.push_back(vec);
} else if (ans.size() > 0 && ans[0].size() == vec.size()) {
ans.push_back(vec);
}
}
for (int i = 0; i < word.length(); i++) {
char original = word[i];
for (char ch = 'a'; ch <= 'z'; ch++) {
word[i] = ch;
if (st.find(word) != st.end()) {
usedOnLevel.push_back(word);
vec.push_back(word);
p.push(vec);
vec.pop_back();
}
}
word[i] = original;
}
}
return ans;
}
};
The difference is that leetcode throws bigger problems at you, and so correct code with poor performance is going to break. And your code has poor performance.
Why? Well, for a start, for each word you find a path to, for each possible substitution, you're looking through all words to find yours. So suppose I start with all of the 5 letter words in the official Scrabble dictionary. There are about 9000 of those. For each word you find you're going to come up with 26*5 = 130 possible new words, then search the entire 9000 word list for that for 1_170_000_000 word comparisons, mostly to find nothing. Your algorithm wanted to do more than just that, but it has already timed out.
How could you make that faster? Here is one idea. Create a data structure to answer the following question:
by position of the deleted letter:
by the resulting string:
list of words that matched
For the entire Scrabble dictionary this data structure only has around 45_000 entries. And makes it easy to find all words next to a given word in the word ladder.
OK, great! Is that enough? Well...probably not. You're starting from startWord and finding all chains of words you can find from there. Most of which are going nowhere near endWord and represent wasted work. If the minimum length chain is fairly long, this can easily be an exponential amount of wasted effort. How can we avoid it?
The answer is to do a breadth-first search from endWord to find out how far away each word is from endWord. In this search we can also record for each word which words moved you closer. Again, even for all of the Scrabble dictionary, this data structure will be of manageable size. And you can break it off as soon as you've found how to get to startWord.
But now with this pre-processing, it is easy to start with startWord and recursively find all solutions. Because all of the work you'll be doing is enumerating paths that you already know will work.

Actionscript 3 error 1176 : Comparison between a value with static type Function and a possibly unrelated type int

I want to make coding about the final score display. If someone has done 10 multiple choice questions and he clicks on the final score button, then his final score will appear along with the description. The score will be made in a range according to the category, namely 1-59 = Under Average, 60-79 = Average, and 80-100 = Above Average.
I've tried coding it but I found error 1176 on line 7 and 11.
Can you help me fix it?
finalscorebutton.addEventListener(MouseEvent.CLICK, finalscore);
function finalscore(event:MouseEvent):void
{
multiplechoicefinalscore.text = sumofscores;
var finalscore:String = finalscore.toString;
finalscore = multiplechoicefinalscore..text;
if(finalscore.toString < 60){
description.text =
"UNDER AVERAGE.";
}
else if(finalscore.toString >= 60 && finalscore.toString <=79){
description.text =
"AVERAGE.";
}
else{
description.text =
"ABOVE AVERAGE.";
}
}
There are multiple syntax and logic errors.
Something.toString is a reference to a method, you probably mean Something.toString() which calls the said method and returns a text representation of whatever Something is.
You don't need a text representation because you want to compare numbers, you need a numeric representation (which is either int, uint or Number).
There are 2 dots in multiplechoicefinalscore..text, what does it even mean?
There is function finalscore and then you define var finalscore, defining things with the same names is a bad idea in general.
You should keep your script formatted properly, otherwise reading it and understanding would be a pain.
So, I assume you have the user's result is in sumofscores. I'm not sure if the script below will actually work as is, but at least it is logically and syntactically correct:
finalscorebutton.addEventListener(MouseEvent.CLICK, onFinal);
function onFinal(e:MouseEvent):void
{
// Ok, let's keep this one, I think you are putting
// the score result into some kind of TextField.
multiplechoicefinalscore.text = sumofscores;
// Get a definitely numeric representation of the score.
var aScore:int = int(sumofscores);
// In terms of logic, putting the complicated condition case
// under the "else" statement will simplify the program.
if (aScore < 60)
{
description.text = "UNDER AVERAGE.";
}
else if (aScore > 79)
{
description.text = "ABOVE AVERAGE.";
}
else
{
description.text = "AVERAGE.";
}
}

I cannot find out why this code keeps skipping a loop

Some background on what is going on:
We are processing addresses into standardized forms, this is the code to take addresses scored by how many components found and then rescore them using a levenshtein algorithm across similar post codes
The scores are how many components were found in that address divided by the number missed, to return a ratio
The input data, scoreDict, is a dictionary containing arrays of arrays. The first set of arrays is the scores, so there are 12 arrays because there are 12 scores in this file (it adjusts by file). There are then however many addresses fit that score in their own separate arrays stored in that. Don't ask me why I'm doing it that way, my brain is dead
The code correctly goes through each score array and each one is properly filled with the unique elements that make it up. It is not short by any amount, nothing is duplicated, I have checked
When we hit the score that is -1 (this goes to any address where it doesn't fit in some rule so we can't use its post code to find components so no components are found) the loop specifically ONLY DOES EVERY OTHER ADDRESS IN THIS SCORE ARRAY
It doesn't do this to any other score array, I have checked
I have tried changing the number to something else like 99, same issue except one LESS address got rescored, and the rest stayed at the original failing score of 99
I am going insane, can anyone find where in this loop something may be going wrong to cause it to only do every other line. The index counter of line and sc come through in the correct order and do not skip over. I have checked
I am sorry this is not professional, I have been at this one loop for 5 hours
Rescore: function Rescore(scoreDict) {
let tempInc = 0;
//Loop through all scores stored in scoreDict
for (var line in scoreDict) {
let addUpdate = "";
//Loop through each line stored by score
for (var sc in scoreDict[line.toString()]) {
console.log(scoreDict[line.toString()].length);
let possCodes = new Array();
const curLine = scoreDict[line.toString()][sc];
console.log(sc);
const curScore = curLine[1].split(',')[curLine[1].split(',').length-1];
switch (true) {
case curScore == -1:
let postCode = (new RegExp('([A-PR-UWYZ][A-HK-Y]?[0-9][A-Z0-9]?[ ]?[0-9][ABD-HJLNP-UW-Z]{2})', 'i')).exec(curLine[1].replace(/\\n/g, ','));
let areaCode;
//if (curLine.split(',')[curLine.split(',').length-2].includes("REFERENCE")) {
if ((postCode = (new RegExp('(([A-Z][A-Z]?[0-9][A-Z0-9]?(?=[ ]?[0-9][A-Z]{2}))|[0-9]{5})', 'i').exec(postCode))) !== null) {
for (const code in Object.keys(addProper)) {
leven.LoadWords(postCode[0], Object.keys(addProper)[code]);
if (leven.distance < 2) {
//Weight will have adjustment algorithms based on other factors
let weight = 1;
//Add all codes that are close to the same to a temp array
possCodes.push(postCode.input.replace(postCode[0], Object.keys(addProper)[code]).split(',')[0] + "(|W|)" + (leven.distance/weight));
}
}
let highScore = 0;
let candidates = new Array();
//Use the component script from cityprocess to rescore
for (var i=0;i<possCodes.length;i++) {
postValid.add([curLine[1].split(',').slice(0,curLine[1].split(',').length-2) + '(|S|)' + possCodes[i].split("(|W|)")[0]]);
if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] > highScore) {
candidates = new Array();
highScore = postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1];
candidates.push(postValid.addChunk[0]);
} else if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] == highScore) {
candidates.push(postValid.addChunk[0]);
}
}
score.Rescore(curLine, sc, candidates[0]);
}
//} else if (curLine.split(',')[curLine.split(',').length-2].contains("AREA")) {
// leven.LoadWords();
//}
break;
case curScore > 0:
//console.log("That's a pretty good score mate");
break;
}
//console.log(line + ": " + scoreDict[line].length);
}
}
console.log(tempInc)
score.ScoreWrite(score.scoreDict);
}
The issue was that I was calling the loop on the array I was editing, so as each element got removed from the array (rescored and moved into a separate array) it got shorter by that element, resulting in an issue that when the first element was rescored and removed, and then we moved onto the second index which was now the third element, because everything shifted up by 1 index
I fixed it by having it simply enter an empty array for each removed element, so everything kept its index and the array kept its length, and then clear the empty values at a later time in the code

Traverse a graph in parallel

I'm revising for an exam (still) and have come across a question (posted below) that has me stumped. I think, in summary, the question is asking "Think of any_old_process that has to traverse a graph and do some work on the objects it finds, including adding more work.". My question is, what data structure can be parallelised to achieve the goals set out in the question?
The role of a garbage collector (GC) is to reclaim unused memory.
Tracing collectors must identify all live objects by traversing graphs
of objects induced by aggregation relationships. In brief, the GC has
some work-list of tasks to perform. It repeatedly (a) acquires a task
(e.g. an object to inspect), (b) performs the task (e.g. marks the
object unless it is already marked), and (c) generates further tasks
(e.g. adds the children of an unmarked task to the work-list). It is
desirable to parallelise this operation.
In a single-threaded
environment, the work-list is usually a single LIFO stack. What would
you have to do to make this safe for a parallel GC? Would this be a
sensible design for a parallel GC? Discuss designs of data structure
to support a parallel GC that would scale better. Explain why you
would expect them to scale better.
The natural data structure for a graph is, well, a graph, i.e. a set of graph elements (nodes) which can refer other elements. Though, for the better cache reuse, the elements can be placed/allocated in an array or arrays (generally, vectors) in order to put neighbor elements as close in memory as possible. Generally, each element or a group of elements should have a mutex (spin_mutex) to protect access to it, the contention means that some other thread is busy working on it, so no need to wait. Though, if possible, an atomic operation over the flag/state fields is preferable to mark the element as visited without a lock. For example, the simplest data structure can be the following:
struct object {
vector<object*> references;
atomic<bool> is_visited; // for simplicity, or epoch counter
// if nothing resets it to false
void inspect(); // processing method
};
vector<object> objects; // also for simplicity, if it can be for real
// things like `parallel_for` would be perfect here
Given this data structure and the way how GC work is described, it perfectly fits for a recursive parallelism like divide-and-conquer pattern:
void object::inspect() {
if( ! is_visited.exchange(true) ) {
for( object* o : objects ) // alternatively it can be `parallel_for` in some variants
cilk_spawn o->inspect(); // for Cilk or `task_group::run` for TBB or PPL
// further processing of the object
}
}
If the data structure in the question is how the tasks are organized. I'd recommend a work-stealing scheduler (like tbb or cilk. There are tons of papers on this subject. To put it simple, each worker thread has its own but shared deque of tasks, and when the deque is empty, a thread steals tasks from others deques.
The scalability comes from the property that each task can add some other tasks which can work in prarallel..
Your questions:
Think of any_old_process that has to traverse a graph and do some work on the objects it finds, including adding more work.
... what data structure can be parallelised to achieve the goals set out in the question?
Quoted questions:
Some stuff about garbage collection.
Since you are specifically interested in parallelizing graph algorithms, I'll give an example of one kind of graph traversal that can be parallelized well.
Executive Summary
Finding local minima ("basins") or maxima ("peaks") are useful operations in digital image processing. A concrete example is geological watershed analysis. One approach to the problem treats each pixel or small group of pixels in the image as a node and finds non-overlapping minimum spanning trees (MST) with the local minima as the tree roots.
Gory details
Below is a simplistic example. It's a web interview question from Palantir Technologies brought to Programming Puzzles & Code Golf by AnkitSablok. It's simplified by two assumptions (bolded below):
That a pixel/cell only has 4 neighbors instead of the usual eight.
That a cell has all uphill neighbors (it's the local minima) or has a unique downhill neighbor. I.e., plains aren't allowed.
Below that is some JavaScript that solves this problem. It violates every reasonable coding standard against use of side-effects, but illustrates where some of the opportunities for parallelization exist.
In the "Create list of sinks (i.e. roots)" loop, note that each cell can be evaluated completely independently for elevation with respect to it's neighbors as long as the elevation data is static. In a sequential program, one thread of execution examines each cell. In a parallel program, the cells are divvied up so that one, and only one, thread reads and writes the local minima state information (sink[] in the program below). If generating the list of minima/roots in parallel, the queuing operations for the stack would have to be synchronized. For a discussion how to do that for stacks and other queues, see "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms", Michael & Scott, 1996. For modern updates, follow the citation tree on Google Scholar (no mutex required :).
In the "Each root explores it's basin" loop, note that each basin could explored/enumerated/flooded in parallel.
If you want dive deeper into parallelizing MSTs, see "Scalable Parallel Minimum Spanning Forest Computation", Nobari, Cao, arras, Bressan, 2012. The first two pages contain a clear and concise survey of the field.
Simplified example
A group of farmers has some elevation data, and we’re going to help them understand how rainfall flows over their farmland. We’ll represent the land as a two-dimensional array of altitudes and use the following model, based on the idea that water flows downhill:
If a cell’s four neighboring cells all have higher altitudes, we call this cell a sink; water collects in sinks. Otherwise, water will flow to the neighboring cell with the lowest altitude. If a cell is not a sink, you may assume it has a unique lowest neighbor and that this neighbor will be lower than the cell.
Cells that drain into the same sink – directly or indirectly – are said to be part of the same basin.
Your challenge is to partition the map into basins. In particular, given a map of elevations, your code should partition the map into basins and output the sizes of the basins, in descending order.
Assume the elevation maps are square. Input will begin with a line with one integer, S, the height (and width) of the map. The next S lines will each contain a row of the map, each with S integers – the elevations of the S cells in the row. Some farmers have small land plots such as the examples below, while some have larger plots. However, in no case will a farmer have a plot of land larger than S = 5000.
Your code should output a space-separated list of the basin sizes, in descending order. (Trailing spaces are ignored.)
Here's an example:
Input:
5
1 0 2 5 8
2 3 4 7 9
3 5 7 8 9
1 2 5 4 2
3 3 5 2 1
Output: 11 7 7
The basins, labeled with A’s, B’s, and C’s, are:
A A A A A
A A A A A
B B A C C
B B B C C
B B C C C
// lm.js - find the local minima
// Globalization of variables.
/*
The map is a 2 dimensional array. Indices for the elements map as:
[0,0] ... [0,n]
...
[n,0] ... [n,n]
Each element of the array is a structure. The structure for each element is:
Item Purpose Range Comment
---- ------- ----- -------
h Height of cell integers
s Is it a sink? boolean
x X of downhill cell (0..maxIndex) if s is true, x&y point to self
y Y of downhill cell (0..maxIndex)
b Basin name ('A'..'A'+# of basins)
Use a separate array-of-arrays for each structure item. The index range is
0..maxIndex.
*/
var height = [];
var sink = [];
var downhillX = [];
var downhillY = [];
var basin = [];
var maxIndex;
// A list of sinks in the map. Each element is an array of [ x, y ], where
// both x & y are in the range 0..maxIndex.
var basinList = [];
// An unordered list of basin sizes.
var basinSize = [];
// Functions.
function isSink(x,y) {
var myHeight = height[x][y];
var imaSink = true;
var bestDownhillHeight = myHeight;
var bestDownhillX = x;
var bestDownhillY = y;
/*
Visit the neighbors. If this cell is the lowest, then it's the
sink. If not, find the steepest downhill direction.
*/
function visit(deltaX,deltaY) {
var neighborX = x+deltaX;
var neighborY = y+deltaY;
if (myHeight > height[neighborX][neighborY]) {
imaSink = false;
if (bestDownhillHeight > height[neighborX][neighborY]) {
bestDownhillHeight = height[neighborX][neighborY];
bestDownhillX = neighborX;
bestDownhillY = neighborY;
}
}
}
if (x !== 0) {
// upwards neighbor exists
visit(-1,0);
}
if (x !== maxIndex) {
// downwards neighbor exists
visit(1,0);
}
if (y !== 0) {
// left-hand neighbor exists
visit(0,-1);
}
if (y !== maxIndex) {
// right-hand neighbor exists
visit(0,1);
}
downhillX[x][y] = bestDownhillX;
downhillY[x][y] = bestDownhillY;
return imaSink;
}
function exploreBasin(x,y,currentSize,basinName) {
// This cell is in the basin.
basin[x][y] = basinName;
currentSize++;
/*
Visit all neighbors that have this cell as the best downhill
path and add them to the basin.
*/
function visit(x,deltaX,y,deltaY) {
if ((downhillX[x+deltaX][y+deltaY] === x) && (downhillY[x+deltaX][y+deltaY] === y)) {
currentSize = exploreBasin(x+deltaX,y+deltaY,currentSize,basinName);
}
return 0;
}
if (x !== 0) {
// upwards neighbor exists
visit(x,-1,y,0);
}
if (x !== maxIndex) {
// downwards neighbor exists
visit(x,1,y,0);
}
if (y !== 0) {
// left-hand neighbor exists
visit(x,0,y,-1);
}
if (y !== maxIndex) {
// right-hand neighbor exists
visit(x,0,y,1);
}
return currentSize;
}
// Read map from file (1st argument).
var lines = $EXEC('cat "' + $ARG[0] + '"').split('\n');
maxIndex = lines.shift() - 1;
for (var i = 0; i<=maxIndex; i++) {
height[i] = lines.shift().split(' ');
// Create all other 2D arrays.
sink[i] = [];
downhillX[i] = [];
downhillY[i] = [];
basin[i] = [];
}
for (var i = 0; i<=maxIndex; i++) { print(height[i]); }
// Everyone decides if they are a sink. Create list of sinks (i.e. roots).
for (var x=0; x<=maxIndex; x++) {
for (var y=0; y<=maxIndex; y++) a
if (sink[x][y] = isSink(x,y)) {
// This node is a root (AKA sink).
basinList.push([x,y]);
}
}
}
//for (var i = 0; i<=maxIndex; i++) { print(sink[i]); }
// Each root explores it's basin.
var basinName = 'A';
for (var i=basinList.length-1; i>=0; --i) { // i-- makes Closure Compiler sad
var x = basinList[i][0];
var y = basinList[i][5];
basinSize.push(exploreBasin(x,y,0,basinName));
basinName = String.fromCharCode(basinName.charCodeAt() + 1);
}
for (var i = 0; i<=maxIndex; i++) { print(basin[i]); }
// Done.
print(basinSize.sort(function(a, b){return b-a}).join(' '));

Iterative deepening search : Is it recursive?

I've searched the internet about the IDS algorithm and I keep finding some example but they are all with recursion, and as I understood iterative is not recursive..
So can you please give me some examples for IDS algorithm ?(implementation would be great and without recursion)
Thanks in advance! you will be a life saver!
The iterative part is not recursive: at the top it is more or less:
int limit = 0;
Solution sol;
do {
limit++;
sol = search(problem,limit);
} while(sol == null);
//do something with the solution.
This said, in most cases searching for a solution is indeed implemented recursively:
Solution search(Problem problem, int limit) {
return search(problem,0,limit);
}
Solution search (Problem problem, int price, int limit) {
if(problem.solved) {
return problem.getSolution();
}
for(int value = 0; value < valuerange; value++) {
problem.assignVariable(value);
int newprice = price + problem.price();
if(price < limit) {
Solution solution = search(problem,newprice,limit);
if(s != null) {
return solution;
}
}
problem.backtrackVariable();
}
return null;
}
But there exists an automatic procedure to turn any recursive program into a non-recursive one.
If you are thinking in algorithm terms (not just implementation), this would mean applying iteration at all nodes of the search tree, instead of just at the root node.
In the case of chess programs, this does have some bennefits. It improves move ordering, even in the case where a branch that was previously pruned by alpha-beta is later included. The cost of the extra search is kept low by using a transposition table.
https://www.chessprogramming.org/Internal_Iterative_Deepening

Resources