I cannot find out why this code keeps skipping a loop - node.js

Some background on what is going on:
We are processing addresses into standardized forms, this is the code to take addresses scored by how many components found and then rescore them using a levenshtein algorithm across similar post codes
The scores are how many components were found in that address divided by the number missed, to return a ratio
The input data, scoreDict, is a dictionary containing arrays of arrays. The first set of arrays is the scores, so there are 12 arrays because there are 12 scores in this file (it adjusts by file). There are then however many addresses fit that score in their own separate arrays stored in that. Don't ask me why I'm doing it that way, my brain is dead
The code correctly goes through each score array and each one is properly filled with the unique elements that make it up. It is not short by any amount, nothing is duplicated, I have checked
When we hit the score that is -1 (this goes to any address where it doesn't fit in some rule so we can't use its post code to find components so no components are found) the loop specifically ONLY DOES EVERY OTHER ADDRESS IN THIS SCORE ARRAY
It doesn't do this to any other score array, I have checked
I have tried changing the number to something else like 99, same issue except one LESS address got rescored, and the rest stayed at the original failing score of 99
I am going insane, can anyone find where in this loop something may be going wrong to cause it to only do every other line. The index counter of line and sc come through in the correct order and do not skip over. I have checked
I am sorry this is not professional, I have been at this one loop for 5 hours
Rescore: function Rescore(scoreDict) {
let tempInc = 0;
//Loop through all scores stored in scoreDict
for (var line in scoreDict) {
let addUpdate = "";
//Loop through each line stored by score
for (var sc in scoreDict[line.toString()]) {
console.log(scoreDict[line.toString()].length);
let possCodes = new Array();
const curLine = scoreDict[line.toString()][sc];
console.log(sc);
const curScore = curLine[1].split(',')[curLine[1].split(',').length-1];
switch (true) {
case curScore == -1:
let postCode = (new RegExp('([A-PR-UWYZ][A-HK-Y]?[0-9][A-Z0-9]?[ ]?[0-9][ABD-HJLNP-UW-Z]{2})', 'i')).exec(curLine[1].replace(/\\n/g, ','));
let areaCode;
//if (curLine.split(',')[curLine.split(',').length-2].includes("REFERENCE")) {
if ((postCode = (new RegExp('(([A-Z][A-Z]?[0-9][A-Z0-9]?(?=[ ]?[0-9][A-Z]{2}))|[0-9]{5})', 'i').exec(postCode))) !== null) {
for (const code in Object.keys(addProper)) {
leven.LoadWords(postCode[0], Object.keys(addProper)[code]);
if (leven.distance < 2) {
//Weight will have adjustment algorithms based on other factors
let weight = 1;
//Add all codes that are close to the same to a temp array
possCodes.push(postCode.input.replace(postCode[0], Object.keys(addProper)[code]).split(',')[0] + "(|W|)" + (leven.distance/weight));
}
}
let highScore = 0;
let candidates = new Array();
//Use the component script from cityprocess to rescore
for (var i=0;i<possCodes.length;i++) {
postValid.add([curLine[1].split(',').slice(0,curLine[1].split(',').length-2) + '(|S|)' + possCodes[i].split("(|W|)")[0]]);
if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] > highScore) {
candidates = new Array();
highScore = postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1];
candidates.push(postValid.addChunk[0]);
} else if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] == highScore) {
candidates.push(postValid.addChunk[0]);
}
}
score.Rescore(curLine, sc, candidates[0]);
}
//} else if (curLine.split(',')[curLine.split(',').length-2].contains("AREA")) {
// leven.LoadWords();
//}
break;
case curScore > 0:
//console.log("That's a pretty good score mate");
break;
}
//console.log(line + ": " + scoreDict[line].length);
}
}
console.log(tempInc)
score.ScoreWrite(score.scoreDict);
}

The issue was that I was calling the loop on the array I was editing, so as each element got removed from the array (rescored and moved into a separate array) it got shorter by that element, resulting in an issue that when the first element was rescored and removed, and then we moved onto the second index which was now the third element, because everything shifted up by 1 index
I fixed it by having it simply enter an empty array for each removed element, so everything kept its index and the array kept its length, and then clear the empty values at a later time in the code

Related

How do I generate a string that contains a keyword?

I'm currently making a program with many functions that utilise Math.rand(). I'm trying to generate a string with a given keyword (in this case, lathe). I want the program to log a string that has "lathe" (or any version of it, with capitals or not), but everything I've tried has the program hit its call stack size limit (I understand exactly why, I want the program to generate a string with the word without it hitting its call stack size).
What I have tried:
function generateStringWithKeyword(randNum: number) {
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/";
let result = "";
for(let i = 0; i < randNum; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
if(result.includes("lathe")) {
continue;
} else {
generateStringWithKeyword(randNum);
}
}
console.log(result);
}
This is what I have now, after doing brief research on stackoverflow I learned that it might have been better to add the if/else block with a continue, rather than using
if(!result.includes("lathe")) return generateStringWithKeyword(randNum);
But both ways I had hit the call stack size limit.
A "correct" version of your algorithm, written as an iterative function instead of as a recursive one so as not to exceed stack depth, would look something like this:
function generateStringWithKeyword(randNum: number) {
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/";
let result = "";
let attemptCnt = 0;
while (!result.toLowerCase().includes("lathe")) {
attemptCnt++;
result = "";
for (let i = 0; i < randNum; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
}
if (attemptCnt > 1e6) {
console.log("I GIVE UP");
return;
}
}
console.log(result);
return result;
}
I don't like when my browser hangs because of a script that won't finish, so I put a maximum attempt count in there. A million chances seems reasonable. When you try it out, this happens:
generateStringWithKeyword(10); // I GIVE UP
Which makes sense; let's perform a rough back-of-the-envelope probability calculation to see how long we might expect this to take. The chance that "lathe" will appear in some case at position 1 of the word is (2/64)×(2/64)×(2×64)×(2/64)×(2/64) ("L" or "l" appears first, followed by "A" or "a", etc) which is approximately 3×10-8. For a word of length 10, "lathe" can appear starting at positions 1, 2, 3, 4, 5, or 6. While this isn't exactly correct, let's think of this as multiplying your chances by 6 of getting the word somewhere, so the actual chance of getting a valid result is somewhere around 1.8×10-7. So we can expect that you'd need to make approximately 1 ÷ 1.8×10-7 = 5.6 million chances to succeed.
Oh, darn, I only gave it a million. Let's up that to 10 million and try again:
generateStringWithKeyword(10); // "lATHELEYSc"
Great! Although, it does sometimes still give up. And really, an algorithm which needs millions of tries before it succeeds is very, very inefficient. You might want to read about bogosort, a sorting algorithm which works by randomly shuffling things and checking to see if they are sorted, and it keeps trying until it works. It's used for educational purposes to highlight how such techniques don't really perform well enough to be practical. Nobody would ever want to use such an algorithm for real.
So how would you do this "the right" way? Well, my suggestion here is to just build your result correctly the first time. If you have 10 characters and 5 of them need to be "lathe" in some case, then you will need 5 truly random characters. So randomly decide how many of those letters should be before "lathe". If you pick 2, for example, then put 2 random characters, plus "lathe" in a random case, plus 3 more random characters.
It could be something like this, where I mostly use your same style of for-loops and += string concatenation:
function generateStringWithKeyword(randNum: number) {
const keyword = "lathe";
if (randNum < keyword.length) throw new Error(
"This is not possible; \"" + keyword + "\" doesn't fit in " + randNum + " characters"
);
const actuallyRandNum = randNum - keyword.length;
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/";
let result = "";
const kwInsertionPoint = Math.floor(Math.random() * (actuallyRandNum + 1));
for (let i = 0; i < kwInsertionPoint; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
}
for (let i = 0; i < keyword.length; i++) {
result += Math.random() < 0.5 ? keyword[i].toLowerCase() : keyword[i].toUpperCase();
}
for (let i = kwInsertionPoint; i < actuallyRandNum; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
}
return result;
}
If you run this, you will see that it is very efficient, and never gives up:
console.log(Array.from({ length: 4 }, () => generateStringWithKeyword(5)).join(" "));
// "lathE LaThe lATHe LatHe"
console.log(Array.from({ length: 4 }, () => generateStringWithKeyword(7)).join(" "));
// "p6lAtHe laThE01 nlaTheK lATHeRJ"
console.log(Array.from({ length: 4 }, () => generateStringWithKeyword(10)).join(" "));
// "giMqzLaTHe 5klAthegBo oVdLatHe0q twNlATheCr"
Playground link to code

O(N) Simple Diffing Algorithm Implementation -- is this right?

I just posted this on HN but it doesn't seem to be getting much uptake, I had a question about diffing -- I wanted to know if an implementation I'm using is alright: it seems a little too simple, and the literature on diffing is dense.
Background: I've been building a rendering engine for a code editor the past couple of days. Rendering huge chunks of highlighted syntax can get laggy. It's not worth switching to React at this stage, so I wanted to just write a quick diff algorithm that would selectively update only changed lines.
I found this article:
https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
With a link to this paper, the initial Git diff implementation:
http://www.xmailserver.org/diff2.pdf
I couldn't find the PDF to start with, but read "edit graph" and immediately thought — why don't I just use a hashtable to store lines from LEFT_TEXT and references to where they are, then iterate over RIGHT_TEXT and return matches one by one, also making sure that I keep track of the last match to prevent jumbling?
The algorithm I produced is only a few lines and seems accurate. It's O(N) time complexity, whereas the paper above gives a best case of O(ND) where D is minimum edit distance.
function lineDiff (left, right) {
left = left.split('\n');
right = right.split('\n');
let lookup = {};
// Store line numbers from LEFT in a lookup table
left.forEach(function (line, i) {
lookup[line] = lookup[line] || [];
lookup[line].push(i);
});
// Last line we matched
var minLine = -1;
return right.map(function (line) {
lookup[line] = lookup[line] || [];
var lineNumber = -1;
if (lookup[line].length) {
lineNumber = lookup[line].shift();
// Make sure we're looking ahead
if (lineNumber > minLine) {
minLine = lineNumber;
} else {
lineNumber = -1
}
}
return {
value: line,
from: lineNumber
};
});
}
RunKit link: https://runkit.com/keithwhor/line-diff
What am I missing? I can't find other references to doing diffing like this. Everything just links back to that one paper.

Find position of item in list using Binary Search

The question is:
Given a list of String, find a specific string in the list and return
its index in the ordered list of String sorted by mergesort. There are
two cases:
The string is in the list, return the index it should be in, in the ordered list.
The String is NOT in the list, return the index it is supposed to be in, in the ordered list.
Here is my my code, I assume that the given list is already ordered.
For 2nd case, how do I use mergesort to find the supposed index? I would appreciate some clues.
I was thinking to get a copy of the original list first, sort it, and get the index of the string in the copy list. Here I got stuck... do I use mergesort again to get the index of non-existing string in the copy list?
public static int BSearch(List<String> s, String a) {
int size = s.size();
int half = size / 2;
int index = 0;
// base case?
if (half == 0) {
if (s.get(half) == a) {
return index;
} else {
return index + 1;
}
}
// with String a
if (s.contains(a)) {
// on the right
if (s.indexOf(s) > half) {
List<String> rightHalf = s.subList(half + 1, size);
index += half;
return BSearch(rightHalf, a);
} else {
// one the left
List<String> leftHalf = s.subList(0, half - 1);
index += half;
return BSearch(leftHalf, a);
}
}
return index;
}
When I run this code, the index is not updated. I wonder what is wrong here. I only get 0 or 1 when I test the code even with the string in the list.
Your code only returns 0 or 1 because you don't keep track of your index for each recursive call, instead of resetting to 0 each time. Also, to find where the non-existent element should be, consider the list {0,2,3,5,6}. If we were to run a binary search to look for 4 here, it should stop at the index where element 5 is. Hope that's enough to get you started!

AS3 "Advanced" string manipulation

I'm making an air dictionary and I have a(nother) problem. The main app is ready to go and works perfectly but when I tested it I noticed that it could be better. A bit of context: the language (ancient egyptian) I'm translating from does not use punctuation so a phrase canlooklikethis. Add to that the sheer complexity of the glyph system (6000+ glyphs).
Right know my app works like this :
user choose the glyphs composing his/r word.
app transforms those glyphs to alphanumerical values (A1 - D36 - X1A, etc).
the code compares the code (say : A5AD36) to a list of xml values.
if the word is found (A5AD36 = priestess of Bast), the user gets the translation. if not, s/he gets all the possible words corresponding to the two glyphs (A5A & D36).
If the user knows the string is a word, no problem. But if s/he enters a few words, s/he'll have a few more choices than hoped (exemple : query = A1A5AD36 gets A1 - A5A - D36 - A5AD36).
What I would like to do is this:
query = A1A5AD36 //word/phrase to be translated;
varArray = [A1, A5A, D36] //variables containing the value of the glyphs.
Corresponding possible words from the xml : A1, A5A, D36, A5AD36.
Possible phrases: A1 A5A D36 / A1 A5AD36 / A1A5A D36 / A1A5AD36.
Possible phrases with only legal words: A1 A5A D36 / A1 A5AD36.
I'm not I really clear but to things simple, I'd like to get all the possible phrases containing only legal words and filter out the other ones.
(example with english : TOBREAKFAST. Legal = to break fast / to breakfast. Illegal = tobreak fast.
I've managed to get all the possible words, but not the rest. Right now, when I run my app, I have an array containing A1 - A5A - D36 - A5AD36. But I'm stuck going forward.
Does anyone have an idea ? Thank you :)
function fnSearch(e: Event): void {
var val: int = sp.length; //sp is an array filled with variables containing the code for each used glyph.
for (var i: int = 0; i < val; i++) { //repeat for every glyph use.
var X: String = ""; //variable created to compare with xml dictionary
for (var i2: int = 0; i2 < val; i2++) { // if it's the first time, use the first glyph-code, else the one after last used.
if (X == "") {
X = sp[i];
} else {
X = X + sp[i2 + i];
}
xmlresult = myXML.mot.cd; //xmlresult = alphanumerical codes corresponding to words from XMLList already imported
trad = myXML.mot.td; //same with traductions.
for (var i3: int = 0; i3 < xmlresult.length(); i3++) { //check if element X is in dictionary
var codeElement: XML = xmlresult[i3]; //variable to compare with X
var tradElement: XML = trad[i3]; //variable corresponding to codeElement
if (X == codeElement.toString()) { //if codeElement[i3] is legal, add it to array of legal words.
checkArray.push(codeElement); //checkArray is an array filled with legal words.
}
}
}
}
var iT2: int = 500 //iT2 set to unreachable value for next lines.
for (var iT: int = 0; iT < checkArray.length; iT++) { //check if the word searched by user is in the results.
if (checkArray[iT] == query) {
iT2 = iT
}
}
if (iT2 != 500) { //if complete query is found, put it on top of the array so it appears on top of the results.
var oldFirst: String = checkArray[0];
checkArray[0] = checkArray[iT2];
checkArray[iT2] = oldFirst;
}
results.visible = true; //make result list visible
loadingResults.visible = false; //loading screen
fnPossibleResults(null); //update result list.
}
I end up with an array of variables containing the glyph-codes (sp) and another with all the possible legal words (checkArray). What I don't know how to do is mix those two to make legal phrases that way :
If there was only three glyphs, I could probably find a way, but user can enter 60 glyphs max.

Is it possible to do a Levenshtein distance in Excel without having to resort to Macros?

Let me explain.
I have to do some fuzzy matching for a company, so ATM I use a levenshtein distance calculator, and then calculate the percentage of similarity between the two terms. If the terms are more than 80% similar, Fuzzymatch returns "TRUE".
My problem is that I'm on an internship, and leaving soon. The people who will continue doing this do not know how to use excel with macros, and want me to implement what I did as best I can.
So my question is : however inefficient the function may be, is there ANY way to make a standard function in Excel that will calculate what I did before, without resorting to macros ?
Thanks.
If you came about this googling something like
levenshtein distance google sheets
I threw this together, with the code comment from milot-midia on this gist (https://gist.github.com/andrei-m/982927 - code under MIT license)
From Sheets in the header menu, Tools -> Script Editor
Name the project
The name of the function (not the project) will let you use the func
Paste the following code
function Levenshtein(a, b) {
if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
// swap to save some memory O(min(a,b)) instead of O(a)
if(a.length > b.length) {
var tmp = a;
a = b;
b = tmp;
}
var row = [];
// init the row
for(var i = 0; i <= a.length; i++){
row[i] = i;
}
// fill in the rest
for(var i = 1; i <= b.length; i++){
var prev = i;
for(var j = 1; j <= a.length; j++){
var val;
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
row[j - 1] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
You should be able to run it from a spreadsheet with
=Levenshtein(cell_1,cell_2)
While it can't be done in a single formula for any reasonably-sized strings, you can use formulas alone to compute the Levenshtein Distance between strings using a worksheet.
Here is an example that can handle strings up to 15 characters, it could be easily expanded for more:
https://docs.google.com/spreadsheet/ccc?key=0AkZy12yffb5YdFNybkNJaE5hTG9VYkNpdW5ZOWowSFE&usp=sharing
This isn't practical for anything other than ad-hoc comparisons, but it does do a decent job of showing how the algorithm works.
looking at the previous answers to calculating Levenshtein distance, I think it would be impossible to create it as a formula.
Take a look at the code here
Actually, I think I just found a workaround. I was adding it in the wrong part of the code...
Adding this line
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
so it now reads
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
Seems to fix the problem. Now 'biulding' is 92% accurate and 'bilding' is 88%. (whereas with the original formula 'biulding' was only 75%... despite being closer to the correct spelling of building)

Resources