How do I generate a string that contains a keyword? - string

I'm currently making a program with many functions that utilise Math.rand(). I'm trying to generate a string with a given keyword (in this case, lathe). I want the program to log a string that has "lathe" (or any version of it, with capitals or not), but everything I've tried has the program hit its call stack size limit (I understand exactly why, I want the program to generate a string with the word without it hitting its call stack size).
What I have tried:
function generateStringWithKeyword(randNum: number) {
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/";
let result = "";
for(let i = 0; i < randNum; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
if(result.includes("lathe")) {
continue;
} else {
generateStringWithKeyword(randNum);
}
}
console.log(result);
}
This is what I have now, after doing brief research on stackoverflow I learned that it might have been better to add the if/else block with a continue, rather than using
if(!result.includes("lathe")) return generateStringWithKeyword(randNum);
But both ways I had hit the call stack size limit.

A "correct" version of your algorithm, written as an iterative function instead of as a recursive one so as not to exceed stack depth, would look something like this:
function generateStringWithKeyword(randNum: number) {
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/";
let result = "";
let attemptCnt = 0;
while (!result.toLowerCase().includes("lathe")) {
attemptCnt++;
result = "";
for (let i = 0; i < randNum; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
}
if (attemptCnt > 1e6) {
console.log("I GIVE UP");
return;
}
}
console.log(result);
return result;
}
I don't like when my browser hangs because of a script that won't finish, so I put a maximum attempt count in there. A million chances seems reasonable. When you try it out, this happens:
generateStringWithKeyword(10); // I GIVE UP
Which makes sense; let's perform a rough back-of-the-envelope probability calculation to see how long we might expect this to take. The chance that "lathe" will appear in some case at position 1 of the word is (2/64)×(2/64)×(2×64)×(2/64)×(2/64) ("L" or "l" appears first, followed by "A" or "a", etc) which is approximately 3×10-8. For a word of length 10, "lathe" can appear starting at positions 1, 2, 3, 4, 5, or 6. While this isn't exactly correct, let's think of this as multiplying your chances by 6 of getting the word somewhere, so the actual chance of getting a valid result is somewhere around 1.8×10-7. So we can expect that you'd need to make approximately 1 ÷ 1.8×10-7 = 5.6 million chances to succeed.
Oh, darn, I only gave it a million. Let's up that to 10 million and try again:
generateStringWithKeyword(10); // "lATHELEYSc"
Great! Although, it does sometimes still give up. And really, an algorithm which needs millions of tries before it succeeds is very, very inefficient. You might want to read about bogosort, a sorting algorithm which works by randomly shuffling things and checking to see if they are sorted, and it keeps trying until it works. It's used for educational purposes to highlight how such techniques don't really perform well enough to be practical. Nobody would ever want to use such an algorithm for real.
So how would you do this "the right" way? Well, my suggestion here is to just build your result correctly the first time. If you have 10 characters and 5 of them need to be "lathe" in some case, then you will need 5 truly random characters. So randomly decide how many of those letters should be before "lathe". If you pick 2, for example, then put 2 random characters, plus "lathe" in a random case, plus 3 more random characters.
It could be something like this, where I mostly use your same style of for-loops and += string concatenation:
function generateStringWithKeyword(randNum: number) {
const keyword = "lathe";
if (randNum < keyword.length) throw new Error(
"This is not possible; \"" + keyword + "\" doesn't fit in " + randNum + " characters"
);
const actuallyRandNum = randNum - keyword.length;
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/";
let result = "";
const kwInsertionPoint = Math.floor(Math.random() * (actuallyRandNum + 1));
for (let i = 0; i < kwInsertionPoint; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
}
for (let i = 0; i < keyword.length; i++) {
result += Math.random() < 0.5 ? keyword[i].toLowerCase() : keyword[i].toUpperCase();
}
for (let i = kwInsertionPoint; i < actuallyRandNum; i++) {
result += chars[Math.floor(Math.random() * chars.length)];
}
return result;
}
If you run this, you will see that it is very efficient, and never gives up:
console.log(Array.from({ length: 4 }, () => generateStringWithKeyword(5)).join(" "));
// "lathE LaThe lATHe LatHe"
console.log(Array.from({ length: 4 }, () => generateStringWithKeyword(7)).join(" "));
// "p6lAtHe laThE01 nlaTheK lATHeRJ"
console.log(Array.from({ length: 4 }, () => generateStringWithKeyword(10)).join(" "));
// "giMqzLaTHe 5klAthegBo oVdLatHe0q twNlATheCr"
Playground link to code

Related

How can I switch from operator to another operator?

I'm making a program that can understands human words, and so far it's going great.
My current standpoint is understanding math equations. You can add, subtract, multiply and divide very easily with as many numbers as you'd like, but I'm wondering how I can do addition then multiply the result, like this:
const source = require("./source.js")
var main = source.interpret(
"add 4 and 4 multiply by 4",
)
source.output(main)
And it should output:
64
Yes I know that there is an easier way of doing this math equation, however in any calculator of any sort you should be able to this kind of switching in any context.
How can I accomplish this?
Here is the full source code;
index.js:
const source = require("./source.js")
var main = source.interpret(
"add 4 and 4 together then multiply the result by 4",
)
source.output(main)
source.js:
function output(main) {
console.log(main)
}
function interpret(str) {
const dl = str.split(' ');
const operator = dl.shift(x => x.includes("add", "subtract", "multiply", "divide"))
const numbers = dl.filter(x => Number(x))
switch (operator) {
case "add":
return numbers.reduce((a, b) => Number(a) + Number(b));
case "subtract":
return numbers.reduce((a, b) => Number(a) - Number(b));
case "multiply":
return numbers.reduce((a, b) => Number(a) * Number(b));
case "divide":
return numbers.reduce((a, b) => Number(a) / Number(b));
}
}
module.exports = {interpret, output}
The main problem with your interpret function is that after finding a single operator, it will perform that operation on all numbers and immediately return. We can’t simply reduce all the numbers using the first operation we find, because it’s possible that some numbers are related to other operations! In the expression add 2 and 2 multiply by 3, the 3 is related to the multiply operation!
This means that we can't process the entire input like that. An alternative is to iterate over the input, and depending on the operator we find, we perform the related action.
To simplify, let's consider that there's only the add operation. What are we expecting next? It could be [number] and [number], but also, it can be by [number]. The first one just adds the two numbers, but in the second, it should add the new number to the last operation.
A side note: your shift and filter functions are parsing the input, and the switch case is interpreting the parsed structure. Your “human language” is actually a programming language! add 2 and 2 is analogous to 2 + 2 in JavaScript, just different. With that, I will introduce you to some programming language theory terms, it can be easier to search for more help if you deep dive in the topic.
Considering the last paragraph, let's refactor interpret:
// from https://stackoverflow.com/questions/175739/how-can-i-check-if-a-string-is-a-valid-number
function isNumeric(str) {
if (typeof str != "string") return false
return !isNaN(str) && !isNaN(parseFloat(str))
}
function interpret(input) {
const tokens = input.split(' ') // in fancy programming language terms,
// this is a lexical analysis step
// note that we are not supporting things like
// double spaces, something to think about!
let state = 0 // we are keeping the results from our operation here
for (i = 0; i < tokens.length; i++) {
const t = tokens[i] // to keep things shorter
switch (t) {
case "add": // remember: there's two possible uses of this operator
const next = tokens[i + 1]
if (next == "by") {
// we should add the next token (hopefully a number!) to the state
state += parseFloat(tokens[i + 2])
i += 2 // very important! the two tokens we read should be skipped
// by the loop. they were "consumed".
continue // stop processing. we are done with this operation
}
if (isNumeric(next)) {
const a = tokens[i + 2] // this should be the "and"
if (a != "and") {
throw new Error(`expected "and" token, got: ${a}`)
}
const b = parseFloat(tokens[i + 3])
state = parseFloat(next) + b
i += 3 // in this case, we are consuming more tokens
continue
}
throw new Error(`unexpected token: ${next}`)
}
}
return state
}
const input = `add 2 and 2 add by 2 add by 5`
console.log(interpret(input))
There's a lot to improve from this code, but hopefully, you can get an idea or two. One thing to note is that all your operations are "binary operations": they always take two operands. So all that checking and extracting depending if it's by [number] or a [number] and [number] expression is not specific to add, but all operations. There's many ways to write this, you could have a binary_op function, I will go for possibly the least maintainable option:
// from https://stackoverflow.com/questions/175739/how-can-i-check-if-a-string-is-a-valid-number
function isNumeric(str) {
if (typeof str != "string") return false
return !isNaN(str) && !isNaN(parseFloat(str))
}
function isOperand(token) {
const ops = ["add", "multiply"]
if (ops.includes(token)) {
return true
}
return false
}
function interpret(input) {
const tokens = input.split(' ') // in fancy programming language terms,
// this is a lexical analysis step
// note that we are not supporting things like
// double spaces, something to think about!
let state = 0 // we are keeping the results from our operation here
for (i = 0; i < tokens.length; i++) {
const t = tokens[i] // to keep things shorter
if (!isOperand(t)) {
throw new Error(`expected operand token, got: ${t}`)
}
// all operators are binary, so these variables will hold the operands
// they may be two numbers, or a number and the internal state
let a, b;
const next = tokens[i + 1]
if (next == "by") {
// we should add the next token (hopefully a number!) to the state
a = state
b = parseFloat(tokens[i + 2])
i += 2 // very important! the two tokens we read should be skipped
// by the loop. they were "consumed".
}
else if (isNumeric(next)) {
const and = tokens[i + 2] // this should be the "and"
if (and != "and") {
throw new Error(`expected "and" token, got: ${and}`)
}
a = parseFloat(next)
b = parseFloat(tokens[i + 3])
i += 3 // in this case, we are consuming more tokens
} else {
throw new Error(`unexpected token: ${next}`)
}
switch (t) {
case "add":
state = a + b
break;
case "multiply":
state = a * b
}
}
return state
}
const input = `add 2 and 2 add by 1 multiply by 5`
console.log(interpret(input)) // should log 25
There's much more to explore. We are writing a "single-pass" interpreter, where the parsing and the interpreting are tied together. You can split these two, and have a parsing function that turns the input into a structure that you can then interpret. Another point is precedence, we are applying the operation in the order they appear in the expression, but in math, multiplication should be done first than addition. All of these problems are programming language problems.
If you are interested, I deeply recommend the book http://craftinginterpreters.com/ for a gentle introduction on writing programming languages, it will definitely help in your endeavor.

I cannot find out why this code keeps skipping a loop

Some background on what is going on:
We are processing addresses into standardized forms, this is the code to take addresses scored by how many components found and then rescore them using a levenshtein algorithm across similar post codes
The scores are how many components were found in that address divided by the number missed, to return a ratio
The input data, scoreDict, is a dictionary containing arrays of arrays. The first set of arrays is the scores, so there are 12 arrays because there are 12 scores in this file (it adjusts by file). There are then however many addresses fit that score in their own separate arrays stored in that. Don't ask me why I'm doing it that way, my brain is dead
The code correctly goes through each score array and each one is properly filled with the unique elements that make it up. It is not short by any amount, nothing is duplicated, I have checked
When we hit the score that is -1 (this goes to any address where it doesn't fit in some rule so we can't use its post code to find components so no components are found) the loop specifically ONLY DOES EVERY OTHER ADDRESS IN THIS SCORE ARRAY
It doesn't do this to any other score array, I have checked
I have tried changing the number to something else like 99, same issue except one LESS address got rescored, and the rest stayed at the original failing score of 99
I am going insane, can anyone find where in this loop something may be going wrong to cause it to only do every other line. The index counter of line and sc come through in the correct order and do not skip over. I have checked
I am sorry this is not professional, I have been at this one loop for 5 hours
Rescore: function Rescore(scoreDict) {
let tempInc = 0;
//Loop through all scores stored in scoreDict
for (var line in scoreDict) {
let addUpdate = "";
//Loop through each line stored by score
for (var sc in scoreDict[line.toString()]) {
console.log(scoreDict[line.toString()].length);
let possCodes = new Array();
const curLine = scoreDict[line.toString()][sc];
console.log(sc);
const curScore = curLine[1].split(',')[curLine[1].split(',').length-1];
switch (true) {
case curScore == -1:
let postCode = (new RegExp('([A-PR-UWYZ][A-HK-Y]?[0-9][A-Z0-9]?[ ]?[0-9][ABD-HJLNP-UW-Z]{2})', 'i')).exec(curLine[1].replace(/\\n/g, ','));
let areaCode;
//if (curLine.split(',')[curLine.split(',').length-2].includes("REFERENCE")) {
if ((postCode = (new RegExp('(([A-Z][A-Z]?[0-9][A-Z0-9]?(?=[ ]?[0-9][A-Z]{2}))|[0-9]{5})', 'i').exec(postCode))) !== null) {
for (const code in Object.keys(addProper)) {
leven.LoadWords(postCode[0], Object.keys(addProper)[code]);
if (leven.distance < 2) {
//Weight will have adjustment algorithms based on other factors
let weight = 1;
//Add all codes that are close to the same to a temp array
possCodes.push(postCode.input.replace(postCode[0], Object.keys(addProper)[code]).split(',')[0] + "(|W|)" + (leven.distance/weight));
}
}
let highScore = 0;
let candidates = new Array();
//Use the component script from cityprocess to rescore
for (var i=0;i<possCodes.length;i++) {
postValid.add([curLine[1].split(',').slice(0,curLine[1].split(',').length-2) + '(|S|)' + possCodes[i].split("(|W|)")[0]]);
if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] > highScore) {
candidates = new Array();
highScore = postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1];
candidates.push(postValid.addChunk[0]);
} else if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] == highScore) {
candidates.push(postValid.addChunk[0]);
}
}
score.Rescore(curLine, sc, candidates[0]);
}
//} else if (curLine.split(',')[curLine.split(',').length-2].contains("AREA")) {
// leven.LoadWords();
//}
break;
case curScore > 0:
//console.log("That's a pretty good score mate");
break;
}
//console.log(line + ": " + scoreDict[line].length);
}
}
console.log(tempInc)
score.ScoreWrite(score.scoreDict);
}
The issue was that I was calling the loop on the array I was editing, so as each element got removed from the array (rescored and moved into a separate array) it got shorter by that element, resulting in an issue that when the first element was rescored and removed, and then we moved onto the second index which was now the third element, because everything shifted up by 1 index
I fixed it by having it simply enter an empty array for each removed element, so everything kept its index and the array kept its length, and then clear the empty values at a later time in the code

Optimal algorithm for this string decompression

I have been working on an exercise from google's dev tech guide. It is called Compression and Decompression you can check the following link to get the description of the problem Challenge Description.
Here is my code for the solution:
public static String decompressV2 (String string, int start, int times) {
String result = "";
for (int i = 0; i < times; i++) {
inner:
{
for (int j = start; j < string.length(); j++) {
if (isNumeric(string.substring(j, j + 1))) {
String num = string.substring(j, j + 1);
int times2 = Integer.parseInt(num);
String temp = decompressV2(string, j + 2, times2);
result = result + temp;
int next_j = find_next(string, j + 2);
j = next_j;
continue;
}
if (string.substring(j, j + 1).equals("]")) { // Si es un bracket cerrado
break inner;
}
result = result + string.substring(j,j+1);
}
}
}
return result;
}
public static int find_next(String string, int start) {
int count = 0;
for (int i = start; i < string.length(); i++) {
if (string.substring(i, i+1).equals("[")) {
count= count + 1;
}
if (string.substring(i, i +1).equals("]") && count> 0) {
count = count- 1;
continue;
}
if (string.substring(i, i +1).equals("]") && count== 0) {
return i;
}
}
return -111111;
}
I will explain a little bit about the inner workings of my approach. It is a basic solution involves use of simple recursion and loops.
So, let's start from the beggining with a simple decompression:
DevTech.decompressV2("2[3[a]b]", 0, 1);
As you can see, the 0 indicates that it has to iterate over the string at index 0, and the 1 indicates that the string has to be evaluated only once: 1[ 2[3[a]b] ]
The core here is that everytime you encounter a number you call the algorithm again(recursively) and continue where the string insides its brackets ends, that's the find_next function for.
When it finds a close brackets, the inner loop breaks, that's the way I choose to make the stop sign.
I think that would be the main idea behind the algorithm, if you read the code closely you'll get the full picture.
So here are some of my concerns about the way I've written the solution:
I could not find a more clean solution to tell the algorithm were to go next if it finds a number. So I kind of hardcoded it with the find_next function. Is there a way to do this more clean inside the decompress func ?
About performance, It wastes a lot of time by doing the same thing again, when you have a number bigger than 1 at the begging of a bracket.
I am relatively to programming so maybe this code also needs an improvement not in the idea, but in the ways It's written. So would be very grateful to get some suggestions.
This is the approach I figure out but I am sure there are a couple more, I could not think of anyone but It would be great if you could tell your ideas.
In the description it tells you some things that you should be awared of when developing the solutions. They are: handling non-repeated strings, handling repetitions inside, not doing the same job twice, not copying too much. Are these covered by my approach ?
And the last point It's about tets cases, I know that confidence is very important when developing solutions, and the best way to give confidence to an algorithm is test cases. I tried a few and they all worked as expected. But what techniques do you recommend for developing test cases. Are there any softwares?
So that would be all guys, I am new to the community so I am open to suggestions about the how to improve the quality of the question. Cheers!
Your solution involves a lot of string copying that really slows it down. Instead of returning strings that you concatenate, you should pass a StringBuilder into every call and append substrings onto that.
That means you can use your return value to indicate the position to continue scanning from.
You're also parsing repeated parts of the source string more than once.
My solution looks like this:
public static String decompress(String src)
{
StringBuilder dest = new StringBuilder();
_decomp2(dest, src, 0);
return dest.toString();
}
private static int _decomp2(StringBuilder dest, String src, int pos)
{
int num=0;
while(pos < src.length()) {
char c = src.charAt(pos++);
if (c == ']') {
break;
}
if (c>='0' && c<='9') {
num = num*10 + (c-'0');
} else if (c=='[') {
int startlen = dest.length();
pos = _decomp2(dest, src, pos);
if (num<1) {
// 0 repetitions -- delete it
dest.setLength(startlen);
} else {
// copy output num-1 times
int copyEnd = startlen + (num-1) * (dest.length()-startlen);
for (int i=startlen; i<copyEnd; ++i) {
dest.append(dest.charAt(i));
}
}
num=0;
} else {
// regular char
dest.append(c);
num=0;
}
}
return pos;
}
I would try to return a tuple that also contains the next index where decompression should continue from. Then we can have a recursion that concatenates the current part with the rest of the block in the current recursion depth.
Here's JavaScript code. It takes some thought to encapsulate the order of operations that reflects the rules.
function f(s, i=0){
if (i == s.length)
return ['', i];
// We might start with a multiplier
let m = '';
while (!isNaN(s[i]))
m = m + s[i++];
// If we have a multiplier, we'll
// also have a nested expression
if (s[i] == '['){
let result = '';
const [word, nextIdx] = f(s, i + 1);
for (let j=0; j<Number(m); j++)
result = result + word;
const [rest, end] = f(s, nextIdx);
return [result + rest, end]
}
// Otherwise, we may have a word,
let word = '';
while (isNaN(s[i]) && s[i] != ']' && i < s.length)
word = word + s[i++];
// followed by either the end of an expression
// or another multiplier
const [rest, end] = s[i] == ']' ? ['', i + 1] : f(s, i);
return [word + rest, end];
}
var strs = [
'2[3[a]b]',
'10[a]',
'3[abc]4[ab]c',
'2[2[a]g2[r]]'
];
for (const s of strs){
console.log(s);
console.log(JSON.stringify(f(s)));
console.log('');
}

Counter for two binary strings C++

I am trying to count two binary numbers from string. The maximum number of counting digits have to be 253. Short numbers works, but when I add there some longer numbers, the output is wrong. The example of bad result is "10100101010000111111" with "000011010110000101100010010011101010001101011100000000111000000000001000100101101111101000111001000101011010010111000110".
#include <iostream>
#include <stdlib.h>
using namespace std;
bool isBinary(string b1,string b2);
int main()
{
string b1,b2;
long binary1,binary2;
int i = 0, remainder = 0, sum[254];
cout<<"Get two binary numbers:"<<endl;
cin>>b1>>b2;
binary1=atol(b1.c_str());
binary2=atol(b2.c_str());
if(isBinary(b1,b2)==true){
while (binary1 != 0 || binary2 != 0){
sum[i++] =(binary1 % 10 + binary2 % 10 + remainder) % 2;
remainder =(binary1 % 10 + binary2 % 10 + remainder) / 2;
binary1 = binary1 / 10;
binary2 = binary2 / 10;
}
if (remainder != 0){
sum[i++] = remainder;
}
--i;
cout<<"Result: ";
while (i >= 0){
cout<<sum[i--];
}
cout<<endl;
}else cout<<"Wrong input"<<endl;
return 0;
}
bool isBinary(string b1,string b2){
bool rozhodnuti1,rozhodnuti2;
for (int i = 0; i < b1.length();i++) {
if (b1[i]!='0' && b1[i]!='1') {
rozhodnuti1=false;
break;
}else rozhodnuti1=true;
}
for (int k = 0; k < b2.length();k++) {
if (b2[k]!='0' && b2[k]!='1') {
rozhodnuti2=false;
break;
}else rozhodnuti2=true;
}
if(rozhodnuti1==false || rozhodnuti2==false){ return false;}
else{ return true;}
}
One of the problems might be here: sum[i++]
This expression, as it is, first returns the value of i and then increases it by one.
Did you do it on purporse?
Change it to ++i.
It'd help if you could also post the "bad" output, so that we can try to move backward through the code starting from it.
EDIT 2015-11-7_17:10
Just to be sure everything was correct, I've added a cout to check what binary1 and binary2 contain after you assing them the result of the atol function: they contain the integer numbers 547284487 and 18333230, which obviously dont represent the correct binary-to-integer transposition of the two 01 strings you presented in your post.
Probably they somehow exceed the capacity of atol.
Also, the result of your "math" operations bring to an even stranger result, which is 6011111101, which obviously doesnt make any sense.
What do you mean, exactly, when you say you want to count these two numbers? Maybe you want to make a sum? I guess that's it.
But then, again, what you got there is two signed integer numbers and not two binaries, which means those %10 and %2 operations are (probably) misused.
EDIT 2015-11-07_17:20
I've tried to use your program with small binary strings and it actually works; with small binary strings.
It's a fact(?), at this point, that atol cant handle numerical strings that long.
My suggestion: use char arrays instead of strings and replace 0 and 1 characters with numerical values (if (bin1[i]){bin1[i]=1;}else{bin1[i]=0}) with which you'll be able to perform all the math operations you want (you've already written a working sum function, after all).
Once done with the math, you can just convert the char array back to actual characters for 0 and 1 and cout it on the screen.
EDIT 2015-11-07_17:30
Tested atol on my own: it correctly converts only strings that are up to 10 characters long.
Anything beyond the 10th character makes the function go crazy.

Is it possible to do a Levenshtein distance in Excel without having to resort to Macros?

Let me explain.
I have to do some fuzzy matching for a company, so ATM I use a levenshtein distance calculator, and then calculate the percentage of similarity between the two terms. If the terms are more than 80% similar, Fuzzymatch returns "TRUE".
My problem is that I'm on an internship, and leaving soon. The people who will continue doing this do not know how to use excel with macros, and want me to implement what I did as best I can.
So my question is : however inefficient the function may be, is there ANY way to make a standard function in Excel that will calculate what I did before, without resorting to macros ?
Thanks.
If you came about this googling something like
levenshtein distance google sheets
I threw this together, with the code comment from milot-midia on this gist (https://gist.github.com/andrei-m/982927 - code under MIT license)
From Sheets in the header menu, Tools -> Script Editor
Name the project
The name of the function (not the project) will let you use the func
Paste the following code
function Levenshtein(a, b) {
if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
// swap to save some memory O(min(a,b)) instead of O(a)
if(a.length > b.length) {
var tmp = a;
a = b;
b = tmp;
}
var row = [];
// init the row
for(var i = 0; i <= a.length; i++){
row[i] = i;
}
// fill in the rest
for(var i = 1; i <= b.length; i++){
var prev = i;
for(var j = 1; j <= a.length; j++){
var val;
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
row[j - 1] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
You should be able to run it from a spreadsheet with
=Levenshtein(cell_1,cell_2)
While it can't be done in a single formula for any reasonably-sized strings, you can use formulas alone to compute the Levenshtein Distance between strings using a worksheet.
Here is an example that can handle strings up to 15 characters, it could be easily expanded for more:
https://docs.google.com/spreadsheet/ccc?key=0AkZy12yffb5YdFNybkNJaE5hTG9VYkNpdW5ZOWowSFE&usp=sharing
This isn't practical for anything other than ad-hoc comparisons, but it does do a decent job of showing how the algorithm works.
looking at the previous answers to calculating Levenshtein distance, I think it would be impossible to create it as a formula.
Take a look at the code here
Actually, I think I just found a workaround. I was adding it in the wrong part of the code...
Adding this line
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
so it now reads
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
Seems to fix the problem. Now 'biulding' is 92% accurate and 'bilding' is 88%. (whereas with the original formula 'biulding' was only 75%... despite being closer to the correct spelling of building)

Resources