AS3 "Advanced" string manipulation - string

I'm making an air dictionary and I have a(nother) problem. The main app is ready to go and works perfectly but when I tested it I noticed that it could be better. A bit of context: the language (ancient egyptian) I'm translating from does not use punctuation so a phrase canlooklikethis. Add to that the sheer complexity of the glyph system (6000+ glyphs).
Right know my app works like this :
user choose the glyphs composing his/r word.
app transforms those glyphs to alphanumerical values (A1 - D36 - X1A, etc).
the code compares the code (say : A5AD36) to a list of xml values.
if the word is found (A5AD36 = priestess of Bast), the user gets the translation. if not, s/he gets all the possible words corresponding to the two glyphs (A5A & D36).
If the user knows the string is a word, no problem. But if s/he enters a few words, s/he'll have a few more choices than hoped (exemple : query = A1A5AD36 gets A1 - A5A - D36 - A5AD36).
What I would like to do is this:
query = A1A5AD36 //word/phrase to be translated;
varArray = [A1, A5A, D36] //variables containing the value of the glyphs.
Corresponding possible words from the xml : A1, A5A, D36, A5AD36.
Possible phrases: A1 A5A D36 / A1 A5AD36 / A1A5A D36 / A1A5AD36.
Possible phrases with only legal words: A1 A5A D36 / A1 A5AD36.
I'm not I really clear but to things simple, I'd like to get all the possible phrases containing only legal words and filter out the other ones.
(example with english : TOBREAKFAST. Legal = to break fast / to breakfast. Illegal = tobreak fast.
I've managed to get all the possible words, but not the rest. Right now, when I run my app, I have an array containing A1 - A5A - D36 - A5AD36. But I'm stuck going forward.
Does anyone have an idea ? Thank you :)
function fnSearch(e: Event): void {
var val: int = sp.length; //sp is an array filled with variables containing the code for each used glyph.
for (var i: int = 0; i < val; i++) { //repeat for every glyph use.
var X: String = ""; //variable created to compare with xml dictionary
for (var i2: int = 0; i2 < val; i2++) { // if it's the first time, use the first glyph-code, else the one after last used.
if (X == "") {
X = sp[i];
} else {
X = X + sp[i2 + i];
}
xmlresult = myXML.mot.cd; //xmlresult = alphanumerical codes corresponding to words from XMLList already imported
trad = myXML.mot.td; //same with traductions.
for (var i3: int = 0; i3 < xmlresult.length(); i3++) { //check if element X is in dictionary
var codeElement: XML = xmlresult[i3]; //variable to compare with X
var tradElement: XML = trad[i3]; //variable corresponding to codeElement
if (X == codeElement.toString()) { //if codeElement[i3] is legal, add it to array of legal words.
checkArray.push(codeElement); //checkArray is an array filled with legal words.
}
}
}
}
var iT2: int = 500 //iT2 set to unreachable value for next lines.
for (var iT: int = 0; iT < checkArray.length; iT++) { //check if the word searched by user is in the results.
if (checkArray[iT] == query) {
iT2 = iT
}
}
if (iT2 != 500) { //if complete query is found, put it on top of the array so it appears on top of the results.
var oldFirst: String = checkArray[0];
checkArray[0] = checkArray[iT2];
checkArray[iT2] = oldFirst;
}
results.visible = true; //make result list visible
loadingResults.visible = false; //loading screen
fnPossibleResults(null); //update result list.
}
I end up with an array of variables containing the glyph-codes (sp) and another with all the possible legal words (checkArray). What I don't know how to do is mix those two to make legal phrases that way :
If there was only three glyphs, I could probably find a way, but user can enter 60 glyphs max.

Related

I cannot find out why this code keeps skipping a loop

Some background on what is going on:
We are processing addresses into standardized forms, this is the code to take addresses scored by how many components found and then rescore them using a levenshtein algorithm across similar post codes
The scores are how many components were found in that address divided by the number missed, to return a ratio
The input data, scoreDict, is a dictionary containing arrays of arrays. The first set of arrays is the scores, so there are 12 arrays because there are 12 scores in this file (it adjusts by file). There are then however many addresses fit that score in their own separate arrays stored in that. Don't ask me why I'm doing it that way, my brain is dead
The code correctly goes through each score array and each one is properly filled with the unique elements that make it up. It is not short by any amount, nothing is duplicated, I have checked
When we hit the score that is -1 (this goes to any address where it doesn't fit in some rule so we can't use its post code to find components so no components are found) the loop specifically ONLY DOES EVERY OTHER ADDRESS IN THIS SCORE ARRAY
It doesn't do this to any other score array, I have checked
I have tried changing the number to something else like 99, same issue except one LESS address got rescored, and the rest stayed at the original failing score of 99
I am going insane, can anyone find where in this loop something may be going wrong to cause it to only do every other line. The index counter of line and sc come through in the correct order and do not skip over. I have checked
I am sorry this is not professional, I have been at this one loop for 5 hours
Rescore: function Rescore(scoreDict) {
let tempInc = 0;
//Loop through all scores stored in scoreDict
for (var line in scoreDict) {
let addUpdate = "";
//Loop through each line stored by score
for (var sc in scoreDict[line.toString()]) {
console.log(scoreDict[line.toString()].length);
let possCodes = new Array();
const curLine = scoreDict[line.toString()][sc];
console.log(sc);
const curScore = curLine[1].split(',')[curLine[1].split(',').length-1];
switch (true) {
case curScore == -1:
let postCode = (new RegExp('([A-PR-UWYZ][A-HK-Y]?[0-9][A-Z0-9]?[ ]?[0-9][ABD-HJLNP-UW-Z]{2})', 'i')).exec(curLine[1].replace(/\\n/g, ','));
let areaCode;
//if (curLine.split(',')[curLine.split(',').length-2].includes("REFERENCE")) {
if ((postCode = (new RegExp('(([A-Z][A-Z]?[0-9][A-Z0-9]?(?=[ ]?[0-9][A-Z]{2}))|[0-9]{5})', 'i').exec(postCode))) !== null) {
for (const code in Object.keys(addProper)) {
leven.LoadWords(postCode[0], Object.keys(addProper)[code]);
if (leven.distance < 2) {
//Weight will have adjustment algorithms based on other factors
let weight = 1;
//Add all codes that are close to the same to a temp array
possCodes.push(postCode.input.replace(postCode[0], Object.keys(addProper)[code]).split(',')[0] + "(|W|)" + (leven.distance/weight));
}
}
let highScore = 0;
let candidates = new Array();
//Use the component script from cityprocess to rescore
for (var i=0;i<possCodes.length;i++) {
postValid.add([curLine[1].split(',').slice(0,curLine[1].split(',').length-2) + '(|S|)' + possCodes[i].split("(|W|)")[0]]);
if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] > highScore) {
candidates = new Array();
highScore = postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1];
candidates.push(postValid.addChunk[0]);
} else if (postValid.addChunk[0].split('(|S|)')[postValid.addChunk[0].split('(|S|)').length-1] == highScore) {
candidates.push(postValid.addChunk[0]);
}
}
score.Rescore(curLine, sc, candidates[0]);
}
//} else if (curLine.split(',')[curLine.split(',').length-2].contains("AREA")) {
// leven.LoadWords();
//}
break;
case curScore > 0:
//console.log("That's a pretty good score mate");
break;
}
//console.log(line + ": " + scoreDict[line].length);
}
}
console.log(tempInc)
score.ScoreWrite(score.scoreDict);
}
The issue was that I was calling the loop on the array I was editing, so as each element got removed from the array (rescored and moved into a separate array) it got shorter by that element, resulting in an issue that when the first element was rescored and removed, and then we moved onto the second index which was now the third element, because everything shifted up by 1 index
I fixed it by having it simply enter an empty array for each removed element, so everything kept its index and the array kept its length, and then clear the empty values at a later time in the code

Is there a faster method to find dictionary keys with wildcards in the middle?

Let's say I have a dictionary with strings of 0s, 1s, and '*' as wildcards for my key value.
For example, my dictionary is structured as such:
{'010*10000':'foo', '100*1*000':'bar'......}
Each dictionary value has a fixed string length, however, there are wildcards within the string represented as '*' characters. Thus, values of '010110000' or '010010000' both return 'foo'.
The problem lies in the length of my dictionary. The dictionary I am working with has over 500,000+ entries. Therefore, when I try to iterate over each key in the dict to find if a key exists, then it takes far too long with O(n) complexity.
Ideally, I would like to find a way to just check if a value such as '010110000' is in the dictionary, similar to the .get() function for regular python dictionaries without wildcards.
I've already tried iterating over my dictionary using fnmatch like the following Wildcard in dictionary key:
for k in my_dict.keys():
if fnmatch.fnmatch(string_of_1s_and_0s, k):
print(my_dict[k])
break
##Do some operation here if we have found the matching key pair...and then break.
However, it's just too slow with O(n) complexity. Is there any way to implement get() but with wildcards?
dicts are hash code based; the hash code, if implemented correctly, will differ wildly for a difference of just one character. There is no way to make a dict do what you want, but what you're doing is probably best done with something other than a dict in the first place. Have you considered a relational database, where the LIKE operator could do something like this? It might still have to scan a large part of the DB, but ideally it could use anchors at one end or the other to at least narrow the search to matching prefixes/suffixes.
Rotate the original pattern left (by taking characters from the start and putting them at the end) while keeping track of the rotate count; like this:
'010*10000' -> '*10000010', rotate_count = 3
'100*1*000' -> '*1*000100', rotate_count = 3
Then split it into a "complex part" and a "simple part", and determine the length of the simple part, like this:
'010*10000' -> '*10000010', rotate_count = 3
complex = '*`, simple = `10000010', simple_length = 8
'100*1*000' -> '*1*000100', rotate_count = 3
complex = '*1*`, simple = `000100', simple_length = 6
If the fixed length of the strings is 16, then there will be 16 possible values of rotate_count, and for each one there will be 16 - rotate_count possible values of simple_length. This can be described as a nested loop:
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
}
}
You can associate an "array of entries" with this, like:
entry_number = 0;
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
entry_number++;
}
}
Then you can use the entry number to find a hash table, like:
entry_number = 0;
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
hash_table = array_of_hash_tables[entry_number];
entry_number++;
}
}
You can also rotate the string you're looking for by the rotate_count and extract simple_length characters from that, convert those characters into a hash, and use it to find a list of entries from the hash table, like:
entry_number = 0;
for(rotate_count = 0; rotate_count < 16; rotate_count++) {
rotated_string = rotate_string(original_string, rotate_count);
for(simple_length = 0; simple_length = 16 - rotate_count; simple_length++) {
hash_table = array_of_hash_tables[entry_number];
if(hash_table != NULL) {
hash = get_simple_hash(rotated_string, simple_length);
list = hash_table[hash];
// Use "list" and "original string" to do the hard stuff here...
}
entry_number++;
}
}
This will quickly eliminate lots of entries (where the start and end don't match) and give you a list of "potential matches" where you'd have to check the part containing wild cards against the original string to determine if there is/isn't an actual match.
Note that if the characters are "ones and zeros" this can be improved by converting "strings containing binary digits" into integers.

Grabbing text from webpage and storing as variable

On the webpage
http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463
It lists prices for a particular item in a game, I wanted to grab the "Current guide price:" of said item, and store it as a variable so I could output it in a google spreadsheet. I only want the number, currently it is "643.8k", but I am not sure how to grab specific text like that.
Since the number is in "k" form, that means I can't graph it, It would have to be something like 643,800 to make it graphable. I have a formula for it, and my second question would be to know if it's possible to use a formula on the number pulled, then store that as the final output?
-EDIT-
This is what I have so far and it's not working not sure why.
function pullRuneScape() {
var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();
var number = page.match(/Current guide price:<\/th>\n(\d*)/)[1];
SpreadsheetApp.getActive().getSheetByName('RuneScape').appendRow([new Date(), number]);
}
Your regex is wrong. I tested this one successfully:
var number = page.match(/Current guide price:<\/th>\s*<td>([^<]*)<\/td>/m)[1];
What it does:
Current guide price:<\/th> find Current guide price: and closing td tag
\s*<td> allow whitespace between tags, find opening td tag
([^<]*) build a group and match everything except this char <
<\/td> match the closing td tag
/m match multiline
Use UrlFetch to get the page [1]. That'll return an HTTPResponse that you can read with GetBlob [2]. Once you have the text you can use regular expressions. In this case just search for 'Current guide price:' and then read the next row. As to remove the 'k' you can just replace with reg ex like this:
'123k'.replace(/k/g,'')
Will return just '123'.
https://developers.google.com/apps-script/reference/url-fetch/
https://developers.google.com/apps-script/reference/url-fetch/http-response
Obviously, you are not getting anything because the regexp is wrong. I'm no regexp expert but I was able to extract the number using basic string manipulation
var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();
var TD = "<td>";
var start = page.indexOf('Current guide price');
start = page.indexOf(TD, start);
var end = page.indexOf('</td>',start);
var number = page.substring (start + TD.length , end);
Logger.log(number);
Then, I wrote a function to convert k,m etc. to the corresponding multiplying factors.
function getMultiplyingFactor(symbol){
switch(symbol){
case 'k':
case 'K':
return 1000;
case 'm':
case 'M':
return 1000 * 1000;
case 'g':
case 'G':
return 1000 * 1000 * 1000;
default:
return 1;
}
}
Finally, tie the two together
function pullRuneScape() {
var page = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();
var TD = "<td>";
var start = page.indexOf('Current guide price');
start = page.indexOf(TD, start);
var end = page.indexOf('</td>',start);
var number = page.substring (start + TD.length , end);
Logger.log(number);
var numericPart = number.substring(0, number.length -1);
var multiplierSymbol = number.substring(number.length -1 , number.length);
var multiplier = getMultiplyingFactor(multiplierSymbol);
var fullNumber = multiplier == 1 ? number : numericPart * multiplier;
Logger.log(fullNumber);
}
Certainly, not the optimal way of doing things but it works.
Basically I parse the html page as you did (with corrected regex) and split the string into number part and multiplicator (k = 1000). Finally I return the extracted number. This function can be used in Google Docs.
function pullRuneScape() {
var pageContent = UrlFetchApp.fetch("http://services.runescape.com/m=itemdb_rs/Armadyl_chaps/viewitem.ws?obj=19463").getContentText();
var matched = pageContent.match(/Current guide price:<.th>\n<td>(\d+\.*\d*)([k]{0,1})/);
var numberAsString = matched[1];
var multiplier = "";
if (matched.length == 3) {
multiplier = matched[2];
}
number = convertNumber(numberAsString, multiplier);
return number;
}
function convertNumber(numberAsString, multiplier) {
var number = Number(numberAsString);
if (multiplier == 'k') {
number *= 1000;
}
return number;
}

Is it possible to do a Levenshtein distance in Excel without having to resort to Macros?

Let me explain.
I have to do some fuzzy matching for a company, so ATM I use a levenshtein distance calculator, and then calculate the percentage of similarity between the two terms. If the terms are more than 80% similar, Fuzzymatch returns "TRUE".
My problem is that I'm on an internship, and leaving soon. The people who will continue doing this do not know how to use excel with macros, and want me to implement what I did as best I can.
So my question is : however inefficient the function may be, is there ANY way to make a standard function in Excel that will calculate what I did before, without resorting to macros ?
Thanks.
If you came about this googling something like
levenshtein distance google sheets
I threw this together, with the code comment from milot-midia on this gist (https://gist.github.com/andrei-m/982927 - code under MIT license)
From Sheets in the header menu, Tools -> Script Editor
Name the project
The name of the function (not the project) will let you use the func
Paste the following code
function Levenshtein(a, b) {
if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
// swap to save some memory O(min(a,b)) instead of O(a)
if(a.length > b.length) {
var tmp = a;
a = b;
b = tmp;
}
var row = [];
// init the row
for(var i = 0; i <= a.length; i++){
row[i] = i;
}
// fill in the rest
for(var i = 1; i <= b.length; i++){
var prev = i;
for(var j = 1; j <= a.length; j++){
var val;
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
row[j - 1] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
You should be able to run it from a spreadsheet with
=Levenshtein(cell_1,cell_2)
While it can't be done in a single formula for any reasonably-sized strings, you can use formulas alone to compute the Levenshtein Distance between strings using a worksheet.
Here is an example that can handle strings up to 15 characters, it could be easily expanded for more:
https://docs.google.com/spreadsheet/ccc?key=0AkZy12yffb5YdFNybkNJaE5hTG9VYkNpdW5ZOWowSFE&usp=sharing
This isn't practical for anything other than ad-hoc comparisons, but it does do a decent job of showing how the algorithm works.
looking at the previous answers to calculating Levenshtein distance, I think it would be impossible to create it as a formula.
Take a look at the code here
Actually, I think I just found a workaround. I was adding it in the wrong part of the code...
Adding this line
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
so it now reads
if(b.charAt(i-1) == a.charAt(j-1)){
val = row[j-1]; // match
} else if(b.charAt(i-1)==a.charAt(j) && b.charAt(i)==a.charAt(j-1)){
val = row[j-1]-0.33; //transposition
} else {
val = Math.min(row[j-1] + 1, // substitution
prev + 1, // insertion
row[j] + 1); // deletion
}
Seems to fix the problem. Now 'biulding' is 92% accurate and 'bilding' is 88%. (whereas with the original formula 'biulding' was only 75%... despite being closer to the correct spelling of building)

AutoFit Columns Width using jxl library in java [duplicate]

How to autofit content in cell using jxl api?
I know this is an old question at this point, but I was looking for the solution to this and thought I would post it in case someone else needs it.
CellView Auto-Size
I'm not sure why the FAQ doesn't mention this, because it very clearly exists in the docs.
My code looked like the following:
for(int x=0;x<c;x++)
{
cell=sheet.getColumnView(x);
cell.setAutosize(true);
sheet.setColumnView(x, cell);
}
c stores the number of columns created
cell is just a temporary place holder for the returned CellView object
sheet is my WriteableSheet object
The Api warns that this is a processor intensive function, so it's probably not ideal for large files. But for a small file like mine (<100 rows) it took no noticeable time.
Hope this helps someone.
The method is self explanatory and commented:
private void sheetAutoFitColumns(WritableSheet sheet) {
for (int i = 0; i < sheet.getColumns(); i++) {
Cell[] cells = sheet.getColumn(i);
int longestStrLen = -1;
if (cells.length == 0)
continue;
/* Find the widest cell in the column. */
for (int j = 0; j < cells.length; j++) {
if ( cells[j].getContents().length() > longestStrLen ) {
String str = cells[j].getContents();
if (str == null || str.isEmpty())
continue;
longestStrLen = str.trim().length();
}
}
/* If not found, skip the column. */
if (longestStrLen == -1)
continue;
/* If wider than the max width, crop width */
if (longestStrLen > 255)
longestStrLen = 255;
CellView cv = sheet.getColumnView(i);
cv.setSize(longestStrLen * 256 + 100); /* Every character is 256 units wide, so scale it. */
sheet.setColumnView(i, cv);
}
}
for(int x=0;x<c;x++)
{
cell=sheet.getColumnView(x);
cell.setAutosize(true);
sheet.setColumnView(x, cell);
}
It is fine, instead of scanning all the columns. Pass the column as a parameter.
void display(column)
{
Cell = sheet.getColumnView(column);
cell.setAutosize(true);
sheet.setColumnView(column, cell);
}
So when you wiill be displaying your text you can set the particular length. Can be helpfull for huge excel files.
From the JExcelApi FAQ
How do I do the equivilent of Excel's "Format/Column/Auto Fit Selection"?
There is no API function to do this for you. You'll need to write code that scans the cells in each column, calculates the maximum length, and then calls setColumnView() accordingly. This will get you close to what Excel does but not exactly. Since most fonts have variable width characters, to get the exact same value, you would need to use FontMetrics to calculate the maximum width of each string in the column. No one has posted code on how to do this yet. Feel free to post code to the Yahoo! group or send it directly to the FAQ author's listed at the bottom of this page.
FontMetrics presumably refers to java.awt.FontMetrics. You should be able to work something out with the getLineMetrics(String, Graphics) method I would have though.
CellView's autosize method doesn't work for me all the time. My way of doing this is by programatically set the size(width) of the column based on the highest length of data in the column. Then perform some mathematical operations.
CellView cv = excelSheet.getColumnView(0);
cv.setSize((highest + ((highest/2) + (highest/4))) * 256);
where highest is an int that holds the longest length of data in the column.
setAutosize() method WILL NOT WORK if your cell has over 255 characters. This is related to the Excel 2003 max column width specification: http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP005199291.aspx
You will need to write your own autosize method to handle this case.
Try this exemple:
expandColumns(sheet, 3);
workbook.write();
workbook.close();
private void expandColumn(WritableSheet sheet, int amountOfColumns){
int c = amountOfColumns;
for(int x=0;x<c;x++)
{
CellView cell = sheet.getColumnView(x);
cell.setAutosize(true);
sheet.setColumnView(x, cell);
}
}
Kotlin's implementation
private fun sheetAutoFitColumns(sheet: WritableSheet, columnsIndexesForFit: Array<Int>? = null, startFromRowWithIndex: Int = 0, excludeLastRows : Int = 0) {
for (columnIndex in columnsIndexesForFit?.iterator() ?: IntProgression.fromClosedRange(0, sheet.columns, 1).iterator()) {
val cells = sheet.getColumn(columnIndex)
var longestStrLen = -1
if (cells.isEmpty()) continue
for (j in startFromRowWithIndex until cells.size - excludeLastRows) {
if (cells[j].contents.length > longestStrLen) {
val str = cells[j].contents
if (str == null || str.isEmpty()) continue
longestStrLen = str.trim().length
}
}
if (longestStrLen == -1) continue
val newWidth = if (longestStrLen > 255) 255 else longestStrLen
sheet.setColumnView(columnIndex, newWidth)
}
}
example for use
sheetAutoFitColumns(sheet) // fit all columns by all rows
sheetAutoFitColumns(sheet, arrayOf(0, 3))// fit A and D columns by all rows
sheetAutoFitColumns(sheet, arrayOf(0, 3), 5)// fit A and D columns by rows after 5
sheetAutoFitColumns(sheet, arrayOf(0, 3), 5, 2)// fit A and D columns by rows after 5 and ignore two last rows

Resources