Longest prefix+suffix-combination in set of strings

Longest prefix+suffix-combination in set of strings - string

I have a set of strings (less than 30) of length 1 to ~30. I need to find the subset of at least two strings that share the longest possible prefix- + suffix-combination.
For example, let the set be
Foobar
Facar
Faobaron
Gweron
Fzobar
The prefix/suffix F/ar has a combined length of 3 and is shared by Foobar, Facar and Fzobar; the prefix/suffix F/obar has a combined length of 5 and is shared by Foobar and Fzobar. The searched-for prefix/suffix is F/obar.
Note that this is not to be confused with the longest common prefix/suffix, since only two or more strings from the set need to share the same prefix+suffix. Also note that the sum of the lengths of both the prefix and the suffix is what is to be maximized, so both need to be taken into account. The prefix or suffix may be the empty string.
Does anyone know of an efficient method to implement this?

How about this:
maxLen := -1;
for I := 0 to Len(A) - 1 do
if Len(A[I]) > maxLen then // (1)
for J := 0 to Len(A[I]) do
for K := 0 to Len(A[I]) - J do
if J+K > maxLen then // (2)
begin
prf := LeftStr(A[I], J);
suf := RightStr(A[I], K);
found := False;
for m := 0 to Len(sufList) - 1 do
if (sufList[m] = suf) and (prfList[m] = prf) then
begin
maxLen := J+K;
Result := prf+'/'+suf;
found := True;
// (3)
n := 0;
while n < Len(sufList) do
if Len(sufList[n])+Len(prfList[n]) <= maxLen then
begin
sufList.Delete(n);
prfList.Delete(n);
end
else
Inc(n);
// (end of 3)
Break;
end;
if not found then
begin
sufList.Add(suf);
prfList.Add(prf);
end;
end;
In this example maxLen keeps sum of lengths of longest found prefix/suffix so far. The most important part of it is the line marked with (2). It bypasses lots of unnecessary string comparisons. In section (3) it eliminates any existing prefix/suffix that is shorter than newly found one (winch is duplicated).

Related

Point in polygon hit test algorithm

I need to test if a point hits a polygon with holes and isles. I'd like to understand how I'm supposed to do this. That's not documented and I can't find any explanation or examples.
What I do is count +1 for every outer polygon hit and -1 for every inner polygon hit. The resulting sum is:
> 0: hit;
<= 0: miss (outside or in a hole).
The HitData class separates paths based on winding number to avoid unnecessary recomputation of orientation. With Clipper.PointInPolygon() applied to every path the sum is easy to compute.
But there are two major drawbacks:
I have to apply Clipper.PointInPolygon() to EVERY path;
I can't leverage the hierarchy of PolyTree.
Can someone who has hands-on experience with Clipper (#angus-johnson?) clear up this confusion?
Again, my question is: how am I supposed to implement this? Am I re-inventing the wheel, while there's an actual solution readily available in the Clipper Library?
Side note: PolyTree still requires to test EVERY path to determine which PolyNode the point is in. There's no Clipper.PointInPolyTree() method and, thus, AFAIK PolyTree doesn't help.
The structure that separates outer and inner polygons:
public class HitData
{
public List<List<IntPoint>> Outer, Inner;
public HitData(List<List<IntPoint>> paths)
{
Outer = new List<List<IntPoint>>();
Inner = new List<List<IntPoint>>();
foreach (List<IntPoint> path in paths)
{
if (Clipper.Orientation(path))
{
Outer.Add(path);
} else {
Inner.Add(path);
}
}
}
}
And this is the algorithm that tests a point:
public static bool IsHit(HitData data, IntPoint point)
{
int hits;
hits = 0;
foreach (List<IntPoint> path in data.Outer)
{
if (Clipper.PointInPolygon(point, path) != 0)
{
hits++;
}
}
foreach (List<IntPoint> path in data.Inner)
{
if (Clipper.PointInPolygon(point, path) != 0)
{
hits--;
}
}
return hits > 0;
}

Can someone who has hands-on experience with Clipper (#angus-johnson?) clear up this confusion?
It's not clear to me what your confusion is. As you've correctly observed, the Clipper library does not provide a function to determine whether a point is inside multiple paths.
Edit (13 Sept 2019):
OK, I've now created a PointInPaths function (in Delphi Pascal) that determines whether a point is inside multiple paths. Note that this function accommodates the different polygon filling rules.
function CrossProduct(const pt1, pt2, pt3: TPointD): double;
var
x1,x2,y1,y2: double;
begin
x1 := pt2.X - pt1.X;
y1 := pt2.Y - pt1.Y;
x2 := pt3.X - pt2.X;
y2 := pt3.Y - pt2.Y;
result := (x1 * y2 - y1 * x2);
end;
function PointInPathsWindingCount(const pt: TPointD;
const paths: TArrayOfArrayOfPointD): integer;
var
i,j, len: integer;
p: TArrayOfPointD;
prevPt: TPointD;
isAbove: Boolean;
crossProd: double;
begin
//nb: returns MaxInt ((2^32)-1) when pt is on a line
Result := 0;
for i := 0 to High(paths) do
begin
j := 0;
p := paths[i];
len := Length(p);
if len < 3 then Continue;
prevPt := p[len-1];
while (j < len) and (p[j].Y = prevPt.Y) do inc(j);
if j = len then continue;
isAbove := (prevPt.Y < pt.Y);
while (j < len) do
begin
if isAbove then
begin
while (j < len) and (p[j].Y < pt.Y) do inc(j);
if j = len then break
else if j > 0 then prevPt := p[j -1];
crossProd := CrossProduct(prevPt, p[j], pt);
if crossProd = 0 then
begin
result := MaxInt;
Exit;
end
else if crossProd < 0 then dec(Result);
end else
begin
while (j < len) and (p[j].Y > pt.Y) do inc(j);
if j = len then break
else if j > 0 then prevPt := p[j -1];
crossProd := CrossProduct(prevPt, p[j], pt);
if crossProd = 0 then
begin
result := MaxInt;
Exit;
end
else if crossProd > 0 then inc(Result);
end;
inc(j);
isAbove := not isAbove;
end;
end;
end;
function PointInPaths(const pt: TPointD;
const paths: TArrayOfArrayOfPointD; fillRule: TFillRule): Boolean;
var
wc: integer;
begin
wc := PointInPathsWindingCount(pt, paths);
case fillRule of
frEvenOdd: result := Odd(wc);
frNonZero: result := (wc <> 0);
end;
end;
With regards leveraging the PolyTree structure:
The top nodes in PolyTree are outer nodes that together contain every (nested) polygon. So you'll only need to perform PointInPolygon on these top nodes until a positive result is found. Then repeat PointInPolygon on that nodes nested paths (if any) looking for a positive match there. Obviously when an outer node fails PointInPolygon test, then its nested nodes (polygons) will also fail. Outer nodes will increment the winding count and inner holes will decrement the winding count.

How to get the last X Characters of a Golang String?

If I have the string "12121211122" and I want to get the last 3 characters (e.g. "122"), is that possible in Go? I've looked in the string package and didn't see anything like getLastXcharacters.

You can use a slice expression on a string to get the last three bytes.
s := "12121211122"
first3 := s[0:3]
last3 := s[len(s)-3:]
Or if you're using unicode you can do something like:
s := []rune("世界世界世界")
first3 := string(s[0:3])
last3 := string(s[len(s)-3:])
Check Strings, bytes, runes and characters in Go and Slice Tricks.

The answer depends on what you mean by "characters". If you mean bytes then:
s := "12121211122"
lastByByte := s[len(s)-3:]
If you mean runes in a utf-8 encoded string, then:
s := "12121211122"
j := len(s)
for i := 0; i < 3 && j > 0; i++ {
_, size := utf8.DecodeLastRuneInString(s[:j])
j -= size
}
lastByRune := s[j:]
You can also convert the string to a []rune and operate on the rune slice, but that allocates memory.

Strange behaviour when simply adding strings in Lazarus - FreePascal

The program has several "encryption" algorithms. This one should blockwise reverse the input. "He|ll|o " becomes "o |ll|He" (block length of 2).
I add two strings, in this case appending the result string to the current "block" string and making that the result. When I add the result first and then the block it works fine and gives me back the original string. But when i try to reverse the order it just gives me the the last "block".
Several other functions that are used for "rotation" are above.
//amount of blocks
function amBl(i1:integer;i2:integer):integer;
begin
if (i1 mod i2) <> 0 then result := (i1 div i2) else result := (i1 div i2) - 1;
end;
//calculation of block length
function calcBl(keyStr:string):integer;
var i:integer;
begin
result := 0;
for i := 1 to Length(keyStr) do
begin
result := (result + ord(keyStr[i])) mod 5;
result := result + 2;
end;
end;
//desperate try to add strings
function append(s1,s2:string):string;
begin
insert(s2,s1,Length(s1)+1);
result := s1;
end;
function rotation(inStr,keyStr:string):string;
var //array of chars -> string
block,temp:string;
//position in block variable
posB:integer;
//block length and block count variable
bl, bc:integer;
//null character as placeholder
n : ansiChar;
begin
//calculating block length 2..6
bl := calcBl(keyStr);
setLength(block,bl);
result := '';
temp := '';
{n := #00;}
for bc := 0 to amBl(Length(inStr),bl) do
begin
//filling block with chars starting from back of virtual block (in inStr)
for posB := 1 to bl do
begin
block[posB] := inStr[bc * bl + posB];
{if inStr[bc * bl + posB] = ' ' then block[posB] := n;}
end;
//adding the block in front of the existing result string
temp := result;
result := block + temp;
//result := append(block,temp);
//result := concat(block,temp);
end;
end;
(full code http://pastebin.com/6Uarerhk)
After all the loops "result" has the right value, but in the last step (between "result := block + temp" and the "end;" of the function) "block" replaces the content of "result" with itself completely, it doesn't add result at the end anymore.
And as you can see I even used a temp variable to try to work around that.. doesnt change anything though.

I am 99.99% certain that your problem is due to a subtle bug in your code. However, your deliberate efforts to hide the relevant code mean that we're really shooting in the dark. You haven't even been clear about where you're seeing the shortened Result: GUI Control/Debugger/Writeln
The irony is that you have all the information at your fingertips to provide a small concise demonstration of your problem - including sample input and expected output.
So without the relevant information, I can only guess; I do think I have a good hunch though.
Try the following code and see if you have a similar experience with S3:
S1 := 'a'#0;
S2 := 'bc';
S3 := S1 + S2;
The reason for my hunch is that #0 is a valid character in a string: but whenever that string needs to be processed as PChar, #0 will be interpreted as a string terminator. This could very well cause the "strange behaviour" you're seeing.
So it's quite probable that you have at least one of the following 2 bugs in your code:
You are always processing 1 too many characters; with the extra character being #0.
When your input string has an odd number of characters: your algorithm (which relies on pairs of characters) adds an extra character with value #0.
Edit
With the additional source code, my hunch is confirmed:
Suppose you have a 5 character string, and key that produces block length 2.
Your inner loop (for posB := 1 to bl do) will read beyond the length of inStr on the last iteration of the outer loop.
So if the next character in memory happens to be #0, you will be doing exactly as described above.
Additional problem. You have the following code:
//calculating block length 2..6
bl := calcBl(keyStr);
Your assumption in the comment is wrong. From the implementation of calcBl, if keyStr is empty, your result will be 0.

How to detect if a character from a string is upper or lower case?

I'm expanding a class of mine for storing generic size strings to allow more flexible values for user input. For example, my prior version of this class was strict and allowed only the format of 2x3 or 9x12. But now I'm making it so it can support values such as 2 x 3 or 9 X 12 and automatically maintain the original user's formatting if the values get changed.
The real question I'm trying to figure out is just how to detect if one character from a string is either upper or lower case? Because I have to detect case sensitivity. If the deliminator is 'x' (lowercase) and the user inputs 'X' (uppercase) inside the value, and case sensitivity is turned off, I need to be able to find the opposite-case as well.
I mean, the Pos() function is case sensitive...

Delphi 7 has UpperCase() and LowerCase() functions for strings. There's also UpCase() for characters.
If I want to search for a substring within another string case insensitively, I do this:
if Pos('needle', LowerCase(hayStack)) > 0 then
You simply use lower case string literals (or constants) and apply the lowercase function on the string before the search. If you'll be doing a lot of searches, it makes sense to convert just once into a temp variable.
Here's your case:
a := '2 x 3'; // Lowercase x
b := '9 X 12'; // Upper case X
x := Pos('x', LowerCase(a)); // x = 3
x := Pos('x', LowerCase(b)); // x = 3
To see if a character is upper or lower, simply compare it against the UpCase version of it:
a := 'A';
b := 'b';
upper := a = UpCase(a); // True
upper := b = UpCase(b); // False

try using these functions (which are part of the Character unit)
Character.TCharacter.IsUpper
Character.TCharacter.IsLower
IsLower
IsUpper
UPDATE
For ansi versions of delphi you can use the GetStringTypeEx functions to fill a list with each ansi character type information. and thne compare the result of each element against the $0001(Upper Case) or $0002(Lower Case) values.
uses
Windows,
SysUtils;
Var
LAnsiChars: array [AnsiChar] of Word;
procedure FillCharList;
var
lpSrcStr: AnsiChar;
lpCharType: Word;
begin
for lpSrcStr := Low(AnsiChar) to High(AnsiChar) do
begin
lpCharType := 0;
GetStringTypeExA(LOCALE_USER_DEFAULT, CT_CTYPE1, #lpSrcStr, SizeOf(lpSrcStr), lpCharType);
LAnsiChars[lpSrcStr] := lpCharType;
end;
end;
function CharIsLower(const C: AnsiChar): Boolean;
const
C1_LOWER = $0002;
begin
Result := (LAnsiChars[C] and C1_LOWER) <> 0;
end;
function CharIsUpper(const C: AnsiChar): Boolean;
const
C1_UPPER = $0001;
begin
Result := (LAnsiChars[C] and C1_UPPER) <> 0;
end;
begin
try
FillCharList;
Writeln(CharIsUpper('a'));
Writeln(CharIsUpper('A'));
Writeln(CharIsLower('a'));
Writeln(CharIsLower('A'));
except
on E:Exception do
Writeln(E.Classname, ': ', E.Message);
end;
Readln;
end.

if myChar in ['A'..'Z'] then
begin
// uppercase
end
else
if myChar in ['a'..'z'] then
begin
// lowercase
end
else
begin
// not an alpha char
end;
..or D2009 on..
if charInSet(myChar,['A'..'Z']) then
begin
// uppercase
end
else
if charInSet(myChar,['a'..'z']) then
begin
// lowercase
end
else
begin
// not an alpha char
end;

The JCL has routines for this in the JclStrings unit, eg CharIsUpper and CharIsLower. SHould work in Delphi 7.

AnsiPos() is not case-sensitive. You can also force upper or lower case, irrespective of what the user enters using UpperCase() and LowerCase().
Just throwing this out there since you may find it far more simple than the other (very good) answers.

How to find a position of a substring within a string with fuzzy match

I have come across a problem of matching a string in an OCR recognized text and find the position of it considering there can be arbitrary tolerance of wrong, missing or extra characters. The result should be a best match position, possibly (not necessarily) with length of matching substring.
For example:
String: 9912, 1.What is your name?
Substring: 1. What is your name?
Tolerance: 1
Result: match on character 7
String: Where is our caat if any?
Substring: your cat
Tolerance: 2
Result: match on character 10
String: Tolerance is t0o h1gh.
Substring: Tolerance is too high;
Tolerance: 1
Result: no match
I have tried to adapt Levenstein algorithm, but it doesn't work properly for substrings and doesn't return position.
Algorithm in Delphi would be preferred, yet any implementation or pseudo logic would do.

Here's a recursive implementation that works, but might not be fast enough. The worst case scenario is when a match can't be found, and all but the last char in "What" gets matched at every index in Where. In that case the algorithm will make Length(What)-1 + Tolerance comparasions for each char in Where, plus one recursive call per Tolerance. Since both Tolerance and the length of What are constnats, I'd say the algorithm is O(n). It's performance will degrade linearly with the length of both "What" and "Where".
function BrouteFindFirst(What, Where:string; Tolerance:Integer; out AtIndex, OfLength:Integer):Boolean;
var i:Integer;
aLen:Integer;
WhatLen, WhereLen:Integer;
function BrouteCompare(wherePos, whatPos, Tolerance:Integer; out Len:Integer):Boolean;
var aLen:Integer;
aRecursiveLen:Integer;
begin
// Skip perfect match characters
aLen := 0;
while (whatPos <= WhatLen) and (wherePos <= WhereLen) and (What[whatPos] = Where[wherePos]) do
begin
Inc(aLen);
Inc(wherePos);
Inc(whatPos);
end;
// Did we find a match?
if (whatPos > WhatLen) then
begin
Result := True;
Len := aLen;
end
else if Tolerance = 0 then
Result := False // No match and no more "wild cards"
else
begin
// We'll make an recursive call to BrouteCompare, allowing for some tolerance in the string
// matching algorithm.
Dec(Tolerance); // use up one "wildcard"
Inc(whatPos); // consider the current char matched
if BrouteCompare(wherePos, whatPos, Tolerance, aRecursiveLen) then
begin
Len := aLen + aRecursiveLen;
Result := True;
end
else if BrouteCompare(wherePos + 1, whatPos, Tolerance, aRecursiveLen) then
begin
Len := aLen + aRecursiveLen;
Result := True;
end
else
Result := False; // no luck!
end;
end;
begin
WhatLen := Length(What);
WhereLen := Length(Where);
for i:=1 to Length(Where) do
begin
if BrouteCompare(i, 1, Tolerance, aLen) then
begin
AtIndex := i;
OfLength := aLen;
Result := True;
Exit;
end;
end;
// No match found!
Result := False;
end;
I've used the following code to test the function:
procedure TForm18.Button1Click(Sender: TObject);
var AtIndex, OfLength:Integer;
begin
if BrouteFindFirst(Edit2.Text, Edit1.Text, ComboBox1.ItemIndex, AtIndex, OfLength) then
Label3.Caption := 'Found #' + IntToStr(AtIndex) + ', of length ' + IntToStr(OfLength)
else
Label3.Caption := 'Not found';
end;
For case:
String: Where is our caat if any?
Substring: your cat
Tolerance: 2
Result: match on character 10
it shows a match on character 9, of length 6. For the other two examples it gives the expected result.

Here is a complete sample of fuzzy match (approximate search), and you can use/change the algorithm as you wish!
https://github.com/alidehban/FuzzyMatch

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string