Fastest way to conditionally strip off the right part of a string - string

I need to remove the numeric part at the end of a string. Here are some examples:
"abcd1234" -> "abcd"
"a3bc45" -> "a3bc"
"kj3ih5" -> "kj3ih"
You get the idea.
I implemented a function which works well for this purpose.
Function VarStamm(name As String) As String
Dim i, a As Integer
a = 0
For i = Len(name) To 1 Step -1
If IsNumeric(Mid(name, i, 1)) = False Then
i = i + 1
Exit For
End If
Next i
If i <= Len(name) Then
VarStamm = name.Substring(0, i - 1)
Else
VarStamm = name
End If
End Function
The question is: is there any faster (more efficient in speed) way to do this? The problem is, I call this function within a loop with 3 million iterations and it would be nice to have it be more efficient.
I know about the String.LastIndexOf method, but I don't know how to use it when I need the index of the last connected number within a string.

You can use Array.FindLastIndex and then Substring:
Dim lastNonDigitIndex = Array.FindLastIndex(text.ToCharArray(), Function(c) Not char.IsDigit(c))
If lastNonDigitIndex >= 0
lastNonDigitIndex += 1
Dim part1 = text.Substring(0, lastNonDigitIndex)
Dim part2 = text.Substring(lastNonDigitIndex)
End If

I was skeptical that the Array.FindLastIndex method was actually faster, so I tested it myself. I borrowed the testing code posted by Amessihel, but added a third method:
Function VarStamm3(name As String) As String
Dim i As Integer
For i = name.Length - 1 To 0 Step -1
If Not Char.IsDigit(name(i)) Then
Exit For
End If
Next i
Return name.Substring(0, i + 1)
End Function
It uses your original algorithm, but just swaps out the old VB6-style string methods for newer .NET equivalent ones. Here's the results on my machine:
RunTime :
- VarStamm : 00:00:07.92
- VarStamm2 : 00:00:00.60
- VarStamm3 : 00:00:00.23
As you can see, your original algorithm was already quite well tuned. The problem wasn't the loop. The problem was Mid, IsNumeric, and Len. Since Tim's method didn't use those, it was much faster. But, if you stick with a manual for loop, it's twice as fast as using Array.FindLastIndex, all things being equal

Given your function VarStamm and Tim Schmelter's one named VarStamm2, here is a small test performance I wrote. I typed an arbitrary long String with a huge right part, and ran the functions one million times.
Module StackOverlow
Sub Main()
Dim testStr = "azekzoerjezoriezltjreoitueriou7657678678797897898997897978897898797989797"
Console.WriteLine("RunTime :" + vbNewLine +
" - VarStamm : " + getTimeSpent(AddressOf VarStamm, testStr) + vbNewLine +
" - VarStamm2 : " + getTimeSpent(AddressOf VarStamm2, testStr))
End Sub
Function getTimeSpent(f As Action(Of String), str As String) As String
Dim sw As Stopwatch = New Stopwatch()
Dim ts As TimeSpan
sw.Start()
For i = 1 To 1000000
f(str)
Next
sw.Stop()
ts = sw.Elapsed
Return String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10)
End Function
Function VarStamm(name As String) As String
Dim i, a As Integer
a = 0
For i = Len(name) To 1 Step -1
If IsNumeric(Mid(name, i, 1)) = False Then
i = i + 1
Exit For
End If
Next i
If i <= Len(name) Then
VarStamm = name.Substring(0, i - 1)
Else
VarStamm = name
End If
End Function
Function VarStamm2(name As String) As String
Dim lastNonDigitIndex = Array.FindLastIndex(name.ToCharArray(), Function(c) Not Char.IsDigit(c))
If lastNonDigitIndex >= 0 Then
lastNonDigitIndex += 1
Return name.Substring(0, lastNonDigitIndex)
End If
Return name
End Function
End Module
Here is the output I got:
RunTime :
- VarStamm : 00:00:38.33
- VarStamm2 : 00:00:02.72
So yes, you should choose his answer, his code is both pretty and efficient.

Related

Optimal means of obtaining cell address column letter from column index and column index from column letter

Typically the accepted approach is to do the following
Number to Letter
public function numberToLetter(ByVal i as long) as string
Dim s as string: s = cells(1,i).address(false,false)
numberToLetter = left(s,len(s)-1)
end function
Letter to Number
Public Function letterToNumber(ByVal s As String) As Long
letterToNumber = Range(s & 1).Column
End Function
However neither of these are particular optimal, as in each case we are creating an object, and then calling a property accessor on the object. Is there a faster approach?
Summary
The core thing to realise is that the lettering system used in Excel is also known as Base26. NumberToLetter is encoding to Base26 from decimal, and LetterToNumber is decoding from Base26 to decimal.
Base conversion can be done with simple loops and
Function base26Encode(ByVal iDecimal As Long) As String
if iDecimal <= 0 then Call Err.Raise(5, "base26Encode" ,"Argument cannot be less than 0")
if iDecimal >= 16384 then Call Err.Raise(5, "base26Encode" ,"There are only 16384 columns in a spreadsheet, thus this function is limited to this number.")
Dim s As String: s = ""
Do
Dim v As Long
v = (iDecimal - 1) Mod 26 + 1
iDecimal = (iDecimal - v) / 26
s = Chr(v + 64) & s
Loop Until iDecimal = 0
base26Encode = s
End Function
Function base26Decode(ByVal sBase26 As String) As Long
sBase26 = UCase(sBase26)
Dim sum As Long: sum = 0
Dim iRefLen As Long: iRefLen = Len(sBase26)
For i = iRefLen To 1 Step -1
sum = sum + (Asc((Mid(sBase26, i))) - 64) * 26 ^ (iRefLen - i)
Next
base26Decode = sum
End Function
Performance
I tested the performance of these functions against the original functions. To do this I used the stdPerformance class of stdVBA.
The code used for testing is as follows:
Sub testPerf()
Dim cMax As Long: cMax = 16384
With stdPerformance.Measure("Encode Original")
For i = 1 To cMax
Call numberToLetter(i)
Next
End With
With stdPerformance.Measure("Encode Optimal")
For i = 1 To cMax
Call base26Encode(i)
Next
End With
With stdPerformance.Measure("Decode Original")
For i = 1 To cMax
Call letterToNumber(base26Encode(i))
Next
End With
With stdPerformance.Measure("Decode Optimal")
For i = 1 To cMax
Call base26Decode(base26Encode(i))
Next
End With
End Sub
The results for which are as follows:
Encode Original: 78 ms
Encode Optimal: 31 ms
Decode Original: 172 ms
Decode Optimal: 63 ms
As shown this is a slightly faster approach (2-3x faster). I am fairly surprised that object creation and property access performed so well however.

Truncating 2 Strings and combined into smaller String [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
So, I have two strings that are each a max length of 100.
Dim a as String ''has a max length of 100
Dim b as String ''has a max length of 100
These two strings need to be truncated and combined into a new string.
Dim c as String 'has a max length of 100
I need to be able to truncate each string appropriately so that I can get string c as close to 100. I was going to do a bunch of statements by 25 to truncate each one.
if a.length = 100 and b.length =0 then
return a
else if a.length = 100 andalso b.length <= 25 then
return a.truncate(75) & b
else if a.length = 100 andalso b.length <= 50 then
return a.truncate(50) & b
else if....
and so one to hit all the scenarios...
I feel like there is a better way to do this and a more efficient way so that i may not hit scenarios like a.length = 100 and b.length = 51. I would be truncating more characters then needed.
Any suggestions?? Please critique me as needed.
EDIT, This is vb.Net..not C# (I'm between Projects) Sorry!
The reason i do not want to just add them together and truncate them is because if both strings are 100 in length, it will completely truncate off the second string. If they are both 100 then I would want to truncate string a to 50 in length and string b to 50 in length so when they are combined they are 100 total. In other words I need some text from both strings.
If the total length of the strings is greater than the limit then you could take a fraction of each in proportion to their lengths:
Module Module1
Function CombineWithLengthConstraint(a As String, b As String, totalLength As Integer) As String
' trivial case 1:
If totalLength < 1 Then
Return String.Empty
End If
Dim aLen = Len(a)
Dim bLen = Len(b)
' trivial case 2:
If aLen + bLen <= totalLength Then
Return a & b
End If
' impossible-to-satisfy-equably case:
If totalLength = 1 Then
If aLen > 0 Then
Return a.Substring(0, 1)
ElseIf bLen > 0 Then
Return b.Substring(0, 1)
Else
Return String.Empty
End If
End If
' aportion the lengths of the strings to be taken in the ratio of their lengths:
Dim aFrac = CInt(Math.Round(aLen / (aLen + bLen) * totalLength, MidpointRounding.AwayFromZero))
Dim bFrac = CInt(Math.Round(bLen / (aLen + bLen) * totalLength, MidpointRounding.AwayFromZero))
' ensure there is at least one character from each string...
If aFrac = 0 Then
aFrac = 1
bFrac -= 1
End If
If bFrac = 0 Then
bFrac = 1
aFrac -= 1
End If
Dim aPart = a.Substring(0, aFrac)
Dim bPart = b.Substring(0, bFrac)
Return aPart & bPart
End Function
Sub Main()
Dim a = New String("A"c, 10)
Dim b = New String("b"c, 40)
Dim c = CombineWithLengthConstraint(a, b, 10)
Console.WriteLine(c)
Console.WriteLine(Len(c))
Console.ReadLine()
End Sub
End Module
Outputs:
AAbbbbbbbb
10
As you can see, the first string, which was 1/5 of the total number of characters, ended up contributing 1/5 of the result.
The VB.NET Len function gives 0 if its argument is Nothing.
I tested it as working with all lengths from 0 to 100 of both strings being combined into one string of length 100 just in case I had made a mistake with the rounding or anything.
Of course, you could return, say, the ending part of string b instead of the starting part if that made sense in the particular application.
Although not exactly what you asked for, here's another option...
Public Function WeirdConcatinate(a As String, b As String) As String
Dim totalLen = a.Length + b.Length
If totalLen > 100 Then
Dim aLen = 100 * a.Length \ totalLen
Dim bLen = 100 - aLen
Return a.Remove(aLen) & b.Remove(bLen)
Else
Return a & b
End If
End Function
This will give you a number of characters from each string (approximately) proportional to how long they are compared to each other. If both strings are the same length, you'll get 50 from each. If a.Length = 100 and b.Length = 50, you'll end up with 66 from a and 34 from b.
Truncate them after concatenating them, then:
Dim c = a & b
If c.Length > 100 Then c = c.Remove(100)
If you want to preserve as much as possible of the start of each string:
Dim c = ""
If(a.Length > 50 AndAlso b.Length < 50)
c = a.Remove(100 - b.Length) & b
Else If a.Length > 50 AndAlso b.Length > 50
c= a.Remove(50) & b.Remove(50)
Else
c = a & b
End if
If c.Length > 100 Then c = c.Remove(100)
As with some other answers, the algorithm is open to interpretation. My method takes from each string until 100 total characters are taken or the string runs out of characters.
Private Function concat(a As String, b As String, length As Integer) As String
Dim ca As New System.Text.StringBuilder()
Dim cb As New System.Text.StringBuilder()
For i As Integer = 0 To length - 1
ca.Append(If(i >= a.Length, "", a(i)))
cb.Append(If(i >= b.Length, "", b(i)))
If ca.Length + cb.Length >= length Then Exit For
Next
Return (ca.ToString() & cb.ToString() & New String(" "c, 100)).Substring(0, length)
End Function
Sub Main()
Dim a As String = New String("a"c, 0)
Dim b As String = New String("b"c, 5)
Dim c As String = concat(a, b, 100)
Console.WriteLine($"'{c}'")
End Sub
'bbbbb '
(padded to 100 characters, doesn't render in block quote)
Dim a As String = New String("a"c, 30)
Dim b As String = New String("b"c, 90)
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'
Dim a As String = New String("a"c, 72)
Dim b As String = New String("b"c, 64)
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'
(your example in a comment. 72 >> 50, 64 >> 50)

Adding a space in a String after a certain character(VB Code)

Hello everyone.
Dim txt1 As Double = Convert.ToDouble(TextBox1.Text) / 100
Dim txt2 As Double = Convert.ToDouble(TextBox2.Text)
Dim txt3 As Double = Convert.ToDouble(TextBox3.Text)
Dim txtResult As Double = Convert.ToDouble(TextBox4.Text)
Dim result As Double = txt1 * txt2 * txt3
TextBox4.Text = result
As you can see I get my result depending on what the user types in. So I have to add a space after a certain character. Textbox14.text(0) <--- after this do I want my space. It's so that after the value is higher than 999 it should type out 1 000 and not 1000. Thank you very much for any useful help, I've truly looked everywhere, I just can't find anything.
You talking about group separator. Custom Numeric Format Strings
You can use .ToString() method and define group separator in the format.
TextBox4.Text = result.ToString("0,0.000")
Different separators will be used based on the local system's language/region settings.
You can define your custome separator manually
var cultureInfo = new System.Globalization.CultureInfo("en-US");
var numberInfo = cultureInfo.NumberFormat;
numberInfo.NumberGroupSeparator = " ";
TextBox4.Text = result.ToString("0,0.000", numberInfo)
If I get it right, you want every 3 chars a space, right ?
Like 1 000 000 ?
try this :
Dim result As String, str As String, ret As String
Dim i As Integer
Dim arr As Char()
'your text to space
result = "10000000"
'reverte so we start with the end
result = StrReverse(result)
i = 0
ret = ""
' make a char array which each char is an own array element
arr = result.Take(result.Length).ToArray
'iterate through all elements
For Each str In arr
' skip the first element .
' only add a space every 3 elements
If (i <> 0) And (i Mod 3 = 0) Then
ret = ret + " "
End If
ret = ret + str
i = i + 1
Next
' revers again the output
ret = StrReverse(ret)
MsgBox(ret)

My VBA code is aborting unexpectedly upon exiting a 'While' loop. Why?

I am new to VBA coding. I have done some coding in Javascript and C++, so I do understand the concepts. I'm not too familiar with the specifics of VBA, though. This particular code is for Excel 2007. The sort function was copied from elsewhere as pseudocode (documentation is not mine). I've rewritten it as VBA (unsuccessfully).
This code is not working properly. The code is abruptly aborting entirely (not just jumping out of a loop or function, but quitting completely after going through the While loop twice.
To replicate the problem, save this code as a Macro for an Excel sheet, type the number 9853 in B5, and in B6 type "=Kaprekar(B5)". Essentially, run Kaprekar(9853).
Could someone please help me figure out what I'm doing wrong here? Thanks.
By the way, I'm using While-Wend now. I also tried Do While-Loop with the same result.
Here's the code:
Function Sort(A)
limit = UBound(A)
For i = 1 To limit
' A[ i ] is added in the sorted sequence A[0, .. i-1]
' save A[i] to make a hole at index iHole
Item = A(i)
iHole = i
' keep moving the hole to next smaller index until A[iHole - 1] is <= item
While ((iHole > 0) And (A(iHole - 1) > Item))
' move hole to next smaller index
A(iHole) = A(iHole - 1)
iHole = iHole - 1
Wend
' put item in the hole
A(iHole) = Item
Next i
Sort = A
End Function
Function Kaprekar%(Original%)
Dim Ord(0 To 3) As Integer
Ord(0) = Original \ 1000
Ord(1) = (Original - (Ord(0) * 1000)) \ 100
Ord(2) = (Original - (Ord(1) * 100) - (Ord(0) * 1000)) \ 10
Ord(3) = (Original - (Ord(2) * 10) - (Ord(1) * 100) - (Ord(0) * 1000))
If (Ord(0) = Ord(1)) * (Ord(1) = Ord(2)) * (Ord(2) = Ord(3)) * (Ord(3) = Ord(0)) = 1 Then
Kaprekar = -1
Exit Function
End If
Arr = Sort(Ord)
Kaprekar = Ord(3)
End Function
excel evaluates both items in the while statement, so
While ((ihole > 0) And (A(ihole - 1) > item))
when ihole=0, returns false for the first test, and out of bounds for the second test, bombing out of the function with a #Value error.
A quick bubblesort would be something like this:
Option Explicit
Function Sort(A)
Dim iLoop As Long
Dim jLoop As Long
Dim Last As Long
Dim Temp
Last = UBound(A)
For iLoop = 0 To Last - 1
For jLoop = iLoop + 1 To Last
If A(iLoop) > A(jLoop) Then
Temp = A(jLoop)
A(jLoop) = A(iLoop)
A(iLoop) = Temp
End If
Next jLoop
Next iLoop
Sort = A
End Function

SQLite full-text search relevance ranking

I am using the fts4 extension of sqlite3 to enable full-text indexing and searching of text data. This it working great, but I've noticed that the results are not relevance-ranked at all. I guess I am too used to Lucene. I've seen some brief suggestions to write a custom rank method using the matchinfo() results, but it's not clear to me how this is done, or whether there are any sophisticated examples out there. How have others dealt with this?
There's a complete example in the documentation, look at the end of appendix a. You'll need to do slightly more work to get a good relevance ranking as the function provided is good only for getting started. For example, with matchinfo(table,'pcnalx') there's enough information to implement Okapi BM25.
There seems to be a distinct lack of documentation on how to implement Okapi BM25 in C and it seems it is an unspoken thing that the implementation is left as an exercise for the user.
Well I found the bro of a programmer "Radford 'rads' Smith" who chucked this up on GitHub
https://github.com/rads/sqlite-okapi-bm25
It only implements BM25 although I'm looking into BM25F tweaks now....
....and here it is.
https://github.com/neozenith/sqlite-okapi-bm25
For FTS5, according to SQLite FTS5 Extension,
FTS5 has no matchinfo().
FTS5 supports ORDER BY rank
So very simply, something like
SELECT * FROM email WHERE email MATCH 'fts5' ORDER BY rank;
without DESC works.
Here is an implementation of Okapi BM25. Using this in combination with the suggestions at SQLite.org will help you generate a relevance-ranked MATCH query. This was written all in VB.Net and the query was called using System.Data.SQLite functions. The custom SQLiteFunction at the end can be called from the SQL code without issue, as long as the SQL code is called using System.Data.SQLite functions.
Public Class MatchInfo
Property matchablePhrases As Integer
Property userDefinedColumns As Integer
Property totalDocuments As Integer
Private _int32HitData As List(Of Integer)
Private _longestSubsequencePhraseMatches As New List(Of Integer)
Private _tokensInDocument As New List(Of Integer)
Private _averageTokensInDocument As New List(Of Integer)
Private _max_hits_this_row As Integer?
Public ReadOnly Property max_hits_this_row As Integer
Get
If _max_hits_this_row Is Nothing Then
_max_hits_this_row = 0
For p = 0 To matchablePhrases - 1
For c = 0 To userDefinedColumns - 1
Dim myHitsThisRow As Integer = hits_this_row(p, c)
If myHitsThisRow > _max_hits_this_row Then
_max_hits_this_row = myHitsThisRow
End If
Next
Next
End If
Return _max_hits_this_row
End Get
End Property
Private _max_hits_all_rows As Integer?
Public ReadOnly Property max_hits_all_rows As Integer
Get
If _max_hits_all_rows Is Nothing Then
_max_hits_all_rows = 0
For p = 0 To matchablePhrases - 1
For c = 0 To userDefinedColumns - 1
Dim myHitsAllRows As Integer = hits_all_rows(p, c)
If myHitsAllRows > _max_hits_all_rows Then
_max_hits_all_rows = myHitsAllRows
End If
Next
Next
End If
Return _max_hits_all_rows
End Get
End Property
Private _max_docs_with_hits As Integer?
Public ReadOnly Property max_docs_with_hits As Integer
Get
If _max_docs_with_hits Is Nothing Then
_max_docs_with_hits = 0
For p = 0 To matchablePhrases - 1
For c = 0 To userDefinedColumns - 1
Dim myDocsWithHits As Integer = docs_with_hits(p, c)
If myDocsWithHits > _max_docs_with_hits Then
_max_docs_with_hits = myDocsWithHits
End If
Next
Next
End If
Return _max_docs_with_hits
End Get
End Property
Private _BM25Rank As Double?
Public ReadOnly Property BM25Rank As Double
Get
If _BM25Rank Is Nothing Then
_BM25Rank = 0
'calculate BM25 Rank
'http://en.wikipedia.org/wiki/Okapi_BM25
'k1, calibrates the document term frequency scaling. Having k1 as 0 corresponds to a binary model – no term frequency. Increasing k1 will give rare words more boost.
'b, calibrates the scaling by document length, and can take values from 0 to 1, where having 0 means no length normalization and having 1 corresponds to fully scaling the term weight by the document length.
Dim k1 As Double = 1.2
Dim b As Double = 0.75
For column = 0 To userDefinedColumns - 1
For phrase = 0 To matchablePhrases - 1
Dim IDF As Double = Math.Log((totalDocuments - hits_all_rows(phrase, column) + 0.5) / (hits_all_rows(phrase, column) + 0.5))
Dim score As Double = (IDF * ((hits_this_row(phrase, column) * (k1 + 1)) / (hits_this_row(phrase, column) + k1 * (1 - b + b * _tokensInDocument(column) / _averageTokensInDocument(column)))))
If score < 0 Then
score = 0
End If
_BM25Rank += score
Next
Next
End If
Return _BM25Rank
End Get
End Property
Public Sub New(raw_pcnalsx_MatchInfo As Byte())
Dim int32_pcsx_MatchInfo As New List(Of Integer)
For i = 0 To raw_pcnalsx_MatchInfo.Length - 1 Step 4
int32_pcsx_MatchInfo.Add(BitConverter.ToUInt32(raw_pcnalsx_MatchInfo, i))
Next
'take the raw data and parse it out
Me.matchablePhrases = int32_pcsx_MatchInfo(0)
int32_pcsx_MatchInfo.RemoveAt(0)
Me.userDefinedColumns = int32_pcsx_MatchInfo(0)
int32_pcsx_MatchInfo.RemoveAt(0)
Me.totalDocuments = int32_pcsx_MatchInfo(0)
int32_pcsx_MatchInfo.RemoveAt(0)
'remember that the columns are 0-based
For i = 0 To userDefinedColumns - 1
_averageTokensInDocument.Add(int32_pcsx_MatchInfo(0))
int32_pcsx_MatchInfo.RemoveAt(0)
Next
For i = 0 To userDefinedColumns - 1
_tokensInDocument.Add(int32_pcsx_MatchInfo(0))
int32_pcsx_MatchInfo.RemoveAt(0)
Next
For i = 0 To userDefinedColumns - 1
_longestSubsequencePhraseMatches.Add(int32_pcsx_MatchInfo(0))
int32_pcsx_MatchInfo.RemoveAt(0)
Next
_int32HitData = New List(Of Integer)(int32_pcsx_MatchInfo)
End Sub
Public Function hits_this_row(phrase As Integer, column As Integer) As Integer
Return _int32HitData(3 * (column + phrase * userDefinedColumns) + 0)
End Function
Public Function hits_all_rows(phrase As Integer, column As Integer) As Integer
Return _int32HitData(3 * (column + phrase * userDefinedColumns) + 1)
End Function
Public Function docs_with_hits(phrase As Integer, column As Integer) As Integer
Return _int32HitData(3 * (column + phrase * userDefinedColumns) + 2)
End Function
End Class
<SQLiteFunction("Rank", 1, FunctionType.Scalar)>
Public Class Rank
Inherits SQLiteFunction
Public Overrides Function Invoke(args() As Object) As Object
Return New MatchInfo(args(0)).BM25Rank
End Function
End Class

Resources