If I have the string "12121211122" and I want to get the last 3 characters (e.g. "122"), is that possible in Go? I've looked in the string package and didn't see anything like getLastXcharacters.
You can use a slice expression on a string to get the last three bytes.
s := "12121211122"
first3 := s[0:3]
last3 := s[len(s)-3:]
Or if you're using unicode you can do something like:
s := []rune("世界世界世界")
first3 := string(s[0:3])
last3 := string(s[len(s)-3:])
Check Strings, bytes, runes and characters in Go and Slice Tricks.
The answer depends on what you mean by "characters". If you mean bytes then:
s := "12121211122"
lastByByte := s[len(s)-3:]
If you mean runes in a utf-8 encoded string, then:
s := "12121211122"
j := len(s)
for i := 0; i < 3 && j > 0; i++ {
_, size := utf8.DecodeLastRuneInString(s[:j])
j -= size
}
lastByRune := s[j:]
You can also convert the string to a []rune and operate on the rune slice, but that allocates memory.
Related
I'm wondering if there is an easy way, such as well known functions to handle code points/runes, to take a chunk out of the middle of a rune slice without messing it up or if it's all needs to coded ourselves to get down to something equal to or less than a maximum number of bytes.
Specifically, what I am looking to do is pass a string to a function, convert it to runes so that I can respect code points and if the slice is longer than some maximum bytes, remove enough runes from the center of the runes to get the bytes down to what's necessary.
This is simple math if the strings are just single byte characters and be handled something like:
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
excess := len(in) - maxLen
start := maxLen/2 - excess/2
return in[:start] + in[start+excess:]
}
return in
}
but in a variable character width byte string it's either going to be a fair bit more coding looping through or there will be nice functions to make this easy. Does anyone have a code sample of how to best handle such a thing with runes?
The idea here is that the DB field the string will go into has a fixed maximum length in bytes, not code points so there needs to be some algorithm from runes to maximum bytes. The reason for taking the characters from the the middle of the string is just the needs of this particular program.
Thanks!
EDIT:
Once I found out that the range operator respected runes on strings this became easy to do with just strings which I found because of the great answers below. I shouldn't have to worry about the string being a well formed UTF format in this case but if I do I now know about the UTF module, thanks!
Here's what I ended up with:
package main
import (
"fmt"
)
func ShortenStringIDToMaxLength(in string, maxLen int) string {
if maxLen < 1 {
// Panic/log whatever is your error system of choice.
}
bytes := len(in)
if bytes > maxLen {
excess := bytes - maxLen
lPos := bytes/2 - excess/2
lastPos := 0
for pos, _ := range in {
if pos > lPos {
lPos = lastPos
break
}
lastPos = pos
}
rPos := lPos + excess
for pos, _ := range in[lPos:] {
if pos >= excess {
rPos = pos
break
}
}
return in[:lPos] + in[lPos+rPos:]
}
return in
}
func main() {
out := ShortenStringIDToMaxLength(`123456789 123456789`, 5)
fmt.Println(out, len(out))
}
https://play.golang.org/p/YLGlj_17A-j
Here is an adaptation of your algorithm, which removes incomplete runes from the beginning of your prefix and the end of your suffix :
func TrimLastIncompleteRune(s string) string {
l := len(s)
for i := 1; i <= l; i++ {
suff := s[l-i : l]
// repeatedly try to decode a rune from the last bytes in string
r, cnt := utf8.DecodeRuneInString(suff)
if r == utf8.RuneError {
continue
}
// if success : return the substring which contains
// this succesfully decoded rune
lgth := l - i + cnt
return s[:lgth]
}
return ""
}
func TrimFirstIncompleteRune(s string) string {
// repeatedly try to decode a rune from the beginning
for i := 0; i < len(s); i++ {
if r, _ := utf8.DecodeRuneInString(s[i:]); r != utf8.RuneError {
// if success : return
return s[i:]
}
}
return ""
}
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
firstHalf := maxLen / 2
secondHalf := len(in) - (maxLen - firstHalf)
prefix := TrimLastIncompleteRune(in[:firstHalf])
suffix := TrimFirstIncompleteRune(in[secondHalf:])
return prefix + suffix
}
return in
}
link on play.golang.org
This algorithm only tries to drop more bytes from the selected prefix and suffix.
If it turns out that you need to drop 3 bytes from the suffix to have a valid rune, for example, it does not try to see if it can add 3 more bytes to the prefix, to have an end result closer to maxLen bytes.
You can use simple arithmetic to find start and end such that the string s[:start] + s[end:] is shorter than your byte limit. But you need to make sure that start and end are both the first byte of any utf-8 sequence to keep the sequence valid.
UTF-8 has the property that any given byte is the first byte of a sequence as long as its top two bits aren't 10.
So you can write code something like this (playground: https://play.golang.org/p/xk_Yo_1wTYc)
package main
import (
"fmt"
)
func truncString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
start := (maxLen + 1) / 2
for start > 0 && s[start]>>6 == 0b10 {
start--
}
end := len(s) - (maxLen - start)
for end < len(s) && s[end]>>6 == 0b10 {
end++
}
return s[:start] + s[end:]
}
func main() {
fmt.Println(truncString("this is a test", 5))
fmt.Println(truncString("日本語", 7))
}
This code has the desirable property that it takes O(maxLen) time, no matter how long the input string (assuming it's valid utf-8).
I have a set of strings (less than 30) of length 1 to ~30. I need to find the subset of at least two strings that share the longest possible prefix- + suffix-combination.
For example, let the set be
Foobar
Facar
Faobaron
Gweron
Fzobar
The prefix/suffix F/ar has a combined length of 3 and is shared by Foobar, Facar and Fzobar; the prefix/suffix F/obar has a combined length of 5 and is shared by Foobar and Fzobar. The searched-for prefix/suffix is F/obar.
Note that this is not to be confused with the longest common prefix/suffix, since only two or more strings from the set need to share the same prefix+suffix. Also note that the sum of the lengths of both the prefix and the suffix is what is to be maximized, so both need to be taken into account. The prefix or suffix may be the empty string.
Does anyone know of an efficient method to implement this?
How about this:
maxLen := -1;
for I := 0 to Len(A) - 1 do
if Len(A[I]) > maxLen then // (1)
for J := 0 to Len(A[I]) do
for K := 0 to Len(A[I]) - J do
if J+K > maxLen then // (2)
begin
prf := LeftStr(A[I], J);
suf := RightStr(A[I], K);
found := False;
for m := 0 to Len(sufList) - 1 do
if (sufList[m] = suf) and (prfList[m] = prf) then
begin
maxLen := J+K;
Result := prf+'/'+suf;
found := True;
// (3)
n := 0;
while n < Len(sufList) do
if Len(sufList[n])+Len(prfList[n]) <= maxLen then
begin
sufList.Delete(n);
prfList.Delete(n);
end
else
Inc(n);
// (end of 3)
Break;
end;
if not found then
begin
sufList.Add(suf);
prfList.Add(prf);
end;
end;
In this example maxLen keeps sum of lengths of longest found prefix/suffix so far. The most important part of it is the line marked with (2). It bypasses lots of unnecessary string comparisons. In section (3) it eliminates any existing prefix/suffix that is shorter than newly found one (winch is duplicated).
How found offset index a string in []rune using go?
I can do this work with string type.
if i := strings.Index(input[offset:], "}}"); i > 0 {print(i);}
but i need for runes.
i have a rune and want get offset index.
how can do this work with runes type in go?
example for more undrestand want need:
int offset=0//mean start from 0 (this is important for me)
string text="123456783}}56"
if i := strings.Index(text[offset:], "}}"); i > 0 {print(i);}
output of this example is : 9
but i want do this with []rune type(text variable)
may?
see my current code : https://play.golang.org/p/seImKzVpdh
tank you.
Edit #2: You again indicated a new type "meaning" of your question: you want to search a string in a []rune.
Answer: this is not supported directly in the standard library. But it's easy to implement it with 2 for loops:
func search(text []rune, what string) int {
whatRunes := []rune(what)
for i := range text {
found := true
for j := range whatRunes {
if text[i+j] != whatRunes[j] {
found = false
break
}
}
if found {
return i
}
}
return -1
}
Testing it:
value := []rune("123}456}}789")
result := search(value, "}}")
fmt.Println(result)
Output (try it on the Go Playground):
7
Edit: You updated the question indicating that you want to search runes in a string.
You may easily convert a []rune to a string using a simple type conversion:
toSearchRunes := []rune{'}', '}'}
toSearch := string(toSearchRunes)
And from there on, you can use strings.Index() as you did in your example:
if i := strings.Index(text[offset:], toSearch); i > 0 {
print(i)
}
Try it on the Go Playground.
Original answer follows:
string values in Go are stored as UTF-8 encoded bytes. strings.Index() returns you the byte position if the given substring is found.
So basically what you want is to convert this byte-position to rune-position. The unicode/utf8 package contains utility functions for telling the rune-count or rune-length of a string: utf8.RuneCountInString().
So basically you just need to pass the substring to this function:
offset := 0
text := "123456789}}56"
if i := strings.Index(text[offset:], "}}"); i > 0 {
fmt.Println("byte-pos:", i, "rune-pos:", utf8.RuneCountInString(text[offset:i]))
}
text = "世界}}世界"
if i := strings.Index(text[offset:], "}}"); i > 0 {
fmt.Println("byte-pos:", i, "rune-pos:", utf8.RuneCountInString(text[offset:i]))
}
Output (try it on the Go Playground):
byte-pos: 9 rune-pos: 9
byte-pos: 6 rune-pos: 2
Note: offset must also be a byte position, because when slicing a string like text[offset:], the index is interpreted as byte-index.
If you want to get the index of a rune, use strings.IndexRune() instead of strings.Index().
I have the text inside TEdit box:
'955-986, total = 32'
How would I delete all text after the comma, so it will only left '955-986'
I tried to limit the TEdit Length, but it's not working as I wanted it to be.
What if there'd be no comma? full non-cut string or empty string ?
Below is your idea of limiting string length, but only applied if at least one comma was found.
var
tmpStr:string;
commaPosition:integer;
begin
tmpStr := Edit1.Text;
commaPosition := pos(',',tmpStr);
if commaPosition > 0 then begin
SetLength(tmpStr, commaPosition - 1);
Edit1.Text := tmpStr;
end;
end;
You could use this code:
var
tmpStr:string;
commaPosition:integer;
begin
tmpStr := Edit1.Text;
commaPosition := pos(',',tmpStr);
tmpStr := copy(tmpStr,1,commaPosition-1);
Edit1.Text := tmpStr;
end;
I'm not a Delphi-Programmer (any more). However, I guess you get the String from the Text-Property of your TEdit-Box object, search for the first occurrence of , and get the index thereof and replace the Text contained in your TEdit-Box by the substring from the beginning of the current string to the found index.
edit.Text := Copy(edit.Text, 1, Pos(',', edit.Text)-1);
Sources:
http://docwiki.embarcadero.com/Libraries/en/System.Copy
http://docwiki.embarcadero.com/Libraries/en/System.Pos
TEdit.Text is a property and cannot be passed as var parameter. But once you introduce temporary variable, you can delegate checking of character index returned from Pos to Delete and it will handle all of cases.
var
S: string;
begin
S := Edit1.Text; // try '955-986, total = 32' and '955-986; total = 32'
Delete(S, Pos(',', S), MaxInt);
Edit1.Text := S;
end;
I need to determine the total number of characters in a textbox and display the value in a label, but all whitespace need to be excluded.
Here is the code:
var
sLength : string;
i : integer;
begin
sLength := edtTheText.Text;
slength:= ' ';
i := length(sLength);
//display the length of the string
lblLength.Caption := 'The string is ' + IntToStr(i) + ' characters long';
You can count the non-white space characters like this:
uses
Character;
function NonWhiteSpaceCharacterCount(const str: string): Integer;
var
c: Char;
begin
Result := 0;
for c in str do
if not Character.IsWhiteSpace(c) then
inc(Result);
end;
This uses Character.IsWhiteSpace to determine whether or not a character is whitespace. IsWhiteSpace returns True if and only if the character is classified as being whitespace, according to the Unicode specification. So, tab characters count as whitespace.
If you are using an Ansi version of Delphi you can also use a Lookup Table with something like
NotBlanks: Array[0..255] Of Boolean
A Bool in the array is set if the matching character is not a blank. Then In the loop you simply increment your counter
Count := 0;
For i := 1 To Length(MyStringToParse) Do
Inc(Count, Byte(NotBlanks[ Ord(MyStringToParse[i]])) );
In the same fashion you can use a set:
For i := 1 To Length(MyStringToParse) Do
If Not (MyStringToParse[i] In [#1,#2{define the blanks in this enum}]) Then
Inc(Count).
Actually you have many ways to solve this.