Is there is efficient way to convert a fixed byte slice to a string without adding null characters to the string?
The traditional way to convert a string from a byte slice is the following:
out := string(b[STRIDX:STRIDX+STRLEN])
While this returns a string, the length is always equal to the byte slice length. So while the string looks normal on a Print statement it is still referencing potentiality null values.This has some very odd effects if you append characters to this string.
Right now i scan the byte slice for nulls to limit the byte slice i feed to string. Not very pretty or efficient.
Example: https://play.golang.org/p/hOoaqCOoFl0
Write a simple function:
func CToGoString(b []byte) string {
i := bytes.IndexByte(b, 0)
if i < 0 {
i = len(b)
}
return string(b[:i])
}
For your example,
package main
import (
"bytes"
"fmt"
)
func CToGoString(b []byte) string {
i := bytes.IndexByte(b, 0)
if i < 0 {
i = len(b)
}
return string(b[:i])
}
const (
BUFLEN = 50
STRLEN = 10
STRIDX = 10
)
func main() {
test := "test"
b := [BUFLEN]byte{}
fmt.Printf("Original\n\tString: '%+v' with length '%d'\n", test, len(test))
copy(b[10:], []byte(test))
s := CToGoString(b[STRIDX : STRIDX+STRLEN])
fmt.Printf("Unpacking with []byte()\n\tString: '%+v' with length '%d' Buf:%+v\n", s, len(s), []byte(s))
}
Playground: https://play.golang.org/p/mH3CBdM6eG_l
Output:
Original
String: 'test' with length '4'
Unpacking with []byte()
String: 'test' with length '4' Buf:[116 101 115 116]
Related
I'm wondering if there is an easy way, such as well known functions to handle code points/runes, to take a chunk out of the middle of a rune slice without messing it up or if it's all needs to coded ourselves to get down to something equal to or less than a maximum number of bytes.
Specifically, what I am looking to do is pass a string to a function, convert it to runes so that I can respect code points and if the slice is longer than some maximum bytes, remove enough runes from the center of the runes to get the bytes down to what's necessary.
This is simple math if the strings are just single byte characters and be handled something like:
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
excess := len(in) - maxLen
start := maxLen/2 - excess/2
return in[:start] + in[start+excess:]
}
return in
}
but in a variable character width byte string it's either going to be a fair bit more coding looping through or there will be nice functions to make this easy. Does anyone have a code sample of how to best handle such a thing with runes?
The idea here is that the DB field the string will go into has a fixed maximum length in bytes, not code points so there needs to be some algorithm from runes to maximum bytes. The reason for taking the characters from the the middle of the string is just the needs of this particular program.
Thanks!
EDIT:
Once I found out that the range operator respected runes on strings this became easy to do with just strings which I found because of the great answers below. I shouldn't have to worry about the string being a well formed UTF format in this case but if I do I now know about the UTF module, thanks!
Here's what I ended up with:
package main
import (
"fmt"
)
func ShortenStringIDToMaxLength(in string, maxLen int) string {
if maxLen < 1 {
// Panic/log whatever is your error system of choice.
}
bytes := len(in)
if bytes > maxLen {
excess := bytes - maxLen
lPos := bytes/2 - excess/2
lastPos := 0
for pos, _ := range in {
if pos > lPos {
lPos = lastPos
break
}
lastPos = pos
}
rPos := lPos + excess
for pos, _ := range in[lPos:] {
if pos >= excess {
rPos = pos
break
}
}
return in[:lPos] + in[lPos+rPos:]
}
return in
}
func main() {
out := ShortenStringIDToMaxLength(`123456789 123456789`, 5)
fmt.Println(out, len(out))
}
https://play.golang.org/p/YLGlj_17A-j
Here is an adaptation of your algorithm, which removes incomplete runes from the beginning of your prefix and the end of your suffix :
func TrimLastIncompleteRune(s string) string {
l := len(s)
for i := 1; i <= l; i++ {
suff := s[l-i : l]
// repeatedly try to decode a rune from the last bytes in string
r, cnt := utf8.DecodeRuneInString(suff)
if r == utf8.RuneError {
continue
}
// if success : return the substring which contains
// this succesfully decoded rune
lgth := l - i + cnt
return s[:lgth]
}
return ""
}
func TrimFirstIncompleteRune(s string) string {
// repeatedly try to decode a rune from the beginning
for i := 0; i < len(s); i++ {
if r, _ := utf8.DecodeRuneInString(s[i:]); r != utf8.RuneError {
// if success : return
return s[i:]
}
}
return ""
}
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
firstHalf := maxLen / 2
secondHalf := len(in) - (maxLen - firstHalf)
prefix := TrimLastIncompleteRune(in[:firstHalf])
suffix := TrimFirstIncompleteRune(in[secondHalf:])
return prefix + suffix
}
return in
}
link on play.golang.org
This algorithm only tries to drop more bytes from the selected prefix and suffix.
If it turns out that you need to drop 3 bytes from the suffix to have a valid rune, for example, it does not try to see if it can add 3 more bytes to the prefix, to have an end result closer to maxLen bytes.
You can use simple arithmetic to find start and end such that the string s[:start] + s[end:] is shorter than your byte limit. But you need to make sure that start and end are both the first byte of any utf-8 sequence to keep the sequence valid.
UTF-8 has the property that any given byte is the first byte of a sequence as long as its top two bits aren't 10.
So you can write code something like this (playground: https://play.golang.org/p/xk_Yo_1wTYc)
package main
import (
"fmt"
)
func truncString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
start := (maxLen + 1) / 2
for start > 0 && s[start]>>6 == 0b10 {
start--
}
end := len(s) - (maxLen - start)
for end < len(s) && s[end]>>6 == 0b10 {
end++
}
return s[:start] + s[end:]
}
func main() {
fmt.Println(truncString("this is a test", 5))
fmt.Println(truncString("日本語", 7))
}
This code has the desirable property that it takes O(maxLen) time, no matter how long the input string (assuming it's valid utf-8).
I am using the below code to encrypt and decrypt the data. Now I want to encrypt the data from Node JS and want to decrypt the data from Go lang. But I am not able to achieve it using GO lang.
var B64XorCipher = {
encode: function(key, data) {
return new Buffer(xorStrings(key, data),'utf8').toString('base64');
},
decode: function(key, data) {
data = new Buffer(data,'base64').toString('utf8');
return xorStrings(key, data);
}
};
function xorStrings(key,input){
var output='';
for(var i=0;i<input.length;i++){
var c = input.charCodeAt(i);
var k = key.charCodeAt(i%key.length);
output += String.fromCharCode(c ^ k);
}
return output;
}
From go I am trying to decode like below I am not able to achieve it.
bytes, err := base64.StdEncoding.DecodeString(actualInput)
encryptedText := string(bytes)
fmt.Println(EncryptDecrypt(encryptedText, "XXXXXX"))
func EncryptDecrypt(input, key string) (output string) {
for i := range input {
output += string(input[i] ^ key[i%len(key)])
}
return output
}
Can someone help me to resolve it.
You should use DecodeRuneInString instead of just slice string to byte.
Solution in playground: https://play.golang.org/p/qi_6S1J_dZU
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
fmt.Println("Hello, playground")
k:="1234fd23434"
input:="The 我characterode我 113 is equal to q"
fmt.Println(EncryptDecrypt(input,k))
// expect: "eZV扷ZRFRWEWA[戣[#GRX#^B"
}
func EncryptDecrypt(input, key string) (output string) {
keylen := len(key)
count := len(input)
i := 0
j := 0
for i < count {
c, n := utf8.DecodeRuneInString(input[i:])
i += n
k, m := utf8.DecodeRuneInString(key[j:])
j += m
if j >= keylen {
j = 0
}
output += string(c ^ k)
}
return output
}
compared to your js result
function xorStrings(key,input){
var output='';
for(var i=0;i<input.length;i++){
var c = input.charCodeAt(i);
var k = key.charCodeAt(i%key.length);
output += String.fromCharCode(c ^ k);
}
return output;
}
console.log(xorStrings('1234fd23434',"The 我characterode我 113 is equal to q"))
// expect: "eZV扷ZRFRWEWA[戣[#GRX#^B"
The test result is the same.
Here is why.
In go, when you range a string, you iterate bytes, but javascript charCodeAt is for character,not byte. In utf-8, the character is maybe 2 or 3 bytes long. So that is why you got different output.
Test in playground https://play.golang.org/p/XawI9aR_HDh
package main
import (
"fmt"
"unicode/utf8"
)
var sentence = "The 我quick brown fox jumps over the lazy dog."
var index = 4
func main() {
fmt.Println("slice of string...")
fmt.Printf("The byte at %d is |%s|, |%s| is 3 bytes long.\n",index,sentence[index:index+1],sentence[index:index+3])
fmt.Println("runes of string...")
ru, _ := utf8.DecodeRuneInString(sentence[index:])
i := int(ru)
fmt.Printf("The character code at %d is|%s|%d| \n",index, string(ru), i)
}
The output is
slice of string...
The byte at 4 is |�|, |我| is 3 bytes long.
runes of string...
The character code at 4 is|我|25105|
The charCodeAt() method returns an integer between 0 and 65535
representing the UTF-16 code unit at the given index.
var c = input.charCodeAt(i);
For statements with range clause
For a string value, the "range" clause iterates over the Unicode code
points in the string starting at byte index 0. On successive
iterations, the index value will be the index of the first byte of
successive UTF-8-encoded code points in the string, and the second
value, of type rune, will be the value of the corresponding code
point. If the iteration encounters an invalid UTF-8 sequence, the
second value will be 0xFFFD, the Unicode replacement character, and
the next iteration will advance a single byte in the string.
for i := range input
UTF-16 versus UTF-8?
How do I get the index of a substring in a string skipping starting with a certain position/with a certain offset, e.g.:
package main
import (
"fmt"
"strings"
)
func main() {
string := "something.value=something=end"
index1 := strings.Index(string, "value=")
fmt.Println(index1) // prints 10
// index2 = ... How do I get the position of the second =, 25?
}
Similar offset in PHP int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )
The strings package does not provide you such function, but in practice it's rarely needed. Often the strings.Split() function is used to easily split strings into tokens / parts.
But if you do need it: you can simply slice a string, which is efficient (no copy is made, the result shares memory with the original string value).
So effectively the function you're looking for would look like this:
func Index(s, substr string, offset int) int {
if len(s) < offset {
return -1
}
if idx := strings.Index(s[offset:], substr); idx >= 0 {
return offset + idx
}
return -1
}
Example using it:
s := "something.value=something=end"
index1 := strings.Index(s, "value=")
fmt.Println(index1) // prints 10
index2 := Index(s, "=", index1+len("value="))
fmt.Println(index2) // prints 25
Output (try it on the Go Playground):
10
25
Note that when slicing a string, and the offset you have to pass to our Index() function is the byte index, not the rune (character) index. They are equal as long as you have characters with less than 128 codepoints, but beyond that the byte index will be greater than the rune index because those codepoints map to multiple bytes in UTF-8 encoding (which is how Go stores strings in memory). strings.Index() returns you the byte index, and len(s) also returns you the byte-length, so the example will work with all strings properly.
Your original task using strings.Split() could look like this:
s := "something.value=something=end"
parts := strings.Split(s, "=")
fmt.Println(parts)
Which outputs (try it on the Go Playground):
[something.value something end]
The value you want to "parse" out is in parts[1].
taking a slice of string with utf-8 characters may produce corrupted strings as you need to convert it to runes
[]rune(videoHtml)[0:index]
I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}
That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"
I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short
This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)
Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point
Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '', '🦰') // U+1F468 U+200D U+1F9B0
go playground
How do you convert a string to its binary representation in Go?
Example:
Input: "A"
Output: "01000001"
In my testing, fmt.Sprintf("%b", 75) only works on integers.
Cast the 1-character string to a byte in order to get its numerical representation.
s := "A"
st := fmt.Sprintf("%08b", byte(s[0]))
fmt.Println(st)
Output: "01000001"
(Bear in mind code "%b" (without number in between) causes leading zeros in output to be dropped.)
You have to iterate over the runes of the string:
func toBinaryRunes(s string) string {
var buffer bytes.Buffer
for _, runeValue := range s {
fmt.Fprintf(&buffer, "%b", runeValue)
}
return fmt.Sprintf("%s", buffer.Bytes())
}
Or over the bytes:
func toBinaryBytes(s string) string {
var buffer bytes.Buffer
for i := 0; i < len(s); i++ {
fmt.Fprintf(&buffer, "%b", s[i])
}
return fmt.Sprintf("%s", buffer.Bytes())
}
Live playground:
http://play.golang.org/p/MXZ1Y17xWa