Golang converting from rune to string - string

I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}

That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"

I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short

This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)

Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point

Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨‍🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '‍', '🦰') // U+1F468 U+200D U+1F9B0
go playground

Related

How can I iterate over each 2 consecutive characters in a string in go?

I have a string like this:
package main
import "fmt"
func main() {
some := "p1k4"
for i, j := range some {
fmt.Println()
}
}
I want take each two consecutive characters in the string and print them. the output should like p1, 1k, k4, 4p.
I have tried it and still having trouble finding the answer, how should I write the code in go and get the output I want?
Go stores strings in memory as their UTF-8 encoded byte sequence. This maps ASCII charactes one-to-one in bytes, but characters outside of that range map to multiple bytes.
So I would advise to use the for range loop over a string, which ranges over the runes (characters) of the string, properly decoding multi-byte runes. This has the advantage that it does not require allocation (unlike converting the string to []rune). You may also print the pairs using fmt.Printf("%c%c", char1, char2), which also will not require allocation (unlike converting runes back to string and concatenating them).
To learn more about strings, characters and runes in Go, read blog post: Strings, bytes, runes and characters in Go
Since the loop only returns the "current" rune in the iteration (but not the previous or the next rune), use another variable to store the previous (and first) runes so you have access to them when printing.
Let's write a function that prints the pairs as you want:
func printPairs(s string) {
var first, prev rune
for i, r := range s {
if i == 0 {
first, prev = r, r
continue
}
fmt.Printf("%c%c, ", prev, r)
prev = r
}
// Print last pair: prev is the last rune
fmt.Printf("%c%c\n", prev, first)
}
Testing it with your input and with another string that has multi-byte runes:
printPairs("p1k4")
printPairs("Go-世界")
Output will be (try it on the Go Playground):
p1, 1k, k4, 4p
Go, o-, -世, 世界, 界G
package main
import (
"fmt"
)
func main() {
str := "12345"
for i := 0; i < len(str); i++ {
fmt.Println(string(str[i]) + string(str[(i+1)%len(str)]))
}
}
This is a simple for loop over your string with the first character appended at the back:
package main
import "fmt"
func main() {
some := "p1k4"
ns := some + string(some[0])
for i := 0; i < len(ns)-1; i++ {
fmt.Println(ns[i:i+2])
}
}

How to convert a string to rune?

Here is my code snippet:
var converter = map[rune]rune {//some data}
sample := "⌘こんにちは"
var tmp string
for _, runeValue := range sample {
fmt.Printf("%+q", runeValue)
tmp = fmt.Sprintf("%+q", runeValue)
}
The output of fmt.Printf("%+q", runeValue) is:
'\u2318'
'\u3053'
'\u3093'
'\u306b'
'\u3061'
'\u306f'
These value are literally rune but as the return type of Sprintf is string, I cannot use it in my map which is [rune]rune.
I was wondering how can I convert string to rune, or in other words how can I handle this problem?
A string is not a single rune, it may contain multiple runes. You may use a simple type conversion to convert a string to a []runes containing all its runes like []rune(sample).
The for range iterates over the runes of a string, so in your example runeValue is of type rune, you may use it in your converter map, e.g.:
var converter = map[rune]rune{}
sample := "⌘こんにちは"
for _, runeValue := range sample {
converter[runeValue] = runeValue
}
fmt.Println(converter)
But since rune is an alias for int32, printing the above converter map will print integer numbers, output will be:
map[8984:8984 12371:12371 12385:12385 12395:12395 12399:12399 12435:12435]
If you want to print characters, use the %c verb of fmt.Printf():
fmt.Printf("%c\n", converter)
Which will output:
map[⌘:⌘ こ:こ ち:ち に:に は:は ん:ん]
Try the examples on the Go Playground.
If you want to replace (switch) certain runes in a string, use the strings.Map() function, for example:
sample := "⌘こんにちは"
result := strings.Map(func(r rune) rune {
if r == '⌘' {
return 'a'
}
if r == 'こ' {
return 'b'
}
return r
}, sample)
fmt.Println(result)
Which outputs (try it on the Go Playground):
abんにちは
If you want the replacements defined by a converter map:
var converter = map[rune]rune{
'⌘': 'a',
'こ': 'b',
}
sample := "⌘こんにちは"
result := strings.Map(func(r rune) rune {
if c, ok := converter[r]; ok {
return c
}
return r
}, sample)
fmt.Println(result)
This outputs the same. Try this one on the Go Playground.
Convert string to rune array:
runeArray := []rune("пример")

Splitting a rune correctly in golang

I'm wondering if there is an easy way, such as well known functions to handle code points/runes, to take a chunk out of the middle of a rune slice without messing it up or if it's all needs to coded ourselves to get down to something equal to or less than a maximum number of bytes.
Specifically, what I am looking to do is pass a string to a function, convert it to runes so that I can respect code points and if the slice is longer than some maximum bytes, remove enough runes from the center of the runes to get the bytes down to what's necessary.
This is simple math if the strings are just single byte characters and be handled something like:
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
excess := len(in) - maxLen
start := maxLen/2 - excess/2
return in[:start] + in[start+excess:]
}
return in
}
but in a variable character width byte string it's either going to be a fair bit more coding looping through or there will be nice functions to make this easy. Does anyone have a code sample of how to best handle such a thing with runes?
The idea here is that the DB field the string will go into has a fixed maximum length in bytes, not code points so there needs to be some algorithm from runes to maximum bytes. The reason for taking the characters from the the middle of the string is just the needs of this particular program.
Thanks!
EDIT:
Once I found out that the range operator respected runes on strings this became easy to do with just strings which I found because of the great answers below. I shouldn't have to worry about the string being a well formed UTF format in this case but if I do I now know about the UTF module, thanks!
Here's what I ended up with:
package main
import (
"fmt"
)
func ShortenStringIDToMaxLength(in string, maxLen int) string {
if maxLen < 1 {
// Panic/log whatever is your error system of choice.
}
bytes := len(in)
if bytes > maxLen {
excess := bytes - maxLen
lPos := bytes/2 - excess/2
lastPos := 0
for pos, _ := range in {
if pos > lPos {
lPos = lastPos
break
}
lastPos = pos
}
rPos := lPos + excess
for pos, _ := range in[lPos:] {
if pos >= excess {
rPos = pos
break
}
}
return in[:lPos] + in[lPos+rPos:]
}
return in
}
func main() {
out := ShortenStringIDToMaxLength(`123456789 123456789`, 5)
fmt.Println(out, len(out))
}
https://play.golang.org/p/YLGlj_17A-j
Here is an adaptation of your algorithm, which removes incomplete runes from the beginning of your prefix and the end of your suffix :
func TrimLastIncompleteRune(s string) string {
l := len(s)
for i := 1; i <= l; i++ {
suff := s[l-i : l]
// repeatedly try to decode a rune from the last bytes in string
r, cnt := utf8.DecodeRuneInString(suff)
if r == utf8.RuneError {
continue
}
// if success : return the substring which contains
// this succesfully decoded rune
lgth := l - i + cnt
return s[:lgth]
}
return ""
}
func TrimFirstIncompleteRune(s string) string {
// repeatedly try to decode a rune from the beginning
for i := 0; i < len(s); i++ {
if r, _ := utf8.DecodeRuneInString(s[i:]); r != utf8.RuneError {
// if success : return
return s[i:]
}
}
return ""
}
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
firstHalf := maxLen / 2
secondHalf := len(in) - (maxLen - firstHalf)
prefix := TrimLastIncompleteRune(in[:firstHalf])
suffix := TrimFirstIncompleteRune(in[secondHalf:])
return prefix + suffix
}
return in
}
link on play.golang.org
This algorithm only tries to drop more bytes from the selected prefix and suffix.
If it turns out that you need to drop 3 bytes from the suffix to have a valid rune, for example, it does not try to see if it can add 3 more bytes to the prefix, to have an end result closer to maxLen bytes.
You can use simple arithmetic to find start and end such that the string s[:start] + s[end:] is shorter than your byte limit. But you need to make sure that start and end are both the first byte of any utf-8 sequence to keep the sequence valid.
UTF-8 has the property that any given byte is the first byte of a sequence as long as its top two bits aren't 10.
So you can write code something like this (playground: https://play.golang.org/p/xk_Yo_1wTYc)
package main
import (
"fmt"
)
func truncString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
start := (maxLen + 1) / 2
for start > 0 && s[start]>>6 == 0b10 {
start--
}
end := len(s) - (maxLen - start)
for end < len(s) && s[end]>>6 == 0b10 {
end++
}
return s[:start] + s[end:]
}
func main() {
fmt.Println(truncString("this is a test", 5))
fmt.Println(truncString("日本語", 7))
}
This code has the desirable property that it takes O(maxLen) time, no matter how long the input string (assuming it's valid utf-8).

Unable to decrypt the xor-base64 text

I am using the below code to encrypt and decrypt the data. Now I want to encrypt the data from Node JS and want to decrypt the data from Go lang. But I am not able to achieve it using GO lang.
var B64XorCipher = {
encode: function(key, data) {
return new Buffer(xorStrings(key, data),'utf8').toString('base64');
},
decode: function(key, data) {
data = new Buffer(data,'base64').toString('utf8');
return xorStrings(key, data);
}
};
function xorStrings(key,input){
var output='';
for(var i=0;i<input.length;i++){
var c = input.charCodeAt(i);
var k = key.charCodeAt(i%key.length);
output += String.fromCharCode(c ^ k);
}
return output;
}
From go I am trying to decode like below I am not able to achieve it.
bytes, err := base64.StdEncoding.DecodeString(actualInput)
encryptedText := string(bytes)
fmt.Println(EncryptDecrypt(encryptedText, "XXXXXX"))
func EncryptDecrypt(input, key string) (output string) {
for i := range input {
output += string(input[i] ^ key[i%len(key)])
}
return output
}
Can someone help me to resolve it.
You should use DecodeRuneInString instead of just slice string to byte.
Solution in playground: https://play.golang.org/p/qi_6S1J_dZU
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
fmt.Println("Hello, playground")
k:="1234fd23434"
input:="The 我characterode我 113 is equal to q"
fmt.Println(EncryptDecrypt(input,k))
// expect: "eZV扷ZRFRWEWA[戣[#GRX#^B"
}
func EncryptDecrypt(input, key string) (output string) {
keylen := len(key)
count := len(input)
i := 0
j := 0
for i < count {
c, n := utf8.DecodeRuneInString(input[i:])
i += n
k, m := utf8.DecodeRuneInString(key[j:])
j += m
if j >= keylen {
j = 0
}
output += string(c ^ k)
}
return output
}
compared to your js result
function xorStrings(key,input){
var output='';
for(var i=0;i<input.length;i++){
var c = input.charCodeAt(i);
var k = key.charCodeAt(i%key.length);
output += String.fromCharCode(c ^ k);
}
return output;
}
console.log(xorStrings('1234fd23434',"The 我characterode我 113 is equal to q"))
// expect: "eZV扷ZRFRWEWA[戣[#GRX#^B"
The test result is the same.
Here is why.
In go, when you range a string, you iterate bytes, but javascript charCodeAt is for character,not byte. In utf-8, the character is maybe 2 or 3 bytes long. So that is why you got different output.
Test in playground https://play.golang.org/p/XawI9aR_HDh
package main
import (
"fmt"
"unicode/utf8"
)
var sentence = "The 我quick brown fox jumps over the lazy dog."
var index = 4
func main() {
fmt.Println("slice of string...")
fmt.Printf("The byte at %d is |%s|, |%s| is 3 bytes long.\n",index,sentence[index:index+1],sentence[index:index+3])
fmt.Println("runes of string...")
ru, _ := utf8.DecodeRuneInString(sentence[index:])
i := int(ru)
fmt.Printf("The character code at %d is|%s|%d| \n",index, string(ru), i)
}
The output is
slice of string...
The byte at 4 is |�|, |我| is 3 bytes long.
runes of string...
The character code at 4 is|我|25105|
The charCodeAt() method returns an integer between 0 and 65535
representing the UTF-16 code unit at the given index.
var c = input.charCodeAt(i);
For statements with range clause
For a string value, the "range" clause iterates over the Unicode code
points in the string starting at byte index 0. On successive
iterations, the index value will be the index of the first byte of
successive UTF-8-encoded code points in the string, and the second
value, of type rune, will be the value of the corresponding code
point. If the iteration encounters an invalid UTF-8 sequence, the
second value will be 0xFFFD, the Unicode replacement character, and
the next iteration will advance a single byte in the string.
for i := range input
UTF-16 versus UTF-8?

Convert string to binary in Go

How do you convert a string to its binary representation in Go?
Example:
Input: "A"
Output: "01000001"
In my testing, fmt.Sprintf("%b", 75) only works on integers.
Cast the 1-character string to a byte in order to get its numerical representation.
s := "A"
st := fmt.Sprintf("%08b", byte(s[0]))
fmt.Println(st)
Output: "01000001"
(Bear in mind code "%b" (without number in between) causes leading zeros in output to be dropped.)
You have to iterate over the runes of the string:
func toBinaryRunes(s string) string {
var buffer bytes.Buffer
for _, runeValue := range s {
fmt.Fprintf(&buffer, "%b", runeValue)
}
return fmt.Sprintf("%s", buffer.Bytes())
}
Or over the bytes:
func toBinaryBytes(s string) string {
var buffer bytes.Buffer
for i := 0; i < len(s); i++ {
fmt.Fprintf(&buffer, "%b", s[i])
}
return fmt.Sprintf("%s", buffer.Bytes())
}
Live playground:
http://play.golang.org/p/MXZ1Y17xWa

Resources