Here is my code snippet:
var converter = map[rune]rune {//some data}
sample := "⌘こんにちは"
var tmp string
for _, runeValue := range sample {
fmt.Printf("%+q", runeValue)
tmp = fmt.Sprintf("%+q", runeValue)
}
The output of fmt.Printf("%+q", runeValue) is:
'\u2318'
'\u3053'
'\u3093'
'\u306b'
'\u3061'
'\u306f'
These value are literally rune but as the return type of Sprintf is string, I cannot use it in my map which is [rune]rune.
I was wondering how can I convert string to rune, or in other words how can I handle this problem?
A string is not a single rune, it may contain multiple runes. You may use a simple type conversion to convert a string to a []runes containing all its runes like []rune(sample).
The for range iterates over the runes of a string, so in your example runeValue is of type rune, you may use it in your converter map, e.g.:
var converter = map[rune]rune{}
sample := "⌘こんにちは"
for _, runeValue := range sample {
converter[runeValue] = runeValue
}
fmt.Println(converter)
But since rune is an alias for int32, printing the above converter map will print integer numbers, output will be:
map[8984:8984 12371:12371 12385:12385 12395:12395 12399:12399 12435:12435]
If you want to print characters, use the %c verb of fmt.Printf():
fmt.Printf("%c\n", converter)
Which will output:
map[⌘:⌘ こ:こ ち:ち に:に は:は ん:ん]
Try the examples on the Go Playground.
If you want to replace (switch) certain runes in a string, use the strings.Map() function, for example:
sample := "⌘こんにちは"
result := strings.Map(func(r rune) rune {
if r == '⌘' {
return 'a'
}
if r == 'こ' {
return 'b'
}
return r
}, sample)
fmt.Println(result)
Which outputs (try it on the Go Playground):
abんにちは
If you want the replacements defined by a converter map:
var converter = map[rune]rune{
'⌘': 'a',
'こ': 'b',
}
sample := "⌘こんにちは"
result := strings.Map(func(r rune) rune {
if c, ok := converter[r]; ok {
return c
}
return r
}, sample)
fmt.Println(result)
This outputs the same. Try this one on the Go Playground.
Convert string to rune array:
runeArray := []rune("пример")
Related
I want to replace string except first and last alphabet.
For example:
handsome -> h******e
한국어 -> 한*어
This is my code:
var final = string([]rune(username)[:1]
for i :=0l i <len([]rune(username)); i++{
if i >1 {
final = final + "*"
}
}
If you convert the string to []rune, you can modify that slice and convert it back to string in the end:
func blur(s string) string {
rs := []rune(s)
for i := 1; i < len(rs)-1; i++ {
rs[i] = '*'
}
return string(rs)
}
Testing it:
fmt.Println(blur("handsome"))
fmt.Println(blur("한국어"))
Output (try it on the Go Playground):
h******e
한*어
Note that this blur() function works with strings that have less than 3 characters too, in which case nothing will be blurred.
I just want to find 3-byte character in Go using regexp.
But it panic with
regexp: Compile(\x{E29AA4}): error parsing regexp: invalid escape
sequence: \x{E29AA4
func get_words_from(text string) []string {
words := regexp.MustCompile(`\x{E29AA4}`)
return words.FindAllString(text, -1)
}
func main() {
text := "One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./\tавt𒀅hr𓀋ee!"
fmt.Println(get_words_from(text))
}
You can try on playground
Decode the UTF-8 byte sequence E2 9A A4 with e.g. utf8.DecodeRune() and use the resulting rune in the regexp:
func get_words_from(text string) []string {
r, _ := utf8.DecodeRune([]byte{0xE2, 0x9A, 0xA4})
words := regexp.MustCompile(string(r))
return words.FindAllString(text, -1)
}
You may also simply convert the byte slice to string (which interprets it as UTF-8 encoded bytes):
func get_words_from2(text string) []string {
s := string([]byte{0xE2, 0x9A, 0xA4})
words := regexp.MustCompile(s)
return words.FindAllString(text, -1)
}
Or use the equivalent unicode code point (which is 0x26a4) in the regexp string:
func get_words_from3(text string) []string {
words := regexp.MustCompile("\u26a4")
return words.FindAllString(text, -1)
}
Note that "\u26a4" is an interpreted string literal and will be unescaped by the Go compiler (not the regexp package).
All examples return (try the examples on the Go Playground):
[⚤ ⚤]
To filter out all runes that have 3 or more bytes in UTF-8, you may use a for range and utf8.RuneLen():
text := "One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./\tавt𒀅hr𓀋ee!"
fmt.Println(text)
var out []rune
for _, r := range text {
if utf8.RuneLen(r) < 3 {
out = append(out, r)
}
}
fmt.Println(string(out))
This outputs (try it on the Go Playground):
One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./ авt𒀅hr𓀋ee!
One,ВАПОЛтлдоtwo ыаплды ыапю.ы./ авthree!
Or use strings.Map(), where you return -1 for such runes, which then will be left out in the result:
out := strings.Map(func(r rune) rune {
if utf8.RuneLen(r) < 3 {
return r
}
return -1
}, text)
fmt.Println(string(out))
This outputs the same. Try this one on the Go Playground.
Also I found that character ⚤ in regex can match by \xE2\x9A\xA4 instead of wrong: \x{E29AA4}
I have this a problem with character conversion. It all starts with this string: U+1F618. According to fileformat.info, this string is now (almost) in the HTML Entity (hex) notation.
But I need this character to be converted into a C/C++/Java source code-notation. I really don't know if this is the official name for the notation, but I assume this site to be correct :).
So basically my question is, instead of outputting to the real emoji, how can I get the value \uD83D\uDE18?
package main
import (
"fmt"
"html"
"strconv"
"strings"
)
func main() {
original := "\\U0001f618"
// Hex String
h := strings.ReplaceAll(original, "\\U", "0x")
// Hex to Int
i, _ := strconv.ParseInt(h, 0, 64)
// Unescape the string (HTML Entity -> String).
str := html.UnescapeString(string(i))
// Display the emoji.
fmt.Println(str)
// but I want something like this: \uD83D\uDE18
}
If you have the input as a string, e.g.
s := "\\U0001f618"
You may use strconv.Unquote() to unquote it. Be sure the string you pass to it is quoted (it must be wrapped with backticks or double quotes):
s2, err := strconv.Unquote(`"` + s + `"`)
fmt.Println(s2, err)
This will give you an s2 string that contains your emoji:
😘 <nil>
Java's string model is a char[] which contains the UTF-16 code points. Go's memory model of string is the UTF-8 encoded byte sequence.
To convert a Go string to UTF-16, you may use the unicode/utf16 package of the standard lib. For example utf16.Encode() encodes a series of runes (unicode codepoints) to UTF-16. You get a series of runes from a Go string with a simple type conversion: []rune("some string").
u16 := utf16.Encode([]rune(s2))
fmt.Printf("%X\n", u16)
The above prints the UTF16 codepoints in hexadecimal format:
[D83D DE18]
To get the format you want, use this loop:
buf := &strings.Builder{}
for _, v := range u16 {
fmt.Fprintf(buf, "\\u%X", v)
}
fmt.Println(buf.String())
Which outputs:
\uD83D\uDE18
Try the examples on the Go Playground.
You can capture this series of conversions in a function:
func convert(s string) (string, error) {
s2, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
return "", err
}
buf := &strings.Builder{}
for _, v := range utf16.Encode([]rune(s2)) {
fmt.Fprintf(buf, "\\u%X", v)
}
return buf.String(), nil
}
Using it:
fmt.Println(convert("\\U0001f618"))
Which outputs (try it on the Go Playground):
\uD83D\uDE18 <nil>
I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}
That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"
I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short
This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)
Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point
Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '', '🦰') // U+1F468 U+200D U+1F9B0
go playground
How do you convert a string to its binary representation in Go?
Example:
Input: "A"
Output: "01000001"
In my testing, fmt.Sprintf("%b", 75) only works on integers.
Cast the 1-character string to a byte in order to get its numerical representation.
s := "A"
st := fmt.Sprintf("%08b", byte(s[0]))
fmt.Println(st)
Output: "01000001"
(Bear in mind code "%b" (without number in between) causes leading zeros in output to be dropped.)
You have to iterate over the runes of the string:
func toBinaryRunes(s string) string {
var buffer bytes.Buffer
for _, runeValue := range s {
fmt.Fprintf(&buffer, "%b", runeValue)
}
return fmt.Sprintf("%s", buffer.Bytes())
}
Or over the bytes:
func toBinaryBytes(s string) string {
var buffer bytes.Buffer
for i := 0; i < len(s); i++ {
fmt.Fprintf(&buffer, "%b", s[i])
}
return fmt.Sprintf("%s", buffer.Bytes())
}
Live playground:
http://play.golang.org/p/MXZ1Y17xWa