Golang convert integer to unicode character - string

Given the following input:
intVal := 2612
strVal := "2612"
What is a mechanism for mapping to the associated unicode value as a string.
For example, the following code prints "☒"
fmt.Println("\u2612")
But the following does not work:
fmt.Println("\\u" + strVal)
I researched runes, strconv, and unicode/utf8 but was unable to find a suitable conversion strategy.

2612 is not the integer value of the unicode rune, the integer value of \u2612 is 9746. The string "2612" is the hex value of the rune, so parse it as a hex number and convert it to a rune.
i, err := strconv.ParseInt(strVal, 16, 32)
if err != nil {
log.Fatal(err)
}
r := rune(i)
fmt.Println(string(r))
https://play.golang.org/p/t_e6AfbKQq

This one works:
fmt.Println("\u2612")
Because an interpreted string literal is specified in the source code, and the compiler will unquote (interpret) it. It is not the fmt package that processes this unquoting.
This doesn't work:
fmt.Println("\\u" + strVal)
Because again an interpreted string literal is used which will be resolved to a string value \u, and then it will be concatenated with the value of the local variable strVal which is 2612, so the final string value will be \u2612. But this is not an interpreted string literal, this is the "final" result. This won't be processed / unquoted further.
Alternatively to JimB's answer, you may also use strconv.Unquote() which does an unquoting similar to what the compiler does.
See this example:
// The original that works:
s := "\u2612"
fmt.Println(s, []byte(s))
// Using strconv.Unquote():
strVal := "2612"
s2, err := strconv.Unquote(`"\u` + strVal + `"`)
fmt.Println(s2, []byte(s2), err)
fmt.Println(s == s2)
Output (try it on the Go Playground):
☒ [226 152 146]
☒ [226 152 146] <nil>
true
Something to note here: We want to unquote the \u2612 text by strconv.Unquote(), but Unquote() requires that the string to be unquoted to be in quotes ("Unquote interprets s as a single-quoted, double-quoted, or backquoted Go string literal..."), that's why we pre- and postpended it with a quotation mark.

Related

Why does golang bytes.Buffer behave in such way?

I recently faced a problem, where I'm writing to a byte.Buffer using a writer. But when I do String() on that byte.Buffer I'm getting an unexpected output (extra pair of double quotes added). Can you please help me understand it?
Here is a code snippet of my problem! I just need help understanding why each word is surrounded by a double quote.
func main() {
var csvBuffer bytes.Buffer
wr := csv.NewWriter(&csvBuffer)
data := []string{`{"agent":"python-requests/2.19.1","api":"/packing-slip/7123"}`}
err := wr.Write(data)
if err != nil {
fmt.Println("WARNING: unable to write ", err)
}
wr.Flush()
fmt.Println(csvBuffer.String())
}
Output:
{""agent"":""python-requests/2.19.1"",""api"":""/packing-slip/7123""}
In CSV double quotes (") are escaped as 2 double quotes. That's what you see.
You encode a single string value which contains double quotes, so all those are replaced with 2 double quotes.
When decoded, the result will contain 1 double quotes of course:
r := csv.NewReader(&csvBuffer)
rec, err := r.Read()
fmt.Println(rec, err)
Outputs (try it on the Go Playground):
[{"agent":"python-requests/2.19.1","api":"/packing-slip/7e0a05b3"}] <nil>
Quoting from package doc of encoding/csv:
Within a quoted-field a quote character followed by a second quote character is considered a single quote.
"the ""word"" is true","a ""quoted-field"""
results in
{`the "word" is true`, `a "quoted-field"`}
In CSV, the following are equivalent:
one,two
and
"one","two"
Now if the values would contain double quotes, that would indicate the end of the value. CSV handles this by substituting double quotes with 2 of them. The value one"1 is encoded as one""1 in CSV, e.g.:
"one""1","two""2"

Replace a character in a string in golang

I am trying to replace a specific position character from an array of strings. Here is what my code looks like:
package main
import (
"fmt"
)
func main() {
str := []string{"test","testing"}
str[0][2] = 'y'
fmt.Println(str)
}
Now, running this gives me the error:
cannot assign to str[0][2]
Any idea how to do this? I have tried using strings.Replace, but AFAIK it will replace all the occurrence of the given character, while I want to replace that specific character. Any help is appreciated. TIA.
Strings in Go are immutable, you can't change their content. To change the value of a string variable, you have to assign a new string value.
An easy way is to first convert the string to a byte or rune slice, do the change and convert back:
s := []byte(str[0])
s[2] = 'y'
str[0] = string(s)
fmt.Println(str)
This will output (try it on the Go Playground):
[teyt testing]
Note: I converted the string to byte slice, because this is what happens when you index a string: it indexes its bytes. A string stores the UTF-8 byte sequence of the text, which may not necessarily map bytes to characters one-to-one.
If you need to replace the 2nd character, use []rune instead:
s := []rune(str[0])
s[2] = 'y'
str[0] = string(s)
fmt.Println(str)
In this example it doesn't matter though, but in general it may.
Also note that strings.Replace() does not (necessarily) replace all occurrences:
func Replace(s, old, new string, n int) string
The parameter n tells how many replacement are to be performed max. So the following also works (try it on the Go Playground):
str[0] = strings.Replace(str[0], "s", "y", 1)
Yet another solution could be to slice the string up until the replacable character, and starting from the character after the replacable one, and just concatenate them (try this one on the Go Playground):
str[0] = str[0][:2] + "y" + str[0][3:]
Care must be taken here too: the slice indices are byte indices, not character (rune) indices.
See related question: Immutable string and pointer address
Here's a function that will do that for you. It takes care of converting the string that you want to modify into a []rune, and then back out to string.
If your intention is to replace bytes rather than runes, you can:
copy this function's code, rename it from runeSub to byteSub
change the r rune parameter to b byte
Also available on repl.it
package main
import "fmt"
// runeSub - given an array of strings (ss), replace the
// (ri)th rune (character) in the (si)th string
// of (ss), with the rune (r)
//
// ss - the array of strings
// si - the index of the string in ss that you want to modify
// ri - the index of the rune in ss[si] that you want to replace
// r - the rune you want to insert
//
// NOTE: this function has no panic protection from things like
// out-of-bound index values
func runeSub(ss []string, si, ri int, r rune) {
rr := []rune(ss[si])
rr[ri] = r
ss[si] = string(rr)
}
func main() {
ss := []string{"test","testing"}
runeSub(ss, 0, 2, 'y')
fmt.Println(ss)
}

Go lang's equivalent of charCode() method of JavaScript

The charCodeAt() method in JavaScript returns the numeric Unicode value of the character at the given index, e.g.
"s".charCodeAt(0) // returns 115
How would I go by to get the numeric unicode value of the the same string/letter in Go?
The character type in Go is rune which is an alias for int32 so it is already a number, just print it.
You still need a way to get the character at the specified position. Simplest way is to convert the string to a []rune which you can index. To convert a string to runes, simply use the type conversion []rune("some string"):
fmt.Println([]rune("s")[0])
Prints:
115
If you want it printed as a character, use the %c format string:
fmt.Println([]rune("absdef")[2]) // Also prints 115
fmt.Printf("%c", []rune("absdef")[2]) // Prints s
Also note that the for range on a string iterates over the runes of the string, so you can also use that. It is more efficient than converting the whole string to []rune:
i := 0
for _, r := range "absdef" {
if i == 2 {
fmt.Println(r)
break
}
i++
}
Note that the counter i must be a distinct counter, it cannot be the loop iteration variable, as the for range returns the byte position and not the rune index (which will be different if the string contains multi-byte characters in the UTF-8 representation).
Wrapping it into a function:
func charCodeAt(s string, n int) rune {
i := 0
for _, r := range s {
if i == n {
return r
}
i++
}
return 0
}
Try these on the Go Playground.
Also note that strings in Go are stored in memory as a []byte which is the UTF-8 encoded byte sequence of the text (read the blog post Strings, bytes, runes and characters in Go for more info). If you have guarantees that the string uses characters whose code is less than 127, you can simply work with bytes. That is indexing a string in Go indexes its bytes, so for example "s"[0] is the byte value of 's' which is 115.
fmt.Println("s"[0]) // Prints 115
fmt.Println("absdef"[2]) // Prints 115
Internally string is a 8 bit byte array in golang. So every byte will represent the ascii value.
str:="abc"
byteValue := str[0]
intValue := int(byteValue)
fmt.Println(byteValue)//97
fmt.Println(intValue)//97

How to get a single Unicode character from string

I wonder how I can I get a Unicode character from a string. For example, if the string is "你好", how can I get the first character "你"?
From another place I get one way:
var str = "你好"
runes := []rune(str)
fmt.Println(string(runes[0]))
It does work.
But I still have some questions:
Is there another way to do it?
Why in Go does str[0] not get a Unicode character from a string, but it gets byte data?
First, you may want to read https://blog.golang.org/strings
It will answer part of your questions.
A string in Go can contains arbitrary bytes. When you write str[i], the result is a byte, and the index is always a number of bytes.
Most of the time, strings are encoded in UTF-8 though. You have multiple ways to deal with UTF-8 encoding in a string.
For instance, you can use the for...range statement to iterate on a string rune by rune.
var first rune
for _,c := range str {
first = c
break
}
// first now contains the first rune of the string
You can also leverage the unicode/utf8 package. For instance:
r, size := utf8.DecodeRuneInString(str)
// r contains the first rune of the string
// size is the size of the rune in bytes
If the string is encoded in UTF-8, there is no direct way to access the nth rune of the string, because the size of the runes (in bytes) is not constant. If you need this feature, you can easily write your own helper function to do it (with for...range, or with the unicode/utf8 package).
You can use the utf8string package:
package main
import "golang.org/x/exp/utf8string"
func main() {
s := utf8string.NewString("ÄÅàâäåçèéêëìîïü")
// example 1
r := s.At(1)
println(r == 'Å')
// example 2
t := s.Slice(1, 3)
println(t == "Åà")
}
https://pkg.go.dev/golang.org/x/exp/utf8string
you can do this:
func main() {
str := "cat"
var s rune
for i, c := range str {
if i == 2 {
s = c
}
}
}
s is now equal to a

How to check if there's only numbers in string

how to check if there is only numbers in the string?
I want to skip some code with goto if there's only numbers in the string.
Thanks
try
i := StrToInt( str );
except
{ str is NOT an integer }
end;
A simple google: Pascal Help
StrToInt
Convert a string to an integer value.
Declaration
Source position: sysstrh.inc line 113
function StrToInt( const s: string ):Integer; Description
StrToInt will convert the string Sto an integer. If the string
contains invalid characters or has an invalid format, then an
EConvertError is raised.
To be successfully converted, a string can contain a combination of
numerical characters, possibly preceded by a minus sign (-). Spaces
are not allowed.
The string S can contain a number in decimal, hexadecimal, binary or
octal format, as described in the language reference. For enumerated
values, the string must be the name of the enumerated value. The name
is searched case insensitively.
For hexadecimal values, the prefix '0x' or 'x' (case insensitive) may
be used as

Resources