How do I convert a string of escape sequences to bytes? - string

Creating a TCP server that needs to process some data. I have a net.Conn instance "connection"from which I will read said data. The lower part of the snippet brings about an error noting that it cannot use the 'esc' value as a byte value.
const(
esc = "\a\n"
)
....
c := bufio.NewReader(connection)
data, err := c.ReadBytes(esc)
Clearly, some conversion is needed but when I try
const(
esc = "\a\n"
)
....
c := bufio.NewReader(connection)
data, err := c.ReadBytes(byte(esc))
The compiler makes note that I cannot convert esc to byte. Is it due to the fact that I declared "\a\n" as a const value on the package level? Or is there something else overall associated with how I'm framing the bytes to be read?

You can't convert esc to byte because you can't convert strings into single bytes. You can convert a string into a byte slice ([]byte).
The bufio.Reader only supports single byte delimiters, you should use a bufio.Scanner with a custom split function instead for multi-byte delimiters.
Perhaps a modified version of https://stackoverflow.com/a/37531472/1205448

Related

How to detect when bytes can't be converted to string in Go?

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?
You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.
But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, pass to WriteString methods, or even round-trip back to a []byte (to Write, say).
There are two places in the language that Go does do UTF-8 decoding of strings for you.
when you do for i, r := range s the r is a Unicode code point as a value of type rune
when you do the conversion []rune(s), Go decodes the whole string to runes.
(Note that rune is an alias for int32, not a completely different type.)
In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you can't accept the U+FFFD replacement and need to throw an error on mis-encoded input.
Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.RuneError and returned by functions in utf8.
Here's a sample program showing what Go does with a []byte holding invalid UTF-8:
package main
import "fmt"
func main() {
a := []byte{0xff}
s := string(a)
fmt.Println(s)
for _, r := range s {
fmt.Println(r)
}
rs := []rune(s)
fmt.Println(rs)
}
Output will look different in different environments, but in the Playground it looks like
�
65533
[65533]

What is the difference between the string and []byte in Go?

s := "some string"
b := []byte(s) // convert string -> []byte
s2 := string(b) // convert []byte -> string
what is the difference between the string and []byte in Go?
When to use "he" or "she"?
Why?
bb := []byte{'h','e','l','l','o',127}
ss := string(bb)
fmt.Println(ss)
hello
The output is just "hello", and lack of 127, sometimes I feel that it's weird.
string and []byte are different types, but they can be converted to one another:
3 . Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.
4 . Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
Blog: Arrays, slices (and strings): The mechanics of 'append':
Strings are actually very simple: they are just read-only slices of bytes with a bit of extra syntactic support from the language.
Also read: Strings, bytes, runes and characters in Go
When to use one over the other?
Depends on what you need. Strings are immutable, so they can be shared and you have guarantee they won't get modified.
Byte slices can be modified (meaning the content of the backing array).
Also if you need to frequently convert a string to a []byte (e.g. because you need to write it into an io.Writer()), you should consider storing it as a []byte in the first place.
Also note that you can have string constants but there are no slice constants. This may be a small optimization. Also note that:
The expression len(s) is constant if s is a string constant.
Also if you are using code already written (either standard library, 3rd party packages or your own), in most of the cases it is given what parameters and values you have to pass or are returned. E.g. if you read data from an io.Reader, you need to have a []byte which you have to pass to receive the read bytes, you can't use a string for that.
This example:
bb := []byte{'h','e','l','l','o',127}
What happens here is that you used a composite literal (slice literal) to create and initialize a new slice of type []byte (using Short variable declaration). You specified constants to list the initial elements of the slice. You also used a byte value 127 which - depending on the platform / console - may or may not have a visual representation.
Late but i hope this could help.
In simple words
Bit: 0 and 1 is how machines represents all the information
Byte: 8 bits that represents UTF-8 encodings i.e. characters
[ ]type: slice of a given data type. Slices are dynamic size arrays.
[ ]byte: this is a byte slice i.e. a dynamic size array that contains bytes i.e. each element is a UTF-8 character.
String: read-only slices of bytes i.e. immutable
With all this in mind:
s := "Go"
bs := []byte(s)
fmt.Printf("%s", bs) // Output: Go
fmt.Printf("%d", bs) // Output: [71 111]
or
bs := []byte{71, 111}
fmt.Printf("%s", bs) // Output: Go
%s converts byte slice to string
%d gets UTF-8 decimal value of bytes
IMPORTANT:
As strings are immutable, they cannot be changed within memory, each time you add or remove something from a string, GO creates a new string in memory. On the other hand, byte slices are mutable so when you update a byte slice you are not recreating new stuffs in memory.
So choosing the right structure could make a difference in your app performance.

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.
Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

How to convert strings to array of byte and back

4I must write strings to a binary MIDI file. The standard requires one to know the length of the string in bytes. As I want to write for mobile as well I cannot use AnsiString, which was a good way to ensure that the string was a one-byte string. That simplified things. I tested the following code:
TByte = array of Byte;
function TForm3.convertSB (arg: string): TByte;
var
i: Int32;
begin
Label1.Text := (SizeOf (Char));
for i := Low (arg) to High (arg) do
begin
label1.Text := label1.Text + ' ' + IntToStr (Ord (arg [i]));
end;
end; // convert SB //
convertSB ('MThd');
It returns 2 77 84 104 100 (as label text) in Windows as well as Android. Does this mean that Delphi treats strings by default as UTF-8? This would greatly simplify things but I couldn't find it in the help. And what is the best way to convert this to an array of bytes? Read each character and test whether it is 1, 2 or 4 bytes and allocate this space in the array? For converting back to a character: just read the array of bytes until a byte is encountered < 128?
Delphi strings are encoded internally as UTF-16. There was a big clue in the fact that SizeOf(Char) is 2.
The reason that all your characters had ordinal in the ASCII range is that UTF-16 extends ASCII in the sense that characters 0 to 127, in the ASCII range, have the same ordinal value in UTF-16. And all your characters are ASCII characters.
That said, you do not need to worry about the internal storage. You simply convert between string and byte array using the TEncoding class. For instance, to convert to UTF-8 you write:
bytes := TEncoding.UTF8.GetBytes(str);
And in the opposite direction:
str := TEncoding.UTF8.GetString(bytes);
The class supports many other encodings, as described in the documentation. It's not clear from the question which encoding you are need to use. Hopefully you can work the rest out from here.

Go - Comparing strings/byte slices input by the user

I am getting input from the user, however when I try to compare it later on to a string literal it does not work. That is just a test though.
I would like to set it up so that when a blank line is entered (just hitting the enter/return key) the program exits. I don't understand why the strings are not comparing because when I print it, it comes out identical.
in := bufio.NewReader(os.Stdin);
input, err := in.ReadBytes('\n');
if err != nil {
fmt.Println("Error: ", err)
}
if string(input) == "example" {
os.Exit(0)
}
string vs []byte
string definition:
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
byte definition:
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
What does it mean?
[]byte is a byte slice. slice can be empty.
string elements are unicode characters, which can have more then 1 byte.
string elements keep a meaning of data (encoding), []bytes not.
equality operator is defined for string type but not for slice type.
As you see they are two different types with different properties.
There is a great blog post explaining different string related types [1]
Regards the issue you have in your code snippet.
Bear in mind that in.ReadBytes(char) returns a byte slice with char inclusively. So in your code input ends with '\n'. If you want your code to work in desired way then try this:
if string(input) == "example\n" { // or "example\r\n" when on windows
os.Exit(0)
}
Also make sure that your terminal code page is the same as your .go source file. Be aware about different end-line styles (Windows uses "\r\n"), Standard go compiler uses utf8 internally.
[1] Comparison of Go data types for string processing.

Resources