Go - Comparing strings/byte slices input by the user - string

I am getting input from the user, however when I try to compare it later on to a string literal it does not work. That is just a test though.
I would like to set it up so that when a blank line is entered (just hitting the enter/return key) the program exits. I don't understand why the strings are not comparing because when I print it, it comes out identical.
in := bufio.NewReader(os.Stdin);
input, err := in.ReadBytes('\n');
if err != nil {
fmt.Println("Error: ", err)
}
if string(input) == "example" {
os.Exit(0)
}

string vs []byte
string definition:
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
byte definition:
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
What does it mean?
[]byte is a byte slice. slice can be empty.
string elements are unicode characters, which can have more then 1 byte.
string elements keep a meaning of data (encoding), []bytes not.
equality operator is defined for string type but not for slice type.
As you see they are two different types with different properties.
There is a great blog post explaining different string related types [1]
Regards the issue you have in your code snippet.
Bear in mind that in.ReadBytes(char) returns a byte slice with char inclusively. So in your code input ends with '\n'. If you want your code to work in desired way then try this:
if string(input) == "example\n" { // or "example\r\n" when on windows
os.Exit(0)
}
Also make sure that your terminal code page is the same as your .go source file. Be aware about different end-line styles (Windows uses "\r\n"), Standard go compiler uses utf8 internally.
[1] Comparison of Go data types for string processing.

Related

How to detect when bytes can't be converted to string in Go?

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?
You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.
But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, pass to WriteString methods, or even round-trip back to a []byte (to Write, say).
There are two places in the language that Go does do UTF-8 decoding of strings for you.
when you do for i, r := range s the r is a Unicode code point as a value of type rune
when you do the conversion []rune(s), Go decodes the whole string to runes.
(Note that rune is an alias for int32, not a completely different type.)
In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you can't accept the U+FFFD replacement and need to throw an error on mis-encoded input.
Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.RuneError and returned by functions in utf8.
Here's a sample program showing what Go does with a []byte holding invalid UTF-8:
package main
import "fmt"
func main() {
a := []byte{0xff}
s := string(a)
fmt.Println(s)
for _, r := range s {
fmt.Println(r)
}
rs := []rune(s)
fmt.Println(rs)
}
Output will look different in different environments, but in the Playground it looks like
�
65533
[65533]

What is the difference between the string and []byte in Go?

s := "some string"
b := []byte(s) // convert string -> []byte
s2 := string(b) // convert []byte -> string
what is the difference between the string and []byte in Go?
When to use "he" or "she"?
Why?
bb := []byte{'h','e','l','l','o',127}
ss := string(bb)
fmt.Println(ss)
hello
The output is just "hello", and lack of 127, sometimes I feel that it's weird.
string and []byte are different types, but they can be converted to one another:
3 . Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.
4 . Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
Blog: Arrays, slices (and strings): The mechanics of 'append':
Strings are actually very simple: they are just read-only slices of bytes with a bit of extra syntactic support from the language.
Also read: Strings, bytes, runes and characters in Go
When to use one over the other?
Depends on what you need. Strings are immutable, so they can be shared and you have guarantee they won't get modified.
Byte slices can be modified (meaning the content of the backing array).
Also if you need to frequently convert a string to a []byte (e.g. because you need to write it into an io.Writer()), you should consider storing it as a []byte in the first place.
Also note that you can have string constants but there are no slice constants. This may be a small optimization. Also note that:
The expression len(s) is constant if s is a string constant.
Also if you are using code already written (either standard library, 3rd party packages or your own), in most of the cases it is given what parameters and values you have to pass or are returned. E.g. if you read data from an io.Reader, you need to have a []byte which you have to pass to receive the read bytes, you can't use a string for that.
This example:
bb := []byte{'h','e','l','l','o',127}
What happens here is that you used a composite literal (slice literal) to create and initialize a new slice of type []byte (using Short variable declaration). You specified constants to list the initial elements of the slice. You also used a byte value 127 which - depending on the platform / console - may or may not have a visual representation.
Late but i hope this could help.
In simple words
Bit: 0 and 1 is how machines represents all the information
Byte: 8 bits that represents UTF-8 encodings i.e. characters
[ ]type: slice of a given data type. Slices are dynamic size arrays.
[ ]byte: this is a byte slice i.e. a dynamic size array that contains bytes i.e. each element is a UTF-8 character.
String: read-only slices of bytes i.e. immutable
With all this in mind:
s := "Go"
bs := []byte(s)
fmt.Printf("%s", bs) // Output: Go
fmt.Printf("%d", bs) // Output: [71 111]
or
bs := []byte{71, 111}
fmt.Printf("%s", bs) // Output: Go
%s converts byte slice to string
%d gets UTF-8 decimal value of bytes
IMPORTANT:
As strings are immutable, they cannot be changed within memory, each time you add or remove something from a string, GO creates a new string in memory. On the other hand, byte slices are mutable so when you update a byte slice you are not recreating new stuffs in memory.
So choosing the right structure could make a difference in your app performance.

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.
Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

Ada Return Concatenated String of Strings

I am trying to finish up a homework assignment, and am down to the last part. First, I'll show you the type that I am dealing with:
TYPE Book_Collection IS
RECORD
Books : Book_Collection_Array;
Max_Size : Integer;
Size : Integer;
END RECORD;
TYPE Book_Type IS
RECORD
Title,
Author,
Publisher : Title_Str;
Year : Year_Type;
Edition : Natural;
Isbn : Isbn_Type;
Price : Dollars;
Stock : Natural;
Format : Format_Type;
END RECORD;
Book_Collection_Array is an array of book_type. These are private types, so the array is bounded (1..200).
There is a function called ToString in a separate package that was provided to us, that takes a book_type as input, and returns a string of all the elements of book_type. What I need to create is a function that takes book_collection is a parameter, and returns a string concatenating all of the strings that are returned by the ToString function that was provided, for the book_types that exist in that book_collection. I have made multiple attempts, but am constantly getting range check failures. Can anyone point me in the right direction?
*Edit:
Thank you to both of you for your help. I went the route of using an unbounded string, and appending each string to it, then declaring an output string and setting it as a constant string equal to the the To_String of the unbounded_string.*
I'll give you a hint.
Ada strings ideally aren't treated or handled much like C or Java strings at all. C strings count on a trailing nul (0) character to designate the end of data in a buffer. Java strings keep track of their own length, and will dynamically reallocate themselves to keep to the proper length if need be. So typical string-handling idioms in those languages think nothing of progressively modifying a string variable.
Ada strings instead are expected to be perfectly-sized when created. Most routines will assume that every element in a string array contains valid character data, and any destination string you assign data into will be perfectly sized to hold it. If that isn't the case, usually an exception is raised (and most likely your program crashes).
There are several ways to deal with this when you are building a string. One way is to create a really big string object as a buffer, and keep a separate length variable to tell your code how much data is really in there at all times. Then when you call Ada string routines you can feed them just the slice of data from the string that is valid. eg: Put_line (My_New_String(1..My_String_Length));
A better way is to just deal with perfectly-sized constant strings. For example, if you want to tack String1 and String2 together, the safe Ada way to do this is:
My_New_String : constant String := String1 & String2;
Then if you later want a string that is this string with String3 tacked on:
My_New_New_String : constant String := My_New_String & String3;
For more info on this, I suggest you look at some of the links over on the right side of this browser window under the heading "Related". I see a lot of good stuff in there.

How do int-to-string casts work in Go?

I only started Go today, so this may be obvious but I couldn't find anything on it.
What does var x uint64 = 0x12345678; y := string(x) give y?
I know var x uint8 = 65; y := string(x) would give y the byte 65, character A, and common sense would suggest (since types larger than uint8 are allowed to be cast to strings) that they would simply be packed in to native byte order (i.e little endian) and assigned to the variable.
This does not seem to be the case:
hex.EncodeToString([]byte(y)) ==> "efbfbd"
First thought says this is an address with the last byte being left off because of some weird null terminator thingy, but if I allocate two x and y variables with two different values and print them out I get the same result.
var x, x2 uint64 = 0x10000000, 0x20000000
y, y2 := string(x), string(x2)
fmt.Println(hex.EncodeToString([]byte(y))) // "efbfbd"
fmt.Println(hex.EncodeToString([]byte(y2))) // "efbfbd"
Maddeningly I can't find the implementation for the string type anywhere although I probably haven't looked hard enough.
This is covered in the Spec: Conversions: Conversions to and from a string type:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".
So effectively when you convert a numeric value to string, it can only yield a string having one rune (character). And since Go stores strings as the UTF-8 encoded byte sequences in memory, that is what you will see if you convert your string to []byte:
Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
When you try to conver the 0x12345678, 0x10000000 and 0x20000000 values to string, since they are outside of the range of valid Unicode code points, as per spec they are converted to "\uFFFD" which in UTF-8 encoding is []byte{239, 191, 189}; when encoded to hex string:
fmt.Println(hex.EncodeToString([]byte("\uFFFD"))) // Output: efbfbd
Or simply:
fmt.Printf("%x", "\uFFFD") // Output: efbfbd
Read the blog post Strings, bytes, runes and characters in Go for more details about string internals.
And btw since Go 1.5 the Go runtime is implemented (mostly) in Go, so these conversions are now implemented in Go and can be found in the runtime package: runtime/string.go, look for the intstring() function.

Resources