Golang what is the optimize way to append and remove character - string

In go, we have the strings.Builder to append characters which is better than using s = s + string(character), but how about is there an optimal way to remove the last character instead of s = s[:len(s)-sizeOfLastCharacter]?

Slicing is a very efficient operation, as detailed in slice internals:
Slicing does not copy the slice's data. It creates a new slice value that points to the original array. This makes slice operations as efficient as manipulating array indices.
Effectively, removing the last element of a slice means creating a new slice descriptor pointing to the same array but with a smaller length. Short of direct access to the internals of a slice, you won't find a more efficient solution.

Use utf8.DecodeLastRuneInString() to find out how many bytes the last rune "occupies", and slice the original string based on that. Slicing a string results in a string value that shares the backing array with the original, so the string content is not copied, just a new string header is created which is just 2 integer values (see reflect.StringHeader).
For example:
s := "Hello, 世界"
r, size := utf8.DecodeLastRuneInString(s)
if r != utf8.RuneError {
s = s[:len(s)-size]
}
fmt.Println(s)
Outputs (try it on the Go Playground):
Hello, 世

Related

String decompression : Reduce time and space complexity

N-rounds of compression are run on a string, where each round replaces some character pattern with one special character (using a dictionary).
Given this compressed string and the dictionary used for compression, we need to find the original string.
For ex:
Dictionary used for compression:
b12k -> ?
a?l -> #
#mn -> !
So, the string ab12klmn is compressed as !
What data structure suits best to store this dictionary such that the decompression is O(n) operation with least possible extra space used?
What I've tried:
This was an interview question, I stored the dictionary in a map with target alphabet (of the compression dictionary) as the key of my map and decompressed strings as the values.
Then a traversal through the given string replacing the special characters with their respective expansions.
For ex:
! -> ab12klmn
# -> ab12k
? -> b12k
Then to reduce the duplicacy of string patterns I did a tree like structuring of this dictionary but the interviewer wasn't satisfied.
Where can I improve this solution?
I understand that we need to get back the original string from the given compressed string.
The best data structure that you can use here can be an 2-dimensional vector (dynamic array). I will try and explain why this can be the best data structure for this problem.
When we use a map we introduce a logn factor while looking for a particular key. With vectors if you know the location of your search query it can be done in O(1).
When we use a vector we are not wasting any extra memory blocks. This is also the case with maps. But if you use 2-d arrays unnecessary memory will be wasted.
But since there are only 256 characters, we will store the dictionary as follows. Lets have a 2d vector of strings with max 256 rows. For this example
b12k -> ?
a?l -> #
#mn -> !
So we will store "b12k" at v[63] as ASCII value of '?' is 63. Similarly, we will store we will store "a?l" at v[35] as ASCII value of '#' is 35 and so on,
NOW HOW TO FIND THE ORIGINAL STRING:
We start from the compressed string.
Initialize your string which will store the final ans. Lets call it origString = "".
Start traversing the string. If its a non-special character add this character to the origString.
If we find any special character just go to that characters ASCII value and its corresponding location in 2d-vector.
Go to step 2.
The pseudo-code for this is
origString = "";
func getOriginalFromCompressed(string s)
for i = [0:s.length()-1]
if(v[s[i]].length()) getOriginalFromCompressed(v[s[i]]);
else origString = stringConcat(origString,s[i]); //add the charcacter to your final ans
end for
end func
origString has the original string.
So the time and space complexity of this solution is O(n).
where n=sum of lengths of all the strings in dictionary given.

Does slice of string perform copy of underlying data?

I am trying to efficiently count runes from a utf-8 string using the utf8 library. Is this example optimal in that it does not copy the underlying data?
https://golang.org/pkg/unicode/utf8/#example_DecodeRuneInString
func main() {
str := "Hello, 世界" // let's assume a runtime-provided string
for len(str) > 0 {
r, size := utf8.DecodeRuneInString(str)
fmt.Printf("%c %v\n", r, size)
str = str[size:] // performs copy?
}
}
I found StringHeader in the (unsafe) reflect library. Is this the exact structure of a string in Go? If so, it is conceivable that slicing a string merely updates Data or allocates a new StringHeader altogether.
type StringHeader struct {
Data uintptr
Len int
}
Bonus: where can I find the code that performs string slicing so that I could look it up myself? Any of these?
https://golang.org/src/runtime/slice.go
https://golang.org/src/runtime/string.go
This related SO answer suggests that runtime-strings incur a copy when converted from string to []byte.
Slicing Strings
does slice of string perform copy of underlying data?
No it does not. See this post by Russ Cox:
A string is represented in memory as a 2-word structure containing a pointer to the string data and a length. Because the string is immutable, it is safe for multiple strings to share the same storage, so slicing s results in a new 2-word structure with a potentially different pointer and length that still refers to the same byte sequence. This means that slicing can be done without allocation or copying, making string slices as efficient as passing around explicit indexes.
-- Go Data Structures
Slices, Performance, and Iterating Over Runes
A slice is basically three things: a length, a capacity, and a pointer to a location in an underlying array.
As such, slices themselves are not very large: ints and a pointer (possibly some other small things in implementation detail). So the allocation required to make a copy of a slice is very small, and doesn't depend on the size of the underlying array. And no new allocation is required when you simply update the length, capacity, and pointer location, such as on line 2 of:
foo := []int{3, 4, 5, 6}
foo = foo[1:]
Rather, it's when a new underlying array has to be allocated that a performance impact is felt.
Strings in Go are immutable. So to change a string you need to make a new string. However, strings are closely related to byte slices, e.g. you can create a byte slice from a string with
foo := `here's my string`
fooBytes := []byte(foo)
I believe that will allocate a new array of bytes, because:
a string is in effect a read-only slice of bytes
according to the Go Blog (see Strings, bytes, runes and characters in Go). In general you can use a slice to change the contents of an underlying array, so to produce a usable byte slice from a string you would have to make a copy to keep the user from changing what's supposed to be immutable.
You could use performance profiling and benchmarking to gain further insight into the performance of your program.
Once you have your slice of bytes, fooBytes, reslicing it does not allocate a new array, it just allocates a new slice, which is small. This appears to be what slicing a string does as well.
Note that you don't need to use the utf8 package to count words in a utf8 string, though you may proceed that way if you like. Go handles utf8 natively. However if you want to iterate over characters you can't represent the string as a slice of bytes, because you could have multibyte characters. Instead you need to represent it as a slice of runes:
foo := `here's my string`
fooRunes := []rune(foo)
This operation of converting a string to a slice of runes is fast in my experience (trivial in benchmarks I've done, but there may be an allocation). Now you can iterate across fooRunes to count words, no utf8 package required. Alternatively, you can skip the explicit []rune(foo) conversion and do it implicitly by using a for ... range loop on the string, because those are special:
A for range loop, by contrast, decodes one UTF-8-encoded rune on each iteration. Each time around the loop, the index of the loop is the starting position of the current rune, measured in bytes, and the code point is its value.
-- Strings, bytes, runes and characters in Go

[]byte(string) vs []byte(*string)

I'm curious as why Go doesn't provide a []byte(*string) method. From a performance perspective, wouldn't []byte(string) make a copy of the input argument and add more cost (though this seems odd since strings are immutable, why copy them)?
[]byte("something") is not a function (or method) call, it's a type conversion.
The type conversion "itself" does not copy the value. Converting a string to a []byte however does, and it needs to, because the result byte slice is mutable, and if a copy would not be made, you could modify / alter the string value (the content of the string) which is immutable, it must be as the Spec: String types section dictates:
Strings are immutable: once created, it is impossible to change the contents of a string.
Note that there are few cases when string <=> []byte conversion does not make a copy as it is optimized "away" by the compiler. These are rare and "hard coded" cases when there is proof an immutable string cannot / will not end up modified.
Such an example is looking up a value from a map where the key type is string, and you index the map with a []byte, converted to string of course (source):
key := []byte("some key")
var m map[string]T
// ...
v, ok := m[string(key)] // Copying key here is optimized away
Another optimization is when ranging over the bytes of a string that is explicitly converted to a byte slice:
s := "something"
for i, v := range []byte(s) { // Copying s is optimized away
// ...
}
(Note that without the conversion the for range would iterate over the runes of the string and not over its UTF8-encoded bytes.)
I'm curious as why Golang doesn't provide a []byte(*string) method.
Because it doesn't make sense.
A pointer (to any type) cannot be represented (in any obviously meaningful way) as a []byte.
From a performance perspective, wouldn't []byte(string) make a copy of the input argument and add more cost (though this seems odd since strings are immutable, why copy them)?
Converting from []byte to string (and vice versa) does involve a copy, because strings are immutable, but byte arrays are not.
However, using a pointer wouldn't solve that problem.

What is the difference between the string and []byte in Go?

s := "some string"
b := []byte(s) // convert string -> []byte
s2 := string(b) // convert []byte -> string
what is the difference between the string and []byte in Go?
When to use "he" or "she"?
Why?
bb := []byte{'h','e','l','l','o',127}
ss := string(bb)
fmt.Println(ss)
hello
The output is just "hello", and lack of 127, sometimes I feel that it's weird.
string and []byte are different types, but they can be converted to one another:
3 . Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.
4 . Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
Blog: Arrays, slices (and strings): The mechanics of 'append':
Strings are actually very simple: they are just read-only slices of bytes with a bit of extra syntactic support from the language.
Also read: Strings, bytes, runes and characters in Go
When to use one over the other?
Depends on what you need. Strings are immutable, so they can be shared and you have guarantee they won't get modified.
Byte slices can be modified (meaning the content of the backing array).
Also if you need to frequently convert a string to a []byte (e.g. because you need to write it into an io.Writer()), you should consider storing it as a []byte in the first place.
Also note that you can have string constants but there are no slice constants. This may be a small optimization. Also note that:
The expression len(s) is constant if s is a string constant.
Also if you are using code already written (either standard library, 3rd party packages or your own), in most of the cases it is given what parameters and values you have to pass or are returned. E.g. if you read data from an io.Reader, you need to have a []byte which you have to pass to receive the read bytes, you can't use a string for that.
This example:
bb := []byte{'h','e','l','l','o',127}
What happens here is that you used a composite literal (slice literal) to create and initialize a new slice of type []byte (using Short variable declaration). You specified constants to list the initial elements of the slice. You also used a byte value 127 which - depending on the platform / console - may or may not have a visual representation.
Late but i hope this could help.
In simple words
Bit: 0 and 1 is how machines represents all the information
Byte: 8 bits that represents UTF-8 encodings i.e. characters
[ ]type: slice of a given data type. Slices are dynamic size arrays.
[ ]byte: this is a byte slice i.e. a dynamic size array that contains bytes i.e. each element is a UTF-8 character.
String: read-only slices of bytes i.e. immutable
With all this in mind:
s := "Go"
bs := []byte(s)
fmt.Printf("%s", bs) // Output: Go
fmt.Printf("%d", bs) // Output: [71 111]
or
bs := []byte{71, 111}
fmt.Printf("%s", bs) // Output: Go
%s converts byte slice to string
%d gets UTF-8 decimal value of bytes
IMPORTANT:
As strings are immutable, they cannot be changed within memory, each time you add or remove something from a string, GO creates a new string in memory. On the other hand, byte slices are mutable so when you update a byte slice you are not recreating new stuffs in memory.
So choosing the right structure could make a difference in your app performance.

Iterating over go string and making string from chars in go

I started learning go and I want to implement some algorithm. I can iterate over strings and then get chars, but these chars are Unicode numbers.
How to concatenate chars into strings in go? Do you have some reference? I was unable to find anything about primitives in official page.
Iterating over strings using range gives you Unicode characters while
iterating over a string using an index gives you bytes. See the spec for
runes and strings as well as their conversions.
As The New Idiot mentioned, strings can be concatenated using the +
operator.
The conversion from character to string is two-fold. You can convert
a byte (or byte sequence) to a string:
string(byte('A'))
or you can convert a rune (or rune sequence) to a string:
string(rune('µ'))
The difference is that runes represent Unicode characters while bytes represent
8 bit values.
But all of this is mentioned in the respective sections of the spec I linked above.
It's quite easy to understand, you should definitely read it.
you can convert a []rune to a string directly:
string([]rune{'h', 'e', 'l', 'l', 'o', '☃'})
http://play.golang.org/p/P9vKXlo47c
as for reference, it's in the Conversions section of the Go spec, in the section titled "Conversions to and from a string type"
http://golang.org/ref/spec#Conversions
as for concatenation, you probably don't want to concatenate every single character with the + operator, since that will perform a lot of copying under the hood. If you're getting runes in one at a time and you're not building an intermediate slice of runes, you most likely want to use a bytes.Buffer, which has a WriteRune method for this sort of thing. http://golang.org/pkg/bytes/#Buffer.WriteRune
Use +
str:= str + "a"
You can try something like this :
string1 := "abc"
character1 := byte('A')
string1 += string(character1)
Even this answer might be of help.
definetly worth reading #nemo's post
Iterating over strings using range gives you Unicode characters while iterating over a string using an index gives you bytes. See the spec for runes and strings as well as their conversions.
Strings can be concatenated using the + operator.
The conversion from character to string is two-fold. You can convert a byte (or byte sequence) to a string:
string(byte('A'))
or you can convert a rune (or rune sequence) to a string:
string(rune('µ'))

Resources