I have a server that has a function to return the registration date of a user. Any user can see any non-hidden user (in this example only user2 is hidden). The server gets a dateRequest from a client and uses id from that to find the corresponding user's date in the user file:
package main
import (
"bytes"
"fmt"
)
const datelen = 10
//format: `name 0xfe date age 0xfd password`; each user is seperated by 0xff
var userfile = []byte("admin\xfe2014-01-0140\xfdadminpassword\xffuser1\xfe2014-03-0423\xfduser1password\xffuser2\xfe2014-09-2736\xfduser2password")
func main() {
c := clientInfo{0, 0, 0}
fmt.Println(string(getDate(dateRequest{c, user{0, "admin"}})))
fmt.Println(string(getDate(dateRequest{c, user{0, "admin______________"}})))
fmt.Println(string(getDate(dateRequest{c, user{1, "user1"}})))
//fmt.Println(string(getDate(dateRequest{c,user{2,"user2"}}))) // panic
fmt.Println(string(getDate(dateRequest{c, user{1, "user1_________________________________________________"}})))
}
func getDate(r dateRequest) []byte {
if r.id == 2 {
panic("hidden user")
}
user := bytes.Split(userfile, []byte{0xff})[r.id]
publicSection := bytes.Split(user, []byte{0xfd})[0]
return publicSection[len(r.username)+1 : len(r.username)+1+datelen]
}
type dateRequest struct {
clientInfo
user
}
type clientInfo struct {
reqTime uint64
ip uint32
ver uint32
}
type user struct {
id int
username string
}
$ go run a.go
2014-01-01
dminpasswo
2014-03-04
r2password
As you can see, it works correctly to receive the dates of the users, but if the request has extra bytes on the username, part of the user's password gets returned instead. Not only that, but if you keep adding more bytes, it returns data from user2, which is supposed to be hidden. Why?
When executing main's third line of code, user and publicSection in getData are "admin�2014-01-0140�adminpassword" and "admin�2014-01-0140". But then it returns "dminpasswo". How is slicing publicSection ("admin�2014-01-0140") returning "dminpasswo"? It looks like a buffer overrun problem, but that shouldn't happen because Go is memory-safe. I even tried reading past the buffer of publicSection by printing publicSection[len(publicSection)], but it panics as expected.
I also tried replacing all the []byte with string, and that fixes the issue for some reason.
Slice expressions check the bounds of the upper index against the slice capacity, not just the length of the slice. In effect, you can slice past the length of the slice. You cannot however slice outside the bounds of the underlying array to access uninitialized memory.
http://play.golang.org/p/oIxXLG-YEV
s := make([]int, 5, 10)
copy(s, []int{1, 2, 3, 4, 5})
fmt.Printf("len:%d cap:%d\n", len(s), cap(s))
// > len:5 cap:10
fmt.Printf("raw slice: %+v\n", s)
// > raw slice: [1 2 3 4 5]
fmt.Printf("sliced past length: %+v\n", s[:10])
// > sliced past length: [1 2 3 4 5 0 0 0 0 0]
// panics
_ = s[:11]
// > panic: runtime error: slice bounds out of range
If you really want to prevent slicing past the length of an array, in go1.3 or later you can set the capacity as the third argument when slicing.
// set the capacity to 5
s := s[:5:5]
// now this will panic
_ = s[:6]
Related
I've got this small code snippet to test 2 ways of converting byte slice to string object, one function to allocate a new string object, another uses unsafe pointer arithmetic to construct string*, which doesn't allocate new memory:
package main
import (
"fmt"
"reflect"
"unsafe"
)
func byteToString(b []byte) string {
return string(b)
}
func byteToStringNoAlloc(b []byte) string {
if len(b) == 0 {
return ""
}
sh := reflect.StringHeader{uintptr(unsafe.Pointer(&b[0])), len(b)}
return *(*string)(unsafe.Pointer(&sh))
}
func main() {
b := []byte("hello")
fmt.Printf("1st element of slice: %v\n", &b[0])
str := byteToString(b)
sh := (*reflect.StringHeader)(unsafe.Pointer(&str))
fmt.Printf("New alloc: %v\n", sh)
toStr := byteToStringNoAlloc(b)
shNoAlloc := (*reflect.StringHeader)(unsafe.Pointer(&toStr))
fmt.Printf("No alloc: %v\n", shNoAlloc) // why different from &b[0]
}
I run this program under go 1.13:
1st element of slice: 0xc000076068
New alloc: &{824634204304 5}
No alloc: &{824634204264 5}
I exptect that the "1st element of slice" should print out the same address like "No alloc", but acturally they're very different. Where did I get wrong?
First of all, type conversions are calling a internal functions, for this case it's slicebytetostring.
https://golang.org/src/runtime/string.go?h=slicebytetostring#L75
It does copy of slice's content into new allocated memory.
In the second case you're creating a new header of the slice and cast it into string header the new unofficial holder of slice's content.
The problem of this is that garbage collector doesn't handle such kind of cases and resulting string header will be marked as a single structure which has no relations with the actual slice which holds the actual content, so, your resulting string would be valid only while the actual content holders are alive (don't count this string header itself).
So once garbage collector sweep the actual content, your string will still point to the same address but already freed memory, and you'll get the panic error or undefined behavior if you touch it.
By the way, there's no need to use reflect package and its headers because direct cast already creates new header as a result:
*(*string)(unsafe.Pointer(&byte_slice))
In my MPI code in C, i'm receiving a word from each of my slave processes. I want to add all these words to an char array in master side (part of code below). I can print these words but not collect them into a single char array.
(I consider max word length as 10, and number of slave's as slavenumber)
char* word = (char*)malloc(sizeof(char)*10);
char words[slavenumber*10];
for (int p = 0; p<slavenumber; p++){
MPI_Recv(word, 10, MPI_CHAR, p, 0,MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Word: %s\n", word); //it works fine
words[p*10] = *word; //This does not work, i think there is a problem here.
}
printf(words); //This does not work correctly, it gives something like: ��>;&�>W�
Can anybody help me on this?
Let's break it down line by line
// allocate a buffer large enough to hold 10 elements of type `char`
char* word = (char*)malloc(sizeof(char)*10);
// define a variable-length-array large enough to
// hold 10*slavenumber elements of `char`
char words[slavenumber*10];
for (int p = 0; p<slavenumber; p++){
// dereference `word` which is exactly the same as writing
// `word[0]` assigning it to `words[p*10]`
words[p*10] = *word;
// words[p*10+1] to words[p*10+9] are unchanged,
// i.e. uninitialized
}
// printing from an array. For this to work properly all
// accessed elements must be initialized and the buffer
// terminated by a null byte. You have neither
printf(words);
Because you left elements uninitialized and didn't null terminate, you're invoking undefined behavior. Be happy that you didn't get demons crawl out of your nose.
In seriousness though, in C you can copy strings by mere assignment. Your usage case calls for strncpy.
for (int p = 0; p<slavenumber; p++){
strncpy(&words[p*10], word, 10);
}
I was optimizing a code using a map[string]string where the value of the map was only either "A" or "B". So I thought Obviously a map[string]bool was way better as the map hold around 50 millions elements.
var a = "a"
var a2 = "Why This ultra long string take the same amount of space in memory as 'a'"
var b = true
var c map[string]string
var d map[string]bool
c["t"] = "A"
d["t"] = true
fmt.Printf("a: %T, %d\n", a, unsafe.Sizeof(a))
fmt.Printf("a2: %T, %d\n", a2, unsafe.Sizeof(a2))
fmt.Printf("b: %T, %d\n", b, unsafe.Sizeof(b))
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c))
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d))
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c["t"]))
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d["t"]))
And the result was:
a: string, 8
a2: string, 8
b: bool, 1
c: map[string]string, 4
d: map[string]bool, 4
c2: map[string]string, 8
d2: map[string]bool, 1
While testing I found something weird, why a2 with a really long string use 8 bytes, same as a which has only one letter?
unsafe.Sizeof() does not recursively go into data structures, it just reports the "shallow" size of the value passed. Quoting from its doc:
The size does not include any memory possibly referenced by x. For instance, if x is a slice, Sizeof returns the size of the slice descriptor, not the size of the memory referenced by the slice.
Maps in Go are implemented as pointers, so unsafe.Sizeof(somemap) will report the size of that pointer.
Strings in Go are just headers containing a pointer and a length. See reflect.StringHeader:
type StringHeader struct {
Data uintptr
Len int
}
So unsafe.Sizeof(somestring) will report the size of the above struct, which is independent of the length of the string value (which is the value of the Len field).
To get the actual memory requirement of a map ("deeply"), see How much memory do golang maps reserve? and also How to get memory size of variable in Go?
Go stores the UTF-8 encoded byte sequences of string values in memory. The builtin function len() reports the byte-length of a string, so
basically the memory required to store a string value in memory is:
var str string = "some string"
stringSize := len(str) + int(unsafe.Sizeof(str))
Also don't forget that a string value may be constructed by slicing another, bigger string, and thus even if the original string is no longer referenced (and thus no longer needed), the bigger backing array will still be required to be kept in memory for the smaller string slice.
For example:
s := "some loooooooong string"
s2 := s[:2]
Here, even though memory requirement for s2 would be len(s2) + unsafe.Sizeof(str) = 2 + unsafe.Sizeof(str), still, the whole backing array of s will be retained.
Let's take the following scenario:
a := make([]int, 10000)
a = a[len(a):]
As we know from "Go Slices: Usage and Internals" there's a "possible gotcha" in downslicing. For any slice a if you do a[start:end] it still points to the original memory, so if you don't copy, a small downslice could potentially keep a very large array in memory for a long time.
However, this case is chosen to result in a slice that should not only have zero length, but zero capacity. A similar question could be asked for the construct a = a[0:0:0].
Does the current implementation still maintain a pointer to the underlying memory, preventing it from being garbage collected, or does it recognize that a slice with no len or cap could not possibly reference anything, and thus garbage collect the original backing array during the next GC pause (assuming no other references exist)?
Edit: Playing with reflect and unsafe on the Playground reveals that the pointer is non-zero:
func main() {
a := make([]int, 10000)
a = a[len(a):]
aHeader := *(*reflect.SliceHeader)((unsafe.Pointer(&a)))
fmt.Println(aHeader.Data)
a = make([]int, 0, 0)
aHeader = *(*reflect.SliceHeader)((unsafe.Pointer(&a)))
fmt.Println(aHeader.Data)
}
http://play.golang.org/p/L0tuzN4ULn
However, this doesn't necessarily answer the question because the second slice that NEVER had anything in it also has a non-zero pointer as the data field. Even so, the pointer could simply be uintptr(&a[len(a)-1]) + sizeof(int) which would be outside the block of backing memory and thus not trigger actual garbage collection, though this seems unlikely since that would prevent garbage collection of other things. The non-zero value could also conceivably just be Playground weirdness.
As seen in your example, re-slicing copies the slice header, including the data pointer to the new slice, so I put together a small test to try and force the runtime to reuse the memory if possible.
I'd like this to be more deterministic, but at least with go1.3 on x86_64, it shows that the memory used by the original array is eventually reused (it does not work in the playground in this form).
package main
import (
"fmt"
"unsafe"
)
func check(i uintptr) {
fmt.Printf("Value at %d: %d\n", i, *(*int64)(unsafe.Pointer(i)))
}
func garbage() string {
s := ""
for i := 0; i < 100000; i++ {
s += "x"
}
return s
}
func main() {
s := make([]int64, 100000)
s[0] = 42
p := uintptr(unsafe.Pointer(&s[0]))
check(p)
z := s[0:0:0]
s = nil
fmt.Println(z)
garbage()
check(p)
}
I'm having a look at Go, which looks quite promising.
I am trying to figure out how to get the size of a go struct, for
example something like
type Coord3d struct {
X, Y, Z int64
}
Of course I know that it's 24 bytes, but I'd like to know it programmatically..
Do you have any ideas how to do this ?
Roger already showed how to use SizeOf method from the unsafe package. Make sure you read this before relying on the value returned by the function:
The size does not include any memory possibly referenced by x. For
instance, if x is a slice, Sizeof returns the size of the slice
descriptor, not the size of the memory referenced by the slice.
In addition to this I wanted to explain how you can easily calculate the size of any struct using a couple of simple rules. And then how to verify your intuition using a helpful service.
The size depends on the types it consists of and the order of the fields in the struct (because different padding will be used). This means that two structs with the same fields can have different size.
For example this struct will have a size of 32
struct {
a bool
b string
c bool
}
and a slight modification will have a size of 24 (a 25% difference just due to a more compact ordering of fields)
struct {
a bool
c bool
b string
}
As you see from the pictures, in the second example we removed one of the paddings and moved a field to take advantage of the previous padding. An alignment can be 1, 2, 4, or 8. A padding is the space that was used to fill in the variable to fill the alignment (basically wasted space).
Knowing this rule and remembering that:
bool, int8/uint8 take 1 byte
int16, uint16 - 2 bytes
int32, uint32, float32 - 4 bytes
int64, uint64, float64, pointer - 8 bytes
string - 16 bytes (2 alignments of 8 bytes)
any slice takes 24 bytes (3 alignments of 8 bytes). So []bool, [][][]string are the same (do not forget to reread the citation I added in the beginning)
array of length n takes n * type it takes of bytes.
Armed with the knowledge of padding, alignment and sizes in bytes, you can quickly figure out how to improve your struct (but still it makes sense to verify your intuition using the service).
import unsafe "unsafe"
/* Structure describing an inotify event. */
type INotifyInfo struct {
Wd int32 // Watch descriptor
Mask uint32 // Watch mask
Cookie uint32 // Cookie to synchronize two events
Len uint32 // Length (including NULs) of name
}
func doSomething() {
var info INotifyInfo
const infoSize = unsafe.Sizeof(info)
...
}
NOTE: The OP is mistaken. The unsafe.Sizeof does return 24 on the example Coord3d struct. See comment below.
binary.TotalSize is also an option, but note there's a slight difference in behavior between that and unsafe.Sizeof: binary.TotalSize includes the size of the contents of slices, while unsafe.Sizeof only returns the size of the top level descriptor. Here's an example of how to use TotalSize.
package main
import (
"encoding/binary"
"fmt"
"reflect"
)
type T struct {
a uint32
b int8
}
func main() {
var t T
r := reflect.ValueOf(t)
s := binary.TotalSize(r)
fmt.Println(s)
}
This is subject to change but last I looked there is an outstanding compiler bug (bug260.go) related to structure alignment. The end result is that packing a structure might not give the expected results. That was for compiler 6g version 5383 release.2010-04-27 release. It may not be affecting your results, but it's something to be aware of.
UPDATE: The only bug left in go test suite is bug260.go, mentioned above, as of release 2010-05-04.
Hotei
In order to not to incur the overhead of initializing a structure, it would be faster to use a pointer to Coord3d:
package main
import (
"fmt"
"unsafe"
)
type Coord3d struct {
X, Y, Z int64
}
func main() {
var dummy *Coord3d
fmt.Printf("sizeof(Coord3d) = %d\n", unsafe.Sizeof(*dummy))
}
/*
returns the size of any type of object in bytes
*/
func getRealSizeOf(v interface{}) (int, error) {
b := new(bytes.Buffer)
if err := gob.NewEncoder(b).Encode(v); err != nil {
return 0, err
}
return b.Len(), nil
}