String memory usage in Golang - string

I was optimizing a code using a map[string]string where the value of the map was only either "A" or "B". So I thought Obviously a map[string]bool was way better as the map hold around 50 millions elements.
var a = "a"
var a2 = "Why This ultra long string take the same amount of space in memory as 'a'"
var b = true
var c map[string]string
var d map[string]bool
c["t"] = "A"
d["t"] = true
fmt.Printf("a: %T, %d\n", a, unsafe.Sizeof(a))
fmt.Printf("a2: %T, %d\n", a2, unsafe.Sizeof(a2))
fmt.Printf("b: %T, %d\n", b, unsafe.Sizeof(b))
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c))
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d))
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c["t"]))
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d["t"]))
And the result was:
a: string, 8
a2: string, 8
b: bool, 1
c: map[string]string, 4
d: map[string]bool, 4
c2: map[string]string, 8
d2: map[string]bool, 1
While testing I found something weird, why a2 with a really long string use 8 bytes, same as a which has only one letter?

unsafe.Sizeof() does not recursively go into data structures, it just reports the "shallow" size of the value passed. Quoting from its doc:
The size does not include any memory possibly referenced by x. For instance, if x is a slice, Sizeof returns the size of the slice descriptor, not the size of the memory referenced by the slice.
Maps in Go are implemented as pointers, so unsafe.Sizeof(somemap) will report the size of that pointer.
Strings in Go are just headers containing a pointer and a length. See reflect.StringHeader:
type StringHeader struct {
Data uintptr
Len int
}
So unsafe.Sizeof(somestring) will report the size of the above struct, which is independent of the length of the string value (which is the value of the Len field).
To get the actual memory requirement of a map ("deeply"), see How much memory do golang maps reserve? and also How to get memory size of variable in Go?
Go stores the UTF-8 encoded byte sequences of string values in memory. The builtin function len() reports the byte-length of a string, so
basically the memory required to store a string value in memory is:
var str string = "some string"
stringSize := len(str) + int(unsafe.Sizeof(str))
Also don't forget that a string value may be constructed by slicing another, bigger string, and thus even if the original string is no longer referenced (and thus no longer needed), the bigger backing array will still be required to be kept in memory for the smaller string slice.
For example:
s := "some loooooooong string"
s2 := s[:2]
Here, even though memory requirement for s2 would be len(s2) + unsafe.Sizeof(str) = 2 + unsafe.Sizeof(str), still, the whole backing array of s will be retained.

Related

Nim: work with read-only memory mapped files

I've only just started with Nim, hence it possibly is a simple question. We need to do many lookups into data that are stored in a file. Some of these files are too large to load into memory, hence the mmapped approach. I'm able to mmap the file by means of memfiles and either have a pointer or MemSlice at my hand. The file and the memory region are read-only, and hence have a fixed size. I was hoping that I'm able to access the data as immutable fixed size byte and char arrays without copying them, leveraging all the existing functionalities available to seqs, arrays, strings etc.. All the MemSlice / string methods copy the data, which is fair, but not what I want (and in my use case don't need).
I understand array, strings etc. types have a pointer to the data and a len field. But couldn't find a way to create them with a pointer and len. I assume it has something to do with ownership and refs to mem that may outlive my slice.
let mm = memfiles.open(...)
let myImmutableFixesSizeArr = ?? # cast[ptr array[fsize, char]](mm.mem) doesn't compile as fsize needs to be const. Neither could I find something like let x: [char] = array_from(mm.mem, fsize)
let myImmutableFixedSizeString = mm[20, 30].to_fixed_size_immutable_string # Create something that is string like so that I can use all the existing string methods.
UPDATE: I did find https://forum.nim-lang.org/t/4680#29226 which explains how to use OpenArray, but OpenArray is only allowed as function argument, and you - if I'm not mistaken - it is doesn't behave like a normal array.
Thanks for your help
It is not possible to convert a raw char array in memory (ptr UncheckedArray[char]) to a string without copying, only to an openArray[char] (or cstring)
So it won't be possible to use procs that expect a string, only those that accept openArray[T] or openArray[char]
Happily an openArray[T] behaves exactly like a seq[T] when sent to a proc.
({.experimental:"views".} does let you assign an openArray[T] to a local variable, but it's not anywhere near ready for production)
you can use the memSlices iterator to loop over delimited chunks in a memFile without copying:
import memfiles
template toOpenArray(ms: MemSlice, T: typedesc = byte): openArray[T] =
##template because openArray isn't a valid return type yet
toOpenArray(cast[ptr UncheckedArray[T]](ms.data),0,(ms.size div sizeof(T))-1)
func process(slice:openArray[char]) =
## your code here but e.g.
## count number of A's
var nA: int
for ch in slice.items:
if ch == 'A': inc nA
debugEcho nA
let mm = memfiles.open("file.txt")
for slice in mm.memSlices:
process slice.toOpenArray(char)
Or, to work with some char array represented in the middle of the file, you can use pointer arithmetic.
import memfiles
template extractImpl(typ,pntr,offset) =
cast[typ](cast[ByteAddress](pntr)+offset)
template checkFileLen(memfile,len,offset) =
if offset + len > memfile.size:
raise newException(IndexDefect,"file too short")
func extract*(mm: MemFile,T:typedesc, offset:Natural): ptr T =
checkFileLen(mm,T,offset)
result = extractImpl(ptr T,mm.mem,offset)
func extract*[U](mm: MemFile,T: typedesc[ptr U], offset: Natural): T =
extractImpl(T,mm.mem,offset)
let mm = memfiles.open("file.txt")
#to extract a compile-time known length string:
let mystring_offset = 3
const mystring_len = 10
type MyStringT = array[mystring_len,char]
let myString:ptr MyStringT = mm.extract(MyStringT,mystring_offset)
process myString[]
#to extract a dynamic length string:
let size_offset = 14
let string_offset = 18
let sz:ptr int32 = mm.extract(int32,size_offset)
let str:ptr UncheckedArray[char] = mm.extract(ptr UncheckedArray[char], string_offset)
checkFileLen(mm,sz[],string_offset)
process str.toOpenArray(0,sz[]-1)

GoLang - memory allocation - []byte vs string

In the below code:
c := "fool"
d := []byte("fool")
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c)) // 16 bytes
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d)) // 24 bytes
To decide the datatype needed to receive JSON data from CloudFoundry, am testing above sample code to understand the memory allocation for []byte vs string type.
Expected size of string type variable c is 1 byte x 4 ascii encoded letter = 4 bytes, but the size shows 16 bytes.
For byte type variable d, GO embeds the string in the executable program as a string literal. It converts the string literal to a byte slice at runtime using the runtime.stringtoslicebyte function. Something like... []byte{102, 111, 111, 108}
Expected size of byte type variable d is again 1 byte x 4 ascii values = 4 bytes but the size of variable d shows 24 bytes as it's underlying array capacity.
Why the size of both variables is not 4 bytes?
Both slices and strings in Go are struct-like headers:
reflect.SliceHeader:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
reflect.StringHeader:
type StringHeader struct {
Data uintptr
Len int
}
The sizes reported by unsafe.Sizeof() are the sizes of these headers, exluding the size of the pointed arrays:
Sizeof takes an expression x of any type and returns the size in bytes of a hypothetical variable v as if v was declared via var v = x. The size does not include any memory possibly referenced by x. For instance, if x is a slice, Sizeof returns the size of the slice descriptor, not the size of the memory referenced by the slice.
To get the actual ("recursive") size of some arbitrary value, use Go's builtin testing and benchmarking framework. For details, see How to get memory size of variable in Go?
For strings specifically, see String memory usage in Golang. The complete memory required by a string value can be computed like this:
var str string = "some string"
stringSize := len(str) + int(unsafe.Sizeof(str))

Can a zero-length and zero-cap slice still point to an underlying array and prevent garbage collection?

Let's take the following scenario:
a := make([]int, 10000)
a = a[len(a):]
As we know from "Go Slices: Usage and Internals" there's a "possible gotcha" in downslicing. For any slice a if you do a[start:end] it still points to the original memory, so if you don't copy, a small downslice could potentially keep a very large array in memory for a long time.
However, this case is chosen to result in a slice that should not only have zero length, but zero capacity. A similar question could be asked for the construct a = a[0:0:0].
Does the current implementation still maintain a pointer to the underlying memory, preventing it from being garbage collected, or does it recognize that a slice with no len or cap could not possibly reference anything, and thus garbage collect the original backing array during the next GC pause (assuming no other references exist)?
Edit: Playing with reflect and unsafe on the Playground reveals that the pointer is non-zero:
func main() {
a := make([]int, 10000)
a = a[len(a):]
aHeader := *(*reflect.SliceHeader)((unsafe.Pointer(&a)))
fmt.Println(aHeader.Data)
a = make([]int, 0, 0)
aHeader = *(*reflect.SliceHeader)((unsafe.Pointer(&a)))
fmt.Println(aHeader.Data)
}
http://play.golang.org/p/L0tuzN4ULn
However, this doesn't necessarily answer the question because the second slice that NEVER had anything in it also has a non-zero pointer as the data field. Even so, the pointer could simply be uintptr(&a[len(a)-1]) + sizeof(int) which would be outside the block of backing memory and thus not trigger actual garbage collection, though this seems unlikely since that would prevent garbage collection of other things. The non-zero value could also conceivably just be Playground weirdness.
As seen in your example, re-slicing copies the slice header, including the data pointer to the new slice, so I put together a small test to try and force the runtime to reuse the memory if possible.
I'd like this to be more deterministic, but at least with go1.3 on x86_64, it shows that the memory used by the original array is eventually reused (it does not work in the playground in this form).
package main
import (
"fmt"
"unsafe"
)
func check(i uintptr) {
fmt.Printf("Value at %d: %d\n", i, *(*int64)(unsafe.Pointer(i)))
}
func garbage() string {
s := ""
for i := 0; i < 100000; i++ {
s += "x"
}
return s
}
func main() {
s := make([]int64, 100000)
s[0] = 42
p := uintptr(unsafe.Pointer(&s[0]))
check(p)
z := s[0:0:0]
s = nil
fmt.Println(z)
garbage()
check(p)
}

Converting an int or String to a char array on Arduino

I am getting an int value from one of the analog pins on my Arduino. How do I concatenate this to a String and then convert the String to a char[]?
It was suggested that I try char msg[] = myString.getChars();, but I am receiving a message that getChars does not exist.
To convert and append an integer, use operator += (or member function concat):
String stringOne = "A long integer: ";
stringOne += 123456789;
To get the string as type char[], use toCharArray():
char charBuf[50];
stringOne.toCharArray(charBuf, 50)
In the example, there is only space for 49 characters (presuming it is terminated by null). You may want to make the size dynamic.
Overhead
The cost of bringing in String (it is not included if not used anywhere in the sketch), is approximately 1212 bytes of program memory (flash) and 48 bytes RAM.
This was measured using Arduino IDE version 1.8.10 (2019-09-13) for an Arduino Leonardo sketch.
Risk
There must be sufficient free RAM available. Otherwise, the result may be lockup/freeze of the application or other strange behaviour (UB).
Just as a reference, below is an example of how to convert between String and char[] with a dynamic length -
// Define
String str = "This is my string";
// Length (with one extra character for the null terminator)
int str_len = str.length() + 1;
// Prepare the character array (the buffer)
char char_array[str_len];
// Copy it over
str.toCharArray(char_array, str_len);
Yes, this is painfully obtuse for something as simple as a type conversion, but somehow it's the easiest way.
You can convert it to char* if you don't need a modifiable string by using:
(char*) yourString.c_str();
This would be very useful when you want to publish a String variable via MQTT in arduino.
None of that stuff worked. Here's a much simpler way .. the label str is the pointer to what IS an array...
String str = String(yourNumber, DEC); // Obviously .. get your int or byte into the string
str = str + '\r' + '\n'; // Add the required carriage return, optional line feed
byte str_len = str.length();
// Get the length of the whole lot .. C will kindly
// place a null at the end of the string which makes
// it by default an array[].
// The [0] element is the highest digit... so we
// have a separate place counter for the array...
byte arrayPointer = 0;
while (str_len)
{
// I was outputting the digits to the TX buffer
if ((UCSR0A & (1<<UDRE0))) // Is the TX buffer empty?
{
UDR0 = str[arrayPointer];
--str_len;
++arrayPointer;
}
}
With all the answers here, I'm surprised no one has brought up using itoa already built in.
It inserts the string representation of the integer into the given pointer.
int a = 4625;
char cStr[5]; // number of digits + 1 for null terminator
itoa(a, cStr, 10); // int value, pointer to string, base number
Or if you're unsure of the length of the string:
int b = 80085;
int len = String(b).length();
char cStr[len + 1]; // String.length() does not include the null terminator
itoa(b, cStr, 10); // or you could use String(b).toCharArray(cStr, len);

Sizeof struct in Go

I'm having a look at Go, which looks quite promising.
I am trying to figure out how to get the size of a go struct, for
example something like
type Coord3d struct {
X, Y, Z int64
}
Of course I know that it's 24 bytes, but I'd like to know it programmatically..
Do you have any ideas how to do this ?
Roger already showed how to use SizeOf method from the unsafe package. Make sure you read this before relying on the value returned by the function:
The size does not include any memory possibly referenced by x. For
instance, if x is a slice, Sizeof returns the size of the slice
descriptor, not the size of the memory referenced by the slice.
In addition to this I wanted to explain how you can easily calculate the size of any struct using a couple of simple rules. And then how to verify your intuition using a helpful service.
The size depends on the types it consists of and the order of the fields in the struct (because different padding will be used). This means that two structs with the same fields can have different size.
For example this struct will have a size of 32
struct {
a bool
b string
c bool
}
and a slight modification will have a size of 24 (a 25% difference just due to a more compact ordering of fields)
struct {
a bool
c bool
b string
}
As you see from the pictures, in the second example we removed one of the paddings and moved a field to take advantage of the previous padding. An alignment can be 1, 2, 4, or 8. A padding is the space that was used to fill in the variable to fill the alignment (basically wasted space).
Knowing this rule and remembering that:
bool, int8/uint8 take 1 byte
int16, uint16 - 2 bytes
int32, uint32, float32 - 4 bytes
int64, uint64, float64, pointer - 8 bytes
string - 16 bytes (2 alignments of 8 bytes)
any slice takes 24 bytes (3 alignments of 8 bytes). So []bool, [][][]string are the same (do not forget to reread the citation I added in the beginning)
array of length n takes n * type it takes of bytes.
Armed with the knowledge of padding, alignment and sizes in bytes, you can quickly figure out how to improve your struct (but still it makes sense to verify your intuition using the service).
import unsafe "unsafe"
/* Structure describing an inotify event. */
type INotifyInfo struct {
Wd int32 // Watch descriptor
Mask uint32 // Watch mask
Cookie uint32 // Cookie to synchronize two events
Len uint32 // Length (including NULs) of name
}
func doSomething() {
var info INotifyInfo
const infoSize = unsafe.Sizeof(info)
...
}
NOTE: The OP is mistaken. The unsafe.Sizeof does return 24 on the example Coord3d struct. See comment below.
binary.TotalSize is also an option, but note there's a slight difference in behavior between that and unsafe.Sizeof: binary.TotalSize includes the size of the contents of slices, while unsafe.Sizeof only returns the size of the top level descriptor. Here's an example of how to use TotalSize.
package main
import (
"encoding/binary"
"fmt"
"reflect"
)
type T struct {
a uint32
b int8
}
func main() {
var t T
r := reflect.ValueOf(t)
s := binary.TotalSize(r)
fmt.Println(s)
}
This is subject to change but last I looked there is an outstanding compiler bug (bug260.go) related to structure alignment. The end result is that packing a structure might not give the expected results. That was for compiler 6g version 5383 release.2010-04-27 release. It may not be affecting your results, but it's something to be aware of.
UPDATE: The only bug left in go test suite is bug260.go, mentioned above, as of release 2010-05-04.
Hotei
In order to not to incur the overhead of initializing a structure, it would be faster to use a pointer to Coord3d:
package main
import (
"fmt"
"unsafe"
)
type Coord3d struct {
X, Y, Z int64
}
func main() {
var dummy *Coord3d
fmt.Printf("sizeof(Coord3d) = %d\n", unsafe.Sizeof(*dummy))
}
/*
returns the size of any type of object in bytes
*/
func getRealSizeOf(v interface{}) (int, error) {
b := new(bytes.Buffer)
if err := gob.NewEncoder(b).Encode(v); err != nil {
return 0, err
}
return b.Len(), nil
}

Resources