Using pointers to return results in x86 Assembly

Using pointers to return results in x86 Assembly - linux

We've been given the following function to try and implement in C as part of a CS course. We are programming on x86 Linux.
function(float x, float y, float *z);
For a function such as example(int x, int y) I understand that the x value resides at [ebp+8] and y at [ebp+12] on the stack, is the same convention used when pushing floats?
We also have to perform some masking and calculations on the float numbers. Do these float numbers behave the same as 32-bit integers just in IEEE-754 format?

here is a simple function and it's asm code :
function(float x, float y, float *z){
float sum = x + y;
float neg = sum - *z;
}
the asm of the above function will be like this:
function:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
pushl %ebx
flds 8(%ebp)
fadds 12(%ebp)
fstps -4(%ebp)
movl 16(%ebp),%ebx
flds -4(%ebp)
fsubs (%ebx)
fstps -8(%ebp)
leal -12(%ebp),%esp
popl %ebx
leave
ret
as you can see from asm above the reference to ebp+x in this case x will be 8/12/16 to get the parameter from the stack,
so as fuz point out it in the comments it is indeed stored on the stack

Related

How to use unsafe get a byte slice from a string without memory copy

I have read about "https://github.com/golang/go/issues/25484" about no-copy conversion from []byte to string.
I am wondering if there is a way to convert a string to a byte slice without memory copy?
I am writing a program which processes terra-bytes data, if every string is copied twice in memory, it will slow down the progress. And I do not care about mutable/unsafe, only internal usage, I just need the speed as fast as possible.
Example:
var s string
// some processing on s, for some reasons, I must use string here
// ...
// then output to a writer
gzipWriter.Write([]byte(s)) // !!! Here I want to avoid the memory copy, no WriteString
So the question is: is there a way to prevent from the memory copying? I know maybe I need the unsafe package, but I do not know how. I have searched a while, no answer till now, neither the SO showed related answers works.

Getting the content of a string as a []byte without copying in general is only possible using unsafe, because strings in Go are immutable, and without a copy it would be possible to modify the contents of the string (by changing the elements of the byte slice).
So using unsafe, this is how it could look like (corrected, working solution):
func unsafeGetBytes(s string) []byte {
return (*[0x7fff0000]byte)(unsafe.Pointer(
(*reflect.StringHeader)(unsafe.Pointer(&s)).Data),
)[:len(s):len(s)]
}
This solution is from Ian Lance Taylor.
One thing to note here: the empty string "" has no bytes as its length is zero. This means there is no guarantee what the Data field may be, it may be zero or an arbitrary address shared among the zero-size variables. If an empty string may be passed, that must be checked explicitly (although there's no need to get the bytes of an empty string without copying...):
func unsafeGetBytes(s string) []byte {
if s == "" {
return nil // or []byte{}
}
return (*[0x7fff0000]byte)(unsafe.Pointer(
(*reflect.StringHeader)(unsafe.Pointer(&s)).Data),
)[:len(s):len(s)]
}
Original, wrong solution was:
func unsafeGetBytesWRONG(s string) []byte {
return *(*[]byte)(unsafe.Pointer(&s)) // WRONG!!!!
}
See Nuno Cruces's answer below for reasoning.
Testing it:
s := "hi"
data := unsafeGetBytes(s)
fmt.Println(data, string(data))
data = unsafeGetBytes("gopher")
fmt.Println(data, string(data))
Output (try it on the Go Playground):
[104 105] hi
[103 111 112 104 101 114] gopher
BUT: You wrote you want this because you need performance. You also mentioned you want to compress the data. Please know that compressing data (using gzip) requires a lot more computation than just copying a few bytes! You will not see any noticeable performance gain by using this!
Instead when you want to write strings to an io.Writer, it's recommended to do it via io.WriteString() function which if possible will do so without making a copy of the string (by checking and calling WriteString() method which if exists is most likely does it better than copying the string). For details, see What's the difference between ResponseWriter.Write and io.WriteString?
There are also ways to access the contents of a string without converting it to []byte, such as indexing, or using a loop where the compiler optimizes away the copy:
s := "something"
for i, v := range []byte(s) { // Copying s is optimized away
// ...
}
Also see related questions:
[]byte(string) vs []byte(*string)
What are the possible consequences of using unsafe conversion from []byte to string in go?
What is the difference between the string and []byte in Go?
Does conversion between alias types in Go create copies?
How does type conversion internally work? What is the memory utilization for the same?

After some extensive investigation, I believe I've discovered the most efficient way of getting a []byte from a string as of Go 1.17 (this is for i386/x86_64 gc; I haven't tested other architectures.) The trade-off of being efficient code here is being inefficient to code, though.
Before I say anything else, it should be made clear that the differences are ultimately very small and probably inconsequential -- the info below is for fun/educational purposes only.
Summary
With some minor alterations, the accepted answer illustrating the technique of slicing a pointer to array is the most efficient way. That being said, I wouldn't be surprised if unsafe.Slice becomes the (decisively) better choice in the future.
unsafe.Slice
unsafe.Slice currently has the advantage of being slightly more readable, but I'm skeptical about it's performance. It looks like it makes a call to runtime.unsafeslice. The following is the gc amd64 1.17 assembly of the function provided in Atamiri's answer (FUNCDATA omitted). Note the stack check (lack of NOSPLIT):
unsafeGetBytes_pc0:
TEXT "".unsafeGetBytes(SB), ABIInternal, $48-16
CMPQ SP, 16(R14)
PCDATA $0, $-2
JLS unsafeGetBytes_pc86
PCDATA $0, $-1
SUBQ $48, SP
MOVQ BP, 40(SP)
LEAQ 40(SP), BP
PCDATA $0, $-2
MOVQ BX, ""..autotmp_4+24(SP)
MOVQ AX, "".s+56(SP)
MOVQ BX, "".s+64(SP)
MOVQ "".s+56(SP), DX
PCDATA $0, $-1
MOVQ DX, ""..autotmp_5+32(SP)
LEAQ type.uint8(SB), AX
MOVQ BX, CX
MOVQ DX, BX
PCDATA $1, $1
CALL runtime.unsafeslice(SB)
MOVQ ""..autotmp_5+32(SP), AX
MOVQ ""..autotmp_4+24(SP), BX
MOVQ BX, CX
MOVQ 40(SP), BP
ADDQ $48, SP
RET
unsafeGetBytes_pc86:
NOP
PCDATA $1, $-1
PCDATA $0, $-2
MOVQ AX, 8(SP)
MOVQ BX, 16(SP)
CALL runtime.morestack_noctxt(SB)
MOVQ 8(SP), AX
MOVQ 16(SP), BX
PCDATA $0, $-1
JMP unsafeGetBytes_pc0
Other unimportant fun facts about the above (easily subject to change): compiled size of 3326B; has an inline cost of 7; correct escape analysis: s leaks to ~r1 with derefs=0.
Carefully Modifying *reflect.SliceHeader
This method has the advantage/disadvantage of letting one modify the internal state of a slice directly. Unfortunately, due it's multiline nature and use of uintptr, the GC can easily mess things up if one is not careful about keeping a reference to the original string. (Here I avoided creating temporary pointers to reduce inline cost and to avoid needing to add runtime.KeepAlive):
func unsafeGetBytes(s string) (b []byte) {
(*reflect.SliceHeader)(unsafe.Pointer(&b)).Data = (*reflect.StringHeader)(unsafe.Pointer(&s)).Data
(*reflect.SliceHeader)(unsafe.Pointer(&b)).Cap = len(s)
(*reflect.SliceHeader)(unsafe.Pointer(&b)).Len = len(s)
return
}
The corresponding assembly on amd64 (FUNCDATA omitted):
TEXT "".unsafeGetBytes(SB), NOSPLIT|ABIInternal, $32-16
SUBQ $32, SP
MOVQ BP, 24(SP)
LEAQ 24(SP), BP
MOVQ AX, "".s+40(SP)
MOVQ BX, "".s+48(SP)
MOVQ $0, "".b(SP)
MOVUPS X15, "".b+8(SP)
MOVQ "".s+40(SP), DX
MOVQ DX, "".b(SP)
MOVQ "".s+48(SP), CX
MOVQ CX, "".b+16(SP)
MOVQ "".s+48(SP), BX
MOVQ BX, "".b+8(SP)
MOVQ "".b(SP), AX
MOVQ 24(SP), BP
ADDQ $32, SP
RET
Other unimportant fun facts about the above (easily subject to change): compiled size of 3700B; has an inline cost of 20; subpar escape analysis: s leaks to {heap} with derefs=0.
Unsafer version of modifying SliceHeader
Adapted from Nuno Cruces' answer. This relies on the inherent structural similarity between StringHeader and SliceHeader, so in a sense it breaks "more easily". Additionally, it temporarily creates an illegal state where cap(b) (being 0) is less than len(b).
func unsafeGetBytes(s string) (b []byte) {
*(*string)(unsafe.Pointer(&b)) = s
(*reflect.SliceHeader)(unsafe.Pointer(&b)).Cap = len(s)
return
}
Corresponding assembly (FUNCDATA omitted):
TEXT "".unsafeGetBytes(SB), NOSPLIT|ABIInternal, $32-16
SUBQ $32, SP
MOVQ BP, 24(SP)
LEAQ 24(SP), BP
MOVQ AX, "".s+40(FP)
MOVQ $0, "".b(SP)
MOVUPS X15, "".b+8(SP)
MOVQ AX, "".b(SP)
MOVQ BX, "".b+8(SP)
MOVQ BX, "".b+16(SP)
MOVQ "".b(SP), AX
MOVQ BX, CX
MOVQ 24(SP), BP
ADDQ $32, SP
NOP
RET
Other unimportant details: compiled size 3636B, inline cost of 11, with subpar escape analysis: s leaks to {heap} with derefs=0.
Slicing a pointer to array
This is the accepted answer (shown here for comparison) -- its primary disadvantage is its ugliness (viz. magic number 0x7fff0000). There's also the tiniest possibility of getting a string bigger than the array, and an unavoidable bounds check.
func unsafeGetBytes(s string) []byte {
return (*[0x7fff0000]byte)(unsafe.Pointer(
(*reflect.StringHeader)(unsafe.Pointer(&s)).Data),
)[:len(s):len(s)]
}
Corresponding assembly (FUNCDATA removed).
TEXT "".unsafeGetBytes(SB), NOSPLIT|ABIInternal, $24-16
SUBQ $24, SP
MOVQ BP, 16(SP)
LEAQ 16(SP), BP
PCDATA $0, $-2
MOVQ AX, "".s+32(SP)
MOVQ BX, "".s+40(SP)
MOVQ "".s+32(SP), AX
PCDATA $0, $-1
TESTB AL, (AX)
NOP
CMPQ BX, $2147418112
JHI unsafeGetBytes_pc54
MOVQ BX, CX
MOVQ 16(SP), BP
ADDQ $24, SP
RET
unsafeGetBytes_pc54:
MOVQ BX, DX
MOVL $2147418112, BX
PCDATA $1, $1
NOP
CALL runtime.panicSlice3Alen(SB)
XCHGL AX, AX
Other unimportant details: compiled size 3142B, inline cost of 9, with correct escape analysis: s leaks to ~r1 with derefs=0
Note the runtime.panicSlice3Alen -- this is bounds check that checks that len(s) is within 0x7fff0000.
Improved slicing pointer to array
This is what I've concluded to be the most efficient method as of Go 1.17. I basically modified the accepted answer to eliminate the bounds check, and found a "more meaningful" constant (math.MaxInt32) to use than 0x7fff0000. Using MaxInt32 preserves 32-bit compatibility.
func unsafeGetBytes(s string) []byte {
const MaxInt32 = 1<<31 - 1
return (*[MaxInt32]byte)(unsafe.Pointer((*reflect.StringHeader)(
unsafe.Pointer(&s)).Data))[:len(s)&MaxInt32:len(s)&MaxInt32]
}
Corresponding assembly (FUNCDATA removed):
TEXT "".unsafeGetBytes(SB), NOSPLIT|ABIInternal, $0-16
PCDATA $0, $-2
MOVQ AX, "".s+8(SP)
MOVQ BX, "".s+16(SP)
MOVQ "".s+8(SP), AX
PCDATA $0, $-1
TESTB AL, (AX)
ANDQ $2147483647, BX
MOVQ BX, CX
RET
Other unimportant details: compiled size 3188B, inline cost of 13, and correct escape analysis: s leaks to ~r1 with derefs=0

In go 1.17, I'd recommend unsafe.Slice as more readable:
unsafe.Slice((*byte)(unsafe.Pointer((*reflect.StringHeader)(unsafe.Pointer(&s)).Data)), len(s))
I think that this also works (doesn't violate any unsafe.Pointer rules), with the benefit that it works for a const s:
*(*[]byte)(unsafe.Pointer(&struct{string; int}{s, len(s)}))
Commentary bellow is regarding the accepted answer as it originally stood. The accepted answer now mentions an (authoritative) solution from Ian Lance Taylor. Keeping it as it points out a common error.
The accepted answer is wrong, and may produce the panic #RFC mentioned in the comments. The explanation by #icza about GC and keep alive is misguided.
The reason capacity is zero (or even an arbitrary value) is more prosaic.
A slice is:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
A string is:
type StringHeader struct {
Data uintptr
Len int
}
Converting a byte slice to a string can be "safely" done as the strings.Builder does it:
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
This will copy the Data pointer and Len from the slice to the string.
The opposite conversion is not "safe" because Cap doesn't get set to the correct value.
The following (originally by me) is also wrong because it violates unsafe.Pointer rule #1.
This is the correct code, that fixes the panic:
var buf = *(*[]byte)(unsafe.Pointer(&str))
(*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
Or perhaps:
var buf []byte
*(*string)(unsafe.Pointer(&buf)) = str
(*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
I should add that all these conversions are unsafe in the sense that strings are expected to be immutable, and byte arrays/slices mutable.
But if you know for sure that the byte slice won't be mutated, you won't get bounds (or GC) issues with the above conversions.

In Go 1.17, one can now use unsafe.Slice, so the accepted answer can be rewritten as follows:
func unsafeGetBytes(s string) []byte {
return unsafe.Slice((*byte)(unsafe.Pointer((*reflect.StringHeader)(unsafe.Pointer(&s)).Data)), len(s))
}

I managed to get the goal by this:
func TestString(t *testing.T) {
b := []byte{'a', 'b', 'c', '1', '2', '3', '4'}
s := *(*string)(unsafe.Pointer(&b))
sb := *(*[]byte)(unsafe.Pointer(&s))
addr1 := unsafe.Pointer(&b)
addr2 := unsafe.Pointer(&s)
addr3 := unsafe.Pointer(&sb)
fmt.Print("&b=", addr1, "\n&s=", addr2, "\n&sb=", addr3, "\n")
hdr1 := (*reflect.StringHeader)(unsafe.Pointer(&b))
hdr2 := (*reflect.SliceHeader)(unsafe.Pointer(&s))
hdr3 := (*reflect.SliceHeader)(unsafe.Pointer(&sb))
fmt.Print("b.data=", hdr1.Data, "\ns.data=", hdr2.Data, "\nsb.data=", hdr3.Data, "\n")
b[0] = 'X'
sb[1] = 'Y' // if sb is from a string directly, this will cause nil panic
fmt.Print("s=", s, "\nsb=")
for _, c := range sb {
fmt.Printf("%c", c)
}
fmt.Println()
}
Output:
=== RUN TestString
&b=0xc000218000
&s=0xc00021a000
&sb=0xc000218020
b.data=824635867152
s.data=824635867152
sb.data=824635867152
s=XYc1234
sb=XYc1234
These variables all share the same memory.

Go 1.20 (February 2023)
You can use unsafe.StringData to greatly simplify YenForYang's answer:
StringData returns a pointer to the underlying bytes of str. For an empty string the return value is unspecified, and may be nil.
Since Go strings are immutable, the bytes returned by StringData must not be modified.
func main() {
str := "foobar"
d := unsafe.StringData(str)
b := unsafe.Slice(d, len(str))
fmt.Printf("%T, %s\n", b, b) // []uint8, foobar (byte is alias of uint8)
}
Go tip playground: https://go.dev/play/p/FIXe0rb8YHE?v=gotip
Remember that you can't assign to b[n]. The memory is still read-only.

Simple, no reflect, and I think it is portable. s is your string and b is your bytes slice
var b []byte
bb:=(*[3]uintptr)(unsafe.Pointer(&b))[:]
copy(bb, (*[2]uintptr)(unsafe.Pointer(&s))[:])
bb[2] = bb[1]
// use b
Remember, bytes value should not be modified (will panic). re-slicing is ok (for example: bytes.split(b, []byte{','} )

VC++ SSE code generation - is this a compiler bug?

A very particular code sequence in VC++ generated the following instruction (for Win32):
unpcklpd xmm0,xmmword ptr [ebp-40h]
2 questions arise:
(1) As far as I understand the intel manual, unpcklpd accepts as 2nd argument a 128-aligned memory address. If the address is relative to a stack frame alignment cannot be forced. Is this really a compiler bug?
(2) Exceptions are thrown from at the execution of this instruction only when run from the debugger, and even then not always. Even attaching to the process and executing this code does not throw. How can this be??
The particular exception thrown is access violation at 0xFFFFFFFF, but AFAIK that's just a code for misalignment.
[Edit:]
Here's some source that demonstrates the bad code generation - but typically doesn't cause a crash. (that's mostly what I'm wondering about)
[Edit 2:]
The code sample now reproduces the actual crash. This one also crashes outside the debugger - I suspect the difference occurs because the debugger launches the program at different typical base addresses.
// mock.cpp
#include <stdio.h>
struct mockVect2d
{
double x, y;
mockVect2d() {}
mockVect2d(double a, double b) : x(a), y(b) {}
mockVect2d operator + (const mockVect2d& u) {
return mockVect2d(x + u.x, y + u.y);
}
};
struct MockPoly
{
MockPoly() {}
mockVect2d* m_Vrts;
double m_Area;
int m_Convex;
bool m_ParClear;
void ClearPar() { m_Area = -1.; m_Convex = 0; m_ParClear = true; }
MockPoly(int len) { m_Vrts = new mockVect2d[len]; }
mockVect2d& Vrt(int i) {
if (!m_ParClear) ClearPar();
return m_Vrts[i];
}
const mockVect2d& GetCenter() { return m_Vrts[0]; }
};
struct MockItem
{
MockItem() : Contour(1) {}
MockPoly Contour;
};
struct Mock
{
Mock() {}
MockItem m_item;
virtual int GetCount() { return 2; }
virtual mockVect2d GetCenter() { return mockVect2d(1.0, 2.0); }
virtual MockItem GetItem(int i) { return m_item; }
};
void testInner(int a)
{
int c = 8;
printf("%d", c);
Mock* pMock = new Mock;
int Flag = true;
int nlr = pMock->GetCount();
if (nlr == 0)
return;
int flr = 1;
if (flr == nlr)
return;
if (Flag)
{
if (flr < nlr && flr>0) {
int c = 8;
printf("%d", c);
MockPoly pol(2);
mockVect2d ctr = pMock->GetItem(0).Contour.GetCenter();
// The mess happens here:
// ; 74 : pol.Vrt(1) = ctr + mockVect2d(0., 1.0);
//
// call ? Vrt#MockPoly##QAEAAUmockVect2d##H#Z; MockPoly::Vrt
// movdqa xmm0, XMMWORD PTR $T4[ebp]
// unpcklpd xmm0, QWORD PTR tv190[ebp] **** crash!
// movdqu XMMWORD PTR[eax], xmm0
pol.Vrt(0) = ctr + mockVect2d(1.0, 0.);
pol.Vrt(1) = ctr + mockVect2d(0., 1.0);
}
}
}
void main()
{
testInner(2);
return;
}
If you prefer, download a ready vcxproj with all the switches set from here. This includes the complete ASM too.

Update: this is now a confirmed VC++ compiler bug, hopefully to be resolved in VS2015 RTM.
Edit: The connect report, like many others, is now garbage. However the compiler bug seems to be resolved in VS2017 - not in 2015 update 3.

Since no one else has stepped up, I'm going to take a shot.
1) If the address is relative to a stack frame alignment cannot be forced. Is this really a compiler bug?
I'm not sure it is true that you cannot force alignment for stack variables. Consider this code:
struct foo
{
char a;
int b;
unsigned long long c;
};
int wmain(int argc, wchar_t* argv[])
{
foo moo;
moo.a = 1;
moo.b = 2;
moo.c = 3;
}
Looking at the startup code for main, we see:
00E31AB0 push ebp
00E31AB1 mov ebp,esp
00E31AB3 sub esp,0DCh
00E31AB9 push ebx
00E31ABA push esi
00E31ABB push edi
00E31ABC lea edi,[ebp-0DCh]
00E31AC2 mov ecx,37h
00E31AC7 mov eax,0CCCCCCCCh
00E31ACC rep stos dword ptr es:[edi]
00E31ACE mov eax,dword ptr [___security_cookie (0E440CCh)]
00E31AD3 xor eax,ebp
00E31AD5 mov dword ptr [ebp-4],eax
Adding __declspec(align(16)) to moo gives
01291AB0 push ebx
01291AB1 mov ebx,esp
01291AB3 sub esp,8
01291AB6 and esp,0FFFFFFF0h <------------------------
01291AB9 add esp,4
01291ABC push ebp
01291ABD mov ebp,dword ptr [ebx+4]
01291AC0 mov dword ptr [esp+4],ebp
01291AC4 mov ebp,esp
01291AC6 sub esp,0E8h
01291ACC push esi
01291ACD push edi
01291ACE lea edi,[ebp-0E8h]
01291AD4 mov ecx,3Ah
01291AD9 mov eax,0CCCCCCCCh
01291ADE rep stos dword ptr es:[edi]
01291AE0 mov eax,dword ptr [___security_cookie (12A40CCh)]
01291AE5 xor eax,ebp
01291AE7 mov dword ptr [ebp-4],eax
Apparently the compiler (VS2010 compiled debug for Win32), recognizing that we will need specific alignments for the code, takes steps to ensure it can provide that.
2) Exceptions are thrown from at the execution of this instruction only when run from the debugger, and even then not always. Even attaching to the process and executing this code does not throw. How can this be??
So, a couple of thoughts:
"and even then not always" - Not standing over your shoulder when you run this, I can't say for certain. However it seems plausible that just by random chance, stacks could get created with the alignment you need. By default, x86 uses 4byte stack alignment. If you need 16 byte alignment, you've got a 1 in 4 shot.
As for the rest (from https://msdn.microsoft.com/en-us/library/aa290049%28v=vs.71%29.aspx#ia64alignment_topic4):
On the x86 architecture, the operating system does not make the alignment fault visible to the application. ...you will also suffer performance degradation on the alignment fault, but it will be significantly less severe than on the Itanium, because the hardware will make the multiple accesses of memory to retrieve the unaligned data.
TLDR: Using __declspec(align(16)) should give you the alignment you want, even for stack variables. For unaligned accesses, the OS will catch the exception and handle it for you (at a cost of performance).
Edit1: Responding to the first 2 comments below:
Based on MS's docs, you are correct about the alignment of stack parameters, but they propose a solution as well:
You cannot specify alignment for function parameters. When data that
has an alignment attribute is passed by value on the stack, its
alignment is controlled by the calling convention. If data alignment
is important in the called function, copy the parameter into correctly
aligned memory before use.
Neither your sample on Microsoft connect nor the code about produce the same code for me (I'm only on vs2010), so I can't test this. But given this code from your sample:
struct mockVect2d
{
double x, y;
mockVect2d(double a, double b) : x(a), y(b) {}
It would seem that aligning either mockVect2d or the 2 doubles might help.

Ownership and conditionally executed code

I read the rust book over the weekend and I have a question about the concept of ownership. The impression I got is that ownership is used to statically determine where a resource can be deallocated. Now, suppose that we have the following:
{ // 1
let x; // 2
{ // 3
let y = Box::new(1); // 4
x = if flip_coin() {y} else {Box::new(2)} // 5
} // 6
} // 7
I was surprised to see that the compiler accepts this program. By inserting println!s and implementing the Drop trait for the boxed value, I saw that the box containing the value 1 will be deallocated at either line 6 or 7 depending on the return value of flip_coin. How does the compiler know when to deallocate that box? Is this decided at run-time using some run-time information (like a flag to indicate if the box is still in use)?

After some research I found out that Rust currently adds a flag to every type that implements the Drop trait so that it knows whether the value has been dropped or not, which of course incurs a run-time cost. There have been proposals to avoid that cost by using static drops or eager drops but those solutions had problems with their semantics, namely that drops could occur at places that you wouldn't expect (e.g. in the middle of a code block), especially if you are used to C++ style RAII. There is now consensus that the best compromise is a different solution where the flags are removed from the types. Instead flags will be added to the stack, but only when the compiler cannot figure out when to do the drop statically (while having the same semantics as C++) which specifically happens when there are conditional moves like the example given in this question. For all other cases there will be no run-time cost. It appears though, that this proposal will not be implemented in time for 1.0.
Note that C++ has similar run-time costs associated with unique_ptr. When the new Drop is implemented, Rust will be strictly better than C++ in that respect.
I hope this is a correct summary of the situation. Credit goes to u/dyoll1013, u/pcwalton, u/!!kibwen, u/Kimundi on reddit, and Chris Morgan here on SO.

In non-optimized code, Rust uses dynamic checks, but it's likely that they will be eliminated in optimized code.
I looked at the behavior of the following code:
#[derive(Debug)]
struct A {
s: String
}
impl Drop for A {
fn drop(&mut self) {
println!("Dropping {:?}", &self);
}
}
fn flip_coin() -> bool { false }
#[allow(unused_variables)]
pub fn test() {
let x;
{
let y1 = A { s: "y1".to_string() };
let y2 = A { s: "y2".to_string() };
x = if flip_coin() { y1 } else { y2 };
println!("leaving inner scope");
}
println!("leaving middle scope");
}
Consistent with your comment on the other answer, the call to drop for the String that was left alone occurs after the "leaving inner scope" println. That does seem consistent with one's expectation that the y's scopes extend to the end of their block.
Looking at the assembly language, compiled without optimization, it seems that the if statement not only copies either y1 or y2 to x, but also zeroes out whichever variable provided the source for the move. Here's the test:
.LBB14_8:
movb -437(%rbp), %al
andb $1, %al
movb %al, -177(%rbp)
testb $1, -177(%rbp)
jne .LBB14_11
jmp .LBB14_12
Here's the 'then' branch, which moves the "y1" String to x. Note especially the call to memset, which is zeroing out y1 after the move:
.LBB14_11:
xorl %esi, %esi
movl $32, %eax
movl %eax, %edx
leaq -64(%rbp), %rcx
movq -64(%rbp), %rdi
movq %rdi, -176(%rbp)
movq -56(%rbp), %rdi
movq %rdi, -168(%rbp)
movq -48(%rbp), %rdi
movq %rdi, -160(%rbp)
movq -40(%rbp), %rdi
movq %rdi, -152(%rbp)
movq %rcx, %rdi
callq memset#PLT
jmp .LBB14_13
(It looks horrible until you realize that all those movq instructions are just copying 32 bytes from %rbp-64, which is y1, to %rbp-176, which is x, or at least some temporary that'll eventually be x.) Note that it copies 32 bytes, not the 24 you'd expect for a Vec (one pointer plus two usizes). This is because Rust adds a hidden "drop flag" to the structure, indicating whether the value is live or not, following the three visible fields.
And here's the 'else' branch, doing exactly the same for y2:
.LBB14_12:
xorl %esi, %esi
movl $32, %eax
movl %eax, %edx
leaq -128(%rbp), %rcx
movq -128(%rbp), %rdi
movq %rdi, -176(%rbp)
movq -120(%rbp), %rdi
movq %rdi, -168(%rbp)
movq -112(%rbp), %rdi
movq %rdi, -160(%rbp)
movq -104(%rbp), %rdi
movq %rdi, -152(%rbp)
movq %rcx, %rdi
callq memset#PLT
.LBB14_13:
This is followed by the code for the "leaving inner scope" println, which is painful to behold, so I won't include it here.
We then call a "glue_drop" routine on both y1 and y2. This seems to be a compiler-generated function that takes an A, checks its String's Vec's drop flag, and if that's set, invokes A's drop routine, followed by the drop routine for the String it contains.
If I'm reading this right, it's pretty clever: even though it's the A that has the drop method we need to call first, Rust knows that it can use ... inhale ... the drop flag of the Vec inside the String inside the A as the flag that indicates whether the A needs to be dropped.
Now, when compiled with optimization, inlining and flow analysis should recognize situations where the drop definitely will happen (and omit the run-time check), or definitely will not happen (and omit the drop altogether). And I believe I have heard of optimizations that duplicate the code following a then/else clause into both paths, and then specialize them. This would eliminate all run-time checks from this code (but duplicate the println! call).
As the original poster points out, there's an RFC proposal to move drop flags out of the values and instead associate them with the stack slots holding the values.
So it's plausible that the optimized code might not have any run-time checks at all. I can't bring myself to read the optimized code, though. Why not give it a try yourself?

Can i use rust instead of c++ in OS Development

I want to know if rust complied code have OS dependent code in it or not.(not talking about print like stuff)
for example
let x = (4i,2i,3i)
let y = (3i,4i,4i)
now if compare x == y is it using some of its library and if yes is platform dependent.
Edited:
Like in C++ we should not use new, try catch, or any standard lib.
what are the things we should be avoid while writing in rust.

You can see the code that the rust compiler will generate for a snippet like that yourself, without having to even install Rust locally.
Just visit the web-based playpen, and type your snippet in there. You can run the program (and thus observe what it does via print statements), or, more usefully in this case, you can compile the program down to the generated assembly and then inspect it to see if it has calls to underlying system routines.
If you go to this link: http://is.gd/Be6YVJ I have already put such a program into the playpen. (See bottom of this post for the actual program text.)
If you hit the asm button, you can then see the assembly for each routine. (I have added inline(never) attributes to the relevant functions to ensure that they do not get optimized away by the compiler.)
Here is the generated assembly for bar below, a function that calls out to a higher-order function to get a pair of 3-tuples, and then compares them for equality:
.section .text._ZN3bar20h2bb2fd5b9c9e987beaaE,"ax",#progbits
.align 16, 0x90
.type _ZN3bar20h2bb2fd5b9c9e987beaaE,#function
_ZN3bar20h2bb2fd5b9c9e987beaaE:
.cfi_startproc
cmpq %fs:112, %rsp
ja .LBB0_2
movabsq $56, %r10
movabsq $0, %r11
callq __morestack
retq
.LBB0_2:
subq $56, %rsp
.Ltmp0:
.cfi_def_cfa_offset 64
movq %rdi, %rax
leaq 8(%rsp), %rdi
callq *%rax
movq 8(%rsp), %rcx
xorl %eax, %eax
cmpq 32(%rsp), %rcx
jne .LBB0_5
movq 40(%rsp), %rcx
cmpq %rcx, 16(%rsp)
jne .LBB0_5
movq 48(%rsp), %rax
cmpq %rax, 24(%rsp)
sete %al
.LBB0_5:
addq $56, %rsp
retq
.Ltmp1:
.size _ZN3bar20h2bb2fd5b9c9e987beaaE, .Ltmp1-_ZN3bar20h2bb2fd5b9c9e987beaaE
.cfi_endproc
So you can see that the only thing it is calling out to is a helper routine, __morestack, that checks for stack-overflow (or allocate more stack, in systems with segmented stack support). (So for an example like this, that is the only core functionality you will need to provide yourself; note that you could just have it halt the kernel.)
Here is the program I put into the playpen:
#[inline(never)]
fn bar(f: fn() -> ((int, int, int), (int, int, int))) -> bool {
let (x, y) = f();
x == y
}
#[inline(never)]
fn foo_1() -> ((int,int,int), (int,int,int)) {
let x = (4i,2i,3i);
let y = (3i,4i,4i);
(x, y)
}
#[inline(never)]
fn foo_2() -> ((int,int,int), (int,int,int)) {
let x = (4i,2i,3i);
(x, x)
}
fn main() {
println!("bar(foo_1): {}", bar(foo_1));
println!("bar(foo_2): {}", bar(foo_2));
}

Rust had been designed to allow one to implement an operating system kernel, drivers or an application that does not even have an operating systems and runs on bare-metal hardware.
Currently Rust's standard runtime can be disable with #![no_std] attribute in the code. You can still use some libraries, such as libcore. One of the things that you will not get without runtime is format! and println! macros, the sprintf() and printf() equivalents.
For an example of something you can do today, take a look at Zinc project.

Understanding GHC assembly output

When compiling a haskell source file using the -S option in GHC the assembly code generated is not clear. There's no clear distinction between which parts of the assembly code belong to which parts of the haskell code. Unlike GCC were each label is named according to the function it corresponds to.
Is there a certain convention in these names produced by GHC? How can I relate certain parts in the generated assembly code to their corresponding parts in the haskell code?

For top level declarations, it's not too hard. Local definitions can be harder to recognize as their names get mangled and they are likely to get inlined.
Let's see what happens when we compile this simple module.
module Example where
add :: Int -> Int -> Int
add x y = x + y
.data
.align 8
.globl Example_add_closure
.type Example_add_closure, #object
Example_add_closure:
.quad Example_add_info
.text
.align 8
.quad 8589934604
.quad 0
.quad 15
.globl Example_add_info
.type Example_add_info, #object
Example_add_info:
.LckX:
jmp base_GHCziBase_plusInt_info
.data
.align 8
_module_registered:
.quad 0
.text
.align 8
.globl __stginit_Example_
.type __stginit_Example_, #object
__stginit_Example_:
.Lcl7:
cmpq $0,_module_registered
jne .Lcl8
.Lcl9:
movq $1,_module_registered
addq $-8,%rbp
movq $__stginit_base_Prelude_,(%rbp)
.Lcl8:
addq $8,%rbp
jmp *-8(%rbp)
.text
.align 8
.globl __stginit_Example
.type __stginit_Example, #object
__stginit_Example:
.Lcld:
jmp __stginit_Example_
.section .note.GNU-stack,"",#progbits
.ident "GHC 7.0.2"
You can see that our function Example.add resulted in the generation of Example_add_closure and Example_add_info. The _closure part, as the name suggests, has to do with closures. The _info part contains the actual instructions of the function. In this case, this is simply a jump to the built-in function GHC.Base.plusInt.
Note that assembly generated from Haskell code looks quite different from what you might get from other languages. The calling conventions are different, and things can get reordered a lot.
In most cases you don't want to jump straight to assembly. It is usually much easier to understand core, a simplified version of Haskell. (Simpler to compile, not necessarily to read). To get at the core, compile with the -ddump-simpl option.
Example.add :: GHC.Types.Int -> GHC.Types.Int -> GHC.Types.Int
[GblId, Arity=2]
Example.add =
\ (x_abt :: GHC.Types.Int) (y_abu :: GHC.Types.Int) ->
GHC.Num.+ # GHC.Types.Int GHC.Num.$fNumInt x_abt y_abu
For some good resources on how to read core, see this question.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using pointers to return results in x86 Assembly - linux

Related

How to use unsafe get a byte slice from a string without memory copy

VC++ SSE code generation - is this a compiler bug?

Ownership and conditionally executed code

Can i use rust instead of c++ in OS Development

Understanding GHC assembly output

Categories

Resources