x64 memset core, is passed buffer address truncated?

x64 memset core, is passed buffer address truncated? - linux

1. Problem Background
Recently a core dump occurred on one of our on-line search server. The core happens in memset() due to the attempt to write to an invalid address, and hence received the SIGSEGV signal. The following information is from dmsg:
is_searcher_ser[17405]: segfault at 000000002c32a668 rip 0000003da0a7b006 rsp 0000000053abc790 error 6
The environment of our on-line servers goes as follows:
OS: RHEL 5.3
Kernel: 2.6.18-131.el5.custom, x86_64 (64-bit)
GCC: 4.1.2 20080704 (Red Hat 4.1.2-44)
Glibc: glibc-2.5-49.6
The following is the relevant code snippet:
CHashMap<…>::CHashMap(…)
{
…
typedef HashEntry *HashEntryPtr;
m_ppEntry = new HashEntryPtr[m_nHashSize]; // m_nHashSize is 389 when core
assert(m_ppEntry != NULL);
memset(m_ppEntry, 0x0, m_nHashSize*sizeof(HashEntryPtr)); // Core in this memset() invocation
…
}
The assembly code of the above code is:
…
0x000000000091fe9e <+110>: callq 0x502638 <_Znam#plt> // new HashEntryPtr[m_nHashSize]
0x000000000091fea3 <+115>: mov 0xc(%rbx),%edx // Get the value of m_nHashSize
0x000000000091fea6 <+118>: mov %rax,%rdi // Put m_ppEntry pointer to %rdi for later memset invocation
0x000000000091fea9 <+121>: mov %rax,0x20(%rbx) // Store the pointer to m_ppEntry member variable(%rbx holds the this pointer)
0x000000000091fead <+125>: xor %esi,%esi // Generate 0
0x000000000091feaf <+127>: shl $0x3,%rdx // m_nHashSize*sizeof(HashEntryPtr)
0x000000000091feb3 <+131>: callq 0x502b38 <memset#plt> // Call the memset() function
…
In the core dump, the assembly of memset#plt is:
(gdb) disassemble 0x502b38
Dump of assembler code for function memset#plt:
0x0000000000502b38 <+0>: jmpq *0x771b92(%rip) # 0xc746d0 <memset#got.plt>
0x0000000000502b3e <+6>: pushq $0x53
0x0000000000502b43 <+11>: jmpq 0x5025f8
End of assembler dump.
(gdb) x/ag 0x0000000000502b3e+0x771b92
0xc746d0 <memset#got.plt>: 0x3da0a7acb0 <memset>
(gdb) disassemble 0x3da0a7acb0
Dump of assembler code for function memset:
0x0000003da0a7acb0 <+0>: cmp $0x1,%rdx
0x0000003da0a7acb4 <+4>: mov %rdi,%rax
…
For the above GDB analysis， we know that the address of memset() has been resolved in the relocation PLT table. That is to say, the first jmpq *0x771b92(%rip) will directly jump to the first instruction of function memset(). Besides, the program had run nearly one day on-line, the relocation address of memset() should have been already resolved earlier.
2. Weird phenomenon
This core fired at the instruction => 0x0000003da0a7b006 <+854>: mov %rdx,-0x8(%rdi) in the memset(). Actually this is the instruction in the memset() to set the 0 at the right begin position of the buffer which is the first parameter of memset().
When cored , in frame 0, the value of $rdi is 0x2c32a670 ,and $rax is 0x2c32a668. From the assembly analysis and off-line test, $rax should hold the source buffer of the memset, i.e., the first parameter of memset().
So, in our example, $rax should be same as the address of m_ppEntry, the value of which is stored in the this object (this pointer is stored in %rbx) first before it is zeroed by memset later. However, the value of m_ppEntry is 0x2ab02c32a668.
Then use info files GDB command to check, the address 0x2c32a668 is indeed invalid (not mapped), and address 0x2ab02c32a668 is a valid address.
3. Why it is weird?
The weird place of this core is that: If the real address of memset has been resolved already(very very probably), then there are only very few instructions between the operation to put the pointer value into m_ppEntry and the attempt to memset it. And actually the value of register $rax (holding the passed buffer address) are not changed at all during these instructions. So, how can m_ppEntry isn’t equal to $rax?
What is weird More is that: when core, the value of $rax (0x2c32a668) is actually the value of lower 4 bytes of m_ppEntry (0x2ab02c32a668). If there is indeed some relationship between the two values, is the m_ppEntry parameter passed to memset being truncated? However, the involved several instructions all use %rax, rather than %eax. By the way, I cannot reproduce this issue offline.
So,
1) Which address is valid? If 0x2c32a668 is valid? Is the heap corrupted just between the several instructions? And how to paraphrase that the value of m_ppEntry is 0x2ab02c32a668, and why the low 4 bytes of this two value is the same?
2) If 0x2ab02c32a668 is valid, why the address is truncated when passed into the 64-bit memset()? Under which condition this error will occur? I cannot reproduce this offline. Is this issue an known bug? I didn't find it through Google.
3) Or, is it due to some hardware or power issue to make the 4 higher bytes of %rdi passed to memset zeroed? (I’m very very reluctant to believe this).
At last, any comment on this core is appreciated.
Thanks,
Gary Hu

I'm assuming most of the time this code works fine, given your mention of one day's running.
I agree signals are worth inspecting, it does look suspiciously like pointer truncation is happening somewhere else.
Only other thing I'm thinking it could be an issue with the new. Is there any possibly that on occasion you could end up calling an overloaded new operator?
Also for completeness what is the declaration of m_ppEntry ?
I'm assuming you're using a no throw new otherwise the assert(m_ppEntry != NULL); would be meaningless.

Related

debugging: I don't see callback in disassembled caller as either a callq or inlined asm (Linux x86_64)

I am debugging a kernel oops (vmcore) using the crash utility on CentOS 7.9 and I have a function foo, which calls a callback, but when I disassemble foo I don't see a callq instruction that references the callback, nor do I see the assembly for the callback in the caller (suggesting it isn't inlined there).
However, the kernel stack shows that the RIP was on offset 33 of the callback function. What gives?
The last instruction for foo shows:
callq 0xffffffff91c9af10 <__stack_chk_fail>
Does this perhaps means that the callback smashed the stack and glibc replaced it with this stack_chk thingamajig?
// signature for foo
foo(some_t *some, size_t off, size_t size,
my_callback_t *func, void *private)
// callback gets called in foo like:
ret = func(args)
Update
I do also see a callq to:
__x86_indirect_thunk_rax
Which I have no idea about.. Perhaps that is somehow the call? Looking into it, it has something to do with a return trampoline, which sounds fun! XD

Normally you should see call *%rcx or something (the function pointer could be in a different register, or stack memory, but some kind of indirect call). Or with an optimized tailcall like jmp *%rcx if your code can be optimized into return func(args). Again, possibly after moving the function pointer to a different register, and after cleaning up the stack.
(The 4th integer/pointer function arg arrives in RCX, but it could be in any other register when eventually used).
call _stack_chk_fail is just part of the -fstack-protector=strong machinery.
Kernel code uses gcc -mindirect-branch=thunk to enable GCC's Spectre mitigation for indirect calls, so yes, indirect calls will go through __x86_indirect_thunk_rax. (Or a different one taking the pointer in any other register.)
User-space code can of course be compiled with this option, too, although I don't think most distros enable it by default.
typedef int (*my_callback_t)(int);
int foo(int a, int b, int c, my_callback_t func)
{
int ret = func(a);
return ret;
}
compiles like this, with -O3 -Wall -mindirect-branch=thunk (GCC10.2 on Godbolt, with -mcmodel=kernel as well for good measure)
# your function
foo:
jmp __x86_indirect_thunk_rcx # like jmp *%rcx tailcall
# extra code that will be present once in the whole kernel, deduplicated by the linker
.section .text.__x86_indirect_thunk_rcx,"axG",#progbits,__x86_indirect_thunk_rcx,comdat
__x86_indirect_thunk_rcx:
call .LIND1
.LIND0:
pause
lfence # block speculation along this never-executed return path that return prediction will jump to.
jmp .LIND0 # this seems unnecessary after lfence in this unreachable code.
.LIND1:
mov %rcx, (%rsp) # overwrite the return address with your func ptr
ret # and pop it into RIP
This is a retpoline.
Without -mindirect-branch=thunk, you of course get the expected jmp *%rcx.

How to comprehend the flow of this assembly code

I can' t understand how this works.
Here's a part of main() program disassembled by objdump and written in intel notation
0000000000000530 <main>:
530: lea rdx,[rip+0x37d] # 8b4 <_IO_stdin_used+0x4>
537: mov DWORD PTR [rsp-0xc],0x0
53f: movabs r10,0xedd5a792ef95fa9e
549: mov r9d,0xffffffcc
54f: nop
550: mov eax,DWORD PTR [rsp-0xc]
554: cmp eax,0xd
557: ja 57c <main+0x4c>
559: movsxd rax,DWORD PTR [rdx+rax*4]
55d: add rax,rdx
560: jmp rax
The rodata section dump:
.rodata
08b0 01000200 ecfdffff d4fdffff bcfdffff ................
08c0 9cfdffff 7cfdffff 6cfdffff 4cfdffff ....|...l...L...
08d0 3cfdffff 2cfdffff 0cfdffff ecfcffff <...,...........
08e0 d4fcffff b4fcffff 0cfeffff ............
In 530, rip is [537] so [rdx] = [537 + 37d] = 8b4.
First question is the value of rdx is how large? Is the valueis ec, or ecfdffff or something else? If it has DWORD, I can understand that has 'ecfdffff' (even this is wrong too?:() but this program don't declare it. How can I judge the value?
Then the program continues.
In 559, rax is first appeared.
The second question is this rax can interpret as a part of eax and in this time is the rax = 0? If rax is 0, in 559 means rax = DWORD[rdx] and the value of rax become ecfdffff and next [55d] do rax += rdx, and I think this value can't jamp. There must be something wrong, so tell me where, or how i make any wrongs.

I think I'll diverge from what Peter discussed (he provides good information) and get to the heart of some issues I think are causing you problems. When I first glanced at this question I assumed that the code was likely compiler generated and the jmp rax was likely the result of some control flow statement. The most likely way to generate such a code sequence is via a C switch. It isn't uncommon for a switch statement to be made of a jump table to say what code should execute depending on the control variable. As an example: the control variable for switch(a) is a.
This all made sense to me, and I wrote up a number of comments (now deleted) that ultimately resulted in bizarre memory addresses that jmp rax would go to. I had errands to run but when I returned I had the aha moment that you may have had the same confusion I did. This output from objdump using the -s option appeared as:
.rodata
08b0 01000200 ecfdffff d4fdffff bcfdffff ................
08c0 9cfdffff 7cfdffff 6cfdffff 4cfdffff ....|...l...L...
08d0 3cfdffff 2cfdffff 0cfdffff ecfcffff <...,...........
08e0 d4fcffff b4fcffff 0cfeffff ............
One of your questions seems to be about what values get loaded here. I never used the -s option to look at data in the sections and was unaware that although the dump splits the data out in groups of 4 bytes (32-bit values) they are shown in byte order as it appears in memory. I had at first assumed the output was displaying these values from Most Significant Byte to Least significant byte and objdump -s had done the conversion. That is not the case.
You have to manually reverse the bytes of each group of 4 bytes to get the real value that would be read from memory into a register.
ecfdffff in the output actually means ec fd ff ff. As a DWORD value (32-bit) you need to reverse the bytes to get the HEX value as you would expect when loaded from memory. ec fd ff ff reversed would be ff ff fd ec or the 32-bit value 0xfffffdec. Once you realize that then this makes a lot more sense. If you make this same adjustment for all the data in that table you'd get:
.rodata
08b0: 0x00020001 0xfffffdec 0xfffffdd4 0xfffffdbc
08c0: 0xfffffd9c 0xfffffd7c 0xfffffd6c 0xfffffd4c
08d0: 0xfffffd3c 0xfffffd2c 0xfffffd0c 0xfffffcec
08e0: 0xfffffcd4 0xfffffcb4 0xfffffe0c
Now if we look at the code you have it starts with:
530: lea rdx,[rip+0x37d] # 8b4 <_IO_stdin_used+0x4>
This doesn't load data from memory, it is computing the effective address of some data and places the address in RDX. The disassembly from OBJDUMP is displaying the code and data with the view that it is loaded in memory starting at 0x000000000000. When it is loaded into memory it may be placed at some other address. GCC in this case is producing position independent code (PIC). It is generated in such a way that the first byte of the program can start at an arbitrary address in memory.
The # 8b4 comment is the part we are concerned about (you can ignore the information after that). The disassembly is saying if the program was loaded at 0x0000000000000000 then the value loaded into RDX would be 0x8b4. How was that arrived at? This instruction starts at 0x530 but with RIP relative addressing the RIP (instruction pointer) is relative to the address just after the current instruction. The address the disassembler used was 0x537 (the byte after the current instruction is the address of the first byte of the next instruction). The instruction adds 0x37d to RIP and gets 0x537+0x37d=0x8b4. The address 0x8b4 happens to be in the .rodata section which you are given a dump of (as discussed above).
We now know that RDX contains the base of some data. The jmp rax suggests this is likely going to be a table of 32-bit values that are used to determine what memory location to jump to depending on the value in the control variable of a switch statement.
This statement appears to be storing the value 0 as a 32-bit value on the stack.
537: mov DWORD PTR [rsp-0xc],0x0
These appear to be variables that the compiler chose to store in registers (rather than memory).
53f: movabs r10,0xedd5a792ef95fa9e
549: mov r9d,0xffffffcc
R10 is being loaded with the 64-bit value 0xedd5a792ef95fa9e. R9D is the lower 32-bits of the 64-bit R9 register.The value 0xffffffcc is being loaded into the lower 32-bits of R9 but there is something else occurring. In 64-bit mode if the destination of an instruction is a 32-bit register the CPU automatically zero extends the value into the upper 32-bits of the register. The CPU is guaranteeing us that the upper 32-bits are zeroed.
This is a NOP and doesn't do anything except align the next instruction to memory address 0x550. 0x550 is a value that is 16-byte aligned. This has some value and may hint that the instruction at 0x550 may be the first instruction at the top of a loop. An optimizer may place NOPs into the code to align the first instruction at the top of a loop to a 16-byte aligned address in memory for performance reasons:
54f: nop
Earlier the 32-bit stack based variable at rsp-0xc was set to zero. This reads the value 0 from memory as a 32-bit value and stores it in EAX. Since EAX is a 32-bit register being used as the destination for the instruction the CPU automatically filled the upper 32-bits of RAX to 0. So all of RAX is zero.
550: mov eax,DWORD PTR [rsp-0xc]
EAX is now being compared to 0xd. If it is above (ja) it goes to the instruction at 0x57c.
554: cmp eax,0xd
557: ja 57c <main+0x4c>
We then have this instruction:
559: movsxd rax,DWORD PTR [rdx+rax*4]
The movsxd is an instruction that will take a 32-bit source operand (in this case the 32-bit value at memory address RDX+RAX*4) load it into the bottom 32-bits of RAX and then sign extend the value into the upper 32-bits of RAX. Effectively if the 32-bit value is negative (the most significant bit is 1) the upper 32-bits of RAX will be set to 1. If the 32-bit value is not negative the upper 32-bits of RAX will be set to 0.
When this code is first encountered RDX contains the base of some table at 0x8b4 from the beginning of the program loaded in memory. RAX is set to 0. Effectively the first 32-bits in the table are copied to RAX and sign extended. As seen earlier the value at offset 0xb84 is 0xfffffdec. That 32-bit value is negative so RAX contains 0xfffffffffffffdec.
Now to the meat of the situation:
55d: add rax,rdx
560: jmp rax
RDX still holds the address to the beginning of a table in memory. RAX is being added to that value and stored back in RAX (RAX = RAX+RDX). We then JMP to the address stored in RAX. So this code all seems to suggest we have a JUMP table with 32-bit values that we are using to determine where we should go. So then the obvious question. What are the 32-bit values in the table? The 32-bit values are the difference between the beginning of the table and the address of the instruction we want to jump to.
We know the table is 0x8b4 from the location our program is loaded in memory. The C compiler told the linker to compute the difference between 0x8b4 and the address where the instruction we want to execute resides. If the program had been loaded into memory at 0x0000000000000000 (hypothetically), RAX = RAX+RDX would have resulted in RAX being 0xfffffffffffffdec + 0x8b4 = 0x00000000000006a0. We then use jmp rax to jump to 0x6a0. You didn't show the entire dump of memory but there is going to be code at 0x6a0 that will execute when the value passed to the switch statement is 0. Each 32-bit value in the JUMP table will be a similar offset to the code that will execute depending on the control variable in the switch statement. If we add 0x8b4 to all the entries in the table we get:
08b0: 0x000006a0 0x00000688 0x00000670
08c0: 0x00000650 0x00000630 0x00000620 0x00000600
08d0: 0x000005F0 0x000005e0 0x000005c0 0x000005a0
08e0: 0x00000588 0x00000568 0x000006c0
You should find that in the code you haven't provided us that these addresses coincide with code that appears after the jmp rax.
Given that the memory address 0x550 was aligned, I have a hunch that this switch statement is inside a loop that keeps executing as some kind of state machine until the proper conditions are met for it to exit. Likely the value of the control variable used for the switch statement is changed by the code in the switch statement itself. Each time the switch statement is run the control variable has a different value and will do something different.
The control variable for the switch statement was originally checked for the value being above 0x0d (13). The table starting at 0x8b4 in the .rodata section has 14 entries. One can assume the switch statement probably has 14 different states (cases).

but this program don't declare it
You're looking at disassembly of machine code + data. It's all just bytes in memory. Any labels the disassembler does manage to show are ones that got left in the executable's symbol table. They're irrelevant to how the CPU runs the machine code.
(The ELF program headers tell the OS's program loader how to map it into memory, and where to jump to as an entry point. This has nothing to do with symbols, unless a shared library references some globals or functions defined in the executable.)
You can single-step the code in GDB and watch register values change.
In 559, rax is first appeared.
EAX is the low 32 bits of RAX. Writing to EAX zero-extends into RAX implicitly. From mov DWORD PTR [rsp-0xc],0x0 and the later reload, we know that RAX=0.
This must have been un-optimized compiler output (or volatile int idx = 0; to defeat constant propagation), otherwise it would know at compile time that RAX=0 and could optimize away everything else.
lea rdx,[rip+0x37d] # 8b4
A RIP-relative LEA puts the address of static into a register. It's not a load from memory. (That happens later when movsxd with an indexed addressing mode uses RDX as the base address.)
The disassembler worked out the address for you; it's RDX = 0x8b4. (Relative to the start of the file; when actually running the program would be mapped at a virtual address like 0x55555...000)
554: cmp eax,0xd
557: ja 57c <main+0x4c>
559: movsxd rax,DWORD PTR [rdx+rax*4]
55d: add rax,rdx
560: jmp rax
This is a jump table. First it checks for an out-of-bounds index with cmp eax,0xd, then it indexes a table of 32-bit signed offsets using EAX (movsxd with an addressing mode that scales RAX by 4), and adds that to the base address of the table to get a jump target.
GCC could just make a jump table of 64-bit absolute pointers, but chooses not to so that .rodata is position-independent as well and doesn't need load-time fixups in a PIE executable. (Even though Linux does support doing that.) See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 where this is discussed (although the main focus of that bug is that gcc -fPIE can't turn a switch into a table lookup of string addresses, and actually still uses a jump table)
The jump-offset table address is in RDX, this is what was set up with the earlier LEA.

Why does Linux save %ebp when doing a context switch?

When doing a context switch, x86 Linux (very cleverly) avoids saving and restoring EAX, EBX, ECX, EDX, ESI, and EDI. Of course, the userland values are saved on the kernel stack when switching into kernel mode. But the values in the kernel code are not saved -- instead, GCC directives are used which tell the compiler not to keep any values which are needed in those registers at the point where the switch happens.
Naturally, ESP has to be saved and restored. But this is what I don't understand: before ESP is switched, EBP is pushed on the kernel stack. I would think that EBP was being used as a frame pointer, but in my kernel debugger, the values sure don't look like it:
(gdb) print $esp
$22 = (void *) 0xc0025ec0
(gdb) print $ebp
$23 = (void *) 0xcf827f3c
The difference is way too big for EBP to be a frame pointer here. A comment in the code says that "EBP is saved/restored explicitly for wchan access", but I'm searching the code and can't figure out how that is so. Google isn't helping either. Can some kernel wizard step in and help here?

The difference is way too big for EBP to be a frame pointer here.
Presumably you have compiled your kernel without frame pointers enabled. See the relevant config option:
config SCHED_OMIT_FRAME_POINTER
def_bool y
prompt "Single-depth WCHAN output"
depends on X86
---help---
Calculate simpler /proc/<PID>/wchan values. If this option
is disabled then wchan values will recurse back to the
caller function. This provides more accurate wchan values,
at the expense of slightly more scheduling overhead.
If in doubt, say "Y".
The function get_wchan will do a sanity check on the ebp value, and only use it if it seems to be a frame pointer.
I think it would be better to use the above config flag in both places, so that ebp would not be saved unnecessarily if it isn't a frame pointer, and also the get_wchan would not bother if we knew there wouldn't be a frame pointer. That said, saving/restoring ebp only adds a very little overhead, so it's not tragic.

I have figured it out. EBP is a frame pointer, but at the point that I checked its value, ESP had already been switched to the new process' kernel stack, but EBP had not yet been restored (so it still had the value from the previous process). Sorry!!
The reason for storing the frame pointer is so that others can determine where in the kernel code a process went to sleep. Among other things, this is used by /proc/PID/wchan, which prints the name of the kernel function which made a process sleep.
The code which checks this is as follows (details removed for brevity):
unsigned long get_wchan(struct task_struct *p)
{
unsigned long sp, bp, ip;
sp = p->thread.sp;
bp = *(unsigned long *) sp;
do {
ip = *(unsigned long *) (bp+4);
if (!in_sched_functions(ip))
return ip;
bp = *(unsigned long *) bp;
} while (count++ < 16);
return 0;
}
Since EBP is pushed right before switching kernel stacks, the stack pointer of a sleeping process will point to the saved EBP (frame pointer) value. That frame pointer points to the caller's saved frame pointer, which points to the previous caller's, which points to the previous caller's... in other words, the saved frame pointers form a linked list going back up the call stack.
The frame pointer is saved immediately on function entry, so the value just above it (4 bytes up) is the return address to the calling function.
The loop in get_wchan walks that "linked list" (bp = *bp), checking the return address above each saved frame pointer, until it finds an address within a function like ep_poll or futex_wait_queue_me.
get_wchan just returns an address inside a function; for display in /proc, lookup_symbol_name is used to convert that address into a function name.

What is %gs in Assembly

void return_input (void)
{
char array[30];
gets (array);
printf("%s\n", array);
}
After compiling it in gcc, this function is converted to the following Assembly code:
push %ebp
mov %esp,%ebp
sub $0x28,%esp
mov %gs:0x14,%eax
mov %eax,-0x4(%ebp)
xor %eax,%eax
lea -0x22(%ebp),%eax
mov %eax,(%esp)
call 0x8048374
lea -0x22(%ebp),%eax
mov %eax,(%esp)
call 0x80483a4
mov -0x4(%ebp),%eax
xor %gs:0x14,%eax
je 0x80484ac
call 0x8048394
leave
ret
I don't understand two lines:
mov %gs:0x14,%eax
xor %gs:0x14,%eax
What is %gs, and what exactly these two lines do?
This is compilation command:
cc -c -mpreferred-stack-boundary=2 -ggdb file.c

GS is a segment register, its use in linux can be read up on here (its basically used for per thread data).
mov %gs:0x14,%eax
xor %gs:0x14,%eax
this code is used to validate that the stack hasn't exploded or been corrupted, using a canary value stored at GS+0x14, see this.
gcc -fstack-protector=strong is on by default in many modern distros; you can use gcc -fno-stack-protector to not add those checks. (On x86, thread-local storage is cheap so GCC keeps the randomized canary value there, making it somewhat harder to leak.)

In the AT&T style assembly languages, the percent sigil generally indicates a register. In x86 family processors from 386 onwards, GS is one of the so-called segment registers. However, in protected mode environments segment registers work as selector registers.
A virtual memory selector represents its own mapping of virtual address space together with its own access regime. In practical terms, %gs:0x14 can be thought of as a reference into an array whose origin is held in %gs (albeit the CPU does a bit of extra dereferencing). On modern GNU/Linux systems, %gs is usually used to point at the thread-local storage region. In the code you're asking about, however, only one item of the TLS matters — the stack canary.
The idea is to attempt to detect a buffer overflow error by placing a random but constant value — it's called a stack canary in memory of the canaries coal miners used to employ to signal increase in levels of poisonous gases by dying — into the stack before gets() gets called, above its stack frame, and check whether it is still there after gets() will have returned. gets() has no business overwriting this part of the stack — it is outside its own stack frame, and it is not given a pointer to it —, so if the stack canary has died, something has gone wrong in a dangerous way. (C as a programming environment happens to be particularly prone to this kind of wrong-goings, and security researchers have learnt to exploit many of them over the last twenty years or so. Also, gets() happens to be a function that is inherently at risk to overflow its target buffer.) You have not offered addresses with your code, but 0x80484ac is likely the address of leave, and the call 0x8048394 which is executed in case of mismatch (that is, jumped over by je 0x80484ac in case of match), is probably a call to __stack_chk_fail(), provided by libc to handle the stack corruption by fleeing the metaphorical poisonous mine.
The reason the canonical value of the stack canary is kept in the thread-local storage is that this way, every thread can have its own stack canary. Stacks themselves are normally not shared between threads, so it is natural to also not share the canary value.

ES, FS, GS: Extra Segment Registers
Can be used as extra segment registers; also used in special instructions that span segments (like string copies).
taken from here
http://www.hep.wisc.edu/~pinghc/x86AssmTutorial.htm
hope it helps

Can anybody explain some simple assembly code?

I have just started to learn assembly. This is the dump from gdb for a simple program which prints hello ranjit.
Dump of assembler code for function main:
0x080483b4 <+0>: push %ebp
0x080483b5 <+1>: mov %esp,%ebp
0x080483b7 <+3>: sub $0x4,%esp
=> 0x080483ba <+6>: movl $0x8048490,(%esp)
0x080483c1 <+13>: call 0x80482f0 <puts#plt>
0x080483c6 <+18>: leave
0x080483c7 <+19>: ret
My questions are :
Why every time ebp is pushed on to stack at start of the program? What is in the ebp which is necessary to run this program?
In second line why is ebp copied to esp?
I can't get the third line at all. what I know about SUB syntax is "sub dest,source", but here how can esp be subtracted from 4 and stored in 4?
What is this value "$0x8048490"? Why it is moved to esp, and why this time is esp closed in brackets? Does it denote something different than esp without brackets?
Next line is the call to function but what is this "0x80482f0"?
What is leave and ret (maybe ret means returning to lib c.)?
operating system : ubuntu 10, compiler : gcc

ebp is used as a frame pointer in Intel processors (assuming you're using a calling convention that uses frames).
It provides a known point of reference for locating passed-in parameters (on one side) and local variables (on the other) no matter what you do with the stack pointer while your function is active.
The sequence:
push %ebp ; save callers frame pointer
mov %esp,%ebp ; create a new frame pointer
sub $N,%esp ; make space for locals
saves the frame pointer for the previous stack frame (the caller), loads up a new frame pointer, then adjusts the stack to hold things for the current "stack level".
Since parameters would have been pushed before setting up the frame, they can be accessed with [bp+N] where N is a suitable offset.
Similarly, because locals are created "under" the frame pointer, they can be accessed with [bp-N].
The leave instruction is a single one which undoes that stack frame. You used to have to do it manually but Intel introduced a faster way of getting it done. It's functionally equivalent to:
mov %ebp, %esp ; restore the old stack pointer
pop %ebp ; and frame pointer
(the old, manual way).
Answering the questions one by one in case I've missed something:
To start a new frame. See above.
It isn't. esp is copied to ebp. This is AT&T notation (the %reg is a dead giveaway) where (among other thing) source and destination operands are swapped relative to Intel notation.
See answer to (2) above. You're subtracting 4 from esp, not the other way around.
It's a parameter being passed to the function at 0x80482f0. It's not being loaded into esp but into the memory pointed at by esp. In other words, it's being pushed on the stack. Since the function being called is puts (see (5) below), it will be the address of the string you want putsed.
The function name in the <> after the address. It's calling the puts function (probably the one in the standard library though that's not guaranteed). For a description of what the PLT is, see here.
I've already explained leave above as unwinding the current stack frame before exiting. The ret simply returns from the current function. If the current functtion is main, it's going back to the C startup code.

In my career I learned several assembly languages, you didn't mention which but it appears Intel x86 (segmented memory model as PaxDiablo pointed out). However, I have not used assembly since last century (lucky me!). Here are some of your answers:
The EBP register is pushed onto the stack at the beginning because we need it further along in other operations of the routine. You don't want to just discard its original value thus corrupting the integrity of the rest of the application.
If I remember correctly (I may be wrong, long time) it is the other way around, we are moving %esp INTO %ebp, remember we saved it in the previous line? now we are storing some new value without destroying the original one.
Actually they are SUBstracting the value of four (4) FROM the contents of the %esp register. The resulting value is not stored on "four" but on %esp. If %esp had 0xFFF8 after the SUB it will contain 0xFFF4. I think this is called "Immediate" if my memory serves me. What is happening here (I reckon) is the computation of a memory address (4 bytes less).
The value $0x8048490 I don't know. However, it is NOT being moved INTO %esp but rather INTO THE ADDRESS POINTED TO BY THE CONTENTS OF %esp. That is why the notation is (%esp) rather than %esp. This is kind of a common notation in all assembly languages I came about in my career. If on the other hand the right operand was simply %esp, then the value would have been moved INTO the %esp register. Basically the %esp register's contents are being used for addressing.
It is a fixed value and the string on the right makes me think that this value is actually the address of the puts() (Put String) compiler library routine.
"leave" is an instrution that is the equivalent of "pop %ebp". Remember we saved the contents of %ebp at the beginning, now that we are done with the routine we are restoring it back into the register so that the caller gets back to its context. The "ret" instruction is the final instruction of the routine, it "returns" to the caller.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string