Using linked lists with Bison - malloc

Suppose my YYSTYPE is a struct with a pointer to the next struct. Can I direct that pointer to YYSTYPE values of other grammar variables, or are their YYSTYPE values local and will disappear after the derivation ends, causing a segfault later on.

YYSTYPE is the type of variables on Yacc's expression stack.
If the pointers you create are to items actually stored in Yacc's stack, which would typically be done by referencing $1 (or &$1 to get the pointer), then you are indeed pointing to data that will be released and reused, and are in for a world of hurt (such as segmentation faults if you're lucky; confusion and mishandled information if you're unlucky).
If the pointers you create are to items of type YYSTYPE that you manage, then of course there is no problem; you manage their duration, not Yacc.
Copy anything you need from things like $1 into your own storage.

Related

Allocated structure for value_p to be used with VPI vpi_put_value()

I'm implementing Verilog "force" and "release" using VPI so that they can be called from C routines. To force a value to a vector net, I need to create an array of s_vpi_vecval for value_p.
I allocated a storage for the array and populated it with the value I want.
Then I used vpi_put_value() to force the vector value to the net.
The standard, IEEE 1800, clearly says that the calling routine needs to allocate a space for such structure for value_p. But it didn't say when it is safe to free the storage.
Is it safe to free the storage just after calling vpi_put_value()?
I assume, the vpi_put_value() will keep a copy of the force value in their side.
Any insights to this are greatly appreciated.
I think it is safe to assume you can free the memory after making the VPI call - this is how all other VPI routines work.
You can easily test this by making two calls to dpi_put_value() using the same struct pointer.
I've put in a request to have the IEEE standard clarified.

Explicitly removing sensitive data from memory?

The recent leak from Wikileaks has the CIA doing the following:
DO explicitly remove sensitive data (encryption keys, raw collection
data, shellcode, uploaded modules, etc) from memory as soon as the
data is no longer needed in plain-text form.
DO NOT RELY ON THE OPERATING SYSTEM TO DO THIS UPON TERMINATION OF
EXECUTION.
Me being a developer in the *nix world; I'm seeing this as merely changing the value of a variable (ensuring I do not pass by value; and instead by reference); so if it's a string thats 100 characters; writing 0's thats 101 characters. Is it really this simple? If not, why and what should be done instead?
Note: There are similar question that asked this; but it's in the C# and Windows world. So, I do not consider this question a duplicate.
Me being a developer in the *nix world; I'm seeing this as merely
changing the value of a variable (ensuring I do not pass by value; and
instead by reference); so if it's a string thats 100 characters;
writing 0's thats 101 characters. Is it really this simple? If not,
why and what should be done instead?
It should be this simple. The devil is in the details.
memory allocation functions, such as realloc, are not guaranteed to leave memory alone (you should not rely on their doing it one way or the other - see also this question). If you allocate 1K of memory, then realloc it to 10K, your original K might still be there somewhere else, containing its sensitive payload. It might then be allocated by another insensitive variable or buffer, or not, and through the new variable, it might be possible to access a part or all of the old content, much as it happened with slack space on some filesystems.
manually zeroing memory (and, with most compilers, bzero and memset count as manual loops) might be blithely optimized out, especially if you're zeroing a local variable ("bug" - actually a feature, with workaround).
some functions might leave "traces" in local buffers or in memory they allocate and deallocate.
in some languages and frameworks, whole portions of data could end up being moved around (e.g. during so-called "garbage collection", as noticed by #gene). You may be able to tell the GC not to process your sensitive area or otherwise "pin" it to that effect, and if so, must do so. Otherwise, data might end up in multiple, partial copies.
information might have come through and left traces you're not aware of (trivial example: a password sent through the network might linger in the network library read buffer).
live memory might be swapped out to disk.
Example of realloc doing its thing. Memory gets partly rewritten, and with some libraries this will only "work" if "a" is not the only allocated area (so you need to also declare c and allocate something immediately after a, so that a is not the last object and left free to grow):
int main() {
char *a;
char *b;
a = malloc(1024);
strcpy(a, "Hello");
strcpy(a + 200, "world");
printf("a at %08ld is %s...%s\n", a, a, a + 200);
b = realloc(a, 10240);
strcpy(b, "Hey!");
printf("a at %08ld is %s...%s, b at %08ld is %s\n", a, a, a + 200, b, b);
return 0;
}
Output:
a at 19828752 is Hello...world
a at 19828752 is 8????...world, b at 19830832 is Hey!
So the memory at address a was partly rewritten - "Hello" is lost, "world" is still there (as well as at b + 200).
So you need to handle reallocations of sensitive areas yourself; better yet, pre-allocate it all at program startup. Then, tell the OS that a sensitive area of memory must never be swapped to disk. Then you need to zero it in such a way that the compiler can't interfere. And you need to use a low-level enough language that you're sure doesn't do things by itself: a simple string concatenation could spawn two or three copies of the data - I'm fairly certain it happened in PHP 5.2.
Ages ago I wrote myself a small library - there wasn't valgrind yet - inspired by Steve Maguire's Writing Solid Code, and apart from overriding the various memory and string functions, I ended up overwriting memory and then calculating the checksum of the overwritten buffer. This not for security, I used it to track buffer over/under flows, double frees, use of freed memory -- this kind of things.
And then you need to ensure your failsafes work - for example, what happens if the program aborts? Might it be possible to make it abort on purpose?
You need to implement defense in depth, and always look at ways to keep as little information around as possible - for example clearing the intermediate buffers during a calculation rather than waiting and freeing the whole lot in one fell swoop at the very end, or just when exiting the program; keeping hashes instead of passwords when at all possible; and so on.
Of course all this depends on how sensitive the information is and what the attack surface is likely to be (mandatory xkcd reference: here). Rebooting the PC with a memtest86 image could be a viable alternative. Think of a dual-boot computer with memtest86 set to test memory and power down the PC as default boot option. When you want to turn off the system... you reboot it instead. The PC will reboot, enter memtest86 by default, and before powering off for good, it'll start filling all available RAM with marching troops of zeros and ones. Good luck freeze-booting information from that.
Zeroing out secrets (passwords, keys, etc) immediately after you are done with them is fairly standard practice. The difficulty is in dealing with language and platform features that can get in your way.
For example, C++ compilers can optimize out calls to memset if it determines that the data is not read after the write. Or operating systems may have paged the memory out to disk, potentially leaving the data available that way.

what is The poisoned NUL byte, in 1998 and 2014 editions?

I have to make a 10 minutes presentation about "poisoned null-byte (glibc)".
I searched a lot about it but I found nothing, I need help please because operating system linux and the memory and process management isn't my thing.
here is the original article, and here is an old article about the same problem but another version.
what I want is a short and simple explanation to the old and new versions of the problem or/and sufficient references where I can better read about this security threat.
To even begin to understand how this attack works, you will need at least a basic understanding of how a CPU works, how memory works, what the "heap" and "stack" of a process are, what pointers are, what libc is, what linked lists are, how function calls are implemented at the machine level (including calls to function pointers), what the malloc and free functions from the C library do, and so on. Hopefully you at least have some basic knowledge of C programming? (If not, you will probably not be able to complete this assignment in time.)
If you have a couple "gaps" in your knowledge of the basic topics mentioned above, hit the books and fill them in as quickly as you can. Talk to others if you need to, to make sure you understand them. Then read the following very carefully. This will not explain everything in the article you linked to, but will give you a good start. OK, ready? Let's start...
C strings are "null-terminated". That means the end of a string is marked by a zero byte. So for example, the string "abc" is represented in memory as (hex): 0x61 0x62 0x63 0x00. Notice, that 3-character string actually takes 4 bytes, due to the terminating null.
Now if you do something like this:
char *buffer = malloc(3); // not checking for error, this is just an example
strcpy(buffer, "abc");
...then that terminating null (zero byte) will go past the end of the buffer and overwrite something. We allocated a 3-byte buffer, but copied 4 bytes into it. So whatever was stored in the byte right after the end of the buffer will be replaced by a zero byte.
That was what happened in __gconv_translit_find. They had a buffer, which had been allocated with enough space to append ".so", including the terminating null byte, onto the end of a string. But they copied ".so" in starting from the wrong position. They started the copy operation one byte too far to the "right", so the terminating null byte went past the end of the buffer and overwrote something.
Now, when you call malloc to get back a dynamically allocated buffer, most implementations of malloc actually store some housekeeping data right before the buffer. For example, they might store the size of the buffer. Later, when you pass that buffer to free to release the memory, so it can be reused for something else, it will find that "hidden" data right before the beginning of the buffer, and will know how many bytes of memory you are actually freeing. malloc may also "hide" other housekeeping data in the same location. (In the 2014 article you referred to, the implementation of malloc used also stored some "flag" bits there.)
The attack described in the article passed carefully crafted arguments to a command-line program, designed to trigger the buffer overflow error in __gconv_translit_find, in such a way that the terminating null byte would wipe out the "flag" bits stored by malloc -- not the flag bits for the buffer which overflowed, but those for another buffer which was allocated right after the one which overflowed. (Since malloc stores that extra housekeeping data before the beginning of an allocated buffer, and we are overrunning the previous buffer. You follow?)
The article shows a diagram, where 0x00000201 is stored right after the buffer which overflows. The overflowing null byte wipes out the bottom 1 and changes that into 0x00000200. That might not make sense at first, until you remember that x86 CPUs are little-endian -- if you don't understand what "little-endian" and "big-endian" CPUs are, look it up.
Later, the buffer whose flag bit was wiped out is passed to free. As it turns out, wiping out that one flag bit "confuses" free and makes it, in turn, also overwrite some other memory. (You will have to understand the implementation of malloc and free which are used by GNU libc, in order to understand why this is so.)
By carefully choosing the input arguments to the original program, you can set things up so that the memory overwritten by the "confused" free is that used for something called tls_dtor_list. This is a linked list maintained by GNU libc, which holds pointers to certain functions which it must call when the main program is exiting.
So tls_dtor_list is overwritten. The attacker has set things up just right, so that the function pointers in the overwritten tls_dtor_list will point to some code which they want to run. When the main program is exiting, some code in libc iterates over that list and calls each of the function pointers. Result: the attacker's code is executed!
Now, in this case, the attacker already has access to the target system. If all they can do is run some code with the privilege level of their own account, that doesn't get them anywhere. They want to run code with root (administrator) privileges. How is that possible? It is possible because the buggy program is a setuid program, owned by root. If you don't know what "setuid" programs in Unix are, look it up and make sure you understand it, because that is also a key to the whole exploit.
This is all about the 2014 article -- I didn't look at the one from 1998. Good luck!

Legacy Fortran Code Compiling Problems

I have legacy Fortran code ranging in date from the 60s to 90s that I need to be able to compile.
The code works as it is written, even if it uses some old practices that are no longer standard.
It was successfully built on the Intel Visual Fortran 2011 Compiler and Visual Studio 2008. I am now on Visual Studio 2012 and Intel Visual Fortran 2013. I can't seem to find the right options to flip to allow it to build.
The major problem is that huge equivalence arrays are used and often instead of passing an array or an actual pointer to a subroutine they are just passing a single value of the pointer equivalence arrays and somehow it is implied to use a sequence of values. The main errors are
the type of actual argument differs from the type of dummy argument
if the actual argument is scalar, the dummy argument shall be scalar unless the actual argument is of type character or is an element of an array that is not assumed shape, pointer, or polymorphic
Once again. I know that the code does work as built. Any helpful suggestions will be appreciated.
The answer for my particular problem was to go to project properties -> diagnostics -> Language Usage Warnings -> Check Routine Interfaces and set it to "No".
It is about 1 year since your query, but I just had a similar problem that involved fortran pointers and dynamic memory allocations that I was able to fix.
A key problem was that memory address location values in the legacy code were equivalent to INTEGER*4 datatype whereas in the new operating system they were INTEGER*8. The way dynamic memory allocation worked was that the location of a dummy array 'HEAP(ii)' was used as an anchor point, relative to which the absolute address given by "malloc" could be referenced. If HEAP base address (i.e. LOC(HEAP(1)) was 11111, and if the absolute address freed by "malloc" was at memory address 1731111111111111, and if HEAP had been assigned to be integer*4 (or equivalently real*4), then the first location freed by "malloc" was equivalent to HEAP( (1731111111111111-11111)/4 + 1). Using HEAP(iaddress) as an argument in subroutine calls with iaddress at these insanely large index values allowed the appropriate portions of HEAP to be used as the array argument in the called subroutine.
Places with problems included:
POINT1.) The value of the absolute address of freed memory, as given by "malloc", had been stored at a position in HEAP() just after the stored calculation-data values. This address value was stored there so that it could be used later to free memory. If this INTEGER*8 address was stored across the boundaries of INTEGER*8 address registers for the new operating system/compiler, it would give a "HEXADECIMALS of DOOM" style crash, when you tried to access this stored INTEGER*8 memory address value based on the index of INTEGER*4 HEAP. This could be avoided by including an extra padding of 1 INTEGER*4 location before the storage spot for this stored value of the memory address. This was needed whenever the number of stored data values in the preceding data segment was an odd number of XXXX*4 data values. The stored memory address value could be read as an INTEGER*8 address, when it was needed, by EQUIVALENCE-ing INTEGER*4 HEAP() to INTEGER*8 LONGHEAP() and using an index value in longheap() that was about 1/2 of the insanely long index value used as the array index in HEAP() (with -1's and +1's to account for fortran indexing of arrays that starts from 1)
POINT2.) All variables that stored memory address locations needed to be tracked down and switched from INTEGER*4 to INTEGER*8. This was tricky in subroutines that called functions that called functions ... One construct to look for is the swapping of memory locations, TEMP = Iaddress; Iaddress = Jaddress; Jaddress = TEMP: in such swaps, make sure TEMP is INTEGER*8. Other memory-address arithmetic and temporary storage variables can cause problems as well. Also once you have made this switch, you need check whether these new INTEGER*8 variables have been used as arguments in calls to subroutines and functions. If so, you need to modify the called functions/subroutines appropriately and make sure that datatypes are consistent in all of the other calls to the same function/subroutine.
It might have been possible just to switch everything to *8 and check for statements involving the indexing arithmetic, but I wasn't sure if the new arrays of logical*8 values and some of the fancy stuff with the old integer*2 arrays would have worked.

Linux kernel struct file pointer

Is it guaranteed that a struct file pointer won't be deallocated and reallocated somewhere else in memory during its open to close lifecycle?
I want to uniquely identify file structs that are passed to a device driver (through read/write/open etc) and was wondering if I could just use the pointer to the file struct for identification. The only other alternative I see would be to store a unique identifier in private_data, if it is not guaranteed that the struct file pointer will not change.
Nothing will happen to the pointer. But you have to make sure that if this pointer is being passed across the kernel-user boundary (or computer-network), you actually check that the pointer you get is one of the valid pointers and perhaps an appropriate one (expected from this particular caller, if you can identify them). Otherwise you will have a huge security hole.

Resources