Implement Generic FIFO

Implement Generic FIFO - dml-lang

I am looking into implementing a FIFO that can hopefully be reused in multiple devices.
The FIFO template shall expose these methods push pop and len. I want the size of the FIFO to be defined as a parameter and I want the type the FIFO holds to be any type of standard integer types (uint8, uint16, uint32, uint64, int8, int16, int32 and int64). Pushing a 64-bit integer into a 8-bit FIFO shall cause truncation. The FIFO shall also support checkpointing.
I started with this code but it does not compile:
template fifo {
param fifo_size;
saved uint64 buf[fifo_size];
}

Related

Function for unaligned memory access on ARM

I am working on a project where data is read from memory. Some of this data are integers, and there was a problem accessing them at unaligned addresses. My idea would be to use memcpy for that, i.e.
uint32_t readU32(const void* ptr)
{
uint32_t n;
memcpy(&n, ptr, sizeof(n));
return n;
}
The solution from the project source I found is similar to this code:
uint32_t readU32(const uint32_t* ptr)
{
union {
uint32_t n;
char data[4];
} tmp;
const char* cp=(const char*)ptr;
tmp.data[0] = *cp++;
tmp.data[1] = *cp++;
tmp.data[2] = *cp++;
tmp.data[3] = *cp;
return tmp.n;
}
So my questions:
Isn't the second version undefined behaviour? The C standard says in 6.2.3.2 Pointers, at 7:
A pointer to an object or incomplete type may be converted to a pointer to a different
object or incomplete type. If the resulting pointer is not correctly aligned 57) for the
pointed-to type, the behavior is undefined.
As the calling code has, at some point, used a char* to handle the memory, there must be some conversion from char* to uint32_t*. Isn't the result of that undefined behaviour, then, if the uint32_t* is not corrently aligned? And if it is, there is no point for the function as you could write *(uint32_t*) to fetch the memory. Additionally, I think I read somewhere that the compiler may expect an int* to be aligned correctly and any unaligned int* would mean undefined behaviour as well, so the generated code for this function might make some shortcuts because it may expect the function argument to be aligned properly.
The original code has volatile on the argument and all variables because the memory contents could change (it's a data buffer (no registers) inside a driver). Maybe that's why it does not use memcpy since it won't work on volatile data. But, in which world would that make sense? If the underlying data can change at any time, all bets are off. The data could even change between those byte copy operations. So you would have to have some kind of mutex to synchronize access to this data. But if you have such a synchronization, why would you need volatile?
Is there a canonical/accepted/better solution to this memory access problem? After some searching I come to the conclusion that you need a mutex and do not need volatile and can use memcpy.
P.S.:
# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 1581.05
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10

This code
uint32_t readU32(const uint32_t* ptr)
{
union {
uint32_t n;
char data[4];
} tmp;
const char* cp=(const char*)ptr;
tmp.data[0] = *cp++;
tmp.data[1] = *cp++;
tmp.data[2] = *cp++;
tmp.data[3] = *cp;
return tmp.n;
}
passes the pointer as a uint32_t *. If it's not actually a uint32_t, that's UB. The argument should probably be a const void *.
The use of a const char * in the conversion itself is not undefined behavior. Per 6.3.2.3 Pointers, paragraph 7 of the C Standard (emphasis mine):
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare
equal to the original pointer. When a pointer to an object is
converted to a pointer to a character type, the result points to the
lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining
bytes of the object.
The use of volatile with respect to the correct way to access memory/registers directly on your particular hardware would have no canonical/accepted/best solution. Any solution for that would be specific to your system and beyond the scope of standard C.

Implementations are allowed to define behaviors in cases where the Standard does not, and some implementations may specify that all pointer types have the same representation and may be freely cast among each other regardless of alignment, provided that pointers which are actually used to access things are suitably aligned.
Unfortunately, because some obtuse compilers compel the use of "memcpy" as an
escape valve for aliasing issues even when pointers are known to be aligned,
the only way compilers can efficiently process code which needs to make
type-agnostic accesses to aligned storage is to assume that any pointer of a type requiring alignment will always be aligned suitably for such type. As a result, your instinct that approach using uint32_t* is dangerous is spot on. It may be desirable to have compile-time checking to ensure that a function is either passed a void* or a uint32_t*, and not something like a uint16_t* or a double*, but there's no way to declare a function that way without allowing a compiler to "optimize" the function by consolidating the byte accesses into a 32-bit load that will fail if the pointer isn't aligned.

What series of number can be used for new ioctl added in new driver

With reference to this link http://stackoverflow.com/questions/8922102/adding-new-ioctls-into-kernel-number-range I came to know that it is mandatory to encode directions into ioctl numbers, if copy-to-user/copy-from-user is not used.
Kindly please someone explain how to arrive at new ioctl numbers with setting encoded direction.

You need to use the _IO() series of macros and the guidelines which are documented in the official ioctl-number documentation. The _IO macros are declared in ioctl.h. Most take a an 8-bit int to represent the type, an 8-bit int to represent the ioctl number and the data type if the data you intend to pass into the IOCTL call. Ideally, the type is unique que to your driver, however most numbers have already been assigned so this is difficult to do. The ioctl number is just to differentiate it from other numbers and may be assigned sequentially.
You can get more info from Chapter 6 of the LDD3.
Edit: Your comment leads me to believe you need a hard example. You shouldn't refer to an IOCTL number by it's hex value. Instead use the _IO() macros like so:
// The type for all of my IOCTL calls.
// This number is from 0 to 255.
// Does not conflict with any number assignments in ioctl-number.txt.
#define MYIOC_TYPE 0xA4
// This ioctl takes no arguments. It does something in the driver
// without passing data back and forth. The ioctl number is from 0 to 255.
#define MYIOC_DOFOO _IO(MYIOC_TYPE, 0x00)
// This ioctl reads an integer value from the driver.
#define MYIOC_GETFOO _IOR(MYIOC_TYPE, 0x01, int)
// This ioctl writes an integer value from the driver.
#define MYIOC_SETFOO _IOW(MYIOC_TYPE, 0x02, int)
// This ioctl is confusing and is probably to be avoided.
// It writes a value to the driver while at the same time
// retrieves a value in the same pointer.
#define MYIOC_SETANDGETFOO _IOWR(MYIOC_TYPE, 0x03, int)
The macros encode the data in the ioctl number. So instead of referring to a single hex number it is much more appropriate to refer to an ioctl's type and number. These macros have the added benefit that they document what direction data goes to/from and what the type of that data is.
You can get more info from Chapter 6 of the LDD3.

Atomic reads and writes of properties

This question has been asked before but i still don't understand it fully so here it goes.
If i have a class with a property (a non-nullable double or int) - can i read and write the property with multiple threads?
I have read somewhere that since doubles are 64 bytes, it is possible to read a double property on one thread while it is being written on a different one. This will result in the reading thread returning a value that is neither the original value nor the new written value.
When could this happen? Is it possible with ints as well? Does it happen with both 64 and 32 bit applications?
I haven't been able to replicate this situation in a console

If i have a class with a property (a non-nullable double or int) - can i read and write the property with multiple theads?
I assume you mean "without any synchronization".
double and long are both 64 bits (8 bytes) in size, and are not guaranteed to be written atomically. So if you were moving from a value with byte pattern ABCD EFGH to a value with byte pattern MNOP QRST, you could potentially end up seeing (from a different thread) ABCD QRST or MNOP EFGH.
With properly aligned values of size 32 bits or lower, atomicity is guaranteed. (I don't remember seeing any guarantees that values will be properly aligned, but I believe they are by default unless you force a particular layout via attributes.) The C# 4 spec doesn't even mention alignment in section 5.5 which deals with atomicity:
Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type in the previous list are also atomic. Reads and writes of other types, including long, ulong, double, and decimal, as well as user-defined types, are not guaranteed to be atomic. Aside from the library functions designed for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement.
Additionally, atomicity isn't the same as volatility - so without any extra care being taken, a read from one thread may not "see" a write from a different thread.

These operations are not atomic, that's why the Interlocked class exists in the first place, with methods like Increment(Int32) and Increment(Int64).
To ensure thread safety, you should use at least this class, if not more complex locking (with ReaderWriterLockSlim, for example, in case you want to synchronize access to groups of properties).

Writing a custom malloc which stores informations in the pointer

I have recently been reading about a family of automatic memory management techniques that rely on storing information in the pointer returned by the allocator, i.e. few bits of header e.g. to differentiate between pointers or to store thread-related information (note that I'm not talking about limited-field reference counting here, only immutable information).
I'd like to toy with these techniques. Now, to implement them, I need to be able to return pointers with a specific shape from my allocator. I suppose I could play with the least weight bits but this would require padding that looks extremely memory consuming, so I believe that I should play with the heaviest bits. However, I have no good idea on how to do this. Is there a way for me to, call malloc or malloc_create_zone or some related function and request a pointer that always starts with the given bits?
Thanks everyone!

The amount of information you can actually store in a pointer is pretty limited (typically one or two bits per pointer). And every attempt to dereference the pointer has to first mask out the magic information. The technique is often called tagging, BTW.
#define TAG_MASK 0x3
#define CONS_TAG 0x1
#define STRING_TAG 0x2
#define NUMBER_TAG 0x3
typedef uintptr_t value_t;
typedef struct cons {
value_t car;
value_t cdr;
} cons_t;
value_t
create_cons(value_t t1, value_t t2)
{
cons_t* pair = malloc(sizeof(cons_t));
value_t addr = (value_t)pair;
pair->car = t1;
pair->cdr = t2;
return addr | CONS_TAG;
}
value_t
car_of_cons(value_t v)
{
if ((v % TAG_MASK) != CONS_TAG) error("wrong type of argument");
return ((cons_t*) (v & ~TAG_MASK))->car;
}
One advantage of this technique is, that you can directly infer the type of the object from the pointer itself. You don't need to dereference it (say, in order to read a special type field or similar). Many language implementations using this scheme also have a special tag combination for "immediate" numbers and other small values, which can be represented direcly using the "pointer".
The disadvatage is, that the amount of information, which can be stored, is pretty limited. Also, as the example code shows, you have to be aware of the tagging in every access to the object, and need to "untag" the pointer before actually using it.
The use of the least significant bits for tagging stemms from the observation, that on most platforms, all pointer to malloced memory is actually aligned on a non-byte boundary (usually 8 bytes), so the least significant bits are always zero.

How should one use Disruptor (Disruptor Pattern) to build real-world message systems?

As the RingBuffer up-front allocates objects of a given type, how can you use a single ring buffer to process messages of various different types?
You can't create new object instances to insert into the ringBuffer and that would defeat the purpose of up-front allocation.
So you could have 3 messages in an async messaging pattern:
NewOrderRequest
NewOrderCreated
NewOrderRejected
So my question is how are you meant to use the Disruptor pattern for real-world messageing systems?
Thanks
Links:
http://code.google.com/p/disruptor-net/wiki/CodeExamples
http://code.google.com/p/disruptor-net
http://code.google.com/p/disruptor

One approach (our most common pattern) is to store the message in its marshalled form, i.e. as a byte array. For incoming requests e.g. Fix messages, binary message, are quickly pulled of the network and placed in the ring buffer. The unmarshalling and dispatch of different types of messages are handled by EventProcessors (Consumers) on that ring buffer. For outbound requests, the message is serialised into the preallocated byte array that forms the entry in the ring buffer.
If you are using some fixed size byte array as the preallocated entry, some additional logic is required to handle overflow for larger messages. I.e. pick a reasonable default size and if it is exceeded allocate a temporary array that is bigger. Then discard it when the entry is reused or consumed (depending on your use case) reverting back to the original preallocated byte array.
If you have different consumers for different message types you could quickly identify if your consumer is interested in the specific message either by knowing an offset into the byte array that carries the type information or by passing a discriminator value through on the entry.
Also there is no rule against creating object instances and passing references (we do this in a couple of places too). You do lose the benefits of object preallocation, however one of the design goals of the disruptor was to allow the user the choice of the most appropriate form of storage.

There is a library called Javolution (http://javolution.org/) that let's you defined objects as structs with fixed-length fields like string[40] etc. that rely on byte-buffers internally instead of variable size objects... that allows the token ring to be initialized with fixed size objects and thus (hopefully) contiguous blocks of memory that allow the cache to work more efficiently.
We are using that for passing events / messages and use standard strings etc. for our business-logic.

Back to object pools.
The following is an hypothesis.
If you will have 3 types of messages (A,B,C), you can make 3 arrays of those pre-allocated. That will create 3 memory zones A, B, C.
It's not like there is only one cache line, there are many and they don't have to be contiguous. Some cache lines will refer to something in zone A, other B and other C.
So the ring buffer entry can have 1 reference to a common ancestor or interface of A & B & C.
The problem is to select the instance in the pools; the simplest is to have the same array length as the ring buffer length. This implies a lot of wasted pooled objects since only one of the 3 is ever used at any entry, ex: ring buffer entry 1234 might be using message B[1234] but A[1234] and C[1234] are not used and unusable by anyone.
You could also make a super-entry with all 3 A+B+C instance inlined and indicate the type with some byte or enum. Just as wasteful on memory size, but looks a bit worse because of the fatness of the entry. For example a reader only working on C messages will have less cache locality.
I hope I'm not too wrong with this hypothesis.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string