Why using a single hash bucket hash table in linux kernel - linux

I am trying to understand the implementation of linux kernel's hash table. What I don't understand is that I find code initializing a hash table with only a single hash bucket. I don't know why the coding is doing that.
This hash table usage makes sense to me:
In kernel/pid.c:
void __init pidhash_init(void)
{
unsigned int i, pidhash_size;
pid_hash = alloc_large_system_hash("PID", sizeof(*pid_hash), 0, 18,
HASH_EARLY | HASH_SMALL,
&pidhash_shift, NULL,
0, 4096);
pidhash_size = 1U << pidhash_shift;
for (i = 0; i < pidhash_size; i++)
INIT_HLIST_HEAD(&pid_hash[i]);
}
pid_hash is a list of struct hlist_head, so each entry in the list represents a hash bucket.
However this usage doesn't make sense to me:
In drivers/android/binder.c of goldfish branch:
static HLIST_HEAD(binder_dead_nodes);
It expands to
struct hlist_head name = { .first = NULL }
Basically it is a hash table with only one hlist_head, namely a hash table with only one hash bucket. So it is actually a double linked list. Why people wants to create a hash table with a single hash bucket like this?

hlist is just a regular double linked list.
The difference between list and hlist is just that hlist trades O(1) access to the tail of the list for a 50% memory reduction for empty lists. This is perfect for hash tables, which have lots of empty lists and never need to access a list in reverse or from behind.
However, it's also great for regular linked lists.
By using hlist they saved a few bytes over list, and gave us a strong signal that the list is used to collect an unknown number of items in an order that doesn't matter.

Related

How can I get element by the Index from the OrderedTable in Nim?

Nim has OrderedTable how can I get the element by its index, and not by its key?
If it's possible - is this an efficient operation, like O(log n) or better?
import tables
let map = {"a": 1, "b": 2, "c": 3}.toOrderedTable
Something like
map.getByIndex(1) # No such method
P.S.
I'm currently using both seq and Table to provide both key and indexed access and wonder if it could be replaced by OrderedTable
type IndexedMap = ref object
list*: seq[float]
map*: Table[string, float]
There is no direct index access to ordered tables because of their internal structure. The typical way to access the elements in order is:
import tables
let map = {"a": 1, "b": 2, "c": 3}.toOrderedTable
for f in map.keys:
echo $f
Basically, accessing the keys iterator. If you click through the source link in the documentation, you reach the actual iterator code:
let L = len(t)
forAllOrderedPairs:
yield t.data[h].key
And if you follow the implementation of the forAllOrderedPairs template (it's recommended you are using an editor with jump to implementation capabilities to inspect such code easier):
if t.counter > 0:
var h = t.first
while h >= 0:
var nxt = t.data[h].next
if isFilled(t.data[h].hcode):
yieldStmt
h = nxt
No idea about performance there, but it won't be as fast as accessing a simple list/array, because the internal structure of OrderedTable contains a hidden data field with the actual keys and values, and it requires an extra conditional check to verify that the entry is actually being used. This implementation detail is probably a compromise to avoid reshuffling the whole list after a single item deletion.
If your accesses are infrequent, using the iterator to find the value might be enough. If benchmarking shows it's a bottleneck you could try freezing the keys/values iterator into a local list and use that instead, as long as you don't want to mutate further the OrderedTable.
Or return to your original idea of keeping a separate list.

What is the !$ (bang dollar) operator in Nim?

In the example of defining a custom hash function on page 114 of Nim in Action, the !$ operator is used to "finalize the computed hash".
import tables, hashes
type
Dog = object
name: string
proc hash(x: Dog): Hash =
result = x.name.hash
result = !$result
var dogOwners = initTable[Dog, string]()
dogOwners[Dog(name: "Charlie")] = "John"
And in the paragraph below:
The !$ operator finalizes the computed hash, which is necessary when writing a custom hash procedure. The use of the $! operator ensures that the computed hash is unique.
I am having trouble understanding this. What does it mean to "finalize" something? And what does it mean to ensure that something is unique in this context?
Your questions might become answered if instead of reading the single description of the !$ operator you take a look at the beginning of the hashes module documentation. As you can see there, primitive data types have a hash() proc which returns their own hash. But if you have a complex object with many variables, you might want to create a single hash for the object itself, and how do you do that? Without going into hash theory, and treating hashes like black boxes, you need to use two kind of procs to produce a valid hash: the addition/concatenation operator and the finalization operator. So you end up using !& to keep adding (or mixing) individual hashes into a temporal value, and then use !$ to finalize that temporal value into a final hash. The Nim in Action example might have been easier to understand if the Dog object had more than a single variable, thus requiring the use of both operators:
import tables, hashes, sequtils
type
Dog = object
name: string
age: int
proc hash(x: Dog): Hash =
result = x.name.hash !& x.age.hash
result = !$result
var dogOwners = initTable[Dog, string]()
dogOwners[Dog(name: "Charlie", age: 2)] = "John"
dogOwners[Dog(name: "Charlie", age: 5)] = "Martha"
echo toSeq(dogOwners.keys)
for key, value in dogOwners:
echo "Key ", key.hash, " for ", key, " points at ", value
As for why are hash values temporarily concatenated and then finalized, that depends much on which algorithms have the Nim developers chosen to use for hashing. You can see from the source code that hash concatenation and finalization is mostly bit shifting. Unfortunately the source code doesn't explain or point at any other reference to understand why is that done and why this specific hashing algorithm was selected compared to others. You could try asking the Nim forums for that, and maybe improve the documentation/source code with your findings.

Are numbers, bools or nils garbage collected in Lua?

This article implies that all types beside numbers, bools and nil are garbage collected.
The field gc is used for the other values (strings, tables, functions, heavy userdata, and threads), which are those subject to garbage collection.
Would this mean under certain circumstances that overusing these non-gc types might result in memory leaks?
In Lua, you have actually 2 kinds of types: Ones which are always passed by value, and ones passed by reference ( as per chapter 2.1 in the Lua Manual ).
The ones you cite are all of the "passed-by-value" type, hence they are directly stored in a variable.
If you delete the variable, the value will be gone instantly.
So it will not start leaking memory, unless, of course, you keep generating new variables containing new values. But in that case it's your own fault ;).
In the article you linked to they write down the C code that shows how values are represented:
/*You can also find this in lobject.h in the Lua source*/
/*I paraphrased a bit to remove some macro magic*/
/*unions in C store one of the values at a time*/
union Value {
GCObject *gc; /* collectable objects */
void *p; /* light userdata */
int b; /* booleans */
lua_CFunction f; /* light C functions */
numfield /* numbers */
};
typedef union Value Value;
/*the _tt tagtells what kind of value is actually stored in the union*/
struct lua_TObject {
int _tt;
Value value_;
};
As you can see in here, booleans and numbers are stored directly in the TObject struct. Since they are not "heap-allocated" it means that they can never "leak" and therefore garbage collecting them would have made no sense.
One interesting to note, however, is that the garbage collector does not collect references created to things on the C side of things (userdata and C C functions). These need to be manually managed from the C-side of things but that is sort of to be expected since in that case you are writing C instead of Lua.

Is there any generic Hashable typeclass in Haskell? (a.k.a. "deriving (Hashable)")

Has anyone written a generic function so that hash functions can be generated automatically for custom data types (using the deriving mechanism)? A few times, I've written the following kind of boilerplate,
data LeafExpr = Var Name | Star deriving (Eq, Show)
instance Hashable LeafExpr where
hash (Var name) = 476743 * hash name
hash Star = 152857
This could be generated automatically: the basic idea is that whenever adding data, you multiply by a prime, for example with lists,
hash (x:xs) = hash x + 193847 * hash xs
Essentially, what I'd like to write is
data LeafExpr = ... deriving (Hashable)
Edit 1
Thanks for all of the very helpful responses, everyone. I'll try to add a generic method as an exercise when I have time. For now (perhaps what sclv was referring to?), I realized I could write the slightly better code,
instance Hashable LeafExpr where
hash (Var name) = hash ("Leaf-Var", name)
hash Star = hash "Leaf-Star"
Edit 2
Using ghc, multiplying by random primes works much better than tupling in edit 1. Conflicts with Data.HashTable went from something like 95% (very bad) to 36%. Code is here: [ http://pastebin.com/WD0Xp0T1 ] [ http://pastebin.com/Nd6cBy6G ].
How much speed do you need? You could use one of the packages that use template haskell to generate serialization code to convert the value to binary and then hash the binary array with hashable.

Design a data structure for reporting resource conflicts

I have a memory address pool with 1024 addresses. There are 16 threads running inside a program which access these memory locations doing either read or write operations. The output of this program is in the form of a series of quadruples whose defn is like this
Quadruple q1 : (Thread no, Memory address, read/write , time)
e.g q1 = (12,578,r,2t), q2= (16,578,w,6t)
I want to design a program which takes the stream of quadruples as input and reports all the conflicts which occur if more than 2 threads try to access the same memory resource inside an interval of 5t secs with at least one write operation.
I have several solutions in mind but I am not sure if they are the best ones to address this problem. I am looking for a solution from a design and data structure perspective.
So the basic problem here is collision detection. I would generally look for a solution where elements are added to some kind of associative collection. As a new element is about to be added, you need to be able to tell whether the collection already contains a similar element, indicating a collision. Here you would seem to need a collection type that allows for duplicate elements, such as the STL multimap. The Quadraple (quadruple?) would obviously be the value type in the associative collection, and the key type would contain the data necessary to determine whether two elements represent a collision, i.e. memory address and time. In order to use a standard associative collection like STL multimap, you need to define some ordering on the keys by defining operator< for the key type (I'm assuming C++ here, you didn't specify). You define a collision as two elements where the memory location is identical and the time values differ by less than some threshold amount. The ordering of the key type has to be such that two keys that represent a collision come out as equivalent under the ordering. Equivalence under the < operator is expressed as a < b is false and b < a is false as well, so the ordering might be defined by this operator:
bool operator<( Key const& a, Key const& b ) {
if ( a.address == b.address ) {
if ( abs(a.time - b.time) < threshold ) {
return false;
}
return a.time < b.time;
}
return a.address < b.address;
}
There is a problem with this design, due to the fact that two keys may be equivalent under < without being equal. This means that two different but similar Quadraples, i.e. two values that collide with one another, would be stored under the same key in the collection. You could use a simpler definition of the ordering
bool operator<( Key const& a, Key const& b ) {
if ( a.address == b.address ) {
return a.time < b.time;
}
return a.address < b.address;
}
Under this ordering definition, colliding elements end up adjacent in an ordered associative container (but under different keys), so you'd be able to find them easily in a post-processing step after they have all been added to the collection.

Resources