Functionality of the BPF hash of maps structure - hashmap

I'm writing a BPF tool that maintains a per-process trace, which means I'll need a 2d data structure of some sort. To bypass the 512 byte limitation, I've decided that going with the supported structs (maps) is my best bet. After glancing at the iovisor docs, it seemed like BPF_HASH_OF_MAPS was the structure I needed. I presumed this structure offered a key-value pairing wherein the "value" was another BPF map. However, based on the parameters it takes, the "outer" structure seems to be an array (personally, from the github documenation, I can't tell the difference between BPF_HASH_OF_MAPS and BPF_ARRAY_OF_MAPS). Have I been misled: does the BPF_HASH_OF_MAPS structure provide hashmap like functionality?

You're correct. bcc doesn't make a different between BPF_HASH_OF_MAPS and BPF_ARRAY_OF_MAPS, as least in how they're exposed. In the kernel, they actually are two different data structures and a BPF hash of maps can have keys of various sizes.
bcc defines a hardcoded int as the key type (first int below) for the BPF_HASH_OF_MAPS maps:
#define BPF_HASH_OF_MAPS(_name, _inner_map_name, _max_entries) \
BPF_TABLE("hash_of_maps$" _inner_map_name, int, int, _name, _max_entries)
To use different key types, you can simply use BPF_TABLE in place of BPF_HASH_OF_MAPS.

Related

Create a map using type as key

I need a HashMap<K,V> where V is a trait (it will likely be Box or an Rc or something, that's not important), and I need to ensure that the map stores at most one of a given struct, and more importantly, that I can query the presence of (and retrieve/insert) items by their type. K can be anything that is unique to each type (a uint would be nice, but a String or even some large struct holding type information would be sufficient as long as it can be Eq and Hashable)
This is occurring in a library, so I cannot use an enum or such since new types can be added by external code.
I looked into std::any::TypeId but besides not working for non-'static types, it seems they aren't even unique (and allegedly collisions were achieved accidentally with a rather small number of types) so I'd prefer to avoid them if feasible since the number of types I'll have may be very large. (hence this is not a duplicate of this IMO)
I'd like something along the lines of a macro to ensure uniqueness but I can't figure out how to have some kind of global compile time counter. I could use a proper UUID, but it'd be nice to have guaranteed uniqueness since this is, in theory at least, statically determinable.
It is safe to assume that all relevant types are defined either in this lib or in a singular crate that directly depends on it, if that allows for a solution that might be otherwise impossible.
e.g. my thoughts are to generate ids for types in the lib, and also export a constant of the counter, which can be used by the consumer of the lib in the same macro (or a very similar one) but I don't see a way to have such a const value modified by const code in multiple places.
Is this possible or do I need some kind of build script that provides values before compile time?

Why do some struct types let us set members that can only be a certain value?

I was reading up on some vulkan struct types, this is one of many examples, but the one I will use is vkInstanceCreateInfo. The documentation states:
The VkInstanceCreateInfo structure is defined as:
typedef struct VkInstanceCreateInfo {
VkStructureType sType;
const void* pNext;
VkInstanceCreateFlags flags;
const VkApplicationInfo* pApplicationInfo;
uint32_t enabledLayerCount;
const char* const* ppEnabledLayerNames;
uint32_t enabledExtensionCount;
const char* const* ppEnabledExtensionNames;
} VkInstanceCreateInfo;
Then below in the options we see:
sType is the type of this structure
sType must be VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Note: I realise this is not something specific to the vulkan API.
Update: I'm not just talking specifically about vulkan, just all parameters that can only be a certain type.
The design allows structures to be chained together so that extensions can create additional parameters to existing calls without interfering with the original API structures and without interfering with each other.
Nearly every struct in Vulkan has sType as it's first member, and pNext as it's second member. That means that if you have a void* and all you know is that it is some kind of Vulkan API struct, you can safely read the first 32 bits and it will be a VkStructureType and read the next 32 or 64 bits and it will tell you if there's another structure in the chain.
So for instance, there's a VkMemoryAllocateInfo structure for allocating memory that has (aside from sType and pNext the size of the allocation and the heap index it should come from. But what if I want to use the "dedicated allocation" extension. Then I also need to fill out a VkMemoryDedicatedAllocateInfo structure with extra information. But I still need to call the same vkAllocateMemory function that only takes a VkMemoryAllocateInfo... so where do I put the VkMemoryDedicatedAllocateInfo structure I filled out? I put a pointer to it in the pNext field of VkMemoryAllocateInfo.
Maybe I also want to share this memory with some OpenGL code. There's an extension that lets you do that, but you need to fill out a VkExportMemoryAllocateInfo structure and pass it in during the allocation as well. Well, I can do that by putting it in the pNext field of my VkMemoryDedicatedAllocateInfo structure. I can create a chain of structures like that as long as I want.
Here's the really important part. Since all structures have sType as their first field, an extension can navigate along this chain of structures and find the ones it cares about without knowing anything about the structures other than that they always start with sType and pNext.
All of this means that Vulkan can be extended in ways that alter the behavior of existing functions, but without changing the function itself, or the structures that are passed to it.
You might ask why all of the core structures have sType and pNext, even though you're passing them to functions with typed pointers, rather than void pointers. The reason is consistency, and because you never know when an existing structure might be needed as part of the chain for some new extension.
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Because C isn't C++. There's no way to declare a structure in C and say that this portion of the structure will always have this value. In C++ you can, by declaring something as const and providing the initial default value. In fact, one of the things I like about the Vulkan C++ bindings is that you can basically forget about sType forever. If you're using extensions you still need to populate pNext as appropriate.

How to map a structure from a buffer like in C with a pointer and cast

In C, I can define many structures and structure of structures.
From a buffer, I can just set the pointer at the beginning of this structure to say this buffer represents this structure.
Of course, I do not want to copy anything, just mapping, otherwise I loose the benefit of the speed.
Is it possible in NodeJs ? How can I do ? How can I be sure it's a mapping and not creating a new object and copy information inside ?
Example:
struct House = {
uint8 door,
uint16BE kitchen,
etc...
}
var mybuff = Buffer.allocate(10, 0)
var MyHouse = new House(mybuff) // same as `House* MyHouse = (House*) mybuff`
console.log(MyHouse.door) // will display the value of door
console.log(MyHouse.kitchen) // will display the value of kitchen with BE function.
This is wrong but explain well what I am looking for.
This without copying anything.
And if I do MyHouse.door=56, mybuff contains know the 56. I consider mybuff as a pointer.
Edit after question update below
Opposed to C/C++, javascript uses pionters by default, so you don't have to do anything. It's the other way around, actually: You have to put some effort in if you want a copy of the current object.
In C, a struct is nothing more than a compile-time reference to different parts of data in the struct. So:
struct X {
int foo;
int bar;
}
is nothing more than saying: if you want bar from a variable with type X, just add the length of foo (length of int) to the base pointer.
In Javascript, we do not even have such a type. We can just say:
var x = {
foo: 1,
bar: 2
}
The lookup of bar will automatically be a pointer (we call them references in javascript) lookup. Because javascript does not have types, you can view an object as a map/dictionary with pointers to mixed types.
If you, for any reason, want to create a copy of a datastructure, you would have to iterate through the entire datastructure (recursively) and create a copy of the datastructure manually. The basic types are not pointer based. These include number (Javascript automatically differentiates between int and float under the hood), string and boolean.
Edit after question update
Although I am not an expert on this area, I do not think it is possible. The problem is, the underlying data representation (as in how the data is represented as bytes in memory) is different, because javascript does not have compile-time information about data structures. As I said before, javascript doesn't have classes/structs, just objects with fields, which basically behave (and may be implemented as) maps/dictionaries.
There are, however, some third party libraries to cope with these problems. There are two general approaches:
Unpack everything to javascript objects. The data will be copied, but you can work with it as normal javascript objects. You should use this if you read/write the data intensively, because the performance increase you get when working with normal javascript objects outweighs the advantage of not having to unpack the data. Link to example library
Leave all data in the buffer. When you need some of the data, compute the location of the data in the buffer at runtime, and read/write at this location accordingly. Because the struct data location computations are done in runtime, you should use this only when you have loads of data and only a few reads/writes to it. In this case the performance decrease of unpacking all data outweighs the few runtime computations that have to be done. Link to example library
As a side-note, if the amount of data you have to process isn't that much, I'd recommend to just unpack the data. It saves you the headache of having to use the library as interface to your data. Computers are fast enough nowadays to copy/process some amount of data in memory. Also, these third party libraries are just some examples. I recommend you do a little more research for libraries to decide which one suits your needs.

What are the differences between hashtable and hashmap? (Not specific to Java)

During my most recent job interview for a software engineer position, I was asked this questions: what are the differences between hashtable and hashmap? I asked the interviewer if he was specific about Java since in Java hashtable is synchronized and hashmap is not (and actually tons of information to compare hashtable vs hashmap in Java after googling so that's not the answer I am looking for) but he said no and wanted to me to explain the the difference of these two in general.
I was really puzzled and shocked (actually still puzzled now) about this question. IMO, hastable or hashmap is simply a matter of terminology. Actually only Java has both terms and in other languages like C++, they don't even have the term hashtable. During the interview, I just explained the principle of hashing and said that hashmap and hashtable should both be implemented based on this principle and I don't know if there is any difference between these two. The interviewer was definitely not convinced and was looking for other answers and of course I was rejected after that round.
So back to the topic, what could possibly be the differences between hashmap and hashtable in general (not specific to Java) if there is any?
In Computer Science there is a difference due to the wording.
A HashTable is some kind of lookup table using key hashes to lookup the corresponding value in a table like data structure. Thats only one kind of a key-value Mapping. There are different implementations as you are probably aware. Different hashes, hash collusion solutions and table growing strategies and more under the hood. It's only interesting if you need to make your own hash table for whatever reason.
A HashMap is some kind of mapping of key-value pairs with a hashed key. Mapping is abstract as such and it may not be a table. Balanced trees or tries or other data structures/mappings are possible too.
You could simplify and say that a HashTable is the underlying data structure and the HashMap may be utilizing a HashTable.
A Dictionary is yet another abstraction level since it may not use hashes at all - for example with full text binary search lookups or other ways for compares. This is all you can get out of the words without considering certain programming languages.
--
Before thinking too much about it. Can you say - with certainty - that your interviewer had a clue about what he/she was talking about? Did you discuss technical details or did they just listen/ask and sometimes comment? Sometimes interviewers just come up with the most ridicules answers to problems they don't really understand in the first place.
Like you wrote yourself, in general it's just Terminology. Software Developers often use the Terms interchangeable, except maybe those who really have differences like in Java.
The interviewer may have been looking for the insight that...
a hash table is a lower-level concept that doesn't imply or necessarily support any distinction or separation of keys and values (i.e. you can implement a hash set of values using a hash table), while
a hash map must support distinct keys and values, as there's to be a mapping/association from keys to values; the two are distinct, even if in some implementations they're always stored side by side in memory, e.g. members of the same structure / std::pair<>.
Example: a (bad) hash table implementation preventing use as a hash map.
Consider:
template <typename T>
class Hash_Table
{
...
bool insert(const T& t)
{
// work out which bucket t hashes to...
size_t bucket = hash_bytes((void*)&t, sizeof t) % num_buckets_;
// see if t is already stored in the bucket...
if (memcmp((void*)&t, (void*)&buckets_[bucket], sizeof t) == 0)
...
... handle collisions etc. ...
}
...
};
Above, the hard-coded calls to a hash function that treats the value being inserted as a binary blob, and memcmp of the entire t, mean you can't make T say a std::pair<int, std::string> and use the hash table as a hash map from ints to strings. So, it's an example of a hash table that's not usable as a hash map.
You might or might not also consider a hash table that simply doesn't provide any convenience features for use as a hash map not to be a hash map. For example, if the API was designed as if dealing only in values - h.insert(t); h.erase(t); auto i = h.find(t); - but it allowed the caller to specify arbitrary custom comparison and hashing functions that could restrict their operations to only the key part of t, then the hash table could be (ab)used as a functional hash map.
To clarify how this relates to makadev's existing answer, I disagree with:
"A HashTable [uses] key hashes to lookup the corresponding value"; wrong because it assumes a key->value mapping.
"A HashMap [...]. Mapping is abstract as such and it may not be a table. Balanced trees or tries or other data structures/mappings are possible too."; wrong because the primary mechanism of a hash map is still hashing of the key to a bucket (index) in the table/array: some hash tables/maps may use other data structures (arrays, linked lists, trees...) to store elements that collide at the same bucket, but that's a different issue and not part of the difference between hash tables and hash maps.
Actually HashTable become obsoletes and HasHMap is best approach to use because Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use java.util.concurrent.ConcurrentHashMap in place of Hashtable.
Second difference is HashMap extends Map Interface and whether HashSet Dictionary interface.

How to delete elements by value in a map structure restricted with having one key

The main problem is that I'm working in a functional language with immutable types so thing like pointers and deletion are a bit harder. I would prefer if this was implementable primarily in Haskell.
Let's imagine we have a single dimensional field
[x,x,x,x,x,x,x,x,x]
So I have a map with keys being SIZES and values being ADDRESSES because each entry starts from a certain ADDRESS and has a certain SIZE.
[(x,x,x),x,x,(x,x,x,x)]
I want to be able to add an element by SIZE to a map and then check if the entries are touching so that I can merge them.
Since my map is by SIZEs I have to iterate through the whole map to find the ones with the bordering ADDRESSes.
Do I really have to chose between implementing a 2 key map and O(n) for merger?
Welp, in essence, this looks like computer memory. Do you want it to be efficient? Because you know, "things like pointers" exist and work in Haskell perfectly well.
Since my map is by SIZEs I have to iterate through the whole map to find the ones with the bordering ADDRESSes.
No, if you store the ranges in a separate data structure. I think for such non-overlapping subsets, there was something called a spanning tree (or as suggested by #Daniel, IntervalMap), but I'm not exactly an expert on those. Otherwise, why don't you simply hold memory blocks like that?
data Block = Block { start :: Int, data :: [Byte] }
type Memory = [Block]
You could cache the block length or use a data structure where length is O(1), to make merges O(nBlocks).
Sure, that doesn't make it obvious at the type level that they won't ever overlap, but that's an invariant you can keep for yourself.

Resources