Whatever order I use in here
let mut tm = TreeMap::new();
tm.insert("aaa".to_string(), "val1".to_json());
tm.insert("zzz".to_string(), "val2".to_json());
//or
// tm.insert("zzz".to_string(), "val2".to_json());
// tm.insert("aaa".to_string(), "val1".to_json());
let a = json::Object(tm);
println!("Json is {}", a)
The result json is always the same:
json is {"aaa":"val1","zzz":"val2"}
But I want the order to be the same as it is in insert operations. How?
Generally it's a very bad idea to rely on order of keys in JSON. Usually the underlying data structure is a hash table, it does not preserve order (the standard does not require it, and a hash map turns out to be the most efficient way of implementing such unordered map). There are some implementations of JSON parsers/generators which preserve order (and some even allow duplicates), but you can never rely on this behavior.
So the best way to achieve the result you want is to use an array of pairs (a pair can be either an array or a map). The order of elements within array is preserved.
Related
I want to map a timestamp t and an identifier id to a certain state of an object. I can do so by mapping a tuple (t,id) -> state_of_id_in_t. I can use this mapping to access one specific (t,id) combination.
However, sometimes I want to know all states (with matching timestamps t) of a specific id (i.e. id -> a set of (t, state_of_id_in_t)) and sometimes all states (with matching identifiers id) of a specific timestamp t (i.e. t -> a set of (id, state_of_id_in_t)). The problem is that I can't just put all of these in a single large matrix and do linear search based on what I want. The amount of (t,id) tuples for which I have states is very large (1m +) and very sparse (some timestamps have many states, others none etc.). How can I make such a dict, which can deal with accessing its contents by partial keys?
I created two distinct dicts dict_by_time an dict_by_id, which are dicts of dicts. dict_by_time maps a timestamp t to a dict of ids, which each point to a state. Similiarly, dict_by_id maps an id to a dict of timestamps, which each point to a state. This way I can access a state or a set of states however I like. Notice that the 'leafs' of both dicts (dict_by_time an dict_by_id) point to the same objects, so its just the way I access the states that's different, the states themselves however are the same python objects.
dict_by_time = {'t_1': {'id_1': 'some_state_object_1',
'id_2': 'some_state_object_2'},
't_2': {'id_1': 'some_state_object_3',
'id_2': 'some_state_object_4'}
dict_by_id = {'id_1': {'t_1': 'some_state_object_1',
't_2': 'some_state_object_3'},
'id_2': {'t_1': 'some_state_object_2',
't_2': 'some_state_object_4'}
Again, notice the leafs are shared across both dicts.
I don't think it is good to do it using two dicts, simply because maintaining both of them when adding new timestamps or identifiers result in double work and could easily lead to inconsistencies when I do something wrong. Is there a better way to solve this? Complexity is very important, which is why I can't just do manual searching and need to use some sort of HashMap magic.
You can always trade add complexity with lookup complexity. Instead of using a single dict, you can create a Class with an add method and a lookup method. Internally, you can keep track of the data using 3 different dictionaries. One uses the (t,id) tuple as key, one uses t as the key and one uses id as the key. Depending on the arguments given to lookup, you can return the result from one of the dictionaries.
Is it possible to insert and get values into/from a HashMap directly with a Hash provided, so I can cache the hashes?
I want to do something like this:
map.insert(key, "value");
let hashed_key = {
let mut hasher = map.hasher().build_hasher();
key.hash(&mut hasher);
hasher.finish()
};
assert_eq!(map.get(key).unwrap(), map.get_by_hash(hashed_key).unwrap());
playground
No.
This is fundamentally impossible at the algorithmic level.
By design, a hash operation is surjective: multiple elements may hash to the same value. Therefore, any HashMap implementation can only use the hash as a hint and must then use a full equality comparison to check that the element found by the hint is the right element (or not).
At best, a get_by_hash method would return an Iterator of all possible elements matching the current hash.
For a degenerate case, consider a hashing algorithm which always returns 4 (obtained by the roll of a fair dice). Which element would you expect it to return?
Work-around
If caching is what you are after, the trick in languages with no HashBuilder is to pre-hash (and cache) the hash inside the key itself.
It requires caching the full key (because of the equality check), but hashing is then a very simple operation (return the cached value).
It does not, however, speed up the equality check, which depending on the value may be quite expensive.
You could adapt the pattern to Rust, although you would lose the advantage of using a HashBuilder.
Problem we are trying to solve:
Give a list of Keys, what is the best way to get the value from IMap given the number of entries is around 500K?
Also we need to filter the values based on fields.
Here is the example map we are trying to read from.
Given IMap[String, Object]
We are using protobuf to serialize the object
Object can be say
Message test{ Required mac_address eth_mac = 1, ….// size can be around 300 bytes }
You can use IMap.getAll(keySet) if you know the keys beforehand. It's much better than single gets since it'll be much less network trips in a bulk operation.
For filtering, you can use predicates on IMap.values(predicate), IMap.entryset(predicate) or IMap.keyset(predicate) based on what you want to filter.
See more: http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#distributed-query
I am trying to filter out columns after using a WholeRowIterator to filter rows. This is to remove columns that were useful in determining which row to keep, but not useful in the data returned by the scan.
The WholeRowIterator does not appear to play nice as the source of another iterator such as a RegExFilter. I know the keys/values are encoded by the WholeRowIterator.
Are there any possible solutions to get this iterator stack to work?
Thanks.
Usually, the WholeRowIterator is the last iterator in the "stack" as it involves serializing the row (many key-values) into a single key-value. You probably don't want to do it more than once. But, let's assume you want to do that:
You would want to write an Iterator which, deserializes each Key-Value into a SortedMap using the WholeRowIterator method, modify the SortedMap, reserialize it back into a single Key-Value, and then return it. This iterator would need to be assigned a priority higher than the priority given to the WholeRowIterator.
Alternatively, you could extend the WholeRowIterator and override the encodeRow(List<Key>,List<Value>) method to not serialize your unwanted columns in the first place. This would save the extra serialization and deserialization the first approach has.
Let's assume that we have some objects (strings, for example). It is well known that working with indices (i.e. with numbers 1,2,3...) is much more convenient than with arbitrary objects.
Is there any common way of assigning an index for each object? One can create a hash_map and store an index in the value, but that will be memory-expensive when the number of objects is too high to be placed into the memory.
Thanks.
You can store the string objects in a sorted file.
This way, you are not storing the objects in memory.
Your mapping function can search for the required object in the sorted file.
You can create a hash map to optimize the search.