Need to rearrange HashMap key/value pairs - hashmap

I have set of key/value paired data in the form of a HashMap that that I am required to manipulate.
Here is the signature of the object I am looking at:
Map<Consumer, ArrayList<EventMsg>> consumerMsgListMap =
new HashMap<Consumer, ArrayList<EventMsg>>();
The result of System.out.println(consumerMsgListMap.toString()) is the following:
ConsumerA=[msg1, msg3], ConsumerB=[msg1, msg2, msg4], ConsumerC=[msg2, msg3]
As you can see each value() is a list rather than an individual value.
I need to figure out a way to rearrange the data such that for each unique EventMsg entry there is an associated Consumer. For example:
msg1 needs to be associated with [ConsumerA, ConsumerB]
msg2 needs to be associated with [ConsumerB, ConsumerC]
msg3 needs to be associated with [ConsumerA, ConsumerC]
msg4 needs to be associated with [ConsumerB]
This isn't a matter of simply reversing the K,V pairs.
I think the proper approach is to get uniqueness among all values by building up a separate HashSet but I can't figure out a way of getting at the values as Individual entities
(e.g. msg1, msg3, msg1, msg2, msg4)
rather than as Groups of entities
(e.g. [msg1, msg3], [msg1, msg2, msg4]).
This is probably obvious to the seasoned pro but at my stage of development I'm stumped. Hope I've stated the problem clearly. Thanks in advance if anyone has any ideas.
Here's the initial setup. Consumer could just as easily be String and EventMsg could just as easily be Integer:
Map<Consumer, ArrayList<EventMsg>> consumerMsgListMap = new HashMap<Consumer, ArrayList<EventMsg>>();
Consumer c1 = new Consumer("ConsumerA");
Consumer c2 = new Consumer("ConsumerB");
Consumer c3 = new Consumer("ConsumerC");
ArrayList<EventMsg> msgListA = new ArrayList<EventMsg>();
msgListA.add(new EventMsg("msg1"));
msgListA.add(new EventMsg("msg3"));
ArrayList<EventMsg> msgListB = new ArrayList<EventMsg>();
msgListB.add(new EventMsg("msg1"));
msgListB.add(new EventMsg("msg2"));
msgListB.add(new EventMsg("msg4"));
ArrayList<EventMsg> msgListC = new ArrayList<EventMsg>();
msgListC.add(new EventMsg("msg2"));
msgListC.add(new EventMsg("msg3"));
consumerMsgListMap.put(c1, msgListA);
consumerMsgListMap.put(c2, msgListB);
consumerMsgListMap.put(c3, msgListC);

It seems fairly straightforward to me. Pseudocode for adoubly-nested loop:
Allocate a new Map<EventMsg, Set<Consumer>> result
Iterate over consumerMsgListMap.entries(), which gives you Map.Entry<Consumer, ArrayList<EventMsg>> objects, one at a time.
Set key = entry.key(), value=entry.value()
For each EventMsg e in value,
if e is not a key in result, then result.put(e, new HashSet<Consumer>());
result.get(e).add(key)

Related

Is there a more efficient way to interact with ItemPaged objects from azure-data-tables SDK function query_entities?

The quickest method I have found is to just convert the ItemPaged object to a list using list() and then I'm able to manipulate/extract using a Pandas DataFrame. However, if I have millions of results, the process can be quite time-consuming, especially if I only want every nth result over a certain time-frame, for instance. Typically, I would have to query the entire time-frame and then re-loop to only obtain every nth element. Does anyone know a more efficient way to use query_entities OR how to more efficiently return every nth item from ItemPaged or more explicitly from table.query_entities? Portion of my code below:
connection_string = "connection string here"
service = TableServiceClient.from_connection_string(conn_str=connection_string)
table_string = ""
table = service.get_table_client(table_string)
entities = table.query_entities(filter, select, etc.)
results = pd.DataFrame(list(entities))
Does anyone know a more efficient way to use query_entities OR how to more efficiently return every nth item from ItemPaged or more explicitly from table.query_entities?
After reproducing from my end, one of the ways to achieve your requirement using get_entity() instead of query_entities(). Below is the complete code that worked for me.
entity = tableClient.get_entity(partition_key='<PARTITION_KEY>', row_key='<ROW_KEY>')
print("Results using get_entity :")
print(format(entity))
RESULTS:

Lua weak tables memory leak

I don't use often weak tables. However now I need to manage certain attributes for my objects which should be stored somewhere else. Thats when weak tables come in handy. My issue is, that they don't work es expected. I need weak keys, so that the entire key/value pair is removed, when the key is no longer referenced and I need strong values, since what is stored are tables with meta information which is only used inside that table, which also have a reference to the key, but somehow those pairs are never collected.
Code example:
local key = { }
local value = {
ref = key,
somevalue = "Still exists"
}
local tab = setmetatable({}, { __mode = "k" })
tab[key] = value
function printtab()
for k, v in pairs(tab) do
print(v.somevalue)
end
end
printtab()
key = nil
value = nil
print("Delete values")
collectgarbage()
printtab()
Expected output:
Still exists
Delete values
Got:
Still exists
Delete values
Still exists
Why is the key/value pair not deleted? The only reference to value is effectivly a weak reference inside tab, and the reference inside value is not relevant, since the value itself is not used anywhere.
Ephemeron tables are supported since Lua 5.2.
The Lua 5.2 manual says:
A table with weak keys and strong values is also called an ephemeron table. In an ephemeron table, a value is considered reachable only if its key is reachable. In particular, if the only reference to a key comes through its value, the pair is removed.
Lua 5.1 does not support ephemeron tables correctly.
You are making too many assumptions about the garbage collector. Your data will be collected eventually. In this particular example it should work if you call collectgarbage() twice, but if you have some loops in your weak table it might take even longer.
EDIT: this actually only matters when you're waiting for the __cg event
I went over your code in more detail and noticed you have another problem.
Your value is referencing the key as well, creating a loop that is probably just too much for the GC of your Lua version to handle. In PUC Lua 5.3 this works as expected, but in LuaJIT the loop seems to keep the value from being collected.
This actually makes a lot of sense if you think about it; from what I can tell, the whole thing works by first removing weak elements from a table when they're not referenced anywhere else and thus leave them to be collected normally the next time the GC runs.
However, when this step runs, the key is still in the table, so the (not weak) value is a valid reference in the GCs eyes, as it is accessible from the code. So the GC kind of deadlocks itself into not being able to remove the key-value pair.
Possible solutions would be:
Don't save a reference to the key in the value
Make the value a weak table as well so it doesn't count as a reference either
Upgrade to another Lua version
Wrap the reference in a weak-valued single-element array
you can change the code like this. Then you will get the expected output. tips: do not reference key variable when you want it to be week.
local key = { }
local value = {
-- ref = key,
somevalue = "Still exists"
}
local tab = setmetatable({}, { __mode = "k" })
tab[key] = value
function printtab()
for k, v in pairs(tab) do
print(v.somevalue)
end
end
printtab()
key = nil
value = nil
print("Delete values")
collectgarbage()
printtab()

Creating Node.js enum in code to match list of values in database

I have a list of valid values that I am storing in a data store. This list is about 20 items long now and will likely grow to around 100, maybe more.
I feel there are a variety of reasons it makes sense to store this in a data store rather than just storing in code. I want to be able to maintain the list and its metadata and make it accessible to other services, so it seems like a micro-service data store.
But in code, we want to make sure only values from the list are passed, and they can typically be hardcoded. So we would like to create an enum that can be used in code to ensure that valid values are passed.
I have created a simple node.js that can generate a JS file with the enum right from the data store. This could be regenerated anytime the file changes or maybe on a schedule. But sharing the enum file with any node.js applications that use it would not be trivial.
Has anyone done anything like this? Any reason why this would be a bad approach? Any feedback is welcome.
Piggy-backing off of this answer, which describes a way of creating an "enum" in JavaScript: you can grab the list of constants from your server (via an HTTP call) and then generate the enum in code, without the need for creating and loading a JavaScript source file.
Given that you have loaded your enumConstants from the back-end (here I hard-coded them):
const enumConstants = [
'FIRST',
'SECOND',
'THIRD'
];
const temp = {};
for (const constant of enumConstants) {
temp[constant] = constant;
}
const PlaceEnum = Object.freeze(temp);
console.log(PlaceEnum.FIRST);
// Or, in one line
const PlaceEnum2 = Object.freeze(enumConstants.reduce((o, c) => { o[c] = c; return o; }, {}));
console.log(PlaceEnum2.FIRST);
It is not ideal for code analysis or when using a smart editor, because the object is not explicitly defined and the editor will complain, but it will work.
Another approach is just to use an array and look for its members.
const members = ['first', 'second', 'third'...]
// then test for the members
members.indexOf('first') // 0
members.indexOf('third') // 2
members.indexOf('zero') // -1
members.indexOf('your_variable_to_test') // does it exist in the "enum"?
Any value that is >=0 will be a member of the list. -1 will not be a member. This doesn't "lock" the object like freeze (above) but I find it suffices for most of my similar scenarios.

Modelling Time Series data with tags

I'm currently working on a poc to model time series data.
The initial datapoint structure:
- the name of a sensor: 192.168.1.1:readCount
- a timestamp
- a value
I use the sensor name as rowid, the timestamp as column id. This approach works very fine.
However I want to add tags to add additional data.
public class Datapoint {
public String metricName;
public long timestampMs;
public long value;
public Map<String, String> tags = new HashMap<String, String>();
}
Datapoint datapoint = new Datapoint();
datapoint.metricName = "IMap.readCount";
datapoint.value = 10;
datapoint.timestampMs = System.currentTimeMillis();
datapoint.tags.put("cluster", "dev");
datapoint.tags.put("member", "192.168.1.1:5701");
datapoint.tags.put("id", "map1");
datapoint.tags.put("company", "Foobar");
I want to use it to say:
- aggregate all metrics for all different machines with the same id. E.g. if machine 1 has 10 writes for mapx, and machine2 did 20 writes for mapx, I want to know that 30.
- aggregate metrics for for all maps: if machine 1 did 20 writes on mapx and 30 writes on mapy, I want to know the total of 50.
The question is how I should model this.
I know that a composite can be used for the column id. So in theory I could add each tag as a an element in that composite. But can a column be efficiently searched for when it has a variable number of elements in the composite?
I know my question is a bit foggy, but I think this reflects my understanding of Cassandra since I just started with it.
#pveentjer
"I know that a composite can be used for the column id. So in theory I could add each tag as a an element in that composite. But can a column be efficiently searched for when it has a variable number of elements in the composite?"
There are some rules and restrictions when using multiple composites, read here and here
For CQL3, there are further limitations, read here

How to maintain counters with LinqToObjects?

I have the following c# code:
private XElement BuildXmlBlob(string id, Part part, out int counter)
{
// return some unique xml particular to the parameters passed
// remember to increment the counter also before returning.
}
Which is called by:
var counter = 0;
result.AddRange(from rec in listOfRecordings
from par in rec.Parts
let id = GetId("mods", rec.CKey + par.UniqueId)
select BuildXmlBlob(id, par, counter));
Above code samples are symbolic of what I am trying to achieve.
According to the Eric Lippert, the out keyword and linq does not mix. OK fair enough but can someone help me refactor the above so it does work? A colleague at work mentioned accumulator and aggregate functions but I am novice to Linq and my google searches were bearing any real fruit so I thought I would ask here :).
To Clarify:
I am counting the number of parts I might have which could be any number of them each time the code is called. So every time the BuildXmlBlob() method is called, the resulting xml produced will have a unique element in there denoting the 'partNumber'.
So if the counter is currently on 7, that means we are processing 7th part so far!! That means XML returned from BuildXmlBlob() will have the counter value embedded in there somewhere. That's why I need it somehow to be passed and incremented every time the BuildXmlBlob() is called per run through.
If you want to keep this purely in LINQ and you need to maintain a running count for use within your queries, the cleanest way to do so would be to make use of the Select() overloads that includes the index in the query to get the current index.
In this case, it would be cleaner to do a query which collects the inputs first, then use the overload to do the projection.
var inputs =
from recording in listOfRecordings
from part in recording.Parts
select new
{
Id = GetId("mods", recording.CKey + part.UniqueId),
Part = part,
};
result.AddRange(inputs.Select((x, i) => BuildXmlBlob(x.Id, x.Part, i)));
Then you wouldn't need to use the out/ref parameter.
XElement BuildXmlBlob(string id, Part part, int counter)
{
// implementation
}
Below is what I managed to figure out on my own:.
result.AddRange(listOfRecordings.SelectMany(rec => rec.Parts, (rec, par) => new {rec, par})
.Select(#t => new
{
#t,
Id = GetStructMapItemId("mods", #t.rec.CKey + #t.par.UniqueId)
})
.Select((#t, i) => BuildPartsDmdSec(#t.Id, #t.#t.par, i)));
I used resharper to convert it into a method chain which constructed the basics for what I needed and then i simply tacked on the select statement right at the end.

Resources