Implement agregation in cooja - aggregation

Could someone has an idea on how to implement agregation in udp-client.c with cooja Please? I have a linear network with one server (node 1) and 4 clients running on contiki. 1 is connected to 2 which is connected to 3 which is connected to 4 which is connected to 5. My goal is that node2 wait for the data from node3 before sending its data to node1, node3 waits for the data from node4 and node4 waits for the data from node5 before sending to node3.
I am using cooja.
Thanks!!!

To define the link between the different nodes, you use the Directed Graph Radio Medium (DGRM) "DGRM" mode to specify link quality between the node1 and the node2 , node2 and node3 ..etc, in order to create a linear topology for your network.
You can modify the code executed by your node (udp-client.c) to send specific msgs.
when a msg is received, the node witch receive the msg verify that it was sent by the parent so it will send the msg for its children.

Related

Circle topology of Nodes for distributed system with WebSockets - How to mark all Nodes when topology healthy?

Let's assume circular topology of Nodes where each Node is connected with next (right/clockwise) Node by WebSocket bidirectional tunnel. I will speak about 5 Nodes, but minimum is 3.
Nodes will end up in this configuration after initial setup:
leftNode and rightNode are references to given WebSocket live tunnel between two Nodes.
Spawning of Nodes works like this:
Node is spawned with constructor(id, leftNodeId, rightNodeId)
WebSocket server is started
Node start trying endlessly to connect to next Node by given id.
When connection is made Nodesaves reference to tunnel into rightNode and right node saves it into leftNode
Nodes can spawn randomly...
How to set circleHealthy variable on all Nodes when circle is complete?
I would have every node transmit its identifier to its right, and forward such messages from its left, adding its own identifier (it might be worth trying to consolidate such messages, using some sort of timeout, but with a maximum of 5 nodes, this is not a big deal). When you see your own identifier coming back from your left, you know that it has passed through every other node.

How to connect to specific cassandra node only

I have two docker cassandra container nodes acting as node1 and node2 in the same data center.
My aim is to have my java application will always connect to node1 and my adhoc manual queries should return from node2 only (There should not be any inter node communication for data)
Normally i can execute read/write queries on top of container1 or container2 using cqlsh. If i fire some queries on top of container1 using cqlsh will it always return the data from same container (node1) or it may route to another node also internally?
and I know coordinator node will talk with peer node for data request , what will happen incase of RF=2 and 2 nodes cluster will coordinator node itself be able to serve the data?
Here, RF=2, node=2, Consistency=ONE
I have set up clusters before to separate OLTP from OLAP. The way to do it, is to separate your nodes into different logical data centers.
So node1 should have it's local data center in cassandra-rackdc.properties to be in "dc1."
dc=dc1
rack=r1
Likewise, node2 should be put into it's own data center, "dc2."
dc=dc2
rack=ra
Then your keyspace definition will look something like this:
CREATE KEYSPACE stackoverflow
WITH REPLICATION={'class':'NetworkTopologyStrategy','dc1':'1','dc2':'1'};
My aim is to have my java application will always connect to node1
In your Java code, you should specify "dc1" as your default data center, as I do in this example:
String dataCenter = "dc1";
Builder builder = Cluster.builder()
.addContactPoints(nodes)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
.withLoadBalancingPolicy(new TokenAwarePolicy(
new DCAwareRoundRobinPolicy.Builder()
.withLocalDc(dataCenter).build()))
.withPoolingOptions(options);
That will make your Java app "sticky" to all nodes in data center "dc1," or just node1 in this case. Likewise, when you cqlsh into node2, your ad-hoc queries should be "sticky" to all nodes in "dc2."
Note: In this configuration, you do not have high-availability. If node1 goes down, your app will not jump over to node2.

Cross Datacenter Replication Path in Cassandra Timeout

Consider 2 data-centers in different geographical locations, correct me if any statement I have made is incorrect.
DC1's coordinator receives a write request from a client which it forwards to its local nodes containing the replicas. For a successful write to happen (Local_Quorum consistency level), the local nodes should have acknowledged the write to the coordinator within the Write_request_timeout_in_ms = 5000 ms period.
when DC1's coordinator receives the write request, it also sends the write request to a coordinator in DC2, which passes on the request to the local nodes containing the replica data. These DC2 nodes are not required to acknowledge back within the 5000 ms for the write mutation to be a success, only DC1 nodes need to respond within that 5000 ms for a successful write.
So my question is what timers govern the write requests happening at DC2? The DC2 coordinator has to respond back to the DC1 coordinator within a specific timeout period, otherwise DC1 coordinator stores a hinted handoff for DC2, so what is that specific timeout period called in the cassandra.yaml file?

Cassandra - write with CL.ALL with multiple data centers

I have two Data Centers, each one with replication factor 3.
Will write with CL.ALL block until data is stored in both DCs (6 nodes, or 3 + 1)?
I would assume, that it blocks until all 3 replicas in local DC has acknowledged successful write.
I would like to have something like CL.ALL_LOCAL, which stores data on all replicas in single DC, so I can read with CL.ONE. The idea is, that write blocks until all replicas in single DC has persisted data, and following read will have high probability to read fresh data
There isn't currently a consistency level that provides what you are describing. The closest is LOCAL_QUORUM which will return after a quorum of nodes in the local datacenter respond.
You can file a ticket on jira to add this functionality if you would like.
https://issues.apache.org/jira/browse/CASSANDRA
I've checked Cassandra 1.1 code and noticed interesting behavior when writing with CL.ALL in multi DC deployment. Probably I've interpreted code wrong.... anyway:
on the beginning they are collecting IP addresses of nodes to send row mutation - this is independent from consistency level provided by the client. In 1.0 it were all nodes from all DCs, from 1.1 they get all nodes from local DC plus one node from each remote DC (the remaining nodes are as "forward to" in the message). Each mutation will be send by separate thread, so the requests can run in parallel. Each such mutation is being handled as a message by messaging service. When node in remote DC receives message, it forwards it to remaining nodes, which are provided in "forward to".
The consistency level provided by the client, defines number of nodes which must acknowledge received message. In case of CL.ALL this number is determined by replication factor - now is getting interesting: since we've send message to all nodes from local DC and to nodes from remote DCs, we will get also acknowledgement from those remove nodes too - yes this is still the number which is defined by replication factor, but depending on notwork latency, we can not be sure which nodes has conformed received message - could be mix from nodes from local and remote DC, but could be also only nodes from local DC. In the worst case, it could happen, that none of the local nodes got the message, and confirmation come from remote DCs (if you have many). This means - writing with CL.ALL does not grantee, that you can immediately read message from your local DC.

Cassandra seed nodes and clients connecting to nodes

I'm a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can't seem to find this bit of information in the documentation.
Do the clients only contain a list of the seed node and each node delegates a new host for the client to connect to? Are seed nodes only really for node to node discovery, rather than a special node for clients?
Should each client use a small sample of random nodes in the DC to connect to?
Or, should each client use all the nodes in the DC?
Answering my own question:
Seeds
From the FAQ:
Seeds are used during startup to discover the cluster.
Also from the DataStax documentation on "Gossip":
The seed node designation has no purpose other than bootstrapping the gossip process
for new nodes joining the cluster. Seed nodes are not a single
point of failure, nor do they have any other special purpose in
cluster operations beyond the bootstrapping of nodes.
From these details it seems that a seed is nothing special to clients.
Clients
From the DataStax documentation on client requests:
All nodes in Cassandra are peers. A client read or write request can
go to any node in the cluster. When a client connects to a node and
issues a read or write request, that node serves as the coordinator
for that particular client operation.
The job of the coordinator is to act as a proxy between the client
application and the nodes (or replicas) that own the data being
requested. The coordinator determines which nodes in the ring should
get the request based on the cluster configured partitioner and
replica placement strategy.
I gather that the pool of nodes that a client connects to can just be a handful of (random?) nodes in the DC to allow for potential failures.
seed nodes serve two purposes.
they act as a place for new nodes to announce themselves to a cluster. so, without at least one live seed node, no new nodes can join the cluster because they have no idea how to contact non-seed nodes to get the cluster status.
seed nodes act as gossip hot spots. since nodes gossip more often with seeds than non-seeds, the seeds tend to have more current information, and therefore the whole cluster has more current information. this is the reason you should not make all nodes seeds. similarly, this is also why all nodes in a given data center should have the same list of seed nodes in their cassandra.yaml file. typically, 3 seed nodes per data center is ideal.
the cassandra client contact points simply provide the cluster topology to the client, after which the client may connect to any node in the cluster. as such, they are similar to seed nodes and it makes sense to use the same nodes for both seeds and client contacts. however, you can safely configure as many cassandra client contact points as you like. the only other consideration is that the first node a client contacts sets its data center affinity, so you may wish to order your contact points to prefer a given data center.
for more details about contact points see this question: Cassandra Java driver: how many contact points is reasonable?
Your answer is right. The only thing I would add is that it's recommended to use the same seed list (i.e. in your cassandra.yaml) across the cluster, as a "best practices" sort of thing. Helps gossip traffic propagate in nice, regular rates, since seeds are treated (very minimally) differently by the gossip code (see https://cwiki.apache.org/confluence/display/CASSANDRA2/ArchitectureGossip).

Resources