Cassandra data modeling for server time series metrics - cassandra

I gonna collect server metrics, such as OS metrics, Java processes info etc, every second.
For example JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
What is the best data model for such kind of data?
Should I store every type of metrics in separate table or all in one?

One option would be to translate each individual metric into a dotted string, so that your JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
Turns into this:
Host Key Value
localhost os.cpu 4
localhost os.memory 16
localhost java.jvm.vendor Oracle
localhost java.heap 4
localhost java.version 1.8
Not shown above is a timestamp column. The primary key would be host+key+timestamp. If you don't need to be able to query by individual host, you could lump the hostname into the key, i.e. key=localhost.os.cpu.
The precise details of your querying needs weigh heavily into whether this is the right choice.

Related

LLRP for Zebra FX7500 with llrpjs doesn't read tags

Using the llrpjs library for Node.js, we are attempting to read tags from the Zebra FX7500 (Motorola?). This discussion points to the RFID Reader Software Interface Control Guide pages 142-144, but does not indicate potential values to set up the device.
From what we can gather, we should issue a SET_READER_CONFIG with a custom parameter (MotoDefaultSpec = VendorIdentifier: 161, ParameterSubtype: 102, UseDefaultSpecForAutoMode: true). Do we need to include the ROSpec and/or AccessSpec values as well (are they required)? After sending the SET_READER_CONFIG message, do we still need to send the regular LLRP messages (ADD_ROSPEC, ENABLE_ROSPEC, START_ROSPEC)? Without the MotoDefaultSpec, even after sending the regular LLRP messages, sending a GET_REPORT does not retrieve tags nor does a custom message with MOTO_GET_TAG_EVENT_REPORT. They both trigger a RO_ACCESS_REPORT event message, but the tagReportData is null.
The README file for llrpjs lists "Vendor definitions support" as a TODO item. While that is somewhat vague, is it possible that the library just hasn't implemented custom LLRP extension (messages/parameters) support, which is why none of our attempts are working? The MotoDefaultSpec parameter and MOTO_GET_TAG_EVENT_REPORT are custom to the vendor/chipset. The MOTO_GET_TAG_EVENT_REPORT custom message seems to trigger a RO_ACCESS_REPORT similar to the base LLRP GET_REPORT message, so we assume that part is working.
It is worth noting that Zebra's 123RFID Desktop setup and optimization tool connects and reads tags as expected, so the device and antenna are working (reading tags).
Could these issues be related to the ROSPEC file we are using (see below)?
{
"$schema": "https://llrpjs.github.io/schema/core/encoding/json/1.0/llrp-1x0.schema.json",
"id": 1,
"type": "ADD_ROSPEC",
"data": {
"ROSpec": {
"ROSpecID": 123,
"Priority": 1,
"CurrentState": "Disabled",
"ROBoundarySpec": {
"ROSpecStartTrigger": {
"ROSpecStartTriggerType": "Immediate"
},
"ROSpecStopTrigger": {
"ROSpecStopTriggerType": "Null",
"DurationTriggerValue": 0
}
},
"AISpec": {
"AntennaIDs": [1, 2, 3, 4],
"AISpecStopTrigger": {
"AISpecStopTriggerType": "Null",
"DurationTrigger": 0
},
"InventoryParameterSpec": {
"InventoryParameterSpecID": 1234,
"ProtocolID": "EPCGlobalClass1Gen2"
}
},
"ROReportSpec": {
"ROReportTrigger": "Upon_N_Tags_Or_End_Of_ROSpec",
"N": 1,
"TagReportContentSelector": {
"EnableROSpecID": true,
"EnableAntennaID": true,
"EnableFirstSeenTimestamp": true,
"EnableLastSeenTimestamp": true,
"EnableSpecIndex": false,
"EnableInventoryParameterSpecID": false,
"EnableChannelIndex": false,
"EnablePeakRSSI": false,
"EnableTagSeenCount": true,
"EnableAccessSpecID": false
}
}
}
}
}
For anyone having a similar issue, we found that attempting to configure more antennas than the Zebra device has connected caused the entire spec to fail. In our case, we had two antennas connected, so including antennas 3 and 4 in the spec was causing the problem.
See below for the working ROSPEC. The extra antennas in the data.AISpec.AntennaIDs property were removed and allowed our application to connect and read tags.
We are still having some issues with llrpjs when trying to STOP_ROSPEC because it sends an RO_ACCESS_REPORT response without a resName value. See the issue on GitHub for more information.
That said, our application works without sending the STOP_ROSPEC command.
{
"$schema": "https://llrpjs.github.io/schema/core/encoding/json/1.0/llrp-1x0.schema.json",
"id": 1,
"type": "ADD_ROSPEC",
"data": {
"ROSpec": {
"ROSpecID": 123,
"Priority": 1,
"CurrentState": "Disabled",
"ROBoundarySpec": {
"ROSpecStartTrigger": {
"ROSpecStartTriggerType": "Null"
},
"ROSpecStopTrigger": {
"ROSpecStopTriggerType": "Null",
"DurationTriggerValue": 0
}
},
"AISpec": {
"AntennaIDs": [1, 2],
"AISpecStopTrigger": {
"AISpecStopTriggerType": "Null",
"DurationTrigger": 0
},
"InventoryParameterSpec": {
"InventoryParameterSpecID": 1234,
"ProtocolID": "EPCGlobalClass1Gen2",
"AntennaConfiguration": {
"AntennaID": 1,
"RFReceiver": {
"ReceiverSensitivity": 0
},
"RFTransmitter": {
"HopTableID": 1,
"ChannelIndex": 1,
"TransmitPower": 170
},
"C1G2InventoryCommand": {
"TagInventoryStateAware": false,
"C1G2RFControl": {
"ModeIndex": 23,
"Tari": 0
},
"C1G2SingulationControl": {
"Session": 1,
"TagPopulation": 32,
"TagTransitTime": 0,
"C1G2TagInventoryStateAwareSingulationAction": {
"I": "State_A",
"S": "SL"
}
}
}
}
}
},
"ROReportSpec": {
"ROReportTrigger": "Upon_N_Tags_Or_End_Of_AISpec",
"N": 1,
"TagReportContentSelector": {
"EnableROSpecID": true,
"EnableAntennaID": true,
"EnableFirstSeenTimestamp": true,
"EnableLastSeenTimestamp": true,
"EnableTagSeenCount": true,
"EnableSpecIndex": false,
"EnableInventoryParameterSpecID": false,
"EnableChannelIndex": false,
"EnablePeakRSSI": false,
"EnableAccessSpecID": false
}
}
}
}
}

Elastsearch query causing NodeJS heap out of memory

What's happen now?
Recenly I build a Elasticsearch query. The main function is to get data count per hours until 12 weeks ago.
When the query get call over and over again. NodeJS memory will start from 20mb growing to 1024mb. And surprisingly the memory aren’t immediately get to the top. Its more like stably under 25mb ( maintain about several minutes ) and suddenly start to growing like (25mb,46mb,125mb,350mb...until 1024mb) and finally causing NodeJS memory leak. Whatever I call this query or not, The memory will still growing up and won’t release at all. And this scenario only happen at remote server (running in docker), At local docker env is totally fine (the memory is identical).
enter image description here
How am I query?
like below.
const query = {
"size": 0,
"query": {
"bool": {
"must": [
{ terms: { '_id.keyword': array_id } },
{
"range": {
"date_created": {
"gte": start_timestamp - timestamp_twelve_weeks,
"lt": start_timestamp
}
}
}
]
}
},
"aggs": {
"shortcode_log": {
"date_histogram": {
"field": "date_created",
"interval": "3600ms"
}
}
}
}
What's the return value?
like below ( total query time is around 2 sec ) .
{
"aggs_res": {
"shortcode_log": {
"buckets": [
{
"key": 1594710000,
"doc_count": 2268
},
{
"key": 1594713600,
"doc_count": 3602
},
{//.....total item count 2016
]
}
}
}
If your histogram interval is really of 3600ms(it should not be 3600s ?), it's a really short period of time to do the aggregation on 12 weeks.
It means 0.06 minutes.
24000 periods per day
168000 per week
2016000 for 12 weeks.
It can explain
Why your script wait for a long before doing anything
Why your memory explode when you try to loop on the buckets
In your example, you have 2016 buckets back only.
I think that their is a small difference between your 2 tests.
New update. The issue is solved. The problem in project has a layer between server and DB. So the code of this layer causing the query memory can't release.

Stream Analytics - Passing multiple, individual records within a window into a UDF

I want to pass multiple, individual records within a set window (can be tumbling, hopping, sliding) without any aggregation into a javascript UDF like so:
Input data is:
{ "device":"A", "temp":20.0, "humidity":0.9, "param1": 83}
{ "device":"A", "temp":22.0, "humidity":0.9, "param1": 63}
{ "device":"B", "temp":15.0, "humidity":0.5, "param1": 13}
{ "device":"A", "temp":22.0, "humidity":0.5, "param1": 88}
{ "device":"A", "temp":22.0, "humidity":0.5, "param1": 88}
Pass records within a specified window as an object array:
function process_records(record_array) {
//access individual records
record_one_device = records[0].device
record_two_device = records[1].device
record_three_device = records[2].device
...
}
Thanks for any help!
According to your requirement, I assumed that you could leverage the Collect aggregate function from Azure Stream Analytics. Here is my test, you could refer to it:
Input
[
{
"Make": "Honda",
"Time": "2015-01-01T00:00:01.0000000Z",
"Weight": 1000
},
{
"Make": "Honda",
"Time": "2015-01-01T00:00:03.0000000Z",
"Weight": 3000
},
{
"Make": "Honda",
"Time": "2015-01-01T00:00:12.0000000Z",
"Weight": 2000
},
{
"Make": "Honda",
"Time": "2015-01-01T00:00:52.0000000Z",
"Weight": 1000
}
]
With the following query, I could retrieve the data contained in temporal windows as follows:
SELECT
Make,
System.TimeStamp AS Time,
Collect() AS records
FROM
Input TIMESTAMP BY Time
GROUP BY
Make,
HoppingWindow(second, 10,10)
Then, you could call UDF.processRecords(Collect()) in your query. For more details, you could refer to common Stream Analytics usage patterns and Azure Stream Analytics UDF and Stream Analytics Window functions.

how can I get google fit data's

Hi how can I get the activity from google fit data. I have almost done all aspects after getting the access token how can I get the google fit datas, by using the below code I can get only this kind of response
code:
gFit.listExistingSessions(req.query.token,function(status,data){
// console.log('Sessions',data);
res.render('results', { resp: data });
});
response:
"session": [
{
"id": "3116a82009dd6cd7:activemode:running:1456064572752",
"startTimeMillis": "1456064572752",
"endTimeMillis": "1456114372880",
"modifiedTimeMillis": "1456745578987",
"application": {
"packageName": "com.google.android.apps.fitness"
},
"activityType": 8
},
{
"id": "3116a82009dd6cd7:activemode:running:1456064572752",
"name": "Evening running",
"startTimeMillis": "1456064572752",
"endTimeMillis": "1456114370411",
"modifiedTimeMillis": "1456745578992",
"application": {
"packageName": "com.google.android.apps.fitness"
},
"activityType": 8
},
{
"id": "3116a82009dd6cd7:activemode:biking:1456742139081",
"startTimeMillis": "1456742139081",
"endTimeMillis": "1456742187907",
"modifiedTimeMillis": "1456745578998",
"application": {
"packageName": "com.google.android.apps.fitness"
},
"activityType": 1
}
]
Can any one tell me how to get the calories burned, steps count, miles and minutes of each of the sessions.
According to their documentation, sessions are only a mean of organizing workouts, but they do not provide specific workout data. For that, you have to query datasets that overlap with the time interval of the session.
For more information on how to query specific datasets, see Working with datasets.
This is the example request for querying datasets:
https://www.googleapis.com/fitness/v1/users/me/dataSources/derived:com.google.step_count.delta:1234567890:Example%20Manufacturer:ExampleTablet:1000001/datasets/1397513334728708316-1397515179728708316
There, you have to replace com.google.step_count.delta with whatever data type you require, and also replace the timestamps at the end of the query with the ones that match your session start and end times.

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources