How does data retrieval happen in KairosDb? - cassandra

I have data like below:-
data = [
{
"name": "test4",
"datapoints": [
[currentTimestamp, count]
],
"tags": {
"name" : "MyName",
"dept" : "Engineering",
"city" : "Delhi",
"state": "Delhi",
"country" : "India"
}
}
]
And I am sending data to KairosDB server by using python script like this -
response = requests.post("http://localhost:8080" + "/api/v1/datapoints", json.dumps(data))
I know this data will be stored in three different tables:-
1. string_index
2. row_keys
3. data_points
And my query is :-
{
"metrics": [
{
"tags": {},
"name": "test4",
"aggregators": [
{
"name": "sum",
"sampling": {
"value": "1",
"unit": "milliseconds"
}
}
]
}
],
"plugins": [],
"cache_time": 0,
"start_absolute": 1529346600000
}
Now I want to know that how data will get fetched from those three tables, I mean what will the flow of data retrieval from Cassandra.
Thanks In Advance.

Related

API Request within another API request (Same API) in Python

I currently have made a python program, request JSON data from an API. Now here is the thing though this JSON actually contains other request Urls to get extra data from that object.
import requests
import json
import sys
import os
import geojson
response = requests.get("http://api.gipod.vlaanderen.be/ws/v1/workassignment", params = {"CRS": "Lambert72"})
print(response.status_code)
text = json.dumps(response.json(),sort_keys=True, indent=4)
print(text)
f = open("text.json", "wt")
f.write(text)
print(os.getcwd())
JSON from request, the other request URLs including parameters is in the detail column.
[
{
"gipodId": 103246,
"owner": "Eandis Leuven",
"description": ", , ZAVELSTRAAT: E Nieuw distributienet (1214m)",
"startDateTime": "2007-12-03T06:00:00",
"endDateTime": "2014-01-06T19:00:00",
"importantHindrance": false,
"coordinate": {
"coordinates": [
4.697028256276443,
50.896894135898485
],
"type": "Point",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:OGC:1.3:CRS84"
}
}
},
**"detail": http://api.gipod.vlaanderen.be/ws/v1/workassignment/103246?crs=4326,
"cities": ["Leuven"]**
}
],
"latestUpdate": "2016-11-16T11:32:39.253"
}
The first request just gets the points (each unique with a certain id), while the second request gets the "details data" which also has polygon data and multiline.
Get Url:
http://api.gipod.vlaanderen.be/ws/v1/workassignment/[id]
{ "comment" : null,
"contactDetails" : { "city" : "Leuven",
"country" : "België",
"email" : null,
"extraAddressInfo" : null,
"firstName" : null,
"lastName" : null,
"number" : "58",
"organisation" : "Eandis Leuven",
"phoneNumber1" : "078/35.35.34",
"phoneNumber2" : null,
"postalCode" : "3012",
"street" : "Aarschotsesteenweg"
},
"contractor" : null,
"mainContractor" : null,
"description" : ", , ZAVELSTRAAT: E Nieuw distributienet (1214m)",
"diversions" : [
{
"gipodId": 1348152,
"reference": "IOW-TERRAS-2013-01-Z",
"description": "Horecaterras op parkeerstrook (Lierbaan 12)",
"comment": null,
"geometry": {
"geometries": [
{
"coordinates": [[[3.212947654779088, 51.175784679668915],
[3.2151308569159482, 51.17366647833133],
[3.216112818368467, 51.17328051591839],
[3.2186926906668876, 51.173044950954456],
[3.2204789191276944, 51.173098278776514],
[3.221602856602255, 51.173333934695286]]],
"type": "MultiLineString",
"crs": null
}
],
"type": "GeometryCollection",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:OGC:1.3:CRS84"
}
}
},
"periods": [{"startDateTime": "2013-04-09T00:00:00","endDateTime": "2013-10-31T00:00:00"}],
"recurrencePattern": null,
"latestUpdate": "2014-01-24T10:23:08.917",
"streets": null,
"diversionTypes": null,
"diversionDirection":
{
"type": 0,
"description": "Beide"
},
"status": "Vergund",
"contactDetails": {
"organisation": "Café Real",
"lastName": "Vets",
"firstName": "Peggy",
"phoneNumber1": null,
"phoneNumber2": null,
"email": "peggy.vets#skynet.be",
"street": "Lierbaan",
"number": "12",
"postalCode": "2580",
"city": "Putte",
"country": "België",
"extraAddressInfo": null
}
"url": null,
}
],
"endDateTime" : "2014-01-06T19:00:00",
"gipodId" : 103246,
"hindrance" : { "description" : null,
"direction" : null,
"effects" : [ "Fietsers hebben doorgang",
"Handelaars bereikbaar",
"Verminderde doorstroming in 1 richting",
"Voetgangers op de rijweg",
"Voetgangers hebben doorgang"
],
"important" : false,
"locations" : [ "Voetpad" ]
},
"latestUpdate" : "2013-06-18T03:43:28.17",
"location" : { "cities" : [ "Leuven" ],
"coordinate" : { "coordinates" : [ 4.697028256276443,
50.896894135898485
],
"crs" : { "properties" : { "name" : "urn:ogc:def:crs:OGC:1.3:CRS84" },
"type" : "name"
},
"type" : "Point"
},
"geometry" : { "coordinates" : [ [ [ [ 4.699934331336474,
50.90431808607037
],
[ 4.699948535632464,
50.90431829749237
],
[ 4.699938837004092,
50.90458139231922
],
[ 4.6999246328435396,
50.90458118062111
],
[ 4.699934331336474,
50.90431808607037
]
] ]
],
"crs" : { "properties" : { "name" : "urn:ogc:def:crs:OGC:1.3:CRS84" },
"type" : "name"
},
"type" : "MultiPolygon"
}
},
"owner" : "Eandis Leuven",
"reference" : "171577",
"startDateTime" : "2007-12-03T06:00:00",
"state" : "In uitvoering",
"type" : "Werken aan nutsleiding",
"url" : "http://www.eandis.be"
}
Now here is the deal, this request has to be repeated for each object I get from the First API Request. And this can be over one hundred objects. So logic dictates this has to happen in a loop, though how to start is bit..troublesome.
You can make you of functions in this case.
Your first function can simply fetch the list of the points. Your second function can simply fetch the data of details.
def fetch_details(url: str):
""" Makes request call to get the data of detail """
response = requests.get(url)
# any other processe
def fetch_points(url: str):
response = requests.get(url)
for obj in response.json():
fetch_details(obj.get("detail"))
api_url = "api.gipod.vlaanderen.be/ws/v1/workassignment"
fetch_points(api_url)

dynamo db fetching data from inner array

Here is my data in dynamo db :
a: {
"datetime": 'a',
"description": "Ford Car",
"offers":
[
{
"countries": [
{
"code": "ALL",
"name": "Global"
}
],
[
{
"code": "As",
"name": "Private"
}
]
}
]
}
I want to fetch all the offers that contains "code": "ALL"
I tried several things and gone through several solutions in stackoverflow .
Code i tried :
FilterExpression = "contains(offers.countries, :code)";
ExpressionAttributeValues = { ":code": {
"countries":
{
"code": "ALL",
"name": "Global"
}
}
This is returning an empty array.

MongoDB create product summary collection

Say I have a product collection like this:
{
"_id": "5a74784a8145fa1368905373",
"name": "This is my first product",
"description": "This is the description of my first product",
"category": "34/73/80",
"condition": "New",
"images": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstproduct_image1.jpg"
},
...
],
"attributes": [
{
"name": "Material",
"value": "Synthetic"
},
...
],
"variation": {
"attributes": [
{
"name": "Color",
"values": ["Black", "White"]
},
{
"name": "Size",
"values": ["S", "M", "L"]
}
]
}
}
and a variation collection like this:
{
"_id": "5a748766f5eef50e10bc98a8",
"name": "color:black,size:s",
"productID": "5a74784a8145fa1368905373",
"condition": "New",
"price": 1000,
"sale": null,
"image": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstvariation_image1.jpg"
}
],
"attributes": [
{
"name": "Color",
"value": "Black"
},
{
"name": "Size",
"value": "S"
}
]
}
I want to keep the documents separate and for the purpose of easy browsing, searching and faceted search implementation, I want to fetch all the data in a single query but I don't want to do join in my application code.
I know it's achievable using a third collection called summary that might look like this:
{
"_id": "5a74875fa1368905373",
"name": "This is my first product",
"category": "34/73/80",
"condition": "New",
"price": 1000,
"sale": null,
"description": "This is the description of my first product",
"images": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstproduct_image1.jpg"
},
...
],
"attributes": [
{
"name": "Material",
"value": "Synthetic"
},
...
],
"variations": [
{
"condition": "New",
"price": 1000,
"sale": null,
"image": [
{
"length": 1000,
"width": 1000,
"src": "products/images/firstvariation_image.jpg"
}
],
"attributes": [
"color=black",
"size=s"
]
},
...
]
}
problem is, I don't know how to keep the summary collection in sync with the product and variation collection. I know it can be done using mongo-connector but i'm not sure how to implement it.
please help me, I'm still a beginner programmer.
you don't actually need to maintain a summary collection, its redundant to store product and variation summary in another collection
instead of you can use an aggregate pipeline $lookup to outer join product and variation using productID
aggregate pipeline
db.products.aggregate(
[
{
$lookup : {
from : "variation",
localField : "_id",
foreignField : "productID",
as : "variations"
}
}
]
).pretty()

how to create a query with geolocation in mongoose (use for search by near places)

I'm quite new in mongodb and mongoose. I don't know if my query is working but when I add some geojson to my code it returns null.
My only target is can filter my data using state,country and state and also search nearby places. It would be really great help if someone can help me. Thanks
var query = {
$and : [
{city : new RegExp('^'+req.body.city+'$', "i") },
{state : req.body.state},
{country : req.body.country},
{
loc : {
$nearSphere : {
$geometry : {
type : "Point",
coordinates : [-117.16108380000003,32.715738]
},
$maxDistance : 100
}
}
}
Business.find(query).populate('deal_id').sort({business_type : -1,deal_id : -1})
.exec(function(err,businesses){
res.json(businesses)
return
})
I don't know if im doing it right, here's my sample data:
[
{
"_id": "5a0b1f489929442c36fd5c83",
"business_row": 29160,
"created_at": "2017-11-14T16:52:10.130Z",
"owner_name": "David Lui",
"company_website": "",
"phone_number": "604-273-3288",
"contact_name": "David Lui",
"zip_postal": "V6X 3Z9",
"state": "British Columbia",
"country": "Canada",
"city": "Richmond",
"address": "3779 Sexsmith Rd # 2172 Richmond British Columbia",
"company_name": "Aem Seafood",
"__v": 1,
"slug": "Aem-Seafood&Richmond",
"loc": {
"coordinates": [
"-123.129488",
"49.185359"
],
"type": "Point"
},
"deal_id": [],
"is_favorite": false,
"is_draft": false,
"has_featured": false,
"owner_id": [
"5a0adcf9f7205f0004535def"
],
"files": [],
"operations": [],
"sub_category": [],
"category_options": [
{
"value": "5a0b186b9f3a4a2710075654",
"sub_cat": {
"value": "59f6d13d00086a6e645c50a4",
"label": "Meat And Fish Markets"
}
}
],
"category_id": [
"5a0b186b9f3a4a2710075654"
],
"business_type_name": "Free",
"business_type": "0",
"user_id": [
"5a0adcf9f7205f0004535def"
]
}
]
turns out i don't need to query the city,state and country for it and use $geoWithIn

Speeding up Cloudant query for type text index

We have a table with this type of structure:
{_id:15_0, createdAt: 1/1/1, task_id:[16_0, 17_0, 18_0], table:”details”, a:b, c: d, more}
We created indexes using
{
"index": {},
"name": "paginationQueryIndex",
"type": "text"
}
It auto created
{
"ddoc": "_design/28e8db44a5a0862xxx",
"name": "paginationQueryIndex",
"type": "text",
"def": {
"default_analyzer": "keyword",
"default_field": {
},
"selector": {
},
"fields": [
],
"index_array_lengths": true
}
}
We are using the following query
{
"selector": {
"createdAt": { "$gt": 0 },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit”: 20
}
It takes 700-800 ms for first time, after that it decreases to 500-600 ms
Why does it take longer the first time?
Any way to speed up the query?
Any way to add indexes to specific fields if type is “text”? (instead of indexing all the fields in these records)
You could try creating the index more explicitly, defining the type of each field you wish to index e.g.:
{
"index": {
"fields": [
{
"name": "createdAt",
"type": "string"
},
{
"name": "task_id",
"type": "string"
},
{
"name": "table",
"type": "string"
}
]
},
"name": "myindex",
"type": "text"
}
Then your query becomes:
{
"selector": {
"createdAt": { "$gt": "1970/01/01" },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit": 20
}
Notice that I used strings where the data type is a string.
If you're interested in performance, try removing clauses from your query one at-a-time to see if one is causing the performance problem. You can also look at the explanation of your query to see if it using your index correctly.
Documentation on creating an explicit text query index is here

Resources