Cassandra indexes vs materialized view - cassandra

I have next Cassandra table structure:
CREATE TABLE ringostat.hits (
hitId uuid,
clientId VARCHAR,
session MAP<VARCHAR, TEXT>,
traffic MAP<VARCHAR, TEXT>,
PRIMARY KEY (hitId, clientId)
);
INSERT INTO ringostat.hits (hitId, clientId, session, traffic)
VALUES('550e8400-e29b-41d4-a716-446655440000'. 'clientId', {'id': '1', 'number': '1', 'startTime': '1460023732', 'endTime': '1460023762'}, {'referralPath': '/example_path_for_example', 'campaign': '(not set)', 'source': 'www.google.com', 'medium': 'referal', 'keyword': '(not set)', 'adContent': '(not set)', 'campaignId': '', 'gclid': '', 'yclid': ''});
INSERT INTO ringostat.hits (hitId, clientId, session, traffic)
VALUES('650e8400-e29b-41d4-a716-446655440000'. 'clientId', {'id': '1', 'number': '1', 'startTime': '1460023732', 'endTime': '1460023762'}, {'referralPath': '/example_path_for_example', 'campaign': '(not set)', 'source': 'www.google.com', 'medium': 'cpc', 'keyword': '(not set)', 'adContent': '(not set)', 'campaignId': '', 'gclid': '', 'yclid': ''});
INSERT INTO ringostat.hits (hitId, clientId, session, traffic)
VALUES('750e8400-e29b-41d4-a716-446655440000'. 'clientId', {'id': '1', 'number': '1', 'startTime': '1460023732', 'endTime': '1460023762'}, {'referralPath': '/example_path_for_example', 'campaign': '(not set)', 'source': 'www.google.com', 'medium': 'referal', 'keyword': '(not set)', 'adContent': '(not set)', 'campaignId': '', 'gclid': '', 'yclid': ''});
I want to select all rows where source='www.google.com' AND medium='referal'.
SELECT * FROM hits WHERE traffic['source'] = 'www.google.com' AND traffic['medium'] = 'referal' ALLOW FILTERING;
Without add ALLOW FILTERING I got error: No supported secondary index found for the non primary key columns restrictions.
That's why I see two options:
Create index on traffic column.
Create materialized view.
Create another table and set INDEX for traffic column.
Which is the best option ? Also, I have many fields with MAP type on which I will need to filter. What issues can be if on every field I will add INDEX ?
Thank You.

From When to use an index.
Do not use an index in these situations:
On high-cardinality columns because you then query a huge volume of records for a small number of results. [...] Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense.
In tables that use a counter column
On a frequently updated or deleted column.
To look for a row in a large partition unless narrowly queried.
If your planned usage meets one or more of these criteria, it is probably better to use a materialized view.

Related

Unable to get correct delivery fee in Foodpanda via scrapy

I am getting delivery fee but for few restaurants, its slightly different than the one on website, say its $2.99 on website but spider is getting $3.09. I have checked latitude and longitude but its same in the url and spider. I have changed multiple values in headers and it changed the delivery fee for few restaurants but not for all. This is the sample url: https://www.foodpanda.sg/restaurants/new?lat=1.43035&lng=103.83684&vertical=restaurants and below is the code snippet:
api_url = 'https://disco.deliveryhero.io/listing/api/v1/pandora/vendors'
form_data = {
'country': 'sg',
'latitude': str(lat),
'longitude': str(lng),
'sort': '',
'language_id': str(self.language_id),
'vertical': 'restaurants',
'include': 'characteristics',
'dynamic_pricing': '0',
'configuration': 'Variant1',
'use_free_delivery_label': 'true',
'limit': '50',
'offset': '0'
}
headers = {
'X-FP-API-KEY': 'volo',
'dps-session-id': dps_session_id,
'x-disco-client-id': 'web',
}
return scrapy.FormRequest(
api_url,
method='GET',
headers=headers,
formdata=form_data,
meta={
'page': page,
'lat': lat,
'lng': lng,
'item_meta': item_meta,
'start_position': start_position
},
callback=self.parse_search
)
Any clue or insight is appreciated.

Unique nested dictionary from the 'for' loop Python3

I've got the host which executes commands via 'subprocess' and gets output list of several parameters. The problem is that output can not be correctly modified to be translated to the dictionary, whether it's yaml or json. After list is received Regexp takes part to match valuable information and to perform grouping. I am interested in getting a unique dictionary, where crossing keys are put into nested dictionary.
Here is the code and the example of output list:
from re import compile,match
# Output can differ from request to request, the "keys" from the #
# list_of_values can dublicate or appear more than two times. The values #
# mapped to the keys can differ too. #
list_of_values = [
"paramId: '11'", "valueId*: '11'",
"elementId: '010_541'", 'mappingType: Both',
"startRng: ''", "finishRng: ''",
'DbType: sql', "activeSt: 'false'",
'profile: TestPr1', "specificHost: ''",
'hostGroup: tstGroup10', 'balance: all',
"paramId: '194'", "valueId*: '194'",
"elementId: '010_541'", 'mappingType: Both',
"startRng: '1020304050'", "finishRng: '1020304050'",
'DbType: sql', "activeSt: 'true'",
'profile: TestPr1', "specificHost: ''",
'hostGroup: tstGroup10', 'balance: all']
re_compile_valueId = compile(
"valueId\*:\s.(?P<valueId>\d{1,5})"
"|elementId:\s.(?P<elementId>\d{3}_\d{2,3})"
"|startRng:\s.(?P<startRng>\d{1,10})"
"|finishRng:\s.(?P<finishRng>\d{1,10})"
"|DbType:\s(?P<DbType>nosql|sql)"
"|activeSt:\s.(?P<activeSt>true|false)"
"|profile:\s(?P<profile>[A-z0-9]+)"
"|hostGroup:\s(?P<hostGroup>[A-z0-9]+)"
"|balance:\s(?P<balance>none|all|priority group)"
)
iterator_loop = 0
uniq_dict = dict()
next_dict = dict()
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'valueId':
uniq_dict['valueId'+str(iterator_loop)] = ''
iterator_loop +=1
next_dict.update({key: value})
else:
next_dict.update({key: value})
uniq_dict['valueId'+str(iterator_loop-1)] = next_dict
print(uniq_dict)
This code right here responses with:
{
'valueId0':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
},
'valueId1':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
}
}
And I was waiting for something like:
{
'valueId0':
{
'valueId': '11',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'false',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '',
'finishRng': ''
},
'valueId1':
{
'valueId': '194',
'elementId': '010_541',
'DbType': 'sql',
'activeSt': 'true',
'profile': 'TestPr1',
'hostGroup': 'tstGroup10',
'balance': 'all',
'startRng': '1020304050',
'finishRng': '1020304050'
}
}
I've also got another code below, which runs and puts values as expected. But the structure breaks the idea of having this all looped around, because each dictionary result key has its own order number mapped. The example below. The list_of_values and re_compile_valueId can be used from previous example.
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'balance':
key = key+str(iterator_loop)
uniq_dict.update({key: value})
iterator_loop +=1
else:
key = key+str(iterator_loop)
uniq_dict.update({key: value})
print(uniq_dict)
The output will look like:
{
'valueId1': '11', 'elementId1': '010_541',
'DbType1': 'sql', 'activeSt1': 'false',
'profile1': 'TestPr1', 'hostGroup1': 'tstGroup10',
'balance1': 'all', 'valueId2': '194',
'elementId2': '010_541', 'startRng2': '1020304050',
'finishRng2': '1020304050', 'DbType2': 'sql',
'activeSt2': 'true', 'profile2': 'TestPr1',
'hostGroup2': 'tstGroup10', 'balance2': 'all'
}
Would appreciate any help! Thanks!
It appeared that some documentation reading needs to be performed :D
The copy() of the next_dict under else statement needs to be applied. Thanks to thread:
Why does updating one dictionary object affect other?
Many thanks to answer's author #thefourtheye (https://stackoverflow.com/users/1903116/thefourtheye)
The final code:
for element in list_of_values:
match_result = match(re_compile_valueId,element)
if match_result:
temp_dict = match_result.groupdict()
for key, value in temp_dict.items():
if value:
if key == 'valueId':
uniq_dict['valueId'+str(iterator_loop)] = ''
iterator_loop +=1
next_dict.update({key: value})
else:
next_dict.update({key: value})
uniq_dict['valueId'+str(iterator_loop-1)] = next_dict.copy()
Thanks for the involvement to everyone.

Pulling item properties from Microsoft Sharepoint document library with Microsoft Graph API

I'm able to successfully pull file metadata from my SharePoint library with the Microsoft Graph API, but am having trouble pulling the properties of an item:
I can get a partial list of properties using this endpoint:
https://graph.microsoft.com/v1.0/sites/{site-id}/drives/{}/items/{}/children?$expand=listItem($expand=fields)
But the list that comes from this endpoint doesn't match the list of properties that exists on the item.
For example, below is a list of fields that come from that endpoint - you can see that '.Push Too Salsify.' (one of the fields I need) is not present. There are also other fields that exist but don't appear in the item properties:
{'ParentLeafNameLookupId': '466', 'CLIPPING_x0020_STATUS': 'Not Started', 'Edit': '0', 'EditorLookupId': '67', '_ComplianceTagWrittenTime': '', 'RequiredField': 'teams/WORKFLOWDEMO/Shared Documents/1062CQP6.Phase4/1062CQP-Phase4-Size.tif', 'PM_x0020_SIGN_x0020_OFF': 'No', 'QA_x0020_APPROVED': 'No', 'ImageWidth': 3648, 'PM_x0020_Approval_x0020_Status': '-', 'AuthorLookupId': '6', 'SelectedFlag': '0', 'NameOrTitle': '1062CQP-Phase4-Size.tif', 'ItemChildCount': '0', 'FolderChildCount': '0', 'LinkFilename': '1062CQP-Phase4-Size.tif', 'ParentVersionStringLookupId': '466', 'PHOTOSTATUS': 'Not Started', '#odata.etag': '"c4b7516e-64df-46d2-b916-a1ee6f29d24a,8"', 'Thumbnail': '3648', '_x002e_Approval_x0020_Status_x002e_': 'Approved', 'Date_x0020_Created': '2019-10-09T04:25:40Z', '_CommentCount': '', 'Created': '2019-10-09T04:25:33Z', 'PreviewOnForm': '0', '_ComplianceTag': '', 'FileLeafRef': '1062CQP-Phase4-Size.tif', 'ImageHeight': 3648, 'LinkFilenameNoMenu': '1062CQP-Phase4-Size.tif', '_ComplianceFlags': '', 'ContentType': 'Document', 'Preview': '3648', 'ImageSize': '3648', 'Product_x0020_Category': 'Baseball', 'DATE_x0020_ASSIGNED': '2019-10-09T04:25:40Z', 'DateCreated': '2019-10-09T04:25:40Z', 'WORKFLOW_x0020_SELECTION': ['Select'], 'Predecessors': [], 'FileType': 'tif', 'LEGAL_x0020_APPROVED': 'No', 'PUSH_x0020_READY': False, 'FileSizeDisplay': '74966432', 'id': '466', '_LikeCount': '', '_ComplianceTagUserId': '', 'Modified': '2019-10-09T14:41:25Z', 'DocIcon': 'tif', '_UIVersionString': '0.7', '_CheckinComment': ''}
Any help would be greatly appreciated. I've scoured the documentation and can't seem to find the correct endpoint to pull item properties from a Sharepoint DriveItem.

Python nested json to csv

I can't convert this Json to csv. I have been trying with different solutions posted here using panda or other parser but non solved this.
This is a small extract of the big json
{'data': {'items': [{'category': 'cat',
'coupon_code': 'cupon 1',
'coupon_name': '$829.99/€705.79 ',
'coupon_url': 'link3',
'end_time': '2017-12-31 00:00:00',
'language': 'sp',
'start_time': '2017-12-19 00:00:00'},
{'category': 'LED ',
'coupon_code': 'code',
'coupon_name': 'text',
'coupon_url': 'link',
'end_time': '2018-01-31 00:00:00',
'language': 'sp',
'start_time': '2017-10-07 00:00:00'}],
'total_pages': 1,
'total_results': 137},
'error_no': 0,
'msg': '',
'request': 'GET api/ #2017-12-26 04:50:02'}
I'd like to get an output like this with the columns:
category, coupon_code, coupon_name, coupon_url, end_time, language, start_time
I'm running python 3.6 with no restrictions.

Cassandra query getting error

Friends I am getting this error when executing the following query : syntax error near unexpected token `('
Help me...
INSERT INTO table_name(account_sid, transcription_id, audio_url, call_sid, callback_method, callback_url, cost, duration, recording_sid, status, transcription_change_date, transcription_date, transcription_sid, transcription_text, type) VALUES ('534534534534535', now(), '', '', '', '','', '', '', '', '', '', '', '', '');
Type is a keyword in CQL, hence it is failing.
You can use double quotes to escape.
INSERT INTO table_name(account_sid, transcription_id, audio_url, call_sid, callback_method, callback_url, cost, duration, recording_sid, status, transcription_change_date, transcription_date, transcription_sid, transcription_text, "type") VALUES ('534534534534535', now(), '', '', '', '','', '', '', '', '', '', '', '', '');
List of keywords in CQL

Resources