Updated data still persist in CQL table - cassandra

I created a table with SET as a column using CQL .
CREATE TABLE z_test.activity_follow (
activity_type_id text,
error_message text,
error_type text,
file text,
line_no text,
project_api_key text,
project_name text,
project_party_id text,
release_stage_id text,
stage_name text,
project_type_name text,
activity_type_name text,
account_id text,
created_at text,
secure_url text,
error_count text,
user_id set<text>,
PRIMARY KEY (activity_type_id,error_message,error_type,file,line_no,project_api_key,project_name,project_party_id,release_stage_id,stage_name,project_type_name,activity_type_name,account_id,created_at,secure_url)
);
Where z_test is my keyspace.
Then i inserted one value into the table using following query,
UPDATE z_test.activity_follow SET user_id = user_id + {'46'} , error_count = '4'
WHERE activity_type_id = '1'
AND error_message = '1'
AND error_type = '1'
AND FILE = '1'
AND line_no = '1'
AND project_api_key = '1'
AND project_name = '1'
AND project_party_id = '1'
AND release_stage_id = '1'
AND stage_name = '1'
AND project_type_name = '1'
AND activity_type_name = '1'
AND account_id = '1'
AND secure_url = '1'
AND created_at = '1'
UPDATE z_test.activity_follow SET user_id = user_id + {'464'} , error_count = '4'
WHERE activity_type_id = '1'
AND error_message = '1'
AND error_type = '1'
AND FILE = '1'
AND line_no = '1'
AND project_api_key = '1'
AND project_name = '1'
AND project_party_id = '1'
AND release_stage_id = '1'
AND stage_name = '1'
AND project_type_name = '1'
AND activity_type_name = '1'
AND account_id = '1'
AND secure_url = '1'
AND created_at = '1'
The values is inserted successfully. And i used following select statement,
SELECT * FROM z_test.users WHERE emails CONTAINS 'test#mail.com';
And i got the following result,
activity_type_id | error_message | error_type | file | line_no | project_api_key | project_name | project_party_id | release_stage_id | stage_name | project_type_name | activity_type_name | account_id | created_at | secure_url | error_count | user_id
------------------+------------------------------------+----------------+--------------------------------------------------------------------+---------+--------------------------------------+--------------------------+------------------+------------------+-------------+-------------------+--------------------+------------+---------------------+-------------------------------------------------+-------------+---------
1 | alebvevcbvghhgrt123 is not defined | ReferenceError | http://localhost/ems-sdk/netspective_ems_js/example/automatic.html | 19 | 8aec5ce3-e924-3090-9bfe-57a440feba5f | Prescribewell-citrus-123 | 48 | 4 | Development | Php | exception | 47 | 2015-03-03 04:04:23 | PRE-EX-429c3daae9c108dffec32f113b9ca9cff1bb0468 | 1 | {'464'}
Then i removed one email from the table using,
UPDATE z_test.activity_follow SET user_id = user_id - {'46'} , error_count = '4'
WHERE activity_type_id = '1'
AND error_message = '1'
AND error_type = '1'
AND FILE = '1'
AND line_no = '1'
AND project_api_key = '1'
AND project_name = '1'
AND project_party_id = '1'
AND release_stage_id = '1'
AND stage_name = '1'
AND project_type_name = '1'
AND activity_type_name = '1'
AND account_id = '1'
AND secure_url = '1'
AND created_at = '1'
Now when i am using the above query ,
SELECT * FROM z_test.activity_follow WHERE user_id CONTAINS '46';
And it still returns the row,
activity_type_id | error_message | error_type | file | line_no | project_api_key | project_name | project_party_id | release_stage_id | stage_name | project_type_name | activity_type_name | account_id | created_at | secure_url | error_count | user_id
------------------+------------------------------------+----------------+--------------------------------------------------------------------+---------+--------------------------------------+--------------------------+------------------+------------------+-------------+-------------------+--------------------+------------+---------------------+-------------------------------------------------+-------------+---------
1 | alebvevcbvghhgrt123 is not defined | ReferenceError | http://localhost/ems-sdk/netspective_ems_js/example/automatic.html | 19 | 8aec5ce3-e924-3090-9bfe-57a440feba5f | Prescribewell-citrus-123 | 48 | 4 | Development | Php | exception | 47 | 2015-03-03 04:04:23 | PRE-EX-429c3daae9c108dffec32f113b9ca9cff1bb0468 | 1 | {'464'}
Why i am getting this behavior? is it expected in CQL? If i can remove this how? I have given every value as 1 for test, i tried it with other values also.

What client are you using to perform your CQL statements? Is this all done in cqlsh or something else?
This is just a shot in a dark guess, but if you run two CQL statements matching the same primary key quickly after one another, it's possible that they are given the same writetime in cassandra which means one of the mutations will be ignored.
See: Cassandra: Writes after setting a column to null are lost randomly. Is this a bug, or I am doing something wrong?
If you are running Cassandra 2.1.2+ cassandra will now break ties if there are writes/upates at the same millisecond (CASSANDRA-6123)

Related

trying to insert data in table in postgreSQL. Everything is working fine with no error but there is no row visible in actually table

I am trying to insert data into a table. It gives (Failed to insert record unsupported format character 'd' (0x64) ) error and no row is getting inserted into table
import psycopg2
try:
i=0
tempcol0="a"
tempcol1="b"
tempcol2="c"
tempcol3="d"
tempcol4="e"
tempcol5="f"
tempcol6="g"
tempcol7="h"
tempcol8="i"
tempcol9="j"
tempcol10="k"
tempcol11="l"
conn = psycopg2.connect(connection details)
cur = conn.cursor()
postgres_insert_query = '''INSERT INTO desktop1(id, column0, column1, column2, column3, column4,
column5, column6, column7, column8, column9, column10, column11) VALUES(%d,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);'''
record_to_insert = (i, tempcol0, tempcol1, tempcol2, tempcol3, tempcol4, tempcol5, tempcol6, tempcol7, tempcol8, tempcol9, tempcol10, tempcol11)
cur.execute(postgres_insert_query, record_to_insert)
conn.commit()
count = cur.rowcount
print(count, " row inserted into table");
except(Exception, psycopg2.Error) as error:
if (conn):
print("Failed to insert record ", error)
finally:
if(conn):
cur.close()
conn.close()
print("PostgreSQL connection is closed")
This is a string interpolation issue. You are attempting to pass a numeric value as part of a String. The '%d' as actually treated as a string. Change it to '%s' and it will work.
import psycopg2
i=0
tempcol0="a"
tempcol1="b"
tempcol2="c"
tempcol3="c"
tempcol4="e"
tempcol5="f"
tempcol6="g"
tempcol7="h"
tempcol8="i"
tempcol9="j"
tempcol10="k"
tempcol11="l"
conn = psycopg2.connect('dbname=postgres user=postgres')
cur = conn.cursor()
postgres_insert_query = '''INSERT INTO test(id, tempcol0, tempcol1,tempcol2, tempcol3, tempcol4,tempcol5, tempcol6, tempcol7, tempcol8, tempcol9,tempcol10,tempcol11) VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'''
record_to_insert = (i, tempcol0, tempcol1, tempcol2, tempcol3, tempcol4, tempcol5, tempcol6, tempcol7, tempcol8, tempcol9, tempcol10, tempcol11)
cur.execute(postgres_insert_query, record_to_insert)
conn.commit()
-bash-4.2$ psql
psql (12.3)
Type "help" for help.
postgres=# select * from test ;
id | tempcol0 | tempcol1 | tempcol2 | tempcol3 | tempcol4 | tempcol5 | tempcol6 | tempcol7 | tempcol8 | tempcol9 | tempcol10 | tempcol11
----+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+-----------+-----------
0 | a | b | c | c | e | f | g | h | i | j | k | l
(1 row)
postgres=#

Azure Application Insight Query to get success rate

Have a question about how could I show success rate on Azure Dashboard.
If I have single temeletry event that indicates success or failure - it's quite simple:
customEvents
| where name == "ResponseEvent" and customDimensions.Condition == "test"
| summarize count() by tostring(customDimensions.State) //State could be Success|Failure
| render piechart
But in my case - I have 2 events: RequestEvent, SuccessResponseEvent and from those two I want to get success rate, something like: successRate = 100*successCount/requestCount.
I end up with this join:
customEvents
| where name == "RequestEvent" and customDimensions.Condition == "test"
| summarize requestCount = count()
| extend joinField = "1"
| join ( customEvents
| where name == "SuccessResponseEvent" and customDimensions.Condition == "test"
| summarize successCount = count()
| extend joinField = "1")
on joinField
| extend successRate = (100 * successCount / requestCount)
//////| extend failureRate = 100 - successRate
| project successRate
| render table
I got the value I need, but I only manage to display it as table, while I need a piechart.
I thought about adding union:
let success = view () { print x=toint(80) };
let failure = view () { print x=toint(20) };
union withsource=TableName success, failure
| render piechart
But I don't see how to do this in my request.
Or create variables using let statement and try to calculate everything and join using materialize(createRequestRecieved), but it causes quite a lot of errors and I hope some simple way exists.
Question is: maybe somebody could point me to how I could achieve this: calculate one value, maybe display it as two values (success and 100-success) and arrange them in format valid for "render piechart" operator?
And second question, not so important: could I join them by some existing field? Whey I'm trying to use joinField = tostring(customDimensions.MappingField) I'm getting an error: Ensure that expression: customDimensions.MappingField is indeed a simple name
If you are going for a piechart, it would require a string legend field and a value on each row for that legens, so union of two results should work:
requests
| summarize Success = sumif(itemCount, success == true)
| project Legend = "Success", Value = Success
| union
(requests
| summarize Failed = sumif(itemCount, success == false)
| project Legend = "Failed", Value = Failed )
| render piechart
Going for a barchart would allow to use both summarize clauses in one query without join/union and may speed up performance:
requests
| summarize Success = sumif(itemCount, success == true), Failed = sumif(itemCount, success == false)
| project Legend = "Status", Success, Failed
| render barchart
Similarly, to calculate the rate in the same query:
requests
| summarize Success = sumif(itemCount, success == true), Failed = sumif(itemCount, success == false)
| extend SuccessRate = Success * 1.0 / (Success + Failed)
I'm quite sure it's not the best option an I'm mising something in this query language capabilities, but I could put my request in variable, apply some caching and repeat it twice, I suppose:
let dataSource = customEvents
| where name == "RequestEvent" and customDimensions.Condition == "test"
| summarize requestCount = count()
| extend joinField = "1"
| join ( customEvents
| where name == "SuccessResponseEvent" and customDimensions.Condition == "test"
| summarize successCount = count()
| extend joinField = "1")
on joinField
| extend successRate = (100 * successCount / requestCount)
| extend failureRate = 100 - successRate;
let cacheddataSource = materialize(dataSource);
cacheddataSource
| project Legend = "Success", Value = successRate
| union (
dataSource
|project Legend = "Failure", Value = failureRate
)
| render piechart
So, let and materialize more or less helps, maybe some tweaks will be necessary to display actual amount of successes and failures.

How to count distinct element over multiple columns and a rolling window in PySpark [duplicate]

This question already has answers here:
pyspark: count distinct over a window
(2 answers)
Closed 1 year ago.
Let's imagine we have the following dataframe :
port | flag | timestamp
---------------------------------------
20 | S | 2009-04-24T17:13:14+00:00
30 | R | 2009-04-24T17:14:14+00:00
32 | S | 2009-04-24T17:15:14+00:00
21 | R | 2009-04-24T17:16:14+00:00
54 | R | 2009-04-24T17:17:14+00:00
24 | R | 2009-04-24T17:18:14+00:00
I would like to calculate the number of distinct port, flag over the 3 hours in Pyspark.
The result will be something like :
port | flag | timestamp | distinct_port_flag_overs_3h
---------------------------------------
20 | S | 2009-04-24T17:13:14+00:00 | 1
30 | R | 2009-04-24T17:14:14+00:00 | 1
32 | S | 2009-04-24T17:15:14+00:00 | 2
21 | R | 2009-04-24T17:16:14+00:00 | 2
54 | R | 2009-04-24T17:17:14+00:00 | 2
24 | R | 2009-04-24T17:18:14+00:00 | 3
The SQL request looks like :
SELECT
COUNT(DISTINCT port) OVER my_window AS distinct_port_flag_overs_3h
FROM my_table
WINDOW my_window AS (
PARTITION BY flag
ORDER BY CAST(timestamp AS timestamp)
RANGE BETWEEN INTERVAL 3 HOUR PRECEDING AND CURRENT
)
I found this topic that solves the problem but only if we want to count distinct elements over one field.
Do someone has any idea of how to achieve that in :
python 3.7
pyspark 2.4.4
Just collect set of structs (port, flag) and get its size. Something like this:
w = Window.partitionBy("flag").orderBy("timestamp").rangeBetween(-10800, Window.currentRow)
df.withColumn("timestamp", to_timestamp("timestamp").cast("long"))\
.withColumn("distinct_port_flag_overs_3h", size(collect_set(struct("port", "flag")).over(w)))\
.orderBy(col("timestamp"))\
.show()
I've just code something like that that works to :
def hive_time(time:str)->int:
"""
Convert string time to number of seconds
time : str : must be in the following format, numberType
For exemple 1hour, 4day, 3month
"""
match = re.match(r"([0-9]+)([a-z]+)", time, re.I)
if match:
items = match.groups()
nb, kind = items[0], items[1]
try :
nb = int(nb)
except ValueError as e:
print(e, traceback.format_exc())
print("The format of {} which is your time aggregaation is not recognize. Please read the doc".format(time))
if kind == "second":
return nb
if kind == "minute":
return 60*nb
if kind == "hour":
return 3600*nb
if kind == "day":
return 24*3600*nb
assert False, "The format of {} which is your time aggregaation is not recognize. \
Please read the doc".format(time)
# Rolling window in spark
def distinct_count_over(data, window_size:str, out_column:str, *input_columns, time_column:str='timestamp'):
"""
data : pyspark dataframe
window_size : Size of the rolling window, check the doc for format information
out_column : name of the column where you want to stock the results
input_columns : the columns where you want to count distinct
time_column : the name of the columns where the timefield is stocked (must be in ISO8601)
return : a new dataframe whith the stocked result
"""
concatenated_columns = F.concat(*input_columns)
w = (Window.orderBy(F.col("timestampGMT").cast('long')).rangeBetween(-hive_time(window_size), 0))
return data \
.withColumn('timestampGMT', data.timestampGMT.cast(time_column)) \
.withColumn(out_column, F.size(F.collect_set(concatenated_columns).over(w)))
Works well, didn't check yet for performance monitoring.

Azure analytics kusto queries : how to group by 2 conditions?

I am using Azure analytics for a mobile app. I have custom events for main app pages - that I can find inside the customEvents table.
I am very new to kusto, so using the samples I found the following query:
let start = startofday(ago(28d));
let events = union customEvents, pageViews
| where timestamp >= start
| where name in ('*') or '*' in ('*') or ('%' in ('*') and itemType == 'pageView') or ('#' in ('*')
and itemType == 'customEvent')
| extend Dim1 = tostring(name);
let overall = events | summarize Users = dcount(user_Id);
let allUsers = toscalar(overall);
events
| summarize Users = dcount(user_Id), Sessions = dcount(session_Id), Instances = count() by Dim1
| extend DisplayDim = strcat(' ', Dim1)
| order by Users desc
| project Dim1, DisplayDim, Users, Sessions, Instances
| project ['Activities'] = DisplayDim, Values = Dim1, ['Active Users'] = Users, ['Unique Sessions'] = Sessions, ['Total Instances'] = Instances
the query is working well, but I want to have all the page events grouped by client_CountryOrRegion
Is there any way I can do this split by client_CountryOrRegion?
Not sure if this is what you are looking for but if you want to have the result split by client_CountryOrRegion, you can just summarize by that column as well as:
let start = startofday(ago(28d));
let events = union customEvents, pageViews
| where timestamp >= start
| where name in ('*') or '*' in ('*') or ('%' in ('*') and itemType == 'pageView') or ('#' in ('*')
and itemType == 'customEvent')
| extend Dim1 = tostring(name);
let overall = events | summarize Users = dcount(user_Id);
let allUsers = toscalar(overall);
events
| summarize Users = dcount(user_Id), Sessions = dcount(session_Id), Instances = count() by Dim1, client_CountryOrRegion
| extend DisplayDim = strcat(' ', Dim1)
| order by Users desc
| project Dim1, DisplayDim, Users, Sessions, Instances
| project ['Activities'] = DisplayDim, Values = Dim1, ['Active Users'] = Users, ['Unique Sessions'] = Sessions, ['Total Instances'] = Instances, client_CountryOrRegion
The change is here:
summarize Users = dcount(user_Id), Sessions = dcount(session_Id), Instances = count() by Dim1 , client_CountryOrRegion

Django have defined a model but the model's some field doesn't work

I am using Django 2.2, MySQL 8.0. When I define a model, I create some class properties and their fields. After saving the operation, PyCharm did not report an error.
from django.db import models
from django_mysql.models import JSONField
class Activity(models.Model):
sponsor = models.IntegerField
certificateOrNot = models.BooleanField
sponsorWay = models.SmallIntegerField
activityName = models.CharField(max_length=60)
activityPhoto = models.URLField(max_length=255)
prizeInfo = JsonField
activityDetails = models.IntegerField
startTime = models.DateField
endTime = models.DateField
conditionType = models.SmallIntegerField
conditionInfo = models.SmallIntegerField
sponsorPhoneNumber = models.CharField(max_length=20)
sponsorNickName = models.CharField(max_length=40)
sponsorWechatNumber = models.CharField(max_length=255)
participantAttention = models.BooleanField
shareJurisdiction = models.BooleanField
allowQuitOrNot = models.BooleanField
inviateFriends = models.BooleanField
inputCommandOrNot = models.BooleanField
participateWay = models.BooleanField
winnerList = models.BooleanField
participantDrawNumber = models.SmallIntegerField
def __str__(self):
return self.activityName
Then I synchronize the database.
python3 manage.py makemigrations luckyDraw_1
python3 manage.py migrate
At this point MySQL can successfully create the table, but the fields in the table do not meet my expectations.
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| activityName | varchar(60) | NO | | NULL | |
| activityPhoto | varchar(255) | NO | | NULL | |
| sponsorPhoneNumber | varchar(20) | NO | | NULL | |
| sponsorNickName | varchar(40) | NO | | NULL | |
| sponsorWechatNumber | varchar(255) | NO | | NULL | |
+---------------------+--------------+------+-----+---------+----------------+
So,what's the problem?
Some fields are missing in your database table because you are missing parentheses on some fields in the end for example the first field should be sponsor = models.IntegerField() not sponsor = models.IntegerField

Resources