node-postgres simple SELECT is becoming extremely slow (PostgreSQL)

node-postgres simple SELECT is becoming extremely slow (PostgreSQL) - node.js

I'm developing an server-side app for mobile game backed with postgresql and I'm using pg with Knex("pg": "6.1.2" and "knex": "0.12.6"). Not so long ago I faced a problem with select perfomance degradation. Latest Postgresql and v4.7.2 Node
I've benchmarked my app with 40 fictive ccu. So I measured time between Query constructor and Query.prototype.handleReadyForQuery and what i've got:
{"select count(*) from \"player_battles\" where (\"attacking_player\" = $1 or \"defending_player\" = $2) and attacking_player <> defending_player": [27, 74, 92, 156, 170, 203, 217, 230, 243, 251, 261, 269, 288, 303, 313, 328, 342, 352, 361, 384, 395, 407, 420, 428, 440, 448, 460, 471, 483, 494, 507, 515, 537, 538, 539, 30, 40, 60, 1564, 2273, 2287, 2291, 2320, 2327, 2346, 2354, 2370, 2380, 2388, 2402, 2411, 2419, 2429, 2436, 2444, 4014, 4412, 4421, 4421, 4422, 4423, 4423, 4424, 4425, 4426, 4427, 4427, 4428, 4429, 4429, 18, 35, 60, 78, 113, 125, 151, 161, 170, 178, 185, 197, 611, 1972, 1987, 1988, 1988, 1989, 1991, 1992, 1993, 1993, 1994, 1995, 1996, 1996, 1997, 1997, 1999, 1999, 2000, 2001, 2002, 2002, 2002]}
(Array of numbers are representing times to complete particular query ms)
4429ms, Carl! Unpleasant surprize.
Obviously the problem lies in unoptimized sql or bad indexing, but analyzing logs with pgBadger showed me that: https://gyazo.com/4855a0eceac8669ab5c21564a392b357
Any ideas?
Related Issue # Github: https://github.com/brianc/node-postgres/issues/1243
Previous question: Knex with PostgreSQL select query extremely performance degradation on multiple parallel requests
UPD Here is screenshot of "General activity": https://gyazo.com/b2781069c87f88a2d4034345d58f91a3
UPD2
Example query
SELECT COUNT(*)
FROM "player_battles"
WHERE ("attacking_player" = $1
OR "defending_player" = $2)
AND attacking_player <> defending_player
(where $1 and $2 are 1 and 2)
Target table ddl
CREATE TABLE public.player_battles
(
id integer NOT NULL DEFAULT nextval('player_battles_id_seq'::regclass),
attacking_player integer NOT NULL,
defending_player integer NOT NULL,
is_attacking_won boolean NOT NULL,
received_honor integer NOT NULL,
lost_honor integer NOT NULL,
gold_stolen bigint,
created_at bigint NOT NULL DEFAULT '0'::bigint,
attacking_level_atm integer NOT NULL DEFAULT 0,
defending_level_atm integer NOT NULL DEFAULT 0,
attacking_honor_atm integer NOT NULL DEFAULT 0,
defending_honor_atm integer NOT NULL DEFAULT 0,
no_winner boolean NOT NULL DEFAULT false,
auto_defeat boolean,
CONSTRAINT player_battles_pkey PRIMARY KEY (id),
CONSTRAINT player_battles_attacking_player_foreign FOREIGN KEY (attacking_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT player_battles_defending_player_foreign FOREIGN KEY (defending_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
And indexes
CREATE INDEX player_battles_attacking_player_index
ON public.player_battles USING btree
(attacking_player)
TABLESPACE pg_default;
CREATE INDEX player_battles_created_at_index
ON public.player_battles USING btree
(created_at DESC)
TABLESPACE pg_default;
CREATE INDEX player_battles_defending_player_index
ON public.player_battles USING btree
(defending_player)
TABLESPACE pg_default;
EXPLAIN ANALYZE
Aggregate (cost=19.40..19.41 rows=1 width=8) (actual time=0.053..0.053 rows=1 loops=1)
-> Bitmap Heap Scan on player_battles (cost=8.33..19.39 rows=4 width=0) (actual time=0.030..0.047 rows=4 loops=1)
Recheck Cond: ((attacking_player = 1) OR (defending_player = 2))
Filter: (attacking_player <> defending_player)
Heap Blocks: exact=4
-> BitmapOr (cost=8.33..8.33 rows=4 width=0) (actual time=0.021..0.021 rows=0 loops=1)
-> Bitmap Index Scan on player_battles_attacking_player_index (cost=0.00..4.16 rows=2 width=0) (actual time=0.016..0.016 rows=2 loops=1)
Index Cond: (attacking_player = 1)
-> Bitmap Index Scan on player_battles_defending_player_index (cost=0.00..4.16 rows=2 width=0) (actual time=0.003..0.003 rows=2 loops=1)
Index Cond: (defending_player = 2)
Planning time: 0.907 ms
Execution time: 0.160 ms
Extra data
Just about 160 players and 310 rows in player_battles, but num of rows does not correlate with the problem. System: DigitalOcean 2cpu/2gb ram. Output of pgBadger and response times from node-postgres: http://ge.tt/3DWk8Dj2
And my conf file may help: http://ge.tt/7cw69Dj2

Related

Sorting dictionary by key

I have a dictionary that have year-month combination as the key and value of it. I used OrderedDict to sort the dictionary and getting result like below. In my expected result, after "2021-1", it should be "2021-2". But "2021-10" is coming in between.
{
"2020-11": 25,
"2020-12": 861,
"2021-1": 935,
"2021-10": 1,
"2021-2": 4878,
"2021-3": 6058,
"2021-4": 3380,
"2021-5": 4017,
"2021-6": 1163,
"2021-7": 620,
"2021-8": 300,
"2021-9": 7
}
My expected result should be like below. I want the dictionary to be sorted by least date to the last date
{
"2020-11": 25,
"2020-12": 861,
"2021-1": 935,
"2021-2": 4878,
"2021-3": 6058,
"2021-4": 3380,
"2021-5": 4017,
"2021-6": 1163,
"2021-7": 620,
"2021-8": 300,
"2021-9": 7,
"2021-10": 1
}
Appreciate if you can help.

If you want to customize the way sorting is done, use sorted with parameter key:
from typing import OrderedDict
from decimal import Decimal
data = {
"2020-11": 25,
"2020-12": 861,
"2021-1": 935,
"2021-10": 1,
"2021-2": 4878,
"2021-3": 6058,
"2021-4": 3380,
"2021-5": 4017,
"2021-6": 1163,
"2021-7": 620,
"2021-8": 300,
"2021-9": 7
}
def year_plus_month(item):
key = item[0].replace("-", ".")
return Decimal(key)
data_ordered = OrderedDict(sorted(data.items(), key=year_plus_month))
print(data_ordered)
I used Decimal instead of float to avoid any wonky floating point precision.

Django addConstraints raises a TypeError on Postgres

I have in models.py a model called FlashNews, that can either be tied to a Race, a League, or a FantasyTeam. I used Django's CheckConstraint (version 2.2.1). There's also a type column to ensure consistency.
I want to guarantee that one and only one column is not null among race, fteam and league, consistently with type value.
The sample code is as is:
FlashNewsTypes = [
(1, 'Race'),
(2, 'League'),
(3, 'Fteam')
]
class FlashNews(models.Model):
race = models.ForeignKey(Race, on_delete=models.CASCADE, null=True, blank=True)
league = models.ForeignKey(League, on_delete=models.CASCADE, null=True, blank=True)
fteam = models.ForeignKey(FantasyTeam, on_delete=models.CASCADE, null=True, blank=True)
type = models.IntegerField(choices=FlashNewsTypes, default=FlashNewsTypes[0][0])
class Meta:
constraints = [
models.CheckConstraint(
name="unique_notnull_field",
check=(
models.Q(
type=FlashNewsTypes[0][0],
race__isnull=False,
league__isnull=True,
fteam__isnull=True,
) | models.Q(
type=FlashNewsTypes[1][0],
race__isnull=True,
league__isnull=False,
fteam__isnull=True,
) | models.Q(
type=FlashNewsTypes[2][0],
race__isnull=True,
league__isnull=True,
fteam__isnull=False,
)
),
)
]
The automatically created migration:
class Migration(migrations.Migration):
dependencies = [
('myapp', '0055_flashnews'),
]
operations = [
migrations.AddConstraint(
model_name='flashnews',
constraint=models.CheckConstraint(check=models.Q(models.Q(('fteam__isnull', True), ('league__isnull', True), ('race__isnull', False), ('type', 1)), models.Q(('fteam__isnull', False), ('league__isnull', True), ('race__isnull', True), ('type', 2)), models.Q(('fteam__isnull', True), ('league__isnull', False), ('race__isnull', True), ('type', 3)), _connector='OR'), name='%(app_label)s_%(class)s_value_matches_type'),
),
]
This migration is successful on my dev env, which uses Sqlite backend. But, when executed with Postgres backend, it raises this error:
Applying myapp.0056_auto_20200502_1754...Traceback (most recent call last):
File "manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/home/src/myapp/env/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/core/management/base.py", line 83, in wrapped
res = handle_func(*args, **kwargs)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 234, in handle
fake_initial=fake_initial,
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/migrations/executor.py", line 117, in migrate
state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/migrations/executor.py", line 245, in apply_migration
state = migration.apply(state, schema_editor)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/migrations/migration.py", line 124, in apply
operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/migrations/operations/models.py", line 827, in database_forwards
schema_editor.add_constraint(model, self.constraint)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/backends/base/schema.py", line 345, in add_constraint
self.execute(sql)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/backends/base/schema.py", line 137, in execute
cursor.execute(sql, params)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/home/src/myapp/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
TypeError: tuple indices must be integers or slices, not str
Are there some DB backend-specific notions I'm missing here regarding Django's constraint ?
Thanks for your help !

As it turns out, I had simplified my question by changing the constraint name, that was named name='%(app_label)s_%(class)s_value_matches_type in my code, for the sake of clarity.
But it seems that this name is the cause of the TypeError. Replacing the name fixes it.

You are using Django version 2, but interpolation of %(app_label)s and %(class)s was added in version 3.

Fuzzy String Matching With Pandas and FuzzyWuzzy,Data matching: TypeError: cannot use a string pattern on a bytes-like object

I have the data file which looks like this -
And I have another data file which has all the correct country names.
For matching both the files that, I am using below:
import pandas as pd
names_array=[]
ratio_array=[]
def match_names(wrong_names,correct_names):
for row in wrong_names:
x=process.extractOne(row, correct_names)
names_array.append(x[0])
ratio_array.append(x[1])
return names_array,ratio_array
fields = ['name']
#Wrong country names dataset
df=pd.read_csv("wrong-country-names.csv",encoding="ISO-8859-1",sep=';', skipinitialspace=True, usecols= fields )
print(df.dtypes)
wrong_names=df.dropna().values
#Correct country names dataset
choices_df=pd.read_csv("country-names.csv",encoding="ISO-8859-1",sep='\t', skipinitialspace=True)
correct_names=choices_df.values
name_match,ratio_match=match_names(wrong_names,correct_names)
df['correct_country_name']=pd.Series(name_match)
df['country_names_ratio']=pd.Series(ratio_match)
df.to_csv("string_matched_country_names.csv")
print(df[['name','correct_country_name','country_names_ratio']].head(10))
I get the below error:
name object
dtype: object
Traceback (most recent call last):
File "<ipython-input-221-a1fd87d9f661>", line 1, in <module>
runfile('C:/Users/Drashti Bhatt/Desktop/untitled0.py', wdir='C:/Users/Drashti Bhatt/Desktop')
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Drashti Bhatt/Desktop/untitled0.py", line 27, in <module>
name_match,ratio_match=match_names(wrong_names,correct_names)
File "C:/Users/Drashti Bhatt/Desktop/untitled0.py", line 9, in match_names
x=process.extractOne(row, correct_names)
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\fuzzywuzzy\process.py", line 220, in extractOne
return max(best_list, key=lambda i: i[1])
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\fuzzywuzzy\process.py", line 78, in extractWithoutOrder
processed_query = processor(query)
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\fuzzywuzzy\utils.py", line 95, in full_process
string_out = StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\fuzzywuzzy\string_processing.py", line 26, in replace_non_letters_non_numbers_with_whitespace
return cls.regex.sub(" ", a_string)
TypeError: expected string or bytes-like object
I tried with .decode option, but it did not work out. What I am doing wrong?
Any help on this will be much appreciated! Thanks much!

The below code is working. you can find the differences. But i am not sure if this is solution that you are looking. And i have tried on sample files which i had created manually. I have removed fields from pd.read_csv.
(... = same as your code)
...
def match_names(wrong_names,correct_names):
for row in wrong_names:
print('row=',row)
...
return names_array,ratio_array
fields = ['name']
#Wrong country names dataset
df=pd.read_csv("fuzzy.csv",encoding="ISO-8859-1", skipinitialspace=True)
print(df.dtypes)
wrong_names=df.dropna().values
print(wrong_names)
#Correct country names dataset
choices_df=pd.read_csv("country.csv",encoding="ISO-8859-1",sep='\t', skipinitialspace=True)
correct_names=choices_df.values
print(correct_names)
...
print(df[['correct_country_name','country_names_ratio']].head(10))
Output
Country object
alpha-2 object
alpha-3 object
country-code int64
iso_3166-2 object
region object
sub-region object
region-co int64
sub-region.1 int64
dtype: object
[[u'elbenie' u'AL' u'ALB' 8 u'ISO 3166-2:AL' u'Europe' u'Southern Europe'
150 39]
[u'enforre' u'AD' u'AND' 20 u'ISO 3166-2:AD' u'Europe' u'Southern Europe'
150 39]
[u'Belerus' u'AT' u'AUT' 40 u'ISO 3166-2:AT' u'Europe' u'Western Europe'
150 155]]
[[u'elbenie']
[u'enforre']
[u'Belerus']]
('row=', array([u'elbenie', u'AL', u'ALB', 8, u'ISO 3166-2:AL', u'Europe',
u'Southern Europe', 150, 39], dtype=object))
('row=', array([u'enforre', u'AD', u'AND', 20, u'ISO 3166-2:AD', u'Europe',
u'Southern Europe', 150, 39], dtype=object))
('row=', array([u'Belerus', u'AT', u'AUT', 40, u'ISO 3166-2:AT', u'Europe',
u'Western Europe', 150, 155], dtype=object))
correct_country_name country_names_ratio
0 [elbenie] 60
1 [enforre] 60
2 [Belerus] 60

How to handle timestamp in Pyspark Structured Streaming

I'm trying to parse the datetime to later do group by at certain hours in structured streaming.
Currently I have code like this:
distinct_table = service_table\
.select(psf.col('crime_id'),
psf.col('original_crime_type_name'),
psf.to_timestamp(psf.col('call_date_time')).alias('call_datetime'),
psf.col('address'),
psf.col('disposition'))
Which gives output in console:
+---------+------------------------+-------------------+--------------------+------------+
| crime_id|original_crime_type_name| call_datetime| address| disposition|
+---------+------------------------+-------------------+--------------------+------------+
|183652852| Burglary|2018-12-31 18:52:00|600 Block Of Mont...| HAN|
|183652839| Passing Call|2018-12-31 18:51:00|500 Block Of Clem...| HAN|
|183652841| 22500e|2018-12-31 18:51:00|2600 Block Of Ale...| CIT|
When I try to apply this udf to convert the timestamp (call_datetime column):
import pyspark.sql.functions as psf
from dateutil.parser import parse as parse_date
#psf.udf(StringType())
def udf_convert_time(timestamp):
d = parse_date(timestamp)
return str(d.strftime('%y%m%d%H'))
I get a Nonetype error..
File "/Users/dev/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 229, in main
process()
File "/Users/dev/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 224, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/Users/dev/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 149, in <lambda>
func = lambda _, it: map(mapper, it)
File "<string>", line 1, in <lambda>
File "/Users/dev/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 74, in <lambda>
return lambda *a: f(*a)
File "/Users/PycharmProjects/data-streaming-project/solution/streaming/data_stream.py", line 29, in udf_convert_time
d = parse_date(timestamp)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 697, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 301, in parse
res = self._parse(timestr, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 349, in _parse
l = _timelex.split(timestr)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 143, in split
return list(cls(s))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 137, in next
token = self.get_token()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 68, in get_token
nextchar = self.instream.read(1)
AttributeError: 'NoneType' object has no attribute 'read'
This is the query plan:
pyspark.sql.utils.StreamingQueryException: u'Writing job aborted.\n=== Streaming Query ===\nIdentifier: [id = 958a6a46-f718-49c4-999a-661fea2dc564, runId = fc9a7a78-c311-42b7-bbed-7718b4cc1150]\nCurrent Committed Offsets: {}\nCurrent Available Offsets: {KafkaSource[Subscribe[service-calls]]: {"service-calls":{"0":200}}}\n\nCurrent State: ACTIVE\nThread State: RUNNABLE\n\nLogical Plan:\nProject [crime_id#25, original_crime_type_name#26, call_datetime#53, address#33, disposition#32, udf_convert_time(call_datetime#53) AS parsed_time#59]\n+- Project [crime_id#25, original_crime_type_name#26, to_timestamp(\'call_date_time, None) AS call_datetime#53, address#33, disposition#32]\n +- Project [SERVICE_CALLS#23.crime_id AS crime_id#25, SERVICE_CALLS#23.original_crime_type_name AS original_crime_type_name#26, SERVICE_CALLS#23.report_date AS report_date#27, SERVICE_CALLS#23.call_date AS call_date#28, SERVICE_CALLS#23.offense_date AS offense_date#29, SERVICE_CALLS#23.call_time AS call_time#30, SERVICE_CALLS#23.call_date_time AS call_date_time#31, SERVICE_CALLS#23.disposition AS disposition#32, SERVICE_CALLS#23.address AS address#33, SERVICE_CALLS#23.city AS city#34, SERVICE_CALLS#23.state AS state#35, SERVICE_CALLS#23.agency_id AS agency_id#36, SERVICE_CALLS#23.address_type AS address_type#37, SERVICE_CALLS#23.common_location AS common_location#38]\n +- Project [jsontostructs(StructField(crime_id,StringType,true), StructField(original_crime_type_name,StringType,true), StructField(report_date,StringType,true), StructField(call_date,StringType,true), StructField(offense_date,StringType,true), StructField(call_time,StringType,true), StructField(call_date_time,StringType,true), StructField(disposition,StringType,true), StructField(address,StringType,true), StructField(city,StringType,true), StructField(state,StringType,true), StructField(agency_id,StringType,true), StructField(address_type,StringType,true), StructField(common_location,StringType,true), value#21, Some(America/Los_Angeles)) AS SERVICE_CALLS#23]\n +- Project [cast(value#8 as string) AS value#21]\n +- StreamingExecutionRelation KafkaSource[Subscribe[service-calls]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]\n'
I'm using StringType for all columns and using to_timestamp for timestamp columns (which seems to work).
I verified and all the data I'm using (just like 100 rows) do have values. Any idea how to debug this?
EDIT
Input is coming from Kafka - Schema is shown above in the error log (all StringType())

Best not to use udf because they don't use spark catalyst optimizer and especially when the spark.sql.functions modules have functions available. This code will transform your timestamp.
import pyspark.sql.functions as F
import pyspark.sql.types as T
rawData = [(183652852, "Burglary", "2018-12-31 18:52:00", "600 Block Of Mont", "HAN"),
(183652839, "Passing Call", "2018-12-31 18:51:00", "500 Block Of Clem", "HAN"),
(183652841, "22500e", "2018-12-31 18:51:00", "2600 Block Of Ale", "CIT")]
df = spark.createDataFrame(rawData).toDF("crime_id",\
"original_crime_type_name",\
"call_datetime",\
"address",\
"disposition")
date_format_source="yyyy-MM-dd HH:mm:ss"
date_format_target="yyyy-MM-dd HH"
df.select("*")\
.withColumn("new_time_format",\
F.from_unixtime(F.unix_timestamp(F.col("call_datetime"),\
date_format_source),\
date_format_target)\
.cast(T.TimestampType()))\
.withColumn("time_string", F.date_format(F.col("new_time_format"), "yyyyMMddHH"))\
.select("call_datetime", "new_time_format", "time_string")\
.show(truncate=True)
+-------------------+-------------------+-----------+
| call_datetime| new_time_format|time_string|
+-------------------+-------------------+-----------+
|2018-12-31 18:52:00|2018-12-31 18:00:00| 2018123118|
|2018-12-31 18:51:00|2018-12-31 18:00:00| 2018123118|
|2018-12-31 18:51:00|2018-12-31 18:00:00| 2018123118|
+-------------------+-------------------+-----------+

Can't read integers value in Cassandra

I have a simple column family 'Users'
System.out.println(" read User");
long timestamp = System.currentTimeMillis();
clientA.insert(ByteBuffer.wrap("mike".getBytes()), new ColumnParent("Users"),
new Column(ByteBuffer.wrap("email".getBytes())).setValue(ByteBuffer.wrap("mike#gmail.com".getBytes())).setTimestamp(timestamp)
, ConsistencyLevel.ONE);
clientA.insert(ByteBuffer.wrap("mike".getBytes()), new ColumnParent("Users"),
new Column(ByteBuffer.wrap("totalPosts".getBytes())).setValue(ByteBuffer.allocate(4).putInt(27).array()).setTimestamp(timestamp)
, ConsistencyLevel.ONE);
SlicePredicate predicate = new SlicePredicate();
predicate.setSlice_range(new SliceRange(ByteBuffer.wrap(new byte[0]), ByteBuffer.wrap(new byte[0]), false, 10));
ColumnParent parent = new ColumnParent("Users");
List<ColumnOrSuperColumn> results = clientA.get_slice(ByteBuffer.wrap("mike".getBytes()),
parent,
predicate,
ConsistencyLevel.ONE);
for (ColumnOrSuperColumn result : results) {
Column column = result.column;
System.out.println(new String(column.getName()) + " -> "+ new String(column.getValue())+", "+column.getValue().length);
}
it returns
read User
email -> mike#gmail.com, 14
totalPosts ->
So totalPosts can't be read with the thrift client
with cassandra-cli
[default#test] get Users['mike']['totalPosts'];
=> (column=totalPosts, value=0000001b, timestamp=1336493080621)
Elapsed time: 10 msec(s).
How can I retrieve this Integer value with Java thrift client?
using cassandra 1.1
edit:
it seems due to this part
for (ColumnOrSuperColumn result : results) {
Column column = result.column;
System.out.println(new String(column.getName()) + " -> "+Arrays.toString(column.getValue()));
}
returns
email -> [109, 105, 107, 101, 64, 103, 109, 97, 105, 108, 46, 99, 111, 109]
totalPosts -> [0, 0, 0, 27]

Your column values are being returned as bytes, the same way you're inserting them. So the value for the 'email' column is [109, 105, 107, 101, 64, 103, 109, 97, 105, 108, 46, 99, 111, 109], which interpreted as ascii is mike#gmail.com. And the number 27 which you write into the ByteBuffer as [0, 0, 0, 27] (via the putint call) comes out the same way.
If you absolutely have to be using the raw thrift interface, you'll probably want to retrieve your totalPosts int using ByteBuffer.getInt(). But if at all possible, I recommend using a library to wrap around the ugliness of the thrift interface, which should take care of value serialization issues like this. Maybe look at Hector, or skip the old interface entirely and go straight to CQL with Cassandra-JDBC.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

node-postgres simple SELECT is becoming extremely slow (PostgreSQL) - node.js

Related

Sorting dictionary by key

Django addConstraints raises a TypeError on Postgres

Fuzzy String Matching With Pandas and FuzzyWuzzy,Data matching: TypeError: cannot use a string pattern on a bytes-like object

How to handle timestamp in Pyspark Structured Streaming

Can't read integers value in Cassandra

Categories

Resources