MemSql: a long enough query sql has truncated,change 'max_allowed_packet' has no effect - singlestore

i have a long enough sql,
example: "select customerID from customer where customerID in (1,2,3,4,5,......50000000)",
when i executed it,then a exceptions throw out...,
i find memsql truncate my query sql, but i has changed the Global VARIABLES "max_allowed_packet=1049999360,load_data_read_size=1049999360,load_data_write_size=1049999360" and restart memsql cluster, however the question of truncate as usual, help me pls. thanks。

You may be running into the limit on the number of constants in a query. It’s 1 million. Before MemSQL 6.5 this would generate a syntax error instead of a more specific error referring to the limit. That limit isn’t configurable.
As others have noted this may not be the most effiecient way to run this query. You may want to try building a temporary table and doing an in (select custid from temp) instead.

Related

ScyllaDB count(*) return difference result

I have a question about query in scylladb. I want to count the rows in a table with:
SELECT COUNT(*)
FROM tabledata;
First run returns a result of 5732 rows
Second run returns a result of 5432 rows
Always different result.
Any suggestions on how to count rows in scylla?
Consistency level?
(you can find on internet a very funny picture about eventual consistency)
IF you have RF=3
If you wrote all your rows with LOCAL_QUORUM
then I'd set CONSISTENCY LOCAL_QUORUM
and rerun the count
if you are not sure whether all your writes were properly done, use CL ALL
another option is to run a full repair and rerun the count
ALSO your table might have TTL, in such case having a different count every time is expected (and if you wrote it might be bigger, if you just read, then it will be smaller)
For efficient count look at https://github.com/scylladb/scylla-code-samples/tree/master/efficient_full_table_scan_example_code - but the same applies re consistency level (and of course this script will tell you with a timeout error that a token range couldn't queried and it means that node/shard was overloaded with other traffic, by default it doesn't retry, it's a simple script)
The problem you're running into is inherent in any distributed row store (Cassandra or Scylla). In order for that to work, a coordinator node needs to contact all other nodes, query them, and assemble the result set. That causes a lot of contention which may prevent some replicas from reporting properly.
I recommend (downloading) using DSBulk for this type of operation. It has a count feature designed just for this purpose.
dsbulk count -k ks1 -t table1 -h '10.200.1.3,10.200.1.4'

what is the impact of limit in cassandra cql

When executing a cqlsh query like select * from table limit 10, would cassandra scan the entire table and just return the first 10 records, or it can precisely locate the first 10 records across whole datacenter without scanning the entire table?
The LIMIT option puts an upper-bound on the maximum number of rows returned by a query but it doesn't prevent the query from performing a full table scan.
Cassandra has internal mechanisms such as request timeouts which prevent bad queries from causing the cluster to crash so queries are more likely to timeout rather than overloading the cluster with scans on all nodes/replicas.
As a side note, the LIMIT option is irrelevant when used with SELECT COUNT() since the count function returns just 1 row (by design). COUNT() needs to do a full table scan regardless of the limit set. I've explained it in a bit more detail in this post -- https://community.datastax.com/questions/6897/. Cheers!
LIMIT option puts an upper-bound on the maximum number of rows returned by a query but it doesn't prevent the query from performing a full table scan.

Cassandra unpredictable failure depending on WHERE clause

I am attempting to execute a SELECT statement against a large Cassandra table (10m rows) with various WHERE clauses. I am issuing these from the Datastax DevCenter application. The columns I am using in the where clause have secondary indexes.
The where clause looks like WHERE fileid = 18000 or alternatively WHERE fileid < 18000. In this example, the second where clause results in the error Unable to execute CQL script on 'connection1': Cassandra failure during read query at consistency ONE (1 responses were required but only 0 replica responded, 1 failed)
I have no idea why it is failing in this unpredictable manner. Any ideas?
NOTE: I am aware that this is a terrible idea, and Cassandra is not meant to be used in this way. I am issuing these queries and timing them to prove to others how inefficient Cassandra is for our use case compared to other solutions.
Your query is probably failing because of a READ timeout (the timeout on waiting to read data). You could try updating the Cassandra.yaml with a larger read timeout time with read_request_timeout_in_ms: 200000 (for 200s) to give an output rather than an error. However, if you're trying to prove the inefficiency of Cassandra in your use case, this error seems like a pretty good way to do it.

Cassandra Table Count Timeout

I try to get number of rows in a table. But cassandra turns timeout for this query select count(*) from events ;
I think that my tables is too big so if ı give a timeout value for my query, it turns always timeout cqlsh --request-timeout=200000
Table size 1.3TB. Is there any way to learn How many rows in this table ?
Do not use count(*) to get no of rows.You can use the following link and download the jar file to get the count.
https://github.com/brianmhess/cassandra-count
One solution that can help you find the total rows in Pagination of result.
Please refer to the below doc :
https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
Note : You can also try ALLOW FILTERING for development! But the use of it should avoided as it is very expensive query, might impact the performance of the cassandra.

Select All Records From Cassandra

I am trying to select all records from one Cassandra table (~10M records) which should be distributed over 4 nodes using CQL shell, but every time I do that it partitions the output to 1K records Max. So my question is, it is possible to select all records at once as I am trying to see how much time it takes Cassandra to retrieve all records.
When you write "SELECT * from CF" CQL client will never select everything at once. It's just a stupid action for large data. Instead it will load only first page and give you an iterator. Cassandra from 2.0 version supports automatic query paging. So you should call your select all query and ITERATE over pages to load full column family. See an example for python client. There is no way to load all in one action in CQL now and it shouldn't be.
While it was already pointed out that it's a bad idea to try and load all data in cqlsh, what you're trying to do is still somewhat possible. You just need to set a limit and probably increase the timeout for cqlsh.
user#host:~# cqlsh --request-timeout=600
This will start the shell with a request timeout of 10 minutes.
select * from some_table limit 10000000;
Please do not use this in a production environment, as it might have terrible implications for performance and cluster availability!

Resources