[Question posted by a user on YugabyteDB Community Slack]
We are using client driver(sql package - database/sql - pkg.go.dev) and YugabyteDB 2.8.3.
There is a variable ysql_num_shards_per_tserver configured with every TSERVER.
We would like to determine hash ranges of each tablet based on:
tablets= ysql_num_shards_per_tserver * num_tservers
But, before running the SELECT query, we need to get the number of tablets for a table.
Example:
For a table with 48 tablets,
first hash range is [0, 1395)
second hash range is [1395, 2730)
etc.
What is the GoLang API in client driver to retrieve value of ysql_num_shards_per_tserver ?
Currently, you can’t query the ysql_num_shards_per_tserver gflag from the client layer. But you can hardcode it in your configuration because you do the same on the YugabyteDB cluster.
Since it’s fixed at configuration time when starting yb-tserver, you can hardcode it in your configuration too.
Another way, the tablets, and their (encoded) ranges can be seen using yb-admin:
yugabyte=# create table t (id int primary key) split into 4 tablets;
yugabyte=# insert into t values (1);
yugabyte=# insert into t values (2);
yugabyte=# insert into t values (3);
yugabyte=# insert into t values (4);
[vagrant#yb-1 ~]$ yb-admin list_tablets ysql.yugabyte t
Tablet-UUID Range Leader-IP Leader-UUID
d3ab946a8fdb438ea7686470e55eee53 partition_key_start: "" partition_key_end: "?\377" yb-1.local:9100 7922bc6371ef4d2d8082a2882a1dfafa
d1b63e3ac36d46128d028572135ad621 partition_key_start: "?\377" partition_key_end: "\177\376" yb-1.local:9100 7922bc6371ef4d2d8082a2882a1dfafa
8f1229413df84560bd531d8ee8ff88e0 partition_key_start: "\177\376" partition_key_end: "\277\375" yb-1.local:9100 7922bc6371ef4d2d8082a2882a1dfafa
457d357288134020b1e5c297be86a5ca partition_key_start: "\277\375" partition_key_end: "" yb-1.local:9100 7922bc6371ef4d2d8082a2882a1dfafa
Related
[Question posted by a user on YugabyteDB Community Slack]
I am running YugabyteDB 2.12 single node and would like to know if it is possible to create a temporary table such that it is automatically dropped upon committing the transaction in which it was created.
In “vanilla” PostgreSQL it is possible to specify ON COMMIT DROP option when creating a temporary table. In the YugabyteDB documentation for CREATE TABLE no such option is mentioned, however, when I tried it from ysqlsh it did not complain about the syntax. Here is what I tried from within ysqlsh:
yugabyte=# begin;
BEGIN
yugabyte=# create temp table foo (x int) on commit drop;
CREATE TABLE
yugabyte=# insert into foo (x) values (1);
INSERT 0 1
yugabyte=# select * from foo;
x
---
1
(1 row)
yugabyte=# commit;
ERROR: Illegal state: Transaction for catalog table write operation 'pg_type' not found
The CREATE TABLE documentation for YugabyteDB mentions the following for temporary tables:
Temporary tables are only visible in the current client session or transaction in which they are created and are automatically dropped at the end of the session or transaction.
When I create a temporary table (without the ON COMMIT DROP option), indeed the table is automatically dropped at the end of the session, but it is not automatically dropped upon commit of the transaction. Is there any way that this can be accomplished (apart from manually dropping the table just before the transaction is committed)?
Your input is greatly appreciated.
Thank you
See these two GitHub issues:
#12221: The create table doc section doesn’t mention the ON COMMIT clause for a temp table
and
#7926 CREATE TEMP … ON COMMIT DROP writes data into catalog table outside the DDL transaction
You cannot (yet, through YB-2.13.0.1) use the ON COMMIT DROP feature. But why not use ON COMMIT DELETE ROWS and simply let the temp table remain in place until the session ends?
Saying this raises a question: how do you create the temp table in the first place? Your stated goal implies that you’d need to create it before every use. But why? You could, instead, have dedicated initialization code to create the ON COMMIT DELETE ROWS temp table that you call from the client for this purpose at (but only at) the start of a session.
If you don’t want to have this, then (back to a variant of your present thinking) you could just do this before every intended use the table:
drop table if exists t;
create temp table t(k int) on commit delete rows;
After all, how else (without dedicated initialization code) would you know whether or not the temp table exists yet?
If you prefer, you could use this logic instead:
do $body$
begin
if not
(
select exists
(
select 1 from information_schema.tables
where
table_type='LOCAL TEMPORARY' and
table_name='t'
)
)
then
create temp table t(k int) on commit delete rows;
end if;
end;
$body$;
Im trying to create a schema that will enable me access rows with only part of the row_key.
For example the key is of the form user_id:machine_os:machine_arch
An example of a row key: 12242:"windows2000":"x86"
From the documentation I could not understand whether this will enable me to query all rows that have userid=12242 or query all rows that have "windows2000"
Is there any feasible way to achieve this ?
Thanks,
Yadid
Alright, here is what is happening: based on your schema, you are effectively creating a column family with a composite primary key or a composite rowkey. What this means is, you will need to restrict each component of the composite key except the last one with a strict equality relation. The last component of the composite key can use inequality and the IN relation, but not the 1st and 2nd components.
Additionally, you must specify all three parts if you want to utilize any kind of filtering. This is necessary because without all parts of the partition key, the coordinator node will have no idea on which node in the cluster the data exists (remember, Cassandra uses the partition key to determine replicas and data placement).
Effectively, this means you can't do any of these:
select * from datacf where user_id = 100012; # missing 2nd and 3rd key components
select * from datacf where user_id = 100012; and machine_arch = 'x86'; # missing 3rd key component
select * from datacf where machine_arch = 'x86'; # you have to specify the 1st
select * from datacf where user_id = 100012 and machine_arch in ('x86', 'x64'); # nope, still want 3rd
However, you will be able to run queries like this:
select * from datacf where user_id = 100012 and machine_arch = 'x86'
and machine_os = "windows2000"; # yes! all 3 parts are there
select * from datacf where user_id = 100012 and machine_os = "windows2000"
and machine_arch in ('x86', 'x64'); # the last part of the key can use the 'IN' or other equality relations
To answer your initial question, with you existing data model, you will neither be able to query data with userid = 12242 or query all rows that have "windows2000" as the machine_os.
If you can tell me exactly what kind of query you will be running, I can probably help in trying to design the table accordingly. Cassandra data models usually work better when looked at from the data retrieval perspective. Long story short- use only user_id as your primary key and use secondary indexes on other columns you want to query on.
We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
I can query this table from cqlsh like this:
select * from data where seqnumber = 10 AND occurday = '2013-10-01';
This query works and returns the expected data.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27' USING CqlStorage();
gives
InvalidRequestException(why:seqnumber cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:39567)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1625)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1611)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:591)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:621)
Shouldn't these behave the same? Why is the version through Pig failing where the straight cqlsh command works?
Hadoop is using CqlPagingRecordReader to try to load your data. This is leading to queries that are not identical to what you have entered. The paging record reader is trying to obtain small slices of Cassandra data at a time to avoid timeouts.
This means that your query is executed as
SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND
token("occurday","seqnumber") <= ? AND occurday='A Great Day'
AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
And this is why you are seeing your repeated key error. I'll submit a bug to the Cassandra Project.
Jira:
https://issues.apache.org/jira/browse/CASSANDRA-6151
Creating tables failing with inet, multiple primary keys, and collections. Syntax was correct.
Error messages don't make sense with the primary key (unmatched parens). remove that, I learned that inet won't work except in some cases.
Anything I'm doing wrong, or not understanding about using CQL3 (interfaces or syntax)?
CREATE TABLE session (
'user_id' bigint,
'admin_id' bigint,
'session_id' varchar,
'cache' text ,
'created' timestamp ,
'hits' list<timestamp>,
'ip' inet ,
PRIMARY KEY ( 'session_id' , 'user_id' )
);
The following also fails
CREATE TABLE 'session' (
'user_id' bigint,
'session_id' varchar,
PRIMARY KEY ( 'session_id' , 'user_id' )
);
This works
CREATE TABLE 'session' (
'user_id' bigint,
'session_id' varchar PRIMARY KEY
);
The clue
>help TYPES
CQL types recognized by this version of cqlsh:
ascii
bigint
blob
boolean
counter
decimal
double
float
int
text
timestamp
uuid
varchar
varint
DSE 3.0.x
[EDIT] - turns out DSE has Cassandra 1.1.x installed.
TL;DR: Collections (part of CQL3) not available yet in DSE 3.0.x
Also worth noting, but unrelated to my issue:
Even in Datastax community edition - one needs to activate CQL3. The documentation says it should be activated by default in cqlsh
http://www.datastax.com/docs/1.2/cql_cli/using_cql
"Activating CQL 3
You activate the CQL mode in one of these ways:
Use the DataStax Java Driver to activate CQL through the
native/binary protocol. Start cqlsh, a Python-based command-line
client.
Use the set_sql_version Thrift method.
Specify the desired CQL mode in the connect() call to the Python driver:
*connection = cql.connect('localhost:9160', cql_version='3.0')*
The documentation there was incorrect also, should be
con = cql.connect('localhost', cql_version='3.0.0')
Also, Enterprise Opcenter doesn't yet support CQL 3 in DSE.
cqlsh --cqlversion=3
Is there an easy way to check if table (column family) is defined in Cassandra using CQL (or API perhaps, using com.datastax.driver)?
Right now I am leaning towards executing SELECT 1 FROM table and checking for exception but maybe there is a better way?
As of 1.1 you should be able to query the system keyspace, schema_columnfamilies column family. If you know which keyspace you want to check, this CQL should list all column families in a keyspace:
SELECT columnfamily_name
FROM schema_columnfamilies WHERE keyspace_name='myKeyspaceName';
The report describing this functionality is here: https://issues.apache.org/jira/browse/CASSANDRA-2477
Although, they do note that some of the system column names have changed between 1.1 and 1.2. So you might have to mess around with it a little to get your desired results.
Edit 20160523 - Cassandra 3.x Update:
Note that for Cassandra 3.0 and up, you'll need to make a few adjustments to the above query:
SELECT table_name
FROM system_schema.tables WHERE keyspace_name='myKeyspaceName';
The Java driver (since you mentioned it in your question) also maintains a local representation of the schema.
Driver 3.x and below:
KeyspaceMetadata ks = cluster.getMetadata().getKeyspace("myKeyspace");
TableMetadata table = ks.getTable("myTable");
boolean tableExists = (table != null);
Driver 4.x and above:
Metadata metadata = session.getMetadata();
boolean tableExists =
metadata.getKeyspace("myKeyspace")
.flatMap(ks -> ks.getTable("myTable"))
.isPresent();
I just needed to manually check for the existence of a table using cqlsh.
Possibly useful general info.
describe keyspace_name.table_name
If it doesn't exist you'll get 'table_name' not found in keyspace 'keyspace'
If it does exist you'll get a description of the table.
For the .NET driver CassandraCSharpDriver version 3.17.1 the following code creates a table if it doesn't exist yet:
var ks = _cassandraSession.Cluster.Metadata.GetKeyspace(keyspaceName);
var tableNames = ks.GetTablesNames();
if(!tableNames.Contains(tableName.ToLowerInvariant()))
{
var stmt = new SimpleStatement($"CREATE TABLE {tableName} (id text PRIMARY KEY, name text, price decimal, volume int, time timestamp)");
_cassandraSession.Execute(stmt);
}
You will need to adapt the list of table columns to your needs. This can also be awaited by using await _cassandraSession.ExecuteAsync(stmt).ConfigureAwait(false) in an async method.
Also, I want to mention that I'm using Cassandra version 4.0.1.