Cannot read from read-replica in YugabyteDB YSQL - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I set the variables in sql:
SET default_transaction_read_only = TRUE
SET yb_read_from_followers=true;
SET yb_follower_read_staleness_ms=30000;
But yet my read query ran on master nodes instead. A screenshot:
The behavior does not seem consistent though. On some queries the read directly from the table seems to be happening on the read node (verified by looking at iostats on the nodes) ... but some queries that use an index seem to go to the main cluster. I’m using yugabyte-2.8.1.0-b37

That version doesn’t support reading from replicas when an index is being used. A fix has been included in https://github.com/yugabyte/yugabyte-db/commit/2ba51212d874bb3e68a676b59f7f69635223892c and will be included in the next minor release 2.8.2.

Related

Splitting tablets in YugabyteDB over multiple directories using fs_data_dirs

[Question posted by a user on YugabyteDB Community Slack]
I did some reading in the docs and found fs_data_dirs. Does yugabyte-db automatically split the tablets evenly in the data dirs?
The flag fs_data_dirs sets the directory or directories for the tablet server or master where it will store data on the filesystem. This should be specified as a comma-separated list.
This data is logging, metadata, and data. The first directory will get the logging, all the directories will get the WAL and rocksdb databases. The tablets which is the storage foundation of a table or index are distributed over the directories in a round-robin fashion. This indeed happens completely automatically.
It might be confusing to talk about splitting because when a YSQL table or secondary index is created, the create statement allows you to explicitly define how many tablets the object is split, which is what is distributed over the directories specified.
At the risk of making it confusing, there is another feature which is called automatic tablet splitting, which is a feature set by the flag ‘--enable_automatic_tablet_splitting’ set in the masters, which is the mechanism of making YugabyteDB automatically split tablets when it deems tablets getting too big, and thus allows you to start with a single tablet, which will then be split automatically.

Changing replication factor of an already running cluster in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I've been googling a bit and it seems that, although it is not recommended to change the RF of an already running cluster, is it possible?
We support changing RF at the database layer at the very least. One way this can be accomplished is by using the modify_placement_info command in yb-admin [see yb-admin doc]. For example, if you have 3 nodes, one in cloud-c1, region-r1, and zone-z1, one in c2.r2.z2, and one in c3.r3.z3, you can use the below command to increase the RF to 3.
bin/yb-admin --master_addresses {list of master IPs:Ports} modify_placement_info c1.r1.z1,c2.r2.z2,c3.r3.z3 3
Another way to control RF is to use tablespaces https://docs.yugabyte.com/preview/explore/ysql-language-features/going-beyond-sql/tablespaces/

DataStax DSBulk - Difference between query / table unload

I'm using dsbulk to try to extract some data from our cassandra cluster, and seeing some odd behavior. Trying to understand if this is expected.
If I perform an unload by specifying tablespace and table, I'm seeing different (less) results than if I perform a query unload specifying select * from table.
I assumed this might be a consistency issue within the cluster, but I've tried various consistency levels, and the results are the same at all levels between ONE and ALL.
Anyone know if this is expected behavior? The direct table extract is about 2x faster, so would prefer that if at all possible.
You are certainly hitting DAT-295, a bug that was fixed since. Please upgrade to the latest DSBulk version (1.2.0 atm - 1.3.0 is due in a few weeks).

Tables already created to insert into a cassandra keyspace to test

I want to test my cluster a little, how data replicates, etc.
I have a cassandra cluster formed by 5 machines ( centos 7 & cassie 3.4 on them).
Are there anywhere tables already created for testing that I can import in my db in some keyspace?
If yes, please be kind enough and explain me how to import them into a keyspace and where from to take them.
You can use Cassandra-stress. This is great to create data for your style of table and also has some default tables.
http://docs.datastax.com/en/cassandra_win/3.0/cassandra/tools/toolsCStress.html
I highly recommend it.
Actually , it is a lot of data in internet that can be used for testing
e.g.
https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
http://bigdata-madesimple.com/70-websites-to-get-large-data-repositories-for-free/
Cassandra provide with tool cqlsh for executing CQL command as COPY for importing CSV data to database.
P.S.But pay attention on the fact that cqlsh has some restriction related to timeout. That is why it would be better to use some cassandra connector to make this process more effective.

What is the default consistency level in spring-data-cassandra?

If I don't set any read/write consistency level at all in my spring-data-cassandra project, what will be my consistency level for reads? What about writes?
(I asked this question here, but the Google Group is now locked)
The default consistency level used by the driver, if not set, is one. Since spring-data-cassandra, as they claim is:
Based on the latest DataStax Enterprise CQL Java Driver
the default CL is one.
See https://jira.spring.io/browse/DATACASS-145 & https://jira.spring.io/browse/DATACASS-146. I am working on this in branch https://github.com/spring-projects/spring-data-cassandra/tree/DATACASS-145. When DATACASS-145 is done, I can move on to DATACASS-146.
I don't think it's going to make it into 1.1.0; 1.1.1 is likely.

Resources