Checking disk size consumed by each database in YugabyteDB

Checking disk size consumed by each database in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
Is there any way we can check the disk size consumed by each database, tables? I tried the query below but no luck:
- \l+
- select pg_database_size('databaseName');
- select t1.datname AS db_name,
pg_size_pretty(pg_database_size(t1.datname)) as db_size
from pg_database t1
order by pg_database_size(t1.datname) desc;

YugabyteDB uses its native data storage called DocDB (i.e. Rocks DB) to store the table's data. These catalog tables will not have any sizing details. You can access individual table level details directly in the YugabyteDB Tablet Server UI http://<ipaddress>:9000/tables. It will show the on-disk space column for each table.

Related

How to track data movement in YugabyteDB in multi-region replication

[Question posted by a user on YugabyteDB Community Slack]
When I'm changing my placement info by adding a new datacenter, my data should start moving to accomplish my "rules". How can I track that movement?
get_load_move_completion This should give me some info?

get_load_move_completion command in yb-admin only tracks load movement when a node is decommissioned (blacklisted). Essentially, it returns 1 - (count of total replicas still present on the blacklisted nodes) / (total initial replica count on the blacklisted nodes) as a percentage.
The http://<yb-master-ip>:7000/tasks endpoint in the master admin UI is where you would be able to see all the adds and removals.

Query not being redirected into read replica cluster in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I think I have still misunderstood something about read replicas. It seems that if I run a query that looks like a sequence scan on a read-only replica, the actual read is done on the main cluster. Although the read replica seems to have the whole dataset.
When I ran a simple select count(*) query in the read replica, I expected it to do a local read from its own data.
However, as can be seen from the picture the main nodes actually started to do the reads and the read replica waited in an almost idle state until it got the response from the main node. Where did I go wrong? (using YugabyteDB 2.6)

Note that read from followers is only available in 2.11 that was recently released: https://blog.yugabyte.com/announcing-yugabytedb-2-11/
Another thing to remember is, even on 2.11, the default behavior, no matter which node you connect to will redirect the read to corresponding leader tablets. You'll have to enable reading from followers in the current session like below:
SET yb_read_from_followers = true;
START TRANSACTION READ ONLY;
SELECT * from t WHERE k='k1'; --> follower read
k | v
----+----
k1 | v1
(1 row)
COMMIT;
This will let YugabyteDB know that it is ok to read from a follower tablet (that is in a read-replica cluster).
Also, even in the read-replica cluster, the node receiving the request might not have the data. For example, you could have a 5-node read-replica cluster with RF=2 on that cluster. So the node that initially receives the request might not necessarily have the data that the request is interested in. Where the request is routed depends on whether the default setting is being used for the session/statement, which is to read from leader tablets. But if read-from-follower is enabled, then the request will be routed to a follower tablet in the same region.

Sharding in NodeJs with AWS RDS Postgresql as Database, Sequelize as ORM

I have a monolith application backend which serves millions of request a day written in NodeJs, with Sequelize and postgres as the database. Since ours is a tenant based application I am planning to shard my database in a way that I have x thousands of tenants in one shard and x in another shards, etc. I use AWS RDS (postgressql) as the Database Server.
On the infra structure its pretty much straight-forward to create a new shard. Just creating a new RDS database server with same configurations as my primary database would be sufficient.
The main problem I am facing now is how to manage the shards.
For example: I have the following requirement -
All my queries of tenant_id < 10000 should go to meta_database
All my queries of tenant_id > 10000 and < 30000 should go to shard_1
All my queries of tenant_id > 30000 and < 60000 should go too shard_2
I tried with the following tools:
Sequelize -
Its seems like it's highly impossible in doing this with Sequelize since it does not support sharding still. I can have multiple Sequelize connections created for all the shards and do the mapping of tenant_id with a particular shard manually in code. But it requires to get the models each time by passing the tenant_id of the tenant, which is not a good and readable approach.
pg_bouncer_rr -
I tried with pg_bouncer-rr and droppped it since I found that having a logic in the query routing level to get the tenant_id from the query and check the value using regex is not a good approach and also can cause some unexpected errors too.
Pg_fdw - Foreign Data Wrapper
I was able to create a fdw server and was able to route my queries to the foriegn server by following few articles. But the problem is it's still inserting all the records to my primary meta database tables. It seems like I was able to route only the reading through data wrappers and the data will still reside on the co-ordinator database. Also on addition to that I can partition my table and have few partitions on the foreign servers, but still when I have a record is to be inserted it is getting written to the main database table and then its getting reflected in my foreign tables. How can i have my foreign server to handle all my write and read calls completely independent of the meta database (meta database should only do the routing and should not have any data persisted).
pl/proxy -
I read few articles on pl/proxy it requires me to write a function for every read and inserts. I guess its more useful for managing table partitions, than managing shards.
I am not sure how to proceed with the tenant based sharding. If anyone have achieved sharding with nodejs, postgres and sequelize, kindly help!
I am even okay in having a proxy to the database that will take care of the query routing based on tenant_id. I tried CITUS for this purpose to use as a proxy but it revoked its support for AWS recently.

How Cassandra 4.0 virtual table reads data?

From Documentation It is cleared that in cassandra 4.0 virtual tables are read only and no writes allowed.
Currently there are 2 vtables available i.e system_views and system_virtual_schema and it contains 17 tables.
This contains data like clients,cache ,settings etc.
Where this data is exactly coming from in vtables?
Here are all vtables: https://github.com/apache/cassandra/tree/64b338cbbce6bba70bda696250f3ccf4931b2808/src/java/org/apache/cassandra/db/virtual
PS: I have gone through cassandra.yaml
Reference : https://cassandra.apache.org/doc/latest/new/virtualtables.html

The virtual tables store metrics data that was previously only available via JMX but now also available via CQL.
For example, the system_view.clients table tracks metadata on client connection including (but not limited to):
the remote IP address of the client
logged in user (if auth is enabled)
protocol version
driver name & version
whether SSL is used, etc
This data is available via JMX and nodetool clientstats, and is now retrievable via CQL (I wrote about this in https://community.datastax.com/questions/6113/). Cheers!

TokenAware Policy in Cassandra

When using token aware policy as Load Balancing Policy in Cassandra do all the queries are automatically send over the correct node (which contains the replica eg select * from Table where partionkey = something, will automatically get the hash and go to the correct replica) or I have to use token() function with all my queries ?

That is correct. The TokenAwarePolicy will allow the driver to prefer a replica for the given partition key as the coordinator for the request if possible.
Additional information about load balancing with the Java driver is available on the LoadBalancingPolicy API page.
Specifically, the API documentation for TokenAwarePolicy is available here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string