[Question posted by a user on YugabyteDB Community Slack]
What happens to tables allocated to a tablespace when a tablespace is removed? As in, when drop tablespace is executed? From what I observed, a tablespace in YugabyteDB is a logical construct only taking effect when a table is created or altered with a tablespace.
You shouldn't be able to drop a tablespace when other tables are using it. It's possible that this was not fixed in an older release but it should not happen in a release where the feature is production-ready.
A tablespace is not only a logical construction, it is a construction to group objects with the idea to put them in specific place. This is both true for PostgreSQL and for YugabyteDB.
Logically if you drop the parent object of children, the children are deleted. But as been stated, as long as there are objects in a tablespace, it can't be dropped.
Related
[Question posted by a user on YugabyteDB Community Slack]
I did some reading in the docs and found fs_data_dirs. Does yugabyte-db automatically split the tablets evenly in the data dirs?
The flag fs_data_dirs sets the directory or directories for the tablet server or master where it will store data on the filesystem. This should be specified as a comma-separated list.
This data is logging, metadata, and data. The first directory will get the logging, all the directories will get the WAL and rocksdb databases. The tablets which is the storage foundation of a table or index are distributed over the directories in a round-robin fashion. This indeed happens completely automatically.
It might be confusing to talk about splitting because when a YSQL table or secondary index is created, the create statement allows you to explicitly define how many tablets the object is split, which is what is distributed over the directories specified.
At the risk of making it confusing, there is another feature which is called automatic tablet splitting, which is a feature set by the flag ‘--enable_automatic_tablet_splitting’ set in the masters, which is the mechanism of making YugabyteDB automatically split tablets when it deems tablets getting too big, and thus allows you to start with a single tablet, which will then be split automatically.
[Question posted by a user on YugabyteDB Community Slack]
I want to someone remove my confusion, please correct me If I am wrong:
I have 3 nodes (3 tables)
Table structure:
ID (Hash of Account/Site/TS)
Account
Site
Timestamp
I have pattern of accounts inside multiple sites. Should I partition by account is it better by site? (Small partition size is better / Large partition size is better).
Read happens by all three columns. Which is a better choice of partition?
YugabyteDB doesn't need declarative partitioning to distributed data (this is done by sharding on the key). Partitioning is used to isolate data (like cold data to archive later, or like geo-distribution placement).
If you define PRIMARY KEY( ( Account, Site, Timestamp ) HASH ) you will have the distribution (YugabyteDB uses a hash function on the columns to distribute to tablets) and no skew (because the timestamp is in it). Sharding is automatic. You don't have to define an additional column for that https://docs.yugabyte.com/preview/architecture/docdb-sharding/sharding/
[Question posted by a user on YugabyteDB Community Slack]
Wanted to check if we should do any optimization on the db side for the tables that have frequent inserts and deletes ...like re-indexing or vacuuming etc.
Workload:
300000 row inserts per hour.
Out of these most of the time 90% will get deleted up with in the hour and remaining will be cleaned up at the end of the day.
YugabyteDB uses rocksdb, which is an LSM-tree implementation. Any change, including a delete, is an addition to the memtable.
Unlike PostgreSQL, where changes introduce row versions that must be cleaned up, YugabyteDB performs this principle automatically.
When a memtable reaches a certain size, it is persisted as a SST file, and once the number of SST files reaches a certain amount, a background thread reads the SST files and merges them. Any changes that are old enough (>15 minutes by default) are removed because they are expired. This principle resembles PostgreSQL vacuuming.
If you do batch DML, and especially deletions with time-series, partitioning allows you to work on a logical single table, whilst the heavy transactions, such as deleting a day can be performed by removing a daily partition instead of doing it row by row.
[Question posted by a user on YugabyteDB Community Slack]
Using YugabyteDB 2.11, and having a simple table like below:
create table my_table (id bigserial primary key, a text);
But while inserting data, sequence is having big gaps. eg.:, while call insert services, 'id' column gets values as 1,2,3,101,201,202,301,.......
This happens because of --ysql_sequence_cache_minval glfag https://docs.yugabyte.com/latest/reference/configuration/yb-tserver/#ysql-sequence-cache-minval.
All sequences are cached by 100 in each yb-tserver so that it can scale without querying the 'single point of truth' each time. Sequences are part of the postgres catalog, and therefore stored on the yb-master. A sequence with a cache of 1 (no cache), will have to reach out to the master for each value, a cache of 100 allows a backend to obtain the sequence number from the master and use the number as indicated by the cache from that number. This will make a next session pick its own range of 100, etc.
Note that even if you remove the cache there will be gaps in sequences because they are not transactional by nature.
The problem of gaps in sequences exists since database sequences exist. It's a computer science problem, not a YB (or implementation) problem. If you want no gaps, you need a serial (not scalable) process. This is not worth the trade-off, especially if you implement YB for scalability.
Help me understand how Global temporary table works
I have process which is going to be threaded and requires data visible only to that thread session. So we opted for Global Temporary Table.
Is it better to leave global temporary table not being dropped after all threads are completed or is it wise to drop the table. Call to this process can happen once or twice in a day.
Around 4 tables are required
Oracle Temp tables are NOT like SQL Server #temp tables. I can't see any reason to continuously drop/create the tables. The data is gone on a per session basis anyways once the transaction or session is completed (depends on table creation options). If you have multiple threads using the same db session, they will see each other's data. If you have one session per thread, then the data is limited in scope as you mentioned. See example here.
If you drop global temporary table and recreate it then it is not impacting to any database activities and server disk io activities because global temporary tables are created in temp tablespace where no archive is generating and not checkpoint is updating header of tempfile. Purpose of temporary table is only accurately maintained in this case.