How can I delete the data in TiKV directly? - tidb

I used tikvTxn to write the key-value data into TiKV directly and skip the TiDB.
db, err := driver.Open("tikv://127.0.0.1:2379?disableGC=true")
txn, _:= db.Begin()
txn.set(key, value)
txn.commit(context.Background())
...
I can't clean the data in TiKV by drop the tables in TiDB.
How can I delete all the data that I inserted to TiKV?

To delete the data inserted via txnkv API, you can:
db, _ := driver.Open("tikv://127.0.0.1:2379?disableGC=false")
txn, _ := db.Begin()
txn.Delete(key)
txn.Commit(context.Background())
...
Txnkv is based on MVCC, so Delete will not reclaim disk space. Instead, it inserts a special version which indicates the key has been deleted.
If there is a TiDB in your cluster, and it enables GC, then the key will be deleted physically and automatically after a GC interval.
Otherwise you need to run GC job to delete it from disk.
import "github.com/pingcap/tidb/store/tikv/gcworker"
gcworker.RunGCJob(ctx context.Context, s tikv.Storage, safePoint uint64, identifier string, concurrency int)

Related

Delete lots of rows from Cassandra Table

I have a table Foo with 4 columns A, B, C, D, E.
The partition key is A.
The clustering key is B, C, D.
what I'd like to do?
delete rows with specific partition keys
why I'd like to do that?
to reclaim the storage
how I'd like to do that?
use datastax/python-driver
I'd like these rows deleted with minimal disruption or risk.
I'm worried about the effect to read/write requests during deletion.
And I want to reclaim the storage ASSP, don't know how to deal with tombstone.
Deletes in Cassandra are adding the data to the tables, not modifying the existing tables. The reclaim of the disk space will happen during compaction process that will create new SSTable from the existing one, and removing the outdated or deleted data, so you'll need to have enough disk space to perform compaction of SSTables. You can tune compaction properties, like, min_threshold to force compaction happen faster, or even do nodetool compact -s after deletion to force rewrite of the whole table.
Deletion of data using Python is straightforward - just prepare a query like this
DELETE FROM <table> WHERE pk = ?;
and then iterate over your list of keys to delete, and call session.execute(prepared_statement, [key_to_delete]).

attach the database in memory to database in disk

Good evening, I would like to attach a database created in memory with database created and saved on the disk. I reached the first part (create the DB in memory) but I have difficults to attach it with DB on the disk.
import sqlite3
# set up a database in memory
c = sqlite3.connect(':memory:')
c.execute('CREATE TABLE my_table (id int, name text);')
c.execute("INSERT INTO my_table VALUES (1, 'bruce'), (2, 'wayne'), (3, 'bat');")
c.commit()
I tried this code below but it doesn't work :/ :
ATTACH DATABASE 'file::memory:?cache=shared' AS db_disk
In Python, there is no direct way to copy the contents of an in-memory database to disk.
But the database forces writes to disk only when a transaction commits, so you can get the same speed by using a disk database and writing everything in a single transaction, i.e., by not calling commit() before you are finished.
(But you might want to increase the cache size.)

How to release memory when using JetUpdate to insert record in Extensible Storage Engine Database?

I need to insert millions of records.Right now I'm in a very tight loop where, for every record, I
a) start a transaction (JetBeginTransaction)
b) prepare an update (JetPrepareUpdate)
c) add the row (JetSetColumns)
d) update (JetUpdate)
e) commit the transaction (JetCommitTransaction)
But more and more memory is occupied, when records are inserted by excuting JetUpdate. Even if I stop to excute insert records or all records are inserted, the memory would not be released.
How could I limit the memory to rise?
Why the JetCommitTransaction does not release the memory?
How to release the memory timely?
The database cache is likely growing. Confirm with the Perf Counter: Database -> Database Cache Size (MB).
You can cap the size with JET_paramCacheSizeMax (see https://learn.microsoft.com/en-us/windows/desktop/extensible-storage-engine/database-cache-parameters ).
I also agree with egray's comment that you should insert more than one insert per transaction. Or at least use Lazy Commit (a flag to JetCommmitTransaction). Otherwise you will be writing to your transaction log file much too frequently, and perf will greatly suffer.

SSTables are never deleted on disk if table gets dropped

SSTables are never deleted on disk if table gets deleted.
I had a a table whose tombstones count is >100000 due to which my read queries were throwing Tombstones error. I then dropped the table, but this didn't delete the SSTable files. I re-created the table, then I ran my select queries, I saw the tombstone error again. I don't understand why the old tombstone error has come up again?
Also, when does the SSTable ever gets deleted on disk?
Truncating a table will not remove the SSTable(s) on disk. You need to run nodetool cleanup
Tombstones will disappear through compaction, but only once gc_grace_seconds has passed. The default is 10 days. Why so long? Its designed to be a bit longer than a week providing enough time to run repair on a cluster before deletes are discarded. This maximizes the opportunity for consistency across the nodes.
In order to have your tables deleted from disk you need to make sure that no hard-links are currently pointing at them. By default, a DROP command will create a snapshot of the CF. You need to set to false the auto_snapshot property in the YAML file:
# Whether or not a snapshot is taken of the data before keyspace truncation
# or dropping of column families. The STRONGLY advised default of true
# should be used to provide data safety. If you set this flag to false, you will
# lose data on truncation or drop.
auto_snapshot: false
If you want err on the safe side (and a general procedure to recreate your keyspace), you could go for:
DROP TABLE IF EXISTS mytable
CREATE TABLE mytable (....)
TRUNCATE mytable
I never had a single problem with this so far.
Truncate operation is safer than drop and recreate. Truncate may throw a timeout exception, do it again until it is completely done.

cannot delete row key

I'm having an issue while deleting row key in Cassandra. Whenever I delete Row Key all the columns contained by that RowKey are deleted but RowKey itself is not deleted. Can anybody tell me how to remove a rowkey, once it is inserted in columnfamily.
I'm looking forward to do that via thrift client.
This is a side effect of how distributed deletes work in Cassandra. From the Cassandra wiki page on distributed deletes:
[A] delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request.
Also take a look at this question on the FAQ: Why do deleted keys show up during range scans?

Resources