memsql how to free space from plancachedir - singlestore

I have a very small memsql instance which have on 200 tables 200MB data in total. The plancachedir kept fulling the file system (25GB+). I tried to shutdown the databases, deleted files under plancachedir. but after restarted database, all files came back. "show plancache" show 0 entries so there's no plans to be deleted.
Would anyone let me know the best way to manage the plancachedir space consumption?
Thanks in advance.

So, if you are comfortable turning off your machine and deleting the plancache directory, try just running SNAPSHOT <db_name> on each database before turning off the server and deleting the plancache.
Otherwise the queries will be recompiled for every write query and alter table you ran during recovery.
25 Gigs is a lot though...
To be honest, MemSQL is not optimized for the case of many 1-meg tables...
Depending on your use case, it might be worth investigating our JSON datatype, or rethinking your schemas.

Related

Azure SQL virtual machine performance - Inserts very slow

I'm trying out different pricing tiers on SQL Server.
Im inserting 4000 rows distributed over 4 tables in 10 seconds
My problem: I don't any performance improvements from a small D2S_V3 to D8S_V3
My application need to insert many rows (bulking is not an option), and this kind of performance is not acceptable
I wonder why I dont see improvements.
So my noob question: Do I need to configure something to see improvements? My naive thinking says I should some difference :-)
What am I doing wrong?
Without knowing much about your schema, it looks like you are storage bound or network bound.
Storage:
Try to mount the database to the local (temporary disk) and see if you notice any difference, if it is faster then your bottleneck is the mounted disk.
Network bound:
Where is the client that's inserting these transaction? On same machine? On Azure?
I suggest you setup a client in the same region and do the tests.
inserting 4000 rows distributed over 4 tables in 10 seconds.I don't any performance improvements from a small D2S_V3 to D8S_V3
I would approach this problem using wait stats approach rather than throwing hardware first with out knowing problem..
For example,running below insert
insert into #t
select orderid from orders o
join
Customers c
on c.custid=o.custid
showed me below wait stats
Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="1" WaitCount="167"
Wait WaitType="PAGEIOLATCH_SH" WaitTimeMs="12" WaitCount="3" />
Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="21" WaitCount="4975" />
most of the time, the query spent time on
PAGEIOLATCH_SH:
getting data from disk into memory
MEMORY_ALLOCATION_EXT :allocating memory for the query to run
based on this i will try to troubleshoot by seeing if i have memory pressure on my system,since this query is trying to get data from disk.
This is just one example,but hopefully this will give you an idea..
Further i will try to see if select is returing data fast
Performance can be directly linked to your hardware or configuration, but it's more likely that it has to do with the structures and the queries. Take a look at the execution plan for the INSERT operation to see how it is being resolved by the optimizer. Also, capture the query metrics using extended events to see how many resources are being used by the operation. These are more likely to lead to a resolution on why the query is performing slowly and enable you to scale the hardware to best serve the query.

Cassandra read operation error using datastax cassandra

Sorry if this is an existing question, but any of the existing ones resolved my problem..
I've installed Cassandra single noded. I don't have a large application right now, but I think this can be the case soon, and I will need more and more nodes..
Well, I'm saving data from a stream to Cassandra, and this were going well, but suddently, when I tried to read data, I've started to receive this error:
"Not enough replica available for query at consistency ONE (1 required but only 0 alive)"
My keyspace was built using simplestrategy with replication_factor = 1. Im saving data separated by a field called "catchId", so most of my queries are like: "select * from data where catchId='xxx'". catchId is a partition key.
I'm using the cassandra-driver-core version 3.0.0-rc1.
The thing is that I don't have that much of data rigth now, and I'm thinking if it will be better to use a RDBMS for now, and migrate to Cassandra only when I have a better infrastructure.
Thanks :)
It seems that your node is unable to respond when you try to make your read (in general this error appears for more than one node). If you do not have lots of data, it's very strange, so this is probably a bad design choice. This can emanate from several things, so you have to make a few investigations.
study your logs ! In particular the system.log
you can change your read_request_timeout_in_ms parameter in cassandra.yaml. Although it's not agood idea in production, it will say you if it's just temporary problem (your request succeed after a little time) or a bigger problem
study your CPU and memory behavior when you are doing requests
if you are very motivated, you can install opscenter which will you give more valuable informations
How and how many write requests are you doing ? It can overwhelm cassandra (even if it's designed for). I recommend to make async requests to avoid problems.

Local cassandra for testing purposes getting slower over time

I do know that it's a cassandra anti-pattern to delete rows (and more so – doing it frequently), but in my simple use case I have a local cassandra (single instance, replication factor set to 1) that I use for unit tests, which drop all tables before running, naturally to perform the tests with a clean slate.
Over time, the performance of this cassandra instance degraded extremely. It surprised me a bit that dropping the keyspaces althogether didn't help at all. Only by manually deleting everything in cassandra data directory I managed to recover all the performance.
This solution is quite fine for me as I don't care for the test data I delete over and over again, but it certainly feels a bit weird to have to delete these things manually on file system. Is there a better way to deal with such situation? Or am I going about this whole case completely wrong?
Based on the little information provided, I will provide some info:
First, deleting data creates tombstones in cassandra. The default behavior is to keep these tombstones for 10 days, set by the variable gc_grace_seconds.
Given you only have 1 node and don't care about the data once you delete it, you could set gc_grace_seconds to zero. You also could make sure to run compaction after you do a lot of deletes.
Documentation here:
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCompact.html
Lastly, there is a feature known as TTL, Time To Live. You could use that instead of deleting and let the database do the "deletes" once the data expires. If you go this route, I would still set gc_grace_seconds to zero and run compactions (via an hourly cronjob since its a dev environment).

How to recover a dropped database on MySQL

I accidentaly dropped a database on MySQL yog ultimate. Also, I found that the IT guy uninstalled MySQL yog from the machine.
Now am working on two machines which includes the one from which database was dropped and mysql was uninstalled.
Is there a way to recover the dropped databases.
You said in a comment that you have a backup from a couple of hours prior to the data loss.
If you also have binary logs, you can restore the backup, and then reapply changes from the binary logs.
Here is documentation on this operation: http://dev.mysql.com/doc/refman/5.6/en/point-in-time-recovery.html
You can even filter the binary logs to reapply changes for just one database (mysqlbinlog --database name). For example you may have other databases that were not dropped on the same instance, and you wouldn't want to reapply changes to those other databases.
Recovering two hours worth of binary logs won't take "a very long amount of time." The trickiest part is figuring out the start point to begin replaying the binary logs. If you were lucky enough to include the binary log position with the backup, this will be simpler and very precise. If you have to go by timestamp, it's less precise and you probably cannot hope to do an exact recovery.
If you didn't have binary logs enabled on this instance since you backed up the database, it's a lot trickier to do a data recovery of lost files. You might be able to use a filesystem undelete tool like the EaseUS Data Recovery Wizard (though I can't say I have experience using that tool).
Reconstructing the files you recover is not for the faint of heart, and it's too much to get into here. You might want to get help from a professional MySQL consulting firm. I work for one such firm, Percona, who offers data recovery services.
There's really only one word: Backups.
After MySQL drops database the data is still on the media for a while. So you can fetch records and rebuild database with DBRECOVER.
mysql> drop database employees;
Query OK, 14 rows affected (0.16 sec)
#sync
#sync
select drop database recovery
select MYSQL VERSION as you used; Page Size should be left as 16k
click select directory , and input the ##datadir directory
!!!caution: you should input the ##datadir directory here. pls don't copy the ##datadir directory to any other filesystem or mount point , and use the copy one . The software need to scan the orginal filesystem or mount point ,otherwise it can't work.You'd better set #datadir mount point as read only, avoid any more disk write is necessary. And don't locate DBRECOVER software package on the same filesystem.
https://youtu.be/ao7OY8IbZQE

Replicating CouchDB to local couch reduces size - why?

I recently started using Couch for a large app I'm working on.
I database with 7907 documents, and wanted to rename the database. I poked around for a bit, but couldn't figure out how to rename it, so I figured I would just replicate it to a local database of the name I wanted.
The first time I tried, the replication failed, I believe the error was a timeout. I tried again, and it worked very quickly, which was a little disconcerting.
After the replication, I'm showing that the new database has the correct amount of records, but the database size is about 1/3 of the original.
Also a little odd is that if I refresh futon, the size of the original fluctuates between 94.6 and 95.5 mb
This leaves me with a few questions:
Is the 2nd database storing references to the first? If so, can I delete the first without causing harm?
Why would the size be so different? Had the original built indexes that the new one eventually will?
Why is the size fluctuating?
edit:
A few things that might be helpful:
This is on a cloudant couchdb install
I checked the first and last record of the new db, and they match, so I don't believe futon is underreporting.
Replicating to a new database is similar to compaction. Both involve certain side-effects (incidentally, and intentionally, respectively) which reduce the size of the new .couch file.
The b-tree indexes get balanced
Data from old document revisions is discarded.
Metadata from previous updates to the DB is discarded.
Replications store to/from checkpoints, so if you re-replicate from the same source, to the same location (i.e. re-run a replication that timed out), it will pick up where it left off.
Answers:
Replication does not create a reference to another database. You can delete the first without causing harm.
Replicating (and compacting) generally reduces disk usage. If you have any views in any design documents, those will re-build when you first query them. View indexes use their own .view file which also consumes space.
I am not sure why the size is fluctuating. Browser and proxy caches are the bane of CouchDB (and web) development. But perhaps it is also a result of internal Cloudant behavior (for example, different nodes in the cluster reporting slightly different sizes).

Resources