Arangodb journal logfiles

Arangodb journal logfiles - arangodb

What for are logfiles in:
"arango_instance_database/journals/logfile-xxxxxx.db
Can I delete them ?
How can I reduce their size ?
I set
database.maximal-journal-size = 1048576
but those files are still 32M large.
Can I set some directory for them like
/var/log/...
?

You're referencing the Write Ahead Logfiles which are at least temporarily the files your data is kept in.
So its a very bad idea to remove them on your own, as long as you still like your data to be intact.
The files are used so documents can be written to disk in a contineous fashion. Once the system is idle, the aggregator job will pick the documents from them and move them over into your database files.
You can find interesting documentation of situations where others didn't choose such an architectural approach, and data was written directly into their data files on the disk, and what this then does to your sytem.
Once all documents in a WAL-file have been moved into db files, the ArangoDB will remove the allocated space.

Thank You a lot for reply :-)
So in case of arangodb deployed as "single instance" I can set:
--wal.suppress-shape-information true
--wal.historic-logfiles 0
Anything else ?
How about
--wal.logfile-size
What are best/common practises in determining its size ?

Related

Limiting Kismet log files to a size or duration

Looking for a solid way to limit the size of Kismet's database files (*.kismet) through the conf files located in /etc/kismet/. The version of Kismet I'm currently using is 2021-08-R1.
The end state would be to limit the file size (10MB for example) or after X minutes of logging the database is written to and closed. Then, a new database is created, connected, and starts getting written to. This process would continue until Kismet is killed. This way, rather than having one large database, there will be multiple smaller ones.
In the kismet_logging.conf file there are some timeout options, but that's for expunging old entries in the logs. I want to preserve everything that's being captured, but break the logs into segments as the capture process is being performed.
I'd appreciate anyone's input on how to do this either through configuration settings (some that perhaps don't exist natively in the conf files by default?) or through plugins, or anything else. Thanks in advance!

Two interesting ways:
One could let the old entries be taken out, but reach in with SQL and extract what you wanted as a time-bound query.
A second way would be to automate the restarting of kismet... which is a little less elegant.. but seems to work.
https://magazine.odroid.com/article/home-assistant-tracking-people-with-wi-fi-using-kismet/
If you read that article carefully... there are lots of bits if interesting information here.

memsql how to free space from plancachedir

I have a very small memsql instance which have on 200 tables 200MB data in total. The plancachedir kept fulling the file system (25GB+). I tried to shutdown the databases, deleted files under plancachedir. but after restarted database, all files came back. "show plancache" show 0 entries so there's no plans to be deleted.
Would anyone let me know the best way to manage the plancachedir space consumption?
Thanks in advance.

So, if you are comfortable turning off your machine and deleting the plancache directory, try just running SNAPSHOT <db_name> on each database before turning off the server and deleting the plancache.
Otherwise the queries will be recompiled for every write query and alter table you ran during recovery.
25 Gigs is a lot though...
To be honest, MemSQL is not optimized for the case of many 1-meg tables...
Depending on your use case, it might be worth investigating our JSON datatype, or rethinking your schemas.

Maximum number of databases in arangodb

I am a new user of Arango DB and I am currently evaluating it for my project.
Can someone please tell me, what the maximum number of databases you can create in Arango DB is?
Thanks.

As far as I know, there are virtually no limits to the number of databases in ArangoDB.
The only thing you have to keep in mind, are the resources that are needed for databases and their collections.
Some of those resources, for each Database / Collection, are:
Files on disk: space and file descriptors needed.
Memory: each Database / Collection will take up space on the disk (and also in memory when loaded.)
For a collection, the number of file descriptors needed at any time is dependent of the journal size defined for it. If the journal size is big, less files are needed, ergo less file descriptors (and their associated resources) are needed.
There is also a nice blog post on disk space usage, here. It is a bit older, and might not be accurate now, but it should give you a general idea.
https://www.arangodb.com/2012/07/collection-disk-usage-arangodb/
Regarding journal-sizes and performance, you should also look at this:
https://www.arangodb.com/2012/09/performance-different-journal-sizes/

What's the function of snapshot in Cassandra?

Although I checked Datastax's document about snapshot, I am still confused about what a snapshot in cassandra is. What's the function or main purpose of a snapshot?
Under the snapshot folder, I find some subfolder named in convention of this:
1426256545571-tablename
What does the number at the very beginning mean? Anyway, I just a need a easy way to know what a snapshot is.

The number is the number of ms from epoch (timestamp). A snapshot is just a local backup. It occurs automatically for some types of operations like truncate (in case done by accident and want to undo it).
They are very fast and don't cost any extra disk space up front since its just a hard link to the immutable data files. Eventually you want to clean them up though to reclaim disk as compactions occur. You can disable the auto_snapshot option in cassandra.yaml if you don't want them anymore. It is likely you will see them while doing repairs, still.

CouchDB .view file growing out of control?

I recently encountered a situation where my CouchDB instance used all available disk space on a 20GB VM instance.
Upon investigation I discovered that a directory in /usr/local/var/lib/couchdb/ contained a bunch of .view files, the largest of which was 16GB. I was able to remove the *.view files to restore normal operation. I'm not sure why the .view files grew so large and how CouchDB manages .view files.
A bit more information. I have a VM running Ubuntu 9.10 (karmic) with 512MB and CouchDB 0.10. The VM has a cron job which invokes a Python script which queries a view. The cron job runs once every five minutes. Every time the view is queried the size of a .view file increases. I've written a job to monitor this on an hourly basis and after a few days I don't see the file rolling over or otherwise decreasing in size.
Does anyone have any insights into this issue? Is there a piece of documentation I've missed? I haven't been able to find anything on the subject but that may be due to looking in the wrong places or my search terms.

CouchDB is very disk hungry, trading disk space for performance. Views will increase in size as items are added to them. You can recover disk space that is no longer needed with cleanup and compaction.
Every time you create update or delete a document then the view indexes will be updated with the relevant changes to the documents. The update to the view will happen when it is queried. So if you are making lots of document changes then you should expect your index to grow and will need to be managed with compaction and cleanup.
If your views are very large for a given set of documents then you may have poorly designed views. Alternatively your design may just require large views and you will need to manage that as you would any other resource.
It would be easier to tell what is happening if you could describe what document updates (inc create and delete) are happening and what your view functions are emitting, especially for the large view.

That your .view files grow, each time you access a view is because CouchDB updates views on access. CouchDB views need compaction like databases too. If you have frequent changes to your documents, resulting in changes in your view, you should run view compaction from time to time. See http://wiki.apache.org/couchdb/HTTP_view_API#View_Compaction
To reduce the size of your views, have a look at the data, you are emitting. When you emit(foo, doc) the entire document is copied to the view to it is very instantly available when you query the view. the function(doc) { emit(doc.title, doc); } will result in a view as big as the database itself. You could also emit(doc.title, nil); and use the include_docs option to let CouchDB fetch the document from the database when you access the view (which will result in a slightly performance penalty). See http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options

Use sequential or monotonic id's for documents instead of random
Yes, couchdb is very disk hungry, and it needs regular compactions. But there is another thing that can help reducing this disk usage, specially sometimes when it's unnecessary.
Couchdb uses B+ trees for storing data/documents which is very good data structure for performance of data retrieval. However use of B-tree trades in performance for disk space usage. With completely random Id, B+-tree fans out quickly. As the minimum fill rate is 1/2 for every internal node, the nodes are mostly filled up to the 1/2 (as the data spreads evenly due to its randomness) generating more internal nodes. Also new insertions can cause a rewrite of full tree. That's what randomness can cause ;)
Instead, use of sequential or monotonic ids can avoid all.

I've had this problem too, trying out CouchDB for a browsed-based game.
We had about 100.000 unexpected visitors on the first day of a site launch, and within 2 days the CouchDB database was taking about 40GB in space. This made the server crash because the HD was completely full.
Compaction brought that back to about 50MB. I also set the _revs_limit (which defaults to 1000) to 10 since we didn't care about revision history, and it's running perfectly since. After almost 1M users, the database size is usually about 2-3GB. When i run compaction it's about 500MB.
Setting document revision limit to 10:
curl -X PUT -d "10" http://dbuser:dbpassword#127.0.0.1:5984/yourdb/_revs_limit
Or without user:password (not recommended):
curl -X PUT -d "10" http://127.0.0.1:5984/yourdb/_revs_limit

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Arangodb journal logfiles - arangodb

What for are logfiles in: "arango_instance_database/journals/logfile-xxxxxx.db Can I delete them ? How can I reduce their size ? I set database.maximal-journal-size = 1048576 but those files are still 32M large. Can I set some directory for them like /var/log/... ?

Thank You a lot for reply :-) So in case of arangodb deployed as "single instance" I can set: --wal.suppress-shape-information true --wal.historic-logfiles 0 Anything else ? How about --wal.logfile-size What are best/common practises in determining its size ?

Related

Limiting Kismet log files to a size or duration

memsql how to free space from plancachedir

Maximum number of databases in arangodb

What's the function of snapshot in Cassandra?

CouchDB .view file growing out of control?

Categories

Resources