more that 1mb data in memcahed nodejs - node.js

I am using this memcached package with nodejs. As default max size of data per key is 1mb i am facing problem when data is more than 1mb for a particular key.
One work around would be in memcache.conf setting default max size more than 1 mb using
-I 2M
and in code setting the maxValue
var memcached = new Memcached('localhost:11211', {maxValue: 2097152});
What would be proper way to stay in 1mb limit? I have read suggestion about splitting data into multiple keys. How can i achieve multiple key splitting with JSON data in memcached package.

Options available :
1/ Make sure you are using compression while storing them in memcached, your nodejs memcached driver would be supporting gzip compression.
2/ Split the data into multiple keys
3/ Increase max object size to more than 1 MB ( but that may increase fragmentation,decrease performance based on your cache usage )
4/ Use redis as cache instead of memcached if your object sizes are usually large. Redis string datatype supports objects upto 512 MB in size, that would be easily available as direct get,set interface in any standard nodejs-redis cache driver

Related

Cassandra maximum realistic blob size

I'm trying to evaluate a few distributed storage platforms and Cassandra is one of them.
Our requirement is to save files between 1MB and 50MB of size and according to Cassandra's documentation http://docs.datastax.com/en/cql/3.3/cql/cql_reference/blob_r.html:
The maximum theoretical size for a
blob is 2 GB. The practical limit on blob size, however, is less than
1 MB.
Does anyone have experience storing files in Cassandra as blobs? Any luck with it? Is the performance really bad with bigger file sizes?
Any other suggestion would also be appreciated!
Cassandra was not build for these type of job.
In Cassandra a single column value size can be: 2 GB ( 1 MB is recommended). So If you want to use use cassandra as object storage, split the big object into multiple small object and store them with object id as partition key and bucket id as clustering key.
It is best to use Distributed Object Storage System like OpenStack Object Storage ("Swift")
The OpenStack Object Store project, known as Swift, offers cloud storage software so that you can store and retrieve lots of data with a simple API. It's built for scale and optimized for durability, availability, and concurrency across the entire data set. Swift is ideal for storing unstructured data that can grow without bound.

Fastest DB for audio data?

What is the fastest free DB available between redis, mongodb and mysql (or other if justified) to use with nodejs for storing and querying audio data?
The audio will be stored in the wav format, and should not exceed 1 MB in size.
I expect the number of concurrent requests to around 50 per second.
My constraints are free DBs, and speed.
Is there any comparison between the different DBs on this?

Estimating Cassandra KeyCache

We are currently in the process of deploying a larger Cassandra cluster and looking for ways to estimate the best size of the key cache. Or more accurately looking for a way of finding out the size of one row in the key cache.
I have tried tying into the integrated metrics systems using graphite, but I wasn't able to receive any clear answer. Further I tried putting my own debugging code into org.cassandra.io.sstable, but this neither yielded any concrete results.
We are using Cassandra 1.20.10, but are there any fool proof solutions to getting the size of one row in the key cache?
With best regards,
Ben
Check out jamm. Its a library used for measuring the size of an object in memory.
You need to add -javaagent:"/path/to/jamm.jar" to your startup parameters but cassandra is configured to start with jamm, so if you change internal cassandra code, this is already done for you.
To size of objects (in bytes):
MemoryMeter meter = new MemoryMeter();
meter.measureDeep(object);
Measure deep is a more costly but much more accurate measurement of an object's memory size.
For estimation of key size, let's assume you intended to store 1 million keys in cache, each key of length 60 bytes on an average. There will be some overhead to store the key, lets say it is 40 bytes that means key size per row = 100 bytes.
Since we need to cache 1 million keys
total key cache = 1 mn * 100 = 100 Mbytes
perform this for each CF in your keyspace.

cassandra read performance for large number of keys

Here is situation
I am trying to fetch around 10k keys from CF.
Size of cluster : 10 nodes
Data on node : 250 GB
Heap allotted : 12 GB
Snitch used : property snitch with 2 racks in same Data center.
no. of sstables for cf per node : around 8 to 10
I am supercolumn approach.Each row contains around 300 supercolumn which in terms contain 5-10 columns.I am firing multiget with 10k row keys and 1 supercolumn.
When fire the call 1st time it take around 30 to 50 secs to return the result.After that cassandra serves the data from key cache.Then it return the result in 2-4 secs.
So cassandra read performance is hampering our project.I am using phpcassa.Is there any way I can tweak cassandra servers so that I can get result faster?
Is super column approach affects the read performance?
Use of super columns is best suited for use cases where the number of sub-columns is a relatively small number. Read more here:
http://www.datastax.com/docs/0.8/ddl/column_family
Just in case you haven't done this already, since you're using phpcassa library, make sure that you've compiled the Thrift library. Per the "INSTALLING" text file in the phpcassa library folder:
Using the C Extension
The C extension is crucial for phpcassa's performance.
You need to configure and make to be able to use the C extension.
cd thrift/ext/thrift_protocol
phpize
./configure
make
sudo make install
Add the following line to your php.ini file:
extension=thrift_protocol.so
After doing much of RND about this stuff we figured there is no way you can get this working optimally.
When cassandra is fetching these 10k rows 1st time it is going to take time and there is no way to optimize this.
1) However in practical, probability of people accessing same records are more.So we take maximum advantage of key cache.Default setting for key cache is 2 MB. So we can afford to increase it to 128 MB with no problems of memory.
After data loading run the expected queries to warm up the key cache.
2) JVM works optimally at 8-10 GB (Dont have numbers to prove it.Just observation).
3) Most important if you are using physical machines (not cloud OR virtual machine) then do check out the disk scheduler you are using.Set it NOOP which is good for cassandra as it reads all keys from one section reducing disk header movement.
Above changes helped to bring down time required for querying within acceptable limits.
Along with above changes if you have CFs which are small in size but frequently accessed enable row caching for it.
Hope above info is useful.

sqlite database design with millions of 'url' strings - slow bulk import from csv

I'm trying to create an sqlite database by importing a csv file with urls. The file has about 6 million strings. Here are the commands I've used
create table urltable (url text primary key);
.import csvfile urldatabase
After about 3 million urls the speed slows down a lot and my hard disk keeps spinning continuously. I've tried splitting the csv file into 1/4th chunks but I run into the same problem.
I read similar posts on stackoverflow and tried using BEGIN...COMMIT blocks and PRAGMA synchronous=OFF but none of them helped. The only way I was able to create the database was by removing the primary key constraint from url. But then, when I run a select command to find a particular url, it takes 2-3 seconds which won't work for my application.
With the primary key set on url, the select is instantaneous. Please advise me what am I doing wrong.
[Edit]
Summary of suggestions that helped :
Reduce the number of transactions
Increase page size & cache size
Add the index later
Remove redundancy from url
Still, with a primary index, the database size is more than double the original csv file that I was trying to import. Any way to reduce that?
Increase your cache size to something large enough to contain all of the data in memory. The default values for page size and cache size are relatively small and if this is a desktop application then you can easily increase the cache size many times.
PRAGMA page_size = 4096;
PRAGMA cache_size = 72500;
Will give you a cache size of just under 300mb. Remember page size must be set before the database is created. The default page size is 1024 and default cache size is 2000.
Alternatively (or almost equivalently really) you can create the database entirely in an in-memory database and then use the backup API to move it to an on-disk database.
A PRIMARY KEY or UNIQUE constraint will automatically generate an index. An index will dramatically speed up SELECTs, at the expense of slowing down INSERTs.
Try importing your data into a non-indexed table, and then explicitly CREATE UNIQUE INDEX _index_name ON urltable(url). It may be faster to build the index all at once than one row at a time.

Resources