Can I store a hashmap in couchbase. if I can, how to retrieve data through n1ql
Couchbase at its heart is a key-value store, which behaves very much like a hash map. You can use N1QL to insert or upsert items, e.g.,
INSERT INTO my_bucket_name (KEY, VALUE) VALUES ( "key1", "raw value");
or
INSERT INTO my_bucket_name (KEY, VALUE) VALUES ( "key2", 12345);
or
INSERT INTO my_bucket_name (KEY, VALUE) VALUES ( "key3", {"field1": "value"});
To retrieve any of the documents you would use a N1QL query like:
SELECT * FROM default USE KEYS "key1";
Couchbase supports a number of datastructures, including a Java Map. The stored data is backed by JSON, so you can use it with N1QL. See the documentation on datastructures for more info.
Related
From what I looked so far, it seems impossible with Cassandra. But I thought I'd give it a shot:
How can I select a value of a json property, parsed from a json object string, and use it as part of an update / insert statement in Cassandra?
For example, I'm given the json object:
{
id:123,
some_string:"hello there",
mytimestamp: "2019-09-02T22:02:24.355Z"
}
And this is the table definition:
CREATE TABLE IF NOT EXISTS myspace.mytable (
id text,
data blob,
PRIMARY KEY (id)
);
Now the thing to know at this point is that for a given reason the data field will be set to the json string. In other words, there is no 1:1 mapping between the given json and the table columns, but the data field contains the json object as kind of a blob value.
... Is it possible to parse the timestamp value of the given json object as part of an insert statement?
Pseudo code example of what I mean, which obviously doesn't work ($myJson is a placeholder for the json object string above):
INSERT INTO myspace.mytable (id, data)
VALUES (123, $myJson)
USING timestamp toTimeStamp($myJson.mytimestamp)
The quick answer is no, it's not possible to do that with CQL.
The norm is to parse the elements of the JSON object within your application to extract the corresponding values to construct the CQL statement.
As a side note, I would discourage using the CQL blob type due to possible performance issues should the blob size exceeed 1MB. If it's JSON, consider storing it as CQL text type instead. Cheers!
Worth mentioning, but CQL can do a limited amount of JSON parsing on its own. Albeit, not as detailed as you're asking here (ex: USING timestamp).
But something like this works:
> CREATE TABLE myjsontable (
... id TEXT,
... some_string TEXT,
... PRIMARY KEY (id));
> INSERT INTO myjsontable JSON '{"id":"123","some_string":"hello there"}';
> SELECT * FROM myjsontable WHERE id='123';
id | some_string
-----+-------------
123 | hello there
(1 rows)
In your case you'd either have to redesign the table or the JSON payload so that they match. But as Erick and Cédrick have mentioned, the USING timestamp part would have to happen client-side.
What you detailed is doable with Cassandra.
Timestamp
To insert timestamp in a query it should be formatted as an ISO 8601 String. Sample examples could be found here. In your code, you might have to convert your incoming value to expected type and format.
Blob:
Blob expects to store binary data, as such it cannot be put Ad hoc as a String in a CQL query. (you can use TEXT type to do it if you want to encode base64)
When you need to insert binary data you need to provide proper type as well. For instance if you are working with Javascript to need to provide a Buffer as describe in the documentation Then when you execute your query you externalized your parameters
const sampleId = 123;
const sampleData = Buffer.from('hello world', 'utf8');
const sampleTimeStamp = new Date();
client.execute('INSERT INTO myspace.mytable (id, data) VALUES (?, ?) USING timestamp toTimeStamp(?)', [ sampleId, sampleData, sampleTimeStamp ]);
I am shifting my database from mongodb to dynamo db. I have a problem with delete function from a table where labName is partition key and serialNumber is my sort key and there is one Id as feedId I want to delete all the records from the table where labName is given and feedId is NOT IN (array of ids).
I am doing it in mongo like below mentioned code
Is there a way with BatchWriteItem where i can add condition for feedId without sort key.
let dbHandle = await getMongoDbHandle(dbName);
let query = {
feedid: {$nin: feedObjectIds}
}
let output = await dbModule.removePromisify(dbHandle,
dbModule.collectionNames.feeds, query);
While working with DynamoDB, you can perform Conditional Retrieval (GET) / Deletion (DELETE) on the records only & only if you have provided all of the attributes for the Primary Key. For example:
For a Simple Primary key, you only need to provide a value for the Partition key.
For a Composite Primary Key, you must need to provide values for both the Partition key & sort key.
In my cassandra table i have a collection of Map also i have indexed the map keys.
CREATE TABLE IF NOT EXISTS test.collection_test(
name text,
year text,
attributeMap map<text,text>,
PRIMARY KEY ((name, year))
);
CREATE INDEX ON collection_test (attributeMap);
The QueryBuilder syntax is as below:
select().all().from("test", "collection_test")
.where(eq("name", name)).and(eq("year", year));
How should i put where condition on attributeMap?
First of all, you will need to create an index on the keys in your map. By default, an index created on a map indexes the values of the map, not the keys. There is special syntax to index the keys:
CREATE INDEX attributeKeyIndex ON collection_test (KEYS(attributeMap));
Next, to SELECT from a map with indexed keys, you'll need the CONTAINS KEY keyword. But currently, there is not a definition for this functionality in the query builder API. However, there is an open ticket to support it: JAVA-677
Currently, to accomplish this with the Java Driver, you'll need to build your own query or use a prepared statement:
PreparedStatement statement = _session.prepare("SELECT * " +
"FROM test.collection_test " +
"WHERE attributeMap CONTAINS KEY ?");
BoundStatement boundStatement = statement.bind(yourKeyValue);
ResultSet results = _session.execute(boundStatement);
Finally, you should read through the DataStax doc on When To Use An Index. Secondary indexes are known to not perform well. I can't imagine that a secondary index on a collection would be any different.
I am trying to optimize my spark job by avoiding shuffling as much as possible.
I am using cassandraTable to create the RDD.
The column family's column names are dynamic, thus it is defined as follows:
CREATE TABLE "Profile" (
key text,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='ALL' AND
...
This definition results in CassandraRow RDD elements in the following format:
CassandraRow <key, column1, value>
key - the RowKey
column1 - the value of column1 is the name of the dynamic column
value - the value of the dynamic column
So if I have RK='profile1', with columns name='George' and age='34', the resulting RDD will be:
CassandraRow<key=profile1, column1=name, value=George>
CassandraRow<key=profile1, column1=age, value=34>
Then I need to group elements that share the same key together to get a PairRdd:
PairRdd<String, Iterable<CassandraRow>>
Important to say, that all the elements I need to group are in the same Cassandra node (share the same row key), so I expect the connector to keep the locality of the data.
The problem is that using groupBy or groupByKey causes shuffling. I rather group them locally, because all the data is on the same node:
JavaPairRDD<String, Iterable<CassandraRow>> rdd = javaFunctions(context)
.cassandraTable(ks, "Profile")
.groupBy(new Function<ColumnFamilyModel, String>() {
#Override
public String call(ColumnFamilyModel arg0) throws Exception {
return arg0.getKey();
}
})
My questions are:
Does using keyBy on the RDD will cause shuffling, or will it keep the data locally?
Is there a way to group the elements by key without shuffling? I read about mapPartitions, but didn't quite understand the usage of it.
Thanks,
Shai
I think you are looking for spanByKey, a cassandra-connector specific operation that takes advantage of the ordering provided by cassandra to allow grouping of elements without incurring in a shuffle stage.
In your case, it should look like:
sc.cassandraTable("keyspace", "Profile")
.keyBy(row => (row.getString("key")))
.spanByKey
Read more in the docs:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md#grouping-rows-by-partition-key
In red is using hash, I need to store hash key with multiple fields and values.
I tried as below:
client.hmset("Table1", "Id", "9324324", "ReqNo", "23432", redis.print);
client.hmset("Table1", "Id", "9324325", "ReqNo", "23432", redis.print);
var arrrep = new Array();
client.hgetall("Table1", function(err, rep){
console.log(rep);
});
Output is: { Id: '9324325', ReqNo: '23432' }
I am getting only one value. How to get all fields and values in the hash key? Kindly help me if I am wrong and let me get the code. Thanks.
You are getting one value because you override the previous value.
client.hmset("Table1", "Id", "9324324", "ReqNo", "23432", redis.print);
This adds Id, ReqNo to the Table1 hash object.
client.hmset("Table1", "Id", "9324325", "ReqNo", "23432", redis.print);
This overrides Id and ReqNo for the Table1 hash object. At this point, you only have two fields in the hash.
Actually, your problem comes form the fact you are trying to map a relational database model to Redis. You should not. With Redis, it is better to think in term of data structures and access paths.
You need to store one hash object per record. For instance:
HMSET Id:9324324 ReqNo 23432 ... and some other properties ...
HMSET Id:9324325 ReqNo 23432 ... and some other properties ...
Then, you can use a set to store the IDs:
SADD Table1 9324324 9324325
Finally to retrieve the ReqNo data associated to the Table1 collection:
SORT Table1 BY NOSORT GET # GET Id:*->ReqNo
If you want to also search for all the IDs which are associated to a given ReqNo, then you need another structure to support this access path:
SADD ReqNo:23432 9324324 9324325
So you can get the list of IDs for record 23432 by using:
SMEMBERS ReqNo:23432
In other words, do not try to transpose a relational model: just create your own data structures supporting your use cases.