Reading Cassandra Map in Node.js - node.js

I have table created using map in cassandra, Now i am trying to read the table from node.js and it returns object for the map, can i get the item count in a map and loop through it to get the items in the map?
table script
create table workingteam (teamid bigint primary key, status map)

You did not post a lot of details. First you will need to study the object Cassandra sends you. Good way to start would be to convert it to the JSON format and dump to the output through log.
console.log("Cassandra sent: %j", object);
I'm guessing in this object you will find attributes like connection parameters, host, client etc, but also something iterative that will contain keys and values.

Related

Converting JSON Data from S3 upload, and using Lambda function to push to DynamoDB

I've been working on an assignment recently and I feel like I'm very close to solving the problem I'm having, but I just can't seem to find anything that would help online.
As the title states, I've got some JSON data being uploaded from a webpage into an S3 bucket. When a new S3 item is created, I want to take that data and store it in a DynamoDB table.
I'm using a Lambda function and testing with some data I've already stored in my S3 bucket. I've got the data in its key-value pairs in my console.logs but I just can't work out why it isn't actually storing the data.
On the left I have the data broken down into its key-value pair, i.e. "artist": "Elvis Presley", using JSON.parse(JSON.stringify(data)).
What I'm wondering, is how to push this data into the table.
var params = {
Item: JSON.parse(JSON.stringify(data)),
ReturnConsumedCapacity: "TOTAL",
TableName: "s3-to-dynamo-s00187306"
};
dynamo.putItem(params, dynamoResultCallback);
The above code is what I've been trying to use but it's giving me a timeout error. If I bump up the allowed time then I receive a different error relating to a missing partition key in the item, even though my partition key matches with one of the key values in every item.
Really stumped here, any advice is appreciated, thanks in advance.
[edit]
So I used what someone suggested below, the dynamo-db converter, and have some logs which provide some insight into what's going on.
I've now got the data in the correct format for dynamo-db, and each item is parsed correctly as far as I can tell.
As for what dynamo represents, I'm not 100% so I'm going to add a screenshot of its declaration at the top of my code. I think it's the doc client?
[edit 2]
So my "_class" values are all the exact same, might try changing the partition key to title instead? (nevermind this didn't work)
JSON.stringify(data) return a json format that not match with Dynamodb format, Dynamodb are waiting a format like this:
Item: {
'CUSTOMER_ID' : {N: '001'},
'CUSTOMER_NAME' : {S: 'Richard Roe'}
}
As you see, the syntax is not the same, I think you need to use another library, maybe dynamo-converters, or look at NodeJs Aws SDK maybe there is a method that can do this.

Return the item number X in DynamoDB

I would like to provide one piece of content per day storing all items in dynamoDB. I will add new content from time to time but only one piece of content needs to be read per day.
It seems it's not recommended to have incremental Id as primary key on dynamoDB.
Here is what I have at the moment:
content_table
id, content_title, content_body, content_author, view_count
1b657df9-8582-4990-8250-f00f2194abe9, title_1, body_1, author_1, view_count_1
810162c7-d954-43ff-84bf-c86741d594ee, title_2, body_2, author_2, view_count_2
4fdac916-0644-4237-8124-e3c5fb97b142, title_3, body_3, author_3, view_count_3
The database will have a low rate of adding new item has I will add new content myself manually.
How can I get the item number XX without querying all the database in nodeJS ?
Should I switch back to a MySQL database ?
Should I use a homemade auto increment even if it's an anti pattern ?
Should I used a time-based uuid, and do a query like, get all ids, sort them, and get the number X in the array ?
Should I use a tool like http://www.stateful.co/ ?
Thanks for your help
I would make the date your hash key, you can then simply get the content from any particular day using GetItem.
date, content_title, content_body, content_author, view_count
20180208, title_1, body_1, author_1, view_count_1
20180207, title_2, body_2, author_2, view_count_2
20180206, title_3, body_3, author_3, view_count_3
If you think you might have more than one piece of content for any one day in future, you could add a datetime attribute and make this the range key
date, datetime, content_title, content_body, content_author, view_count
20180208, 20180208101010, title_1, body_1, author_1, view_count_1
20180208, 20180208111111, title_2, body_2, author_2, view_count_2
20180206, 20180208101010, title_3, body_3, author_3, view_count_3
Its then still very fast and simple to execute a Query to get the content for a particular day.
Note that due to the way DynamoDB distributes throughput, if you choose the second option, you might want to archive old content into another table.

Spark DataFrame Filter using Binary (Array[Bytes]) data

I have a DataFrame from a JDBC table hitting MySql and I need to filter it using a UUID. The data is stored in MySql using binary(16) and when querying out in spark is converted to Array[Byte] as expected.
I'm new to spark and have been trying various ways to pass a variable of type UUID into the DataFrame's filter method.
Ive tried statements like
val id: UUID = // other logic that looks this up
df.filter(s"id = $id")
df.filter("id = " convertToByteArray(id))
df.filter("id = " convertToHexString(id))
All of these error with different messages.
I just need to somehow pass in Binary types but can't seem to put my finger on how to do so properly.
Any help is greatly appreciated.
After reviewing even more sources online, I found a way to accomplish this without using the filter method.
When I'm reading from my sparkSession, I just use an adhoc table instead of table name, as follows:
sparkSession.read.jdbc(connectionString, s"(SELECT id, {other col omitted) FROM MyTable WHERE id = 0x$id) AS MyTable", props)
This pre-filters the results for me and then I just work with the data frame as I need.
If anyone knows of a solution using filter, I'd still love to know it as that would be useful in some cases.

Create a Couchbase Document without Specifying an ID

Is it possible to insert a new document into a Couchbase bucket without specifying the document's ID? I would like use Couchbase's Java SDK create a document and have Couchbase determine the document's UUID with Groovy code similar to the following:
import com.couchbase.client.java.CouchbaseCluster
import com.couchbase.client.java.Cluster
import com.couchbase.client.java.Bucket
import com.couchbase.client.java.document.JsonDocument
// Connect to localhost
CouchbaseCluster myCluster = CouchbaseCluster.create()
// Connect to a specific bucket
Bucket myBucket = myCluster.openBucket("default")
// Build the document
JsonObject person = JsonObject.empty()
.put("firstname", "Stephen")
.put("lastname", "Curry")
.put("twitterHandle", "#StephenCurry30")
.put("title", "First Unanimous NBA MVP)
// Create the document
JsonDocument stored = myBucket.upsert(JsonDocument.create(person));
No, Couchbase documents have to have a key, that's the whole point of a key-value store, after all. However, if you don't care what the key is, for example, because you retrieve documents through queries rather than by key, you can just use a uuid or any other unique value when creating the document.
It seems there is no way to have Couchbase generate the document IDs for me. At the suggestion of another developer, I am using UUID.randomUUID() to generate the document IDs in my application. The approach is working well for me so far.
Reference: https://forums.couchbase.com/t/create-a-couchbase-document-without-specifying-an-id/8243/4
As you already found out, generating a UUID is one approach.
If you want to generate a more meaningful ID, for instance a "foo" prefix followed by a sequence number, you can make use of atomic counters in Couchbase.
The atomic counter is a document that contains a long, on which the SDK relies to guarantee a unique, incremented value each time you call bucket.counter("counterKey", 1, 2). This code would take the value of the counter document "counterKey", increment it by 1 atomically and return the incremented value. If the counter doesn't exist, it is created with the initial value 2, which is the value returned.
This is not automatic, but a Couchbase way of creating sequences / IDs.

Astyanax key range query

trying to write a query which will paginate through all rows in a column family using astyanax client and RowSliceQuery.
keyspace.prepareQuery(COLUMN_FAMILY).getKeyRange(null, null, null, null, 100);
Done this successfully using hector where 1st call is done with null start and end keys. After retrieving 1st page I use last key from the result to make query for second page and etc. This is code for 1st page using hector.
HFactory.createRangeSlicesQuery(keyspace,
LongSerializer.get(), new CompositeSerializer(),
BytesArraySerializer.get())
.setColumnFamily(COLUMN_FAMILY)
.setRange(null, null, false, 100).setRowCount(100);
Now when I am trying to do this with astyanax I am getting errors about null and non-null keys and tokens. Not sure what tokens do in this query. Also I am able to use allRows(), but would like to do this using key range query as it gives me more flexibility.
Does anybody have an example of key range query using astyanax? I cannot find an example neither in "getting started" documentation or anywhere else on the net.
Thanks!
Anton
What you are referring to is the getRowRange method:
keyspace.prepareQuery(CF_STANDARD1)
.getRowRange(startKey, endKey, startToken, endToken, count)
Note however that this works only when the ByteOrderedPartitioner is used. Since by default Cassandra uses the Murmur3Partitioner, this will usually not work. Using an index to do this instead is recommended. Astyanax also provides the reverse index search recipe which takes advantage of a second column family which stores your keys as columns to allow efficient range searches on the original data.
Check this sample code. I hope this code will help you in doing the paging.
IndexQuery<String, String> query = keyspace
.prepareQuery(CF_STANDARD1).searchWithIndex()
.setRowLimit(10).autoPaginateRows(true).addExpression()
.whereColumn("Index2").equals().value(42);
Best,

Resources