How can I update expiration of a document in Couchbase using Python 3? - python-3.x

We have a lot of docs in Couchbase with expiration = 0, which means that documents stay in Couchbase forever. I am aware that INSERT/UPDATE/DELETE isn't supported by N1QL.
We have 500,000,000 such docs and I would like to do this in parallel using chunks/bulks. How can I update the expiration field using Python 3?
I am trying this:
bucket.touch_multi(('000c4894abc23031eed1e8dda9e3b120', '000f311ea801638b5aba8c8405faea47'), ttl=10)
However I am getting an error like:
_NotFoundError_0xD (generated, catch NotFoundError): <Key=u'000c4894abc23031eed1e8dda9e3b120'

I just tried this:
from couchbase.cluster import Cluster
from couchbase.cluster import PasswordAuthenticator
cluster = Cluster('couchbase://localhost')
authenticator = PasswordAuthenticator('Administrator', 'password')
cluster.authenticate(authenticator)
cb = cluster.open_bucket('default')
keys = []
for i in range(10):
keys.append("key_{}".format(i))
for key in keys:
cb.upsert(key, {"some":"thing"})
print(cb.touch_multi(keys, ttl=5))
and I get no errors, just a dictionary of keys and OperationResults. And they do in fact expire soon thereafter. I'd guess some of your keys are not there.
However maybe you'd really rather set a bucket expiry? That will make all the documents expire in that time, regardless of what the expiry on the individual documents are. In addition to the above answer that mentions that, check out this for more details.

You can use Couchbase Python (Any) SDK Bucket.touch() method Described here https://docs.couchbase.com/python-sdk/current/document-operations.html#modifying-expiraton
If you don't know the document keys you can use N1QL Covered index get the document keys asynchronously inside your python SDK and use the above bucket touch API set expiration from your python SDK.
CREATE INDEX ix1 ON bucket(META().id) WHERE META().expiration = 0;
SELECT RAW META().id
FROM bucket WHERE META().expiration = 0 AND META().id LIKE "a%";
You can issue different SELECT's for different ranges and do in parallel.
Update Operation, You need to write one. As you get each key do (instead of update) bucket.touch(), which only updates document expiration without modifying the actual document. That saves get/put of whole document (https://docs.couchbase.com/python-sdk/current/core-operations.html#setting-document-expiration).

Related

Google Cloud Datastore Cursor with google.cloud.ndb

I am working with Google Cloud Datastore using the latest google.cloud.ndb library
I am trying to implement pagination use Cursor using the following code.
The same is not fetching the data correctly.
[1] To Fetch Data:
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5)
This code works fine and fetches 5 entities from MyModel
I want to implementation pagination that can be integrated with a Web frontend
[2] To Fetch Next Set of Data
from google.cloud.ndb._datastore_query import Cursor
nextpage_value = "2"
nextcursor = Cursor(cursor=nextpage_value.encode()) # Converts to bytes
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor= nextcursor)
[3] To Fetch Previous Set of Data
previouspage_value = "1"
prevcursor = Cursor(cursor=previouspage_value.encode())
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor=prevcursor)
The [2] & [3] sets of code do not fetch paginated data, but returns results same as results of codebase [1].
Please note I'm working with Python 3 and using the
latest "google.cloud.ndb" Client library to interact with Datastore
I have referred to the following link https://github.com/googleapis/python-ndb
I am new to Google Cloud, and appreciate all the help I can get.
Firstly, it seems to me like you are expecting to use the wrong kind of pagination. You are trying to use numeric values, whereas the datastore cursor is providing cursor-based pagination.
Instead of passing in byte-encoded integer values (like 1 or 2), the datastore is expecting tokens that look similar to this: 'CjsSNWoIb3Z5LXRlc3RyKQsSBFVzZXIYgICAgICAgAoMCxIIQ3ljbGVEYXkiCjIwMjAtMTAtMTYMGAAgAA=='
Such a cursor you can obtain from the first call to the fetch_page() method, which returns a tuple:
(results, cursor, more) where results is a list of query results, cursor is a cursor pointing just after the last result returned, and more indicates whether there are (likely) more results after that
Secondly, you should be using fetch_page() instead of fetch_page_async(), since the second method does not return you the cursors you need for pagination. Internally, fetch_page() is calling fetch_page_async() to get your query results.
Thirdly and lastly, I am not entirely sure whether the "previous page" use-case is doable using the datastore-provided pagination. It may be that you need to implement that yourself manually, by storing some of the cursors.
I hope that helps and good luck!

Generating a unique key for dynamodb within a lambda function

DynamoDB does not have the option to automatically generate a unique key for you.
In examples I see people creating a uid out of a combination of fields, but is there a way to create a unique ID for data which does not have any combination of values that can act as a unique identifier? My questions is specifically aimed at lambda functions.
One option I see is to create a uuid based on the timestamp with a counter at the end, insert it (or check if it exists) and in case of duplication retry with an increment until success. But, this would mean that I could potentially run over the execution time limit of the lambda function without creating an entry.
If you are using Node.js 8.x, you can use uuid module.
var AWS = require('aws-sdk'),
uuid = require('uuid'),
documentClient = new AWS.DynamoDB.DocumentClient();
[...]
Item:{
"id":uuid.v1(),
"Name":"MyName"
},
If you are using Node.js 10.x, you can use awsRequestId without uuid module.
var AWS = require('aws-sdk'),
documentClient = new AWS.DynamoDB.DocumentClient();
[...]
Item:{
"id":context.awsRequestId,
"Name":"MyName"
},
The UUID package available on NPM does exactly that.
https://www.npmjs.com/package/uuid
You can choose between 4 different generation algorithms:
V1 Timestamp
V3 Namespace
V4 Random
V5 Namespace (again)
This will give you:
"A UUID [that] is 128 bits long, and can guarantee uniqueness across
space and time." - RFC4122
The generated UUID will look like this: 1b671a64-40d5-491e-99b0-da01ff1f3341
If it's too long, you can always encode it in Base64 to get G2caZEDVSR6ZsAAA2gH/Hw but you'll lose the ability to manipulate your data through the timing and namespace information contained in the raw UUID (which might not matter to you).
awsRequestId looks like its actually V.4 UUID (Random), code snippet below:
exports.handler = function(event, context, callback) {
console.log('remaining time =', context.getRemainingTimeInMillis());
console.log('functionName =', context.functionName);
console.log('AWSrequestID =', context.awsRequestId);
callback(null, context.functionName);
};
In case you want to generate this yourself, you can still use https://www.npmjs.com/package/uuid or Ulide (slightly better in performance) to generate different versions of UUID based on RFC-4122
For Go developers, you can use these packages from Google's UUID, Pborman, or Satori. Pborman is better in performance, check these articles and benchmarks for more details.
More Info about Universal Unique Identifier Specification could be found here.
We use idgen npm package to create id's. There are more questions on the length depending upon the count to increase or decrease the size.
https://www.npmjs.com/package/idgen
We prefer this over UUID or GUID's since those are just numbers. With DynamoDB it is all characters for guid/uuid, using idgen you can create more id's with less collisions using less number of characters. Since each character has more ranges.
Hope it helps.
EDIT1:
Note! As of idgen 1.2.0, IDs of 16+ characters will include a 7-character prefix based on the current millisecond time, to reduce likelihood of collisions.
if you using node js runtime, you can use this
const crypto = require("crypto")
const uuid = crypto.randomUUID()
or
import { randomUUID } from 'crypto'
const uuid = randomUUID()
Here is a better solution.
This logic can be build without any library used because importing a lambda function layer can get difficult sometimes. Below you can find the link for the code which will generate the unique id and save it in the SQS queue, rather than DB which will incur the cost for writing, fetching, and deleting the ids.
There is also a cloudformation template provided, which you can go and deploy in your account, and it will setup the whole application. A detailed explanation is provided in the link.
Please refer to the link below.
https://github.com/tanishk97/UniqueIdGeneration_AWS_CFT/wiki

Create a Couchbase Document without Specifying an ID

Is it possible to insert a new document into a Couchbase bucket without specifying the document's ID? I would like use Couchbase's Java SDK create a document and have Couchbase determine the document's UUID with Groovy code similar to the following:
import com.couchbase.client.java.CouchbaseCluster
import com.couchbase.client.java.Cluster
import com.couchbase.client.java.Bucket
import com.couchbase.client.java.document.JsonDocument
// Connect to localhost
CouchbaseCluster myCluster = CouchbaseCluster.create()
// Connect to a specific bucket
Bucket myBucket = myCluster.openBucket("default")
// Build the document
JsonObject person = JsonObject.empty()
.put("firstname", "Stephen")
.put("lastname", "Curry")
.put("twitterHandle", "#StephenCurry30")
.put("title", "First Unanimous NBA MVP)
// Create the document
JsonDocument stored = myBucket.upsert(JsonDocument.create(person));
No, Couchbase documents have to have a key, that's the whole point of a key-value store, after all. However, if you don't care what the key is, for example, because you retrieve documents through queries rather than by key, you can just use a uuid or any other unique value when creating the document.
It seems there is no way to have Couchbase generate the document IDs for me. At the suggestion of another developer, I am using UUID.randomUUID() to generate the document IDs in my application. The approach is working well for me so far.
Reference: https://forums.couchbase.com/t/create-a-couchbase-document-without-specifying-an-id/8243/4
As you already found out, generating a UUID is one approach.
If you want to generate a more meaningful ID, for instance a "foo" prefix followed by a sequence number, you can make use of atomic counters in Couchbase.
The atomic counter is a document that contains a long, on which the SDK relies to guarantee a unique, incremented value each time you call bucket.counter("counterKey", 1, 2). This code would take the value of the counter document "counterKey", increment it by 1 atomically and return the incremented value. If the counter doesn't exist, it is created with the initial value 2, which is the value returned.
This is not automatic, but a Couchbase way of creating sequences / IDs.

Dropped and Recreated ArangoDB Databases Retain Collections

A deployment script creates and configures databases, collections, etc. The script includes code to drop databases before beginning so testing them can proceed normally. After dropping the database and re-adding it:
var graphmodule = require("org/arangodb/general-graph");
var graphList = graphmodule._list();
var dbList = db._listDatabases();
for (var j = 0; j < dbList.length; j++) {
if (dbList[j] == 'myapp')
db._dropDatabase('myapp');
}
db._createDatabase('myapp');
db._useDatabase('myapp');
db._create('appcoll'); // Collection already exists error occurs here
The collections that had previously been added to mydb remain in mydb, but they are empty. This isn't exactly a problem for my particular use case since the collections are empty and I had planned to rebuild them anyway, but I'd prefer to have a clean slate for testing and this behavior seems odd.
I've tried closing the shell and restarting the database between the drop and the add, but that didn't resolve the issue.
Is there a way to cleanly remove and re-add a database?
The collections should be dropped when db._dropDatabase() is called.
However, if you run db._dropDatabase('mydb'); directly followed by db._createDatabase('mydb'); and then retrieve the list of collections via db._collections(), this will show the collections from the current database (which is likely the _system database if you were able to run the commands)?.
That means you are probably looking at the collections in the _system database all the time unless you change the database via db._useDatabase(name);. Does this explain it?
ArangoDB stores additional information for managed graphs;
Therefore when working with named graphs, you should use the graph management functions to delete graphs to make shure nothing remains in the system:
var graph_module = require("org/arangodb/general-graph");
graph_module._drop("social", true);
The current implementation of the graph viewer in the management interface stores your view preferences (like the the attribute that should become the label of a graph) in your browsers local storage, so thats out of the reach of these functions.

Neo4j-php retrieve node

I have been exclusively using cypher queries of this client for Neo4j because there is no out of the box way of doing many things. One of those id to get nodes. There is no way to retrieve them without knowing their id, which is very low level. Any idea on how to run a
$client->findOne('property','value');
?
It should be straightforward but it isn't from the documentation.
Make Indexes on the properties you want to search, from a newly created $personNode
$personIndex = new \Everyman\Neo4j\NodeIndex($client, 'person');
$personIndex->add($personNode, 'name', $personNode->name);
Then later to search, the new PHP object $personIndex will reference the same, populated index as above.
$personIndex = new \Everyman\Neo4j\NodeIndex($client, 'person');
$match = $personIndex->findOne('name', 'edoceo');

Resources