Using set as Primary Key in Dynamodb Index - python-3.x

I have created a Dynamodb model where I have set an attribute to a set using a combination of three separate id's and another attribute which takes in timestamp. The idea was to create a GIS index on these two with the set attribute as the primary key and timestamp as the sort key. While using the "equality" operator for KeyConditionExpression, I am unable to fetch the data. Not sure what the issue is. So if somebody can guide me whether I am following the right approach or I am missing something.
Below is the set attribute value sample
{ "291447cb-f7a5-4627-9a7e-ac7b4adf9xce", "21", "d2e5723a-437a-4517-9f4b-1a62575224d6" }

DynamoDB can only use keys of scalar types (single value string, number or binary). What you could do is concatenate the values into a string for your key (e.g. "291447cb-f7a5-4627-9a7e-ac7b4adf9xce:21:d2e5723a-437a-4517-9f4b-1a62575224d6").
Don't forget in your table you'd need to store this concatenated key so it can be used in your GSI. And you'd need to make sure it's updated / kept in sync with the set as per your requirements.

Related

Find Unique constraint on a column using jooq

I have a service which consumes the database meta. It consumes the table names and alongwith it the respective column names. Now I can see there is a method called nullable() which can be used to check if the column is nullable or not. Similarly, i want to check if the column allows only unique values or not
Version used: 3.14.15
How to read database meta data with jOOQ
The API to use for this kind of schema introspection is org.jooq.Meta, which you can access via DSLContext.meta(). By default, it is backed by the JDBC DatabaseMetaData API, but you can replace that default to work with any of these instead:
Generated code
Interpreted DDL
XML
You can do that by passing a MetaProvider to your Configuration.
Once you reach a Table, do check these methods:
Table.getKeys() (primary keys and unique keys)
Table.getUniqueKeys() (unique keys only)
Table.getPrimaryKey() (primary key only)

Azure Cosmos DB - Can I use a JSON field that doesn't exist for all documents as my partition key?

I am trying to setup a new Cosmos DB and it's asking me to set a partition key. I think I understand the concept where I should select a JSON field that can group my documents efficiently.
Is it possible to configure the collection to use a JSON field that may not exist in every incoming document?
For example:
{
"name" : "Robin",
"DOB" : "01/01/1969",
"scans" : {
"bloodType" : "O"
}
}
{
"name" : "Bill",
"DOB" : "01/01/1969"
}
Can I use /scans.bloodType as the partion key? For documents that don't have a scans JSON field, I still want that data as I can update that document later.
You can, indeed, specify a partition key that might not exist in every document. When you save a document that's missing the property specified by your partition key, it will result in assigning an "undefined" value for its partition key.
In the future, if you wanted to provide a value for the partition key of such a document, you'd have to delete, and then re-add, the document. You cannot modify a property's value when it happens to be the partition key within that document's container (nor can you add the property to an existing document that doesn't explicitly have that property defined, since it's already been assigned the "undefined" value).
See this answer for more details on undefined partition key properties.
Unfortunately you can't do that.
As per the official docs the partition key path (/scans.bloodType) or the partition key value cannot be changed.
Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value.
In terms of solutions, you could either try and find another partition Key property path and ensure there's a value at the time of creation, or maybe use a secondary collection to store your incomplete documents and use the change feed to "move" them to the final collection once all the data becomes available.
Something like blood type wouldn't be a good partition key, assuming it refers to A/B/O types, etc, if there are only a small number of possible values. What you can generally always fall back on is the unique id property of the item, which Cosmos creates automatically. It's fine to have one item per partition, if nothing else is a better candidate. Per docs:
If your container has a property that has a wide range of possible
values, it is likely a great partition key choice. One possible
example of such a property is the item ID. For small read-heavy
containers or write-heavy containers of any size, the item ID is
naturally a great choice for the partition key.

Name the primary key of Container in Cosmbos DB Sql Api different than "id"

I am working on Azure and I have created a database with Cosmos DB SQL API. After creating a container I see that the primary key is always named "id". Is there any way to create a container with PK with name different than "id"?
Every document has a unique id called id. This cannot be altered. If you don't set a value, it is assigned a GUID.
When using methods such as ReadDocument() (or equivalent, based on the SDK) for direct reads instead of queries, a document's id property must be specified.
Now, as far as partitioning goes, you can choose any property you want to use as the partition key.
And if you have additional domain-specific identifiers (maybe a part number), you can always store those in their own properties as well. In this example though, just remember that, while you can query for documents containing a specific part number, you can only do a direct-read via id. If direct reads will be the majority of your read operations, then it's worth considering the use of id to store such a value.
PK = primary key, by the way ;) It's the "main" key for the record, it is not private :)
And no, as far as I know you cannot change the primary key name. It is always "id". Specifically it must be in lowercase, "Id" is not accepted.

DocumentDB REST API: x-ms-documentdb-partitionkey is invalid

I am attempted to get a Document from DocumentDB using the REST API. I am using a partitioned Collection and therefore need to add the "x-ms-documentdb-partitionkey" header. If I add this, I get "Partition key abc is invalid". I can't find anywhere in the documentation that expects the key to be in specific format, but simply supplying the expected string value does not work. Does anyone know the expected format?
Partition key must be specified as an array (with a single element). For example:
x-ms-documentdb-partitionkey: [ "abc" ]
Partition key for a partitioned collection is actually the path to a property in DocumentDB. Thus you would need to specify it in the following format:
/{path to property name} e.g. /department
From Partitioning and scaling in Azure DocumentDB:
You must pick a JSON property name that has a wide range of values and
is likely to have evenly distributed access patterns. The partition
key is specified as a JSON path, e.g. /department represents the
property department.
More examples are listed in the link as well.

RethinkDB: How do I create a custom duplicate check on insert

I want to bulk insert an array of data using NodeJS and RethinkDB but I don't want to insert existing records (where name & value already has a record, I don't want to dupcheck on primary key id).
[
{name:"Robert", value:"1337"},
{name:"Martin", value:"0"},
{name:"Oskar", value:"1"}
]
If any of the above values already exist, don't insert, but update "value".
My current working solution is that I loop through the array and first check if it exists using a filter, if not, i insert it. But it's very slow on 10.000 records.
I don't think we have that kind of concept in RethinkDB. I tried to read the doc more. To insert a new document, use insert, to update field, use update, to replace to a whole new document, use replace(the primary key won't change)...So I don't think it's possible in RethinkDB.
Here is some way you can make it run faster:
Create a compound index contains those two fields: name and value
Then using that index to check for existence instead of using filter
Generate your own id field, instead of letting RethinkDB generated it. Therefore, you know the primary key, and use it to look up document with get which will be very fast.
I had a similar requirement in a RethinkDB project, but in that case the primary key was being checked for duplicates, and it was also custom instead of being auto-generated.
What you could do is run an async.series or async.waterfall two-step check. First pick a single object from your array, then filter the database for the name-value pairs of your current object. If the results come up null, it is unique. If not, you have a pre-existing record with same details.
Depending on the result, you can then pass on the control to next step which will either insert the new document or update existing one. It will be simpler if you use a flag for this in async.waterfall.

Resources