Azure Cosmos DB SQL API - read/query document with unknown data structure - azure

I have a Cosmos DB with a container that contains document with varying structure.
I am using the Java SQL API for reading the documents from this container.
The issue I am having is that the API methods for querying/reading the container expects a model class as input param and will return instances of the model class. Because my container contains documents that have varying fields and depth, it is not possible for me to create a model class to represent this.
I need to be able to read/query the documents and then parse it myself and extract the values that I am looking for.
Any ideas? I have used "Object" in the API methods for e.g. queryItem and then it returns a LinkedHashMap that I can parse myself. Is this the way to do it? It looks a bit "raw" but I have not found a better way.
Below is a typical example from the SDK doc. I cannot create a "Family" model class in my code, because the structure can vary from document to document - both which fields are stored and the depth.
private void queryItems() {
CosmosQueryRequestOptions queryOptions = new CosmosQueryRequestOptions();
queryOptions.setQueryMetricsEnabled(true);
CosmosPagedIterable<Family> familiesPagedIterable = container.queryItems(
"SELECT * FROM Family WHERE Family.lastName IN ('Andersen', 'Wakefield', 'Johnson')", queryOptions, Family.class);
familiesPagedIterable.iterableByPage(10).forEach(cosmosItemPropertiesFeedResponse -> {
logger.info("Got a page of query result with {} items(s) and request charge of {}",
cosmosItemPropertiesFeedResponse.getResults().size(), cosmosItemPropertiesFeedResponse.getRequestCharge());
logger.info("Item Ids {}", cosmosItemPropertiesFeedResponse
.getResults()
.stream()
.map(Family::getId)
.collect(Collectors.toList()));
});
}

Per my understanding, it's determined by the sdk funtion's input parameters and output data type. And exactly, we can find that both sample code for java or spring are depends on the data model. So it's really good for you to use Object in your code because of the various documents.
And it's true that we can't design a data model to contain all the properties in the documents but I think it's also a good idea to set a model which contains all the properties required. I mean that maybe you have a useless property in a query, so the query model should exclude it.

I think I found the proper solution:
Create model class. Define the members with unknown depth and structure as JsonNode.
Then the model class could be used and the values of the JsonNode accessed using nice methods.

Related

create dynamic object in GO operator controller

I am following the below blog which explains how to create operator and import another CR into existing one.
http://heidloff.net/article/accessing-third-party-custom-resources-go-operators/
here https://github.com/nheidloff/operator-sample-go/blob/aa9fd15605a54f712e1233423236bd152940f238/operator-application/controllers/application_controller.go#L276 , spec is created with hardcoded properties.
I want to import the spark operator types in my operator.
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/pkg/apis/sparkoperator.k8s.io/v1beta2/types.go
This spark operator is having say - 100+ types/properties. By following the above blog , i could create the Go object but it would be hardcoded. I want to create the dynamic object based on user provided values in CR YAML. e.g. - customer can provided 25 attributes , sometimes 50 for spark app. I need to have dynamic object created based on user YAML. Can anybody please help me out ?
If you set the spec type to be a json object, you can have the Spec contain arbitrary json/yaml. You don't have to have a strongly typed Spec object, your operator can then decode it and do whatever you want with it during your reconcile operation as long as its you as you can serialize and deserialize it from json. Should be able to set it to json.RawMesage I think?
What do you mean by hardcoded properties?
If I understood it correctly, you want to define an API for a resource which uses both types from an external operator and your customs. You can extend your API using the types from specific properties such as ScheduledSparkApplicationSpec from this. Here is an example API definition in Go:
type MyKindSpec struct {
// using external third party api (you need to import it)
SparkAppTemplate v1beta2.ScheduledSparkApplicationSpec `json:"sparkAppTemplate,omitempty"`
// using kubernetes core api (you need to import it)
Container v1.Container `json:"container,omitempty"`
// using custom types
MyCustomType MyCustomType `json:"myCustomType,omitempty"`
}
type MyCustomType struct {
FirstField string `json:"firstField,omitempty"`
SecondField []int `json:"secondField,omitempty"`
}

How to update a Cosmos DB item from a Node JS app

I have a Cosmos Document DB with many items in it and I'd like to update a single item.
I've tried doing it like this:
const { resources: updatedItem } = await container.item(existingType).replace(newItem)
But get this error:
Illegal characters ['/', '\', '?', '#'] cannot be used in resourceId
Upon further research it's because the _rid is being read from the existingType and contains patterns like AAAAA==/
What is the best practice for dealing with this?
According to the API document
Read the item's definition. Any provided type, T, is not necessarily
enforced by the SDK. You may get more or less properties and it's up
to your logic to enforce it. If the type, T, is a class, it won't pass
typeof comparisons, because it won't have a match prototype. It's
recommended to only use interfaces.
There is no set schema for JSON items. They may contain any number of
custom properties.
You can create entity class so that you can remove system properties.

How to define a function to get an field element of a Marshmallow Schema while still serialising through a nested Schema

So I have a Nested Many Schema (eg Users) inside another Schema (eg Computer). My input object to be deserialised by the Schema is complex and does not allow for assignment, and to modify it to allow for assignment is impractical.
The input object (eg ComputerObject) itself does not contain an a value called "Users", but nested in a few other objects is a function that can get the users (eg ComputerObject.OS.Accounts.getUsers()), and I want the output of that function to be used as the value for that field in the schema.
Two possible solutions exist that I know of, I could either define the field as field.Method(#call the function here) or I could do a #post_dump function to call the function and add it to the final output JSON as it can provide both the initial object and the output JSON.
The issue with both of these is that it then doesn't serialise it through the nested Schema Users, which contains more nested Schemas and so on, it would just set that field to be equal to the return value of getUsers, which I don't want.
I have tried to define it in a pre-dump so that it can then be serialised in the dump (note: this schema is used only for dumping and not for loading), but as that takes in the initial object I cannot assign to it.
Basically, I have a thing I am trying to do, and a bunch of hacky workarounds that could make it work but not without breaking other things or missing out on the validation altogether, but no actual solution it seems, anybody know how to do this properly?
For further info, the object that is being input is a complex Django Model, which might give me some avenues Im not aware of, my Django experience is somewhat lacking.
So figured this out myself eventually:
Instead of managing the data-getting in the main schema, you can define the method used in the sub-schema using post_dump with many=True, thus the following code would work correctly:
class User(Schema):
id = fields.UUID
#pre_dump(pass_many=True)
def get_data(self, data, **kwargs):
data = data.Accounts.getUsers()
return data
class Computer(Schema):
#The field will need to be called "OS" in order to correctly look in the "OS" attribute for further data
OS = fields.Nested(User, many=True, data_key="users")

groovy domain objects in Db4O database

I'm using db4o with groovy (actually griffon). I'm saving dozen of objects into db4o objectSet and see that .yarv file size is about 11Mb. I've checked its content and found that it stores metaClass with all nested fields into every object. It's a waste of space.
Looking for the way to avoid storing of metaClass and therefore reduce the size of result .yarv file, since I'm going to use db4o to store millions of entities.
Should I try callConstructors(true) db4o configuration? Think it would help?
Any help would be highly appreciated.
As an alternative you can just store 'Groovy'-beans instances. Those are compiled down to regular Java-ish classes with no special Groovy specific code attached to them.
Just like this:
class Customer {
// properties
Integer id
String name
Address address
}
class Address{
String street;
}
def customer = new Customer(id:1, name:"Gromit", address:new Address(street:"Fun"))
I don't know groovy but based on your description every groovy object carries metadata and you want to skip storing these objects.
If that is the case installing a "null translator" (TNull class) will cause the "translated" objects to not be stored.
PS: Call Constructor configuration has no effect on what gets stored in the db; it only affects how objects are instantiated when reading from db.
Hope this helps

How to Inner Join an UDF Function with parameters using SubSonic

I need to query a table using FreeTextTable (because I need ranking), with SubSonic. AFAIK, Subsonic doesn't support FullText, so I ended up creating a simple UDF function (Table Function) which takes 2 params (keywords to search and max number of results).
Now, how can I inner join the main table with this FreeTextTable?
InlineQuery is not an option.
Example:
table ARTICLE with fields Id, ArticleName, Author, ArticleStatus.
The search can be done by one of more of the following fields: ArticleName (fulltext), Author (another FullText but with different search keywords), ArticleStatus (an int).
Actually the query is far more complex and has other joins (depending on user choice).
If SubSonic cannot handle this situation, probably the best solution is good old plain sql (so there would be no need to create an UDF, too).
Thanks for your help
ps: will SubSonic 3.0 handle this situation?
3.0 can do this for you but you'd need to make a template for it since we don't handle functions (yet) out of the box. I'll be working on this in the coming weeks - for now I don't think 2.2 will do this for you.
I realize your question is more complex than this, but you can get results from a table valued function via SubSonic 2.2 with a little massaging.
Copy the .cs file from one of your generated views into a safe folder, and then change all the properties to match the columns returned by your UDF.
Then, on your collection, add a constructor method with your parameters and have it execute an InlineQuery.
public partial class UDFSearchCollection
{
public UDFSearchCollection(){}
public UDFSearchCollection(string keyword, int maxResults)
{
UDFSearchCollection coll = new InlineQuery().ExecuteAsCollection<UDFSearchCollection>("select resultID, resultColumn from dbo.udfSearch(#keyword, #maxResults)",keyword,maxResults);
coll.CopyTo(this);
coll = null;
}
}
public partial class UDFSearch : ReadOnlyRecord<UDFSearch>, IReadOnlyRecord
{
//all the methods for read only record go here
...
}
An inner join would be a little more difficult because the table object doesn't have it's own parameters collection. But it could...

Resources