Does ArangoDB "know" what attributes exist in a collection? (shapes data) - arangodb

There's a recipe how to sample documents and determine their structure:
https://docs.arangodb.com/cookbook/AccessingShapesData.html
It is stated, that you can't query internal shapes data. But examining some documents will only approximate what attribute keys are used, or the entire collection must be scanned.
So my question is: does the database store what attributes exist somewhere internally? At least for common attributes?
If yes, why isn't it possible to query that data? It would be far more efficient than a user-defined function that outputs roughly the same information.
It would be great if one could discover schemes "for free":
http://som-research.uoc.edu/tools/jsonDiscoverer/#/

Whenever an attribute is used first in a collection, ArangoDB will store this somewhere internally. That means it does keep track of which attributes were used in a collection. There are a few issues however:
the attribute names are stored globally, but nested attribute names are stored separately (ex: user.name will be stored as user and name). From looking at purely the separate attribute name parts, ArangoDB will not know in which combinations they are used in the data
attribute names are stored whenever an attribute name is first used in a collection. Currently ArangoDB does not keep track of when an attribute is not used anymore. The attribute name will then still be present in the list of attributes
Under these restrictions, the list of attributes could be made available, but I am not sure how useful this will be.

Related

Removing element IDs whose element has gone missing in Revit API using C#

We have a model. The central model has become corrupt due to too many missing elements. After following the procedure outlined in this documentation, we are unable to find an instance of the model where these elements exist. There are thousands of sequentially numbered ids - cause is unknown. Perhaps someone copied into the model and immediately removed elements leaving the ids behind? We don't know.
Is there a way to remove element ids of elements that no longer exist in the file programmatically? I don't know what that would be a collection of.
Potentially a very costly (perceived to be Revit induced) problem for us.
As far as I know, you cannot have an element id without an element associated with it.
Regardless, you normally delete elements by specifying just the element id, or a collection of ids, passed in to the Document.Delete method:
http://www.revitapidocs.com/2017/dd023de2-cf2b-03ca-6f45-89b5e867fe92.htm
So if you know which element ids you want to remove, all should be fine.
No idea how the method will behave if the elements are not there after all, as you say...

DocumentDB: get all documents of same entity type

I'm storing documents of several different types (entity types?) in a single collection. What would be the best way get all documents of a certain type (like you would do with select * from a table).
Options I see so far:
Include the type as a property. But that would mean a looking into every document when getting the documents, right?
Prepend the type name to the document id and try searching by id with typename*.
Is there a better way to do this?
There's no built-in entity-type property, but you can certainly create your own, and ensure that it's indexed. At this point, it's as straightforward as adding a WHERE clause:
WHERE docs.docType = "SomeType"
Assuming it's a hash-based index, this should provide efficient lookups and filter out unwanted document types.
While you can embed the type into a property (such as document id), you'd then have to do partial string matches, which won't be as efficient as an indexed-property comparison.
If you're curious to know what this query is costing you, the RU value is displayed both in the portal and via x-ms-request-charge return header.
I agree with David's answer and using a single docType field is what I did when I first started using DocumentDB. However, there is another option that I started using after doing some experiments. That is to create an is<Type> field and setting its value to true. This is slightly more efficient for queries than using a single string field, because the indexes themselves are smaller partial indexes, but could potentially take up slightly more storage space.
The other advantage to this approach is that it provides advantages for inheritance and mixins. For example, I have both isLookup=true and isState=true on certain entities. I also have other lookup types. Then in my application code, some behaviors are common for all lookup fields and other behaviors are only applicable to the State type.
If you index the type property on the collection, it will not be a complete scan.

How to set a field containing unique key

I want to save data in CouchDB documents and as I am used to doing it in RDBMS. I want to create a field which can only contain a unique value in the database. If I now save a document and there is already a document with unique key I expect an error from CouchDB.
I guess I can use the document ID and replace the auto generated doc-id by my value, but is there a way to set other field as unique key holder. Any best practice regarding unique keys?
As you said, the generated _id is enforced as unique. That is the only real unique constraint in CouchDB, and some people use it as such for their own applications.
However, this only applies to a single CouchDB instance. Once you start introducing replication and other instances, you can run into conflicts if the same _id is generated on more than 1 node. (depending on how you generate your _ids, this may or may not be a problem)
As Dominic said, the _id is the only parameter that is almost assured to be unique. One thing that is sure is that you have to design your "database" in a different way. Keep in mind that the _id will be database wide. You will be able to have only one document with this _id.
The _id must be a string, which means you can't have an array or a number or anything else.
If you want to make the access public, you'll have to think about how to generate your id in a way that it won't mess with your system.
I came up with ids that looked like that:
"email:email#example.com"
It worked well in my particular case to prevent people from creating multiple auth on the same email. But as Documinic said, if you have multiple masters, you'll have to think about possible conflicts.

Using and populating (real) DBRef arrays with Mongoose / mongoose-dbref

Mongoose doesn't appear to support Mongo DBRefs. Apparently they released "DBRef" support but it was actually just plain references (no ability to reference documents from different collections). I've finally managed to craft a schema that allows me to hold an array of ObjectID references and populate them, which is great for certain parts of my schema, but it would be extremely convenient if I could use proper DBRefs to create an array that lets me refer to documents from a number of collections.
Luckily(?) there's a module that can monkey patch DBRef support into mongoose: https://github.com/goulash1971/mongoose-dbref
Unluckily, I can't make any sense of the documents. The best I can tell is that there is no ability to use DBRefs in an array (there is a 'fetch' method to dereference, but it takes a single dbref); 'populate' doesn't seem to be patched to fill in DBRefs, and I can't tell how I'm supposed to assign a DBRef given a source document [collection.items.push(?????)].
From the internet, it appears that I can assign an object of the form { $id: document._id, $ref: 'Collection' } -- when logging the result, it appears to have "taken" as a DBRef data type, but I am unsure if this is correct since I cannot seem to do anything useful with it (turn the ref back into a document).
What I really want is a way to represent an ordered list of items from multiple collections; any solution to this is fine by me, but so far DBRefs are the best I've got. Help?
A DBRef (as explained in detail here) is a tuple containing the ObjectId, collection name, and possibly the database container name of a referenced object in another collection.
Internally in the MongoDB server these serve no purpose and are just data within a document. The point is for use in some drivers and ODM implementations to allow for some sort of automatic expansion by issuing additional queries to the server in order to have the data that is elsewhere appear to be an ordinary sub-document part of the referencing document. This can be automatic or a lazy load depending on the implementation, but is always done over the wire and processed on the client side. The server will do nothing to traverse or join this data.
Additionally, MongoDB collections are schemaless, so there is nothing as in the relational sense that says all documents in a collection have to have the same structure.
In the case of Mongoose, there are built in functions to do this sort of loading for you as a convenience, and while not strictly a DBRef and utilizing documents with a different schema in the same collection is the same means as storing the documents external to the referencing document.
It is important to consider the data access patterns of your application and not to simply opt for the same sort of relational design you are used to. Keeping in mind that you are only ever reading from one collection at a time, it is most desirable to get at the data you need in a single read or write, without multiple operations over the wire, which will slow things down considerably.
In short, you should always consider embedding sub-documents first, and then use external references any your best supported form only when you absolutely have to. Your application users will thank you in the end.

Core Data: am I on the right track? Setting up data model for data that contains multiple arrays, eg. accelerometer data

I am working on a project that involves a lot of data, and at first I was doing it all in plist, and I realized it was getting out of hand and I would have to learn Core Data. I'm still not entirely sure whether I can do what I want in Core Data, but I think it should work out. I've set up a data model, but I'm not sure if it's the right way to do it. Please read on if you think you can help out and let me know if I'm on the right track. Please bear with me, because I am trying to explain it as thoroughly as I can.
I've got the basic object with attributes set up at the root level; say a person with attributes like a name, date of birth, etc. Pretty simple. You set up one entity like this "Person" in your model, and you can save as many of them as you want in your data and retrieve them as an array, right? It could be sorted based on an attribute in the Person, such as the date they were added to the database.
Now where I get a bit more confused is when I want to store several different collections of data with each person. For example a list of courses and associated test marks. In a plist I would have stored an array of dictionaries that stored this, sorted by the date assessed. The way I set this up in my data model was that I added an entity called "Tests" and a "to-many" relationship from Person to Tests, and then when I pull that I get an NSSet that I can order by a timestamp again? Is there a better way to do this?
Similarly the Person may have a set of arrays of numerical data (the kind that you could graph over time,eg. Nike+ stores your running data like distance vs time, and a person would have multiple runs associated with them, hence a set of arrays, each with their own associated date of collection). The way I set this up is a little different, with a "Runs" attribute with just a timestamp attribute, and that is connected from Person via a to-many relationship, with inverse "forPerson". Then the Runs entity is connected to another entity via a to-many relationship that has attributes to store numerical data and the time. This would once again I would use a time/order attribute to sort them.
So the main question I have is whether using an internal attribute like timestamp to sort a set would be the right way to load in a "array" from core data. Searching forums/stack overflow about how to store NSArrays in core data seem overly complicated compared to this, giving me the sense that I'm misunderstanding something.
Thanks for your help. Sorry for all the text, but I'm new to Core Data and I figure setting up the data model properly is essential before starting to code methods for getting/saving data. If necessary, I can set up a sample model to demonstrate this and post a picture of it.
CoreData will give you NSSets by default. These are convertible to arrays by calling allObjects or sortedArrayUsingDescriptors, if you want a sorted array. The "ordered" property on the relationship description gives you an NSOrderedSet in the managed object. Hashed sets provide quicker adds, access and membership checks, with a penalty (relative to ordered sets) for the sort.

Resources