Apache Chemistry CMIS session.createDocument vs folder.createDocument - cmis

I would like someone to give me the difference between the session createDocument and folder createDocument methods.
Also within this context is there a sample on how I could use document appendContentStream() method, I was struggling to see an example online, I have a requirement where documents sizes can be up to 300-350MB and I was keen to know more about the appendContentStream() after it was recommended at the Nuxeo webinar by Jeff Potts though he did mention size around 1GB.

Session.createDocument() creates a document and returns the document ID. Folder.createDocument() creates a document and returns a complete Document object. To do that, Folder.createDocument() needs one more round-trip to the server. If you just want to create a document and you are not interested in the document properties, or the document permissions, or the document renditions, etc., use the Session variant. It's faster.
The CMIS specification does not limit the document size. Some repositories support uploading a document of several GBs in one go. If such an upload fails, for example if there is a connection problem, you have to repeat the complete upload, though. appendContentStream() allows uploading a document in chunks. If uploading a chunk fails, you only have to repeat the upload of that one chunk. If that makes sense depends on your application, your repository, and your network.
There is a appendContentStream() code example (maybe not a good one) in the OpenCMIS TCK:
https://svn.apache.org/viewvc/chemistry/opencmis/trunk/chemistry-opencmis-test/chemistry-opencmis-test-tck/src/main/java/org/apache/chemistry/opencmis/tck/tests/crud/SetAndDeleteContentTest.java?view=markup

Related

Are there cons to using GridFS as a default with MongoDB?

I'm creating a RESTful API with node, express, and mongodb and the book I'm using as a reference recommends using GridFS (namely gridfs-stream) for cases where one needs to handle files larger than the MongoDB cut-off (16MB)
I'm not sure if my app will ever need to handle files that size, but I'm wondering if there are cons to using it anyways in case I may need that feature later.
Are there any cons (i.e. significant unnecessary performance penalties, stability issues) that I should be aware of to help make this decision?
I'm also open to suggestions for alternate file management solutions that you may have.
Thanks!
dont use Gridfs for small binary data
GridFS requires two queries: one to fetch a file’s metadata and one to fetch its contents
Therefore, if you use GridFS to store small files, you are doubling the number
of queries that your application has to do. GridFS is basically a way of breaking up large
binary objects for storage in the database.
GridFS is for storing big data—larger than will fit in a single document. As a rule of best practice anything that is too big to load all at once on the client is probably not something
you want to load all at once on the server. Therefore, anything you’re going to
stream to a client is a good candidate for GridFS. Things that will be loaded all at once
on the client, such as images, sounds, or even small video clips, should generally just
be embedded in your main document
Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document instead of using GridFS. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.
see https://docs.mongodb.com/manual/core/gridfs/
please mark correct if this helped

CouchDB, how to get document changes only

Using /_changes?filter=_design I can get all the changes for design documents.
How do I get all the changes for documents only?
Is there such a thing like /_changes?filter=_docs_only ???
There is no built in filter for this. You will need to write your own filter function (http://couchdb.readthedocs.org/en/latest/couchapp/ddocs.html#filterfun) that excludes design documents (check the doc's _id for "_design/", etc.) from the feed. You then reference this filter function when you query the changes feed (http://couchdb.readthedocs.org/en/latest/api/database/changes.html?highlight=changes). However, most applications don't run into this too often since design documents are typically only updated when there is an application change.
It would probably be more efficient to implement this filter on the client side instead of streaming all your changes to the couchjs process (always inefficient). As your application loops through the changes simply check whether it is a design doc there.
Cheers.

SharePoint - document libraries, lists, views and number of elements

I have a lot of documents I want to store in a document library in SharePoint 2010. We're talking about 50k+ documents. I've worked with document libraries many times, but not of this size and I find myself getting confused about some definitions when it comes to how these should be stored and the number of elements allowed.
By looking here: http://technet.microsoft.com/en-us/library/cc262787%28v=office.14%29.aspx#ListLibrary it says that a document library can hold up to 30 million documents. Nice! 50k is not close to 30 millions. However, can I just dump all of the documents into a library without grouping them in views or sub folders? Cause a view only can have 5k elements and then I have to create views and put the documents in many views in order not to exceed this limit.
Now, the documents, and the library, will most likely never be browsed by going to the library. Each document will be linked from another place, and this will also not be that often. Therefore I am kind of hoping I can just dump all the documents in one big library. I have read that if the number of elements in a list exceeds 5k SharePoint will not query the query to return everything, but instead exchange this query with some default query. In my case this is fine, but are there other concerns about dumping this many files into one library in SharePoint 2010? And is there anything else I may not have thought about?
Also quick question at the end, I am planning on scripting the upload by using PowerShell, but I have heard from others that uploading documents this way to SharePoint could takea lot of time because it does it one document at the time. Is it possible to "bulk upload" documents through PowerShell or another approach?
The key here is to understand that SharePoint can STORE up to 30 million documents, but can only display 5,000 at a time. The easiest way to maintain that would be to dump the documents into separate folders with no more than 5,000 documents in each folder. Its easy to do that, but I'm not a big fan of folders since they impose a single organizational structure on a set of documents. Applying metadata and then filtering views is more efficient in the long run, but much harder to do when dumping documents into a library. I would suggest looking at some of the third party migration software that can do this kind of bulk upload and still maintain appropriate metadata. One I've used (there are others) is Metalogix Content Matrix.

Should I use NSFileWrappers in UIManagedDocument?

I am trying to store a plist and several binary files (let's say images) as part of an UIManagedDocument. The name of the binary files are an attribute in Core Data and I don't need to enumerate them, just access the right one when showing the related entity.
The file structure that I want to have is:
- <File yyyyMMdd-HHmmss>.extdoc
- StoreContent
- persistentStore
- AdditionalContent
- ListStatus.plist (used to store per document defaults)
- Images
- uuid1.png
- uuid2.png
- ...
- uuidn.png
So far, I have successfully followed the instructions in How do I save additional content into my UIManagedDocument file packages?, but when I try to add the binary files there are some things that I don't know how to do.
Should I treat the URL /the/path/File yyyyMMdd-HHmmss.extdoc/AdditionalContent (the default one provided with readAdditionalContentFromURL:error:) as a NSFileWrapper? Are there any advantages/disadvantages vs just using the URLs? I find it more complicated to use the file wrapper, since the plist has to be read using the file wrapper accessors and NSCoder (I guess), and the files, I have to store the file wrapper for the Images directory and then obtain the corresponding node with objectForKey (I assume). But Apple's Document-Based Apps Programming Guide for iOS regarding custom formats instead of NSData or NSFileWrapper, states "Keep in mind that your code will have to duplicate what UIDocument does for you, and so you must deal with greater complexity and a greater possibility of error." Am I misunderstanding this?
Per document defaults are declared as properties: the setter modifies the NSDictionary that maps the plist and marks the document as updated, and the getter accesses the dictionary with the proper key. How do I expose the ability to read/write the binary files? Should I add a method to my subclass of UIManagedDocument? - (void)writeImage:(NSString*)uuid; and -(UIImage *)readImage:(NSString *)uuid; And should I keep this data in memory until the document is saved? How?
Assuming that NSFileWrapper is the way to go, if I plan to use this document with iCloud should I use file coordinators with the file wrapper? If so, how?
Any source code for each question will be greatly appreciated. Thank you.
P.S.: I know that I could save some binary data inside of Core Data, but I don't feel comfortable with that solution. Among other reasons, I rather store the PNG data for image files that a serialized version of UIImage that won't be compatible with NSImage if I want to create a desktop app.
I'd like to say that, in general I rather like UIManagedDocument. It has a few advantages over raw Core Data. For example, it sets up the entire core data stack for you automatically. It also sets up nested managed object contexts for you, so you get free background saving. None of that is particularly earth-shattering, but it's a lot of functionality from a tiny amount of code.
I haven't played around with saving additional information...but here are my thoughts.
First, you shouldn't need to treat the new URL as a file wrapper. You should just be able to do regular file operations on the provided URL. Just make sure you have everything implemented properly in additionalContentForURL:error:, writeAdditionalContent:toURL:originalContentsURL:error: and readAdditionalContentFromURL:error:. The read and write operations need to be symmetric. And you should probably snapshot your data in additionalContentsForURL:error: so that everything will be saved in a known, good state (since the save operations are asynchronous).
As an alternative, have you considered using the Store in External Record File flag in your data model instead of saving it manually? This should force Core Data to (depending on the size of the binary data) automatically store them externally. I looked at the release notes, and I didn't see anything saying you couldn't use this feature with iCloud. That might be the easiest fix.
Attacking a side point for the moment (as I have not had ANY good experience with UIManagedDocument).
You can save the binary inside of Core Data for a iOS 5.0+ application using the external file reference. Then you can save the PNG of the image to Core Data directly and not need to worry about a UIManagedDocument or about bloating the sqlite file.
There is nothing stopping you from storing the PNG instead of a UIImage.
One other thought. You may need to use an NSFileCoordinator for the read and write operations. Technically, any read or write operations in the iCloud container need to use a file coordinator (to coordinate with the iCloud sync service--this prevents accidentally corrupting a file by reading it while another process is writing to it).
I know that UIDocument wraps most of its input and output methods automatically. I'd guess that these methods are similarly wrapped (since they give you a URL to use)--However, the docs aren't very clear.

Storing lots of attachments in single CouchDB document

tl;dr : Should I store directories in CouchDB as a list of attachments, or a single tar
I've been using CouchDB to store project documents. I just create documents via Futon and upload them directly from there. I've also written a script to bulk-upload directories. I am using it like a basic content repository. I replicate it, so other people on my team have a copy of the repository.
I noticed that saving directories as a series of files seems to have a lot of storage overhead, so instead I upload a .tar.gz file containing the directory. This does significantly reduce the size of the document but now any change to the directory requires replicating the entire tarball.
I am looking for thoughts or perspective on the matter.
It really depends one what you want to achieve. I will try and provide some options for you to consider.
Storing one tar.gz will save you space, but it does make it harder to work with. If you are simply archiving it may work for you.
Storing all the attachments on one document works well for couchapps. The workflow is you mess around with attachments until you are ready to release the application, then there is not a lot of overhead for replication, because it is usually one time. It is nice that they are one one document because they all move/replicate as one bundle. Downsides for using this approach for a content management system are that you can get a lot of history baggage that you have to compact on your local couch. Also you will get a lot of conflicts during replication between couches, and couch will keep conflicts around for you to resolve. Therefore if you choose this model, you should compact frequently to reduce disk size.
For a content management system, I might recommend using one document per attachment. That would give you less conflicts. There will be a slight overhead as each doc will have some space allocated for the doc itself, but the savings in having to do frequent compaction and/or conflict resolution will be better.
Hope that gives you some options to weigh out.

Resources