Where are the docs (docx, xls etc) are physically store when using onlyoffice? - onlyoffice

I assume it's not on 'document-server'? Is it true that all documents are being stored as database entities in a db?

Some information is stored in the db but the documentserver also contains information such as change history, changes and current output under documentserver/server/App_data

Related

How to store metadata with Ceph?

I want to store user files in CephFS. The problem is that I also need to store some metadata of these files (Download date, verification status for example), as well as the ability to sort by date or the ability to give a number. If I use Mongodb for metadata, I have synchronization problem (file can be in database but not in CephFS or vice versa)
The file structure in CephFS is as follows:
/{user.id}/{Media collection name}/{media.id}
The media.id is uuidv4.
What idea I have:
Create a "meta" folder, in which to put the metadata of the files by their id, but without the date the file was uploaded to CephFS. To access the date, use data from CephFS(It stores the date the file was created, changed, just like any other file system(?))
I didn't find information in the Ceph documentation that it stores metadata as well, so I'm not sure if this option would work.

How to prevent saving same named file on CouchDb?

I am using CouchDB with Divan - C# interfacing library for CouchDb.
A file can be uploaded many times on CouchDb. Every time the "id" is changed after file is uploaded, but the "rev" remains the same.
This happens even if all custom attributes defined for file being uploaded are same any existing file on CouchDb with same name.
Is there any way that can avoid uploading same named file if all custom attributes are same? Fetching all files and checking them for file name repetition could be a way, but definitely not preferable for its required time depending on other factors.
Thanking you.
Let's say you have 3 attributes for a file :
name
size in bytes
Date of modification
I see two main possibilities to avoid duplicates in your database.
Client approach
You query the database to check if the document with the same attributes exists with a view. If it's not existing, create it.
User defined id
You could generate an id from the attributes as this library is doing.
For example, if my document has those attributes :
"name":"test.txt",
"size":"512",
"lastModified":"2016-11-08T15:44:29.563Z"
You could build a unique id like this :
"_id":"test.txt/2016-11-08T15:44:29.563Z/512"

CouchDB document replication(updating specific attributes of a document)

I have an issue of replication and I need your help in it.In couchDb replication,I want to replicate in such a way that during Couchdb replication I want to reset/update some specific attributes of a a document for some purpose and then these edited documents should be saved in replicated db without effecting the original ones.For example:
A document named Student with attributes id,name,class etc.
And I want to replicate this document in the way that its name and class should be reset/updated.
Will you please tell me how can I achieve it.
Thanks.
You can't update docs during the replication.
But you can exclude docs from being replicated with the help of a CouchDB filter (e.g. preventing all docs with a revision higher then 1 from being replicated).
If you want to have multiple versions of the same dataset (e.g. to have dataset revisions) - i use the term "dataset" instead of "doc" to clearly express that not the internal CouchDB doc revision handling is involved - you have to store them as separated docs that have all a unique id and a reference property like original: "UUID_of_the_original".
you can't use the CouchDB doc revision handling for that purpose (thats what many people think when they see the _rev property in the docs)

Keeping elasticsearch in sync with key or versioning

So I have a situation where I get in a lot of large XML files and I want that data sycronised on elasticsearch.
Current way
Have index_1
When data is updated create blank index_2
Load all of latest data into index_2
Alias to index_2 and delete index_1
Proposed way
Have a synced.xml file which has been sycronised with elasticsearch
When a new timedated xml file is availiable compare against synced.xml
If anything is new in the timedated xml file, add just that to ES
Rename timedated xml to synced.xml
This means out of 500,000 items, I only have to add the 5,000 items that have changed for example, not duplicate the 500,000 items.
Question
In a scenario like this, how to I ensure they are sycronised? For example, what happens if elasticsearch gets wiped, how can I tell my program that it would need to add the whole lot again. Is there a way to use some sort of sycronisation key on elasticsearch, or perhaps a better approach?
Here is what I recommend...
Add a stored field to your type to store a hash like MD5
Use Scan/Scroll to export the ID and Hash from ES
In your backing dataset export ID and Hash
Use something like MapReduce to "join" on exported ids from each
set
Where there are differences via comparing the hash or finding
missing keys, index/update
The hash is only useful if want to detect document changes. This also assume that either you persist ES's IDs back to your backing store or that you self assign IDs.

How can I "undelete" a set of documents in CouchDB?

I have a large set of documents in a CouchDB database that were just accidentally bulk deleted using _deleted:true. I also have a backup for this set of data that includes their last known good revision and metadata. I need to maintain the same _id, so simple restore with a new _id is not an option.
Compaction has not been run and I can access any of these documents via the &rev= url parameter as well as their attachments (which are needed).
What I need to do is "restore" these documents to the revision I have on file. Surprisingly, I have come up empty with any queries on how to achieve this. Tips or hacks appreciated.
If you just PUT the whole document, including the attachment stub, back into the DB, with the deleted rev, but less the _deleted:true parameter, then all will be well.

Resources