GridFS: Clean out all unreferenced files

GridFS: Clean out all unreferenced files - node.js

I have just moved towards storing things in my GridFS in MongoDB. During testing, I noticed many files are being created but not deleted properly. I have a collection users, which has a field avatar. It contains the ObjectId of the file.
Now I'd like to have some command I could use to remove all the files and chunks that are not referenced there. Is it possible to do that with one query? Maybe 'map-reduce'?
Also I am not sure how to properly delete GridFS-Files in node-mongodb-native properly.

? Now I'd like to have some command I could use to remove all the files and chunks that are not referenced there.
Key terms here is "referenced". MongoDB does not have any joins and therefore, it does not have concept of "references".
Maybe 'map-reduce'?
Map / Reduce is a query tool, not a data modification tool. The same is true of the newer "Aggregration Framework".
What you will have to do is loop through your files and check the references for each one individually. You will then be able to delete those files.
Take a look at some documented examples on how to issue those deletions.

Related

yaml library which supports writing multiple documents

I have a small NodeJS app which generates two YAML-files.
I want to merge them into one file so that I have one file with two Document nodes. Something like:
---
yaml1
---
yaml2
I tried using the npm package yaml but to no avail.
Browsing through the docs of js-yaml, I cannot find how to achieve this.
Any help is appreciated.

YAML has been designed so that it is easy to merge multiple documents in a stream. Quoting the spec:
Concatenating two YAML streams requires both to use the same character encoding. In addition, it is necessary to separate the last document of the first stream and the first document of the second stream. This is easily ensured by inserting a document end marker between the two streams. Note that this is safe regardless of the content of either stream. In particular, either or both may be empty, and the first stream may or may not already contain such a marker.
The document end marker is ... (followed by a newline). Joining the contents of both files with this marker will do the trick. This works since YAML allows a document to be ended by multiple document end markers. On the other hand, the directives end marker (---) you use always starts a document, so it is not safe to join the documents with it since the second document may already start with one, leading to the creation of an empty document in between.

getting info from an MS SQL .bak file

I am writing an Electron app that, among many other things, restores an unknown .bak file to a MS SQL server and then extracts more information. In order to do this successfully, I need to extract some info from that .bak file programmatically (so SSMS cannot be used). I will be using sqlcmd, since that can be run by Electron's node.js backend. Unfortunately, I have a bit of a chicken and egg problem, because it seems I cannot restore a .bak file without knowing things about the paths for the .mdf files specified within the .bak file (that cannot be found without first restoring it). There is a RESTORE WITH MOVE option, though this seems to also require knowledge of the paths inside the .bak, which cannot be determined from the .bak itself. How might I get this information, or is it impossible?

Read about RESTORE FILELISTONLY.
At this link you'll find further statements one can use together with RESTORE in order to fetch meta data.
The returned resultset of FILELISTONLY will give you the LogicalName, the file's type (Data or Log), information about the file group and much more.
The other statements provide other meta data. Just check it out...

How to store separate versions of document for a blog using mongodb with nodejs?

I am creating a blog site but i would like to add the feature to have history for a blog what changes are been done in previous version and latest version. Creating a separate history collection will be one approach but what about the document do i need to just copy the document of old revision in history collection or the just the json diff . If we just store the diff of field what will be the base object to construct the upto date document till that revision if we wish to query document with specific revision suppose (__version: 15) and current version will be somewhere 25.

I'd go with your first option, using a history collection, since it would be easier to fetch an entire document than keeping lot of diffs and then having to sync them all together. Also, having a different collectiont would be a faster solution, since you wouldn't need to fetch for different versions, assemble them and display it.
But on the other side, if you just want to show minor changes, like on facebook when you edit a comment, I think the diff approach would be the best option, altough you'd have to keep references on where and what has changed

how can i store elasticsearch settings+mappings in one file (like schema.xml for Solr)

How can I store elasticsearch settings+mappings in one file (like schema.xml for Solr)? Currently, when I want to make a change to my mapping, I have to delete my index settings and start again. Am I missing something?
I don't have a large data set as of now. But in preparation for a large amount of data that will be indexed, I'd like to be able to modify the settings and some how reindex without starting completely fresh each time. Is this possible and if so, how?

These are really multiple questions disguised as one. Nevertheless:
How can I store elasticsearch settings+mappings in one file (like schema.xml for Solr)?
First, note, that you don't have to specify mapping for lots of types, such as dates, integers, or even strings (when the default analyzer is OK for you).
You can store settings and mappings in various ways, in ElasticSearch < 1.7:
In the main elasticsearch.yml file
In an index template file
In a separate file with mappings
Currently, when I want to make a change to my mapping, I have to delete my index settings and start again. Am I missing something?
You have to re-index data, when you change mapping for an existing field. Once your documents are indexed, the engine needs to reindex them, to use the new mapping.
Note, that you can update index settings, in specific cases, such as number_of_replicas, "on the fly".
I'd like to be able to modify the settings and some how reindex without starting completely fresh each time. Is this possible and if so, how?
As said: you must reindex your documents, if you want to use a completely new mapping for them.
If you are adding, not changing mapping, you can update mappings, and new documents will pick it up when being indexed.

Since Elasticsearch 2.0:
It is no longer possible to specify mappings in files in the config directory.
Find the documentation link here.
It's also not possible anymore to store index templates within the config location (path.conf) under the templates directory.
The path.conf (/etc/default/elasticsearch by default on Ubuntu) stores now only environment variables including heap size, file descriptors.
You need to create your templates with curl.
If you are really desperate, you could create your indexes and then backup your data directory and then use this one as your "template" for new Elasticsearch clusters.

How to make SubSonic 3.0 generate .cs files for each class/table instead instead of single ActiveRecord.cs

I have been using SubSonic 2 on several projects before but with the new SubSonic 3 I have implemented in 2 projects. However, my question has always been is if I can change the the output T4 template to generate a class file for each table instead of single ActiveRecord.cs file. I want to use it in a very large project and I can see where is not practical to have 80+ tables in a single file. I prefer to have separate class files.
Would I need to change SubSonic.Core?
If its not possible, please let me know.
Thanks

Why does it matter how many files there are if the code is entirely generated? What practical difference is there?
You can change the templates to output multiple files. No changes would be required to the SubSonic dll, just the T4 Templates.
However, I fail to see how it is worth even just the time to post the question here, much less the time required to actually make those changes.

There is a way to do this, if you rewrite the T4s to follow this example. However, I think there is an issue that may arise when you drop a table, the previously created .cs file for that table will not be removed. I think you would have to further edit the T4 to start by deleting all its previously generated files.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

GridFS: Clean out all unreferenced files - node.js

Related

yaml library which supports writing multiple documents

getting info from an MS SQL .bak file

How to store separate versions of document for a blog using mongodb with nodejs?

how can i store elasticsearch settings+mappings in one file (like schema.xml for Solr)

How to make SubSonic 3.0 generate .cs files for each class/table instead instead of single ActiveRecord.cs

Categories

Resources