Initial DB structure / data for MongoDB + NodeJS web application - node.js

I'm developing a web application in Node.js with MongoDB as the back end. What I wanted to know is, what is the generally accepted procedure, if any exists, for creating initial collections and populating them with initial data such as a white list for names or lists of predefined constants.
From what I have seen, MongoDB creates collections implicitly any time data is inserted into the database and the collection being inserted into doesn't already exist. Is it standard to let these implicit insertions take care of collection creation, or do people using MongoDB have scripts setup which build the main structure and insert any required initial data? (For example, when using MySQL I'd have a .sql script which I can run to dump and rebuild /repopulate the database from scratch).
Thank you for any help.
MHY

If you have data, this post on SO might be interresting for you. But since Mongo understands JavaScript, you can easily write a script that prepares the data for you.
It's the nature of Mongo to create everything that does not exist. This allows a very flexible and agile development since you are not constrainted to types or need to check if table x already exists before working on it. If you need to create collections dynamically, just get it from the database and work it if (no matter if it exists or not).
If you are looking for a certain object, be sure to check it (not null or if a certain key exists) because it may affect your code if you work with null objects.

There's is absolutely no reason to use setup scripts merely to make collections and databases appear. Both DB and collection creation is done lazily.
Rember that MongoDB is a completely schema free document store so there's no way to even setup a specific schema in advance.
There are tools available to dump and restore database content supplied with mongo.
Now, if your application needs initial data (like configuration parameters or whitelists like you suggest) it's usually best practice to have your application components set up there own data as needed and offer data migration paths as well.

Related

How Do I create in memory search indexes in Elixir

I am currently working on an Elixir/Phoenix project and I was wondering what is a good way to create a quick in-memory search index.
The index would be created on request and destroyed when the request is over and currently the data comes from a database via Ecto. Also, I would like to query it by different indexes so not just by :id but other indexes Example :user_id so a flat key value store may not be enough.
Are there any tools that would be helpful? I looked a bit into mnesia but when using it with ecto3_mnesia, a local file/folder was created and I would prefer if everything was in memory.
Thanks
I have no idea about ecto3_mnesia, but I am pretty sure raw :mnesia without any redundant wrapper is a good fit here (or, even, :ets if you don’t need a clustered solution.)
:mnesia.table_create/2 accepts many options, two you might be interested in are disc_copies and raw_copies. Simply initialize the former with empty node list and the latter with your complete node list, and you are all set: no disk copies are created, everything is in memory.

Combine CouchDB databases with replication while recording source db

I’m just starting out with CouchDB (2.1), and I’m planning to use it to replicate confidential per-user data from a mobile app up to my server. I’ve read that per-user databases are the best way to do this, and I’ve set that up. Each database has a mix of user-created documents of types Foo and Bar.
Now, I’d also like to be able to collect multi-user slices of that data together into one database and build views on it for admin reporting. Say I want a database which contains all the Foos from all users. So far so good, an entry in _replicator with a filter from each user database to one target does the job.
But looking at the combined database, I can’t tell which user a given Foo came from. I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?
CouchDB's replicator simply tries to match up the exact state of a given document in the target database — and if it can't, it stores ± the exact source contents anyway (as a conflicting version).
Furthermore the _rev field of a document, which the replication system uses to check if a document needs to be updated, is actually based on (a hash over) the other document fields.
So unfortunately you can't add metadata during replication. This would indeed be handy for this and other per-user vs. shared replication situations, but it's not something CouchDB currently supports, and it would break some optimizations to add support for it.
I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?
Including something like a .user field in each document is the right solution.
As far as being redundant, I wouldn't think of it that way — or at least, not as a bad thing. You'll find with CouchDB (and like other NoSQL stores) there's a trend to "denormalize" data to begin with. Especially given the things replication lets me do operationally and architecturally, I'd much rather have a self-contained document than one that relies on metadata derived from a database name.
I'm not sure exactly how in your case an extra field will make validation more complex, so I can't fully speak to that. You do want to make sure the user writing the document has set it "honestly", and so yes there is a bit more complication, but usually not too burdensome in most cases.

nodejs unit testing strategy: check if a table exists and contains the data you expect

I have managed to unit test all the function that are using data from my database.
The problem starts when I want to check the data itself, what happens if the schema of my DB changed? All the other unit tests are using DB stubs and not the real data.
How can I check the schema of the DB? here I must not mock it, because I want to check the real schema.
Edit: Its important to note that the aforementioned DB is a third party one. I.e. I have checked all the functionality with mocks and now I want to check the acctual schema of this DB, just to make sure someone didn't changed it without mentioning.
You will ideally write an integration test that roundtrips the data to/from your database. You should use a local copy of the database in a clean state, not use a production/development or shared database.
If you're interested I wrote an article on this a while back. It's Java focussed but the theory holds true for pretty much any language

A better option than to store db model in a txt file in a php shared hosting environment

This is more of a conceptual question rather than a programming question.
I have developed a system where I use a DB layer which is responsible for generating queries as well as running them.
To avoid creating queries which can't run I have a simplified database model over every table with all respective columns of the persistance layer. In each record I provide the name of a table and for each column in the table I provide name, type and length. This way I can catch bad naming problems but also invalid inputs.
The model has no knowledge of data stored in the tables.
The model is stored in a txt file which exists in the filesystem of the server. I am concerned with the security of that solution as typing in the url for the db_model txt file would expose the entire persistence data model of the application.
How can I do a better job with this?
I'm thinking about a few options.
encrypt the txt file and then for each session, decrypt and store as a session variable as I need the model for each pageload, even several times on most pageloads.
Moving it up in hierarchy of the filesystem above the root of the webserver and read it through ftp connection. It would look bad when packaging the system as a product though so I don't think that option is viable.
Are any of these options a good idea or should I do something completeley different?
best regards
Rythmic
Simple answer:
Don't keep track of it your self. Your RDBMS (which one are you using, btw?) will have an internal mechanism to keep track of this. It also has its own mechanisms for ensuring that the queries you pass to it are acceptable. That's why we pay it the big bucks - let it do its job the way it's trained to.
Relying on the RDBMS is definitely an option to consider strongly - another option is to query the DB itself if you feel the need to validate input - ie either store the data in your text file in the db itself and read it through non-parameterised queries or even better, read the DB schema directly from your DB system which will guarantee the version you're checking input against exactly matches your DB schema

Drupal: Avoid database when dealing with node type info?

I'm writing a Drupal module that deals with creating new nodes from CSV files. The way I've been doing it currently, the user provides a node type, and my module goes to the database to find the fields for that node.
After the user matches the node fields to the CSV fields, I want to validate the data. This requires finding out the types of the node fields. I'm not entirely sure how to do that. (Maybe look at the content_node_field table?)
Then, I have to create the nodes. Currently, the module creates a new StdClass object, populates it with the necessary data, and saves it.
But what if I could abstract away from the database entirely and avoid dealing with it? What if I asked the user to a node of this type that already exists? I could node_load() this node, and use that to determine node fields. When it comes time to save the nodes, I could use the "seed" node to figure out what the structure of the new nodes needs to be.
One downside: this requires at least one node of this type to exist before the module can function.
Also, would this be slower than accessing the db directly?
I fear that over time, db names could change, and content types could be defined across multiple tables. By working only from a pre-existing node, I could get around many of these issues. Right?
Surely node_load will be hitting the database anyway? The node fields are stored in the database so if you need to get them, at some point you have to talk to the database. Given that some page loads on Drupal invoke hundreds (or even thousands!) of database queries I really wouldn't worry about one or two!
Table names are unlikely to change and the schema should stay fixed between point versions of Drupal at least. It would be better practice to use the API to get the data you want if it is possible though, and this would give better protection against change. I don't know if that's possible.

Resources