Is using an Audit Table in Postgres to create triggers for NOTIFY/LISTEN a good idea? - node.js

So I have a postgres database that I have installed an audit table - source https://wiki.postgresql.org/wiki/Audit_trigger_91plus
Now my question is as follows:
I have been wanting to create a sort of stream that notifies me of any changes that have been made by any application that has access to my DB. Now, I know that I can create a trigger and a pub/sub via pg but that will take up performance time and that is something that can become significant as the DB scales.
So instead of slowing the actual DB I was wondering if I were to do the same NOTIFY/LISTEN functionality I would've on the main tables but instead install it on the audit tables.
Has anyone ever done this? If so what have you experienced, pros? cons?. Or if anyone knows why I should or should not do this can you please let me know.
Thanks

Via NOTIFY/LISTEN, the PRO-s:
Light communications with the server, no need to pull for the data changes.
Via NOTIFY/LISTEN, the CON-s:
The practice shows that it is not sufficient just to set it up and listen for the events, because every so often the channel goes down, due to various communication problems. For a serious system you would need to put in place an additional monitoring service that can verify that your listeners are still operating, and if not - destroy the existing ones and create new ones. This can be tricky, and you probably won't find a good example of doing it.
Via scheduled data pulls, PRO-s:
Simplicity - you just check for the data change according to the schedule;
Reliability - there is nothing to break there, once the pull implementation is working.
Via scheduled data pulls, CON-s:
Additional traffic for the server, depending on how quickly you need to see the data change, and how would that interfere (if at all) with other requests to the server.

Related

Make Node/MEANjs Highly Available

I'm probably opening up a can of worms with regard to how many hundreds of directions can be taken with this- but I want high availability / disaster recovery with my MEANjs servers.
Right now, I have 3 servers:
MongoDB
App (Grunt'ing the main application, this is the front end
server)
A third server for other processing on the back-end
So at the moment, if I reboot my MongoDB server (or more realistically, it crashes for some reason), I suddenly see this in my App server terminal:
MongoDB connection error: Error: failed to connect to
[172.30.3.30:27017] [nodemon] app crashed - waiting for file changes
before starting...
After MongoDB is back online, nothing happens on the app server until I re-grunt.
What's the best practice for this situation? You can see in the error I'm using nodeMon to monitor changes to the app. I bet upon init I could get my MongoDB server to update a file on the app server within nodemon's view to force a restart? Or is there some other tool I can use for this? Or should I be handling my connections to the db server more gracefully so the app doesn't "crash"?
Is there a way to re-direct to a secondary mongodb in case the primary isn't available? This would be more apt to HA/DR type stuff.
I would like to start with a side note: Given the description in the question and the comments to it, I am not convinced that using AWS is a wise option. A PaaS provider like Heroku, OpenShift or AppFog seems to be more suitable, especially when combined with a MongoDB service provider. Running MongoDB on EBS can be quite a challenge when you are new to MongoDB. And pretty expensive, too, as soon as you need provisioned IOPS.
Note In the following paragraphs, I simplified a few things for the sake of comprehensibility
If you insist on running it on your own, however, you have an option. MongoDB itself comes with means of automatic, transparent failover, called a replica set.
A minimal replica set consists of of two data bearing nodes and a so called arbiter. Write operations go to the node currently elected "primary" only, and reads do, too, unless you explicitly allow or request reads to be performed on the current "secondary". The secondary constantly syncs to the primary. If the current primary goes down for some reason, the former secondary becomes elected primary.
The arbiter is there so that there is always a quorum (qualified majority would be an equivalent term) of members to elect the current secondary to be the new primary. This quorum is mainly important for edge cases, but since you can not rule out these edge cases, an uneven number of members is a hard requirement for a MongoDB replica set (setting aside some special cases).
The beauty of this is that almost all drivers, and the node.js for sure, are replica set aware and deal with the failover procedure pretty gracefully. They simply send the reads and writes to the new primary, without any change to be done at any other point.
You only need to deal with some cases during the failover process. Without going into much detail, you basically check for certain errors in the according callbacks and redo the operation, if you encounter one of those errors and redoing the operation is feasible.
As you might have noticed, the third member, the arbiter, does not hold much data. It is a very lightweight process and can basically run on the cheapest instance you can find.
So you have data replication and automatic, transparent failover with relative ease at the cost of the cheapest VM you can find, since you would need two data bearing nodes anyway if you used any other means.

Firebird DB - monitoring table

I have recently just started working with firebird DB v2.1 on a Linux Redhawk 5.4.11 system. I am trying to create a monitor script that gets kicked off via a cron job. However I am running into a few issues and I was hoping for some advice...
First off I have read through most of the documentation that come with the firebird DB and a lot of the documentation that is provided on their site. I have tried using the gstat tool which is supplied but that didn't seem to give me the kind of information I was looking for. I ran across README.monitoring_tables file which seemed to be exactly what I wanted to monitor. Yet this is where I started to hit a snag in my progress....
After running from logging into the db via isql, I run SELECT MON$PAGE_READS, MON$PAGE_WRITES FROM MON$IO_STATS; I was able to get some numbers which seemed okay. However upon running the command again it appeared the data was stale because the numbers were not updating. I waited 1 minute, 5 minutes, 15 minutes and all the data was the same during each. Once I logged off and back on to run the command again the data changed. It appears that only on a relog does the data refresh and yet I am not sure if even then the data is correct.
My question is now am I even doing this correct? Are these commands truly monitoring my db or are just monitoring the command itself? Also why does it take a relog to refresh the statistics? One thing I was worried about was inconsistency in my data. In other words my system was running yet when I would logon each time the read/writes were not linearly increasing. They would vary from 10k to 500 to 2k. Any advice or help would be appreciated!
When you query a monitoring table, a snapshot of the monitoring information is created so the contents of the monitoring tables are stable for the rest of the transaction. You need to commit and start a new transaction if you want fresh information. Firebird always uses a transaction (and isql implicitly starts a transaction if none was started explicitly).
This is also documented in doc/README.monitoring_tables (at least in the Firebird 2.5 version):
A snapshot is created the first time any of the monitoring tables is being selected from in the given transaction and it's preserved until the transaction ends, so multiple queries (e.g. master-detail ones) will always return the consistent view of the data. In other words, the monitoring tables always behave like a snapshot (aka consistency) transaction, even if the host transaction has been started with another isolation level. To refresh the snapshot, the current transaction should be finished and the monitoring tables should be queried in the new transaction context.
(emphasis mine)
Note that depending on your monitoring needs, you should also look at the trace functionality that was introduced in Firebird 2.5.

How do I share a cache across Node workers with Redis?

Forgive me if this is a really dumb question. I have been googling for the past hour and can't seem to find it answered anywhere.
Our application needs to query our CMS database every hour or so to update all of its non-user-specfic CMS content. I would like to store that data in one place and have all the workers have access to it - w/o each worker having to call the API every hour. Also I would like this cache to persist in the event of a node worker crash. Since we're pretty new to node here I predict we might have some of those.
I will handle all the cache expiration logic. I just want a store that can be shared between users, can handle worker crashing and restarting, and is at the application level - not the user level. So user sessions are no good for this.
Is Redis even what I'm looking for? Sadly it may be too late to install mongo on our web layer for this release anyway. Pub/sub looks promising but really seems like it's made for messaging - not a shared cache. Maybe I am reading that wrong though.
Thank you so much stack overflow! I promise to be a good citizen now that I have registered.
Redis is a great solution for your problem. Not sure why you are considering pub/sub though. Doesn't sound like the workers need to be notified when the cache is updated, they just need to be able to read the latest value written to the cache. You can use a simple string value in redis for this stored under a consistent key.
In summary, you'd have a process that would update a redis key (say, cms-cache-stuff) every hour. Each worker which needs that data will just GET cms-cache-stuff from redis every time it needs that cached info.
This solution will survive both the cache refresh process crashing or workers crashing, since the key in redis will always have data in it (though that data will be stale if the refresh process doesn't come back up).
If for some wild reason you don't want the workers continually reading from redis (why not? its plenty fast enough) you could still store the latest cached data in cms-cache-stuff and then publish a message through pub/sub to your workers letting them know the cache is updated, so they can read cms-cache-stuff again. This gives you durability and recovery, since crashed workers can just read cms-cache-stuff again at startup and then start listening on the pub/sub channel for additional updates.
Pub/sub alone is pretty useless for caching since it provides no durability. If a worker is crashed and not listening on the channel, the messages are simply discarded.
Well as I suspected my problem was a super-basic noob mistake that's hard to even even explain well enough to get the "duh" answer. I was using the connect-redis package, which is really designed for sessions, not a cache. Once someone pointed to node_redis client I was able to pretty easily get it set up and do what I wanted to do.
Thanks a lot - hopefully this helps some redis noob in the future!

SQL Azure distributing heavy read queries for reporting

We are using SQL Azure for our application and need some inputs on how to handle queries that scan a lot data for reporting. Our application is both read/write intensive and so we don't want the report queries to block the rest of the operations.
To avoid connection pooling issues caused by long running queries we put the code that queries the DB for reporting onto a worker role. This still does not avoid the database getting hit with a bunch of read only queries.
Is there something we are missing here - Could we setup a read only replica which all the reporting calls hit?
Any suggestions would be greatly appreciated.
Have a look at SQL Azure Data Sync. It will allow you to incrementally update your reporting database.
here are a couple of links to get you started
http://msdn.microsoft.com/en-us/library/hh667301.aspx
http://social.technet.microsoft.com/wiki/contents/articles/1821.sql-data-sync-overview.aspx
I think it is still in CTP though.
How about this:
Create a separate connection string for reporting, for example use a different Application Name
For your reporting queries use SET TRANSACTION ISOLATION LEVEL SNAPSHOT
This should prevent your long running queries blocking your operational queries. This will also allow your reports to get a consistent read.
Since you're talking about reporting I'm assuming you don't need real time data. In that case, you can consider creating a copy of your production database at a regular interval (every 12 hours for example).
In SQL Azure it's very easy to create a copy:
-- Execute on the master database.
-- Start copying.
CREATE DATABASE Database1B AS COPY OF Database1A;
Your reporting would happen on Database1B without impacting the actual production database (Database1A).
You are saying you have a lot of read-only queries...any possibility of caching them? (perfect since it is read-only)
What reporting tool are you using? You can output cache the queries as well if needed.

How does azure websites with EF migrations ensure integrity when updating

The scenario is simple: using EF code first migrations, with multiple azure website instances, decent size DB like 100GB (assuming azure SQL), lots of active concurrent users..say 20k for the heck of it.
Goal: push out update, with active users, keep integrity while upgrading.
I've sifted through all the docs I can find. However the core details seem to be missing or I'm blatantly overlooking them. When Azure receives an update request via FTP/git/tfs, how does it process the update? What does it do with active users? For example, does it freeze incoming requests to all instances, let items already processing finish, upgrade/replace each instance, let EF migrations process, then let traffics start again? If it upgrades/refreshes all instances simultaneously, how does it ensure EF migrations run only once? If it refreshes instances live in a rolling upgrade process (upgrade 1 at a time with no inbound traffic freeze), how could it ensure integrity since instances in the older state would/could potentially break?
The main question, what is the real process after it receives the request to update? What are the recommendations for updating a live website?
To put it simply, it doesn't.
EF Migrations and Azure deployment are two very different beasts. Azure deployment gives you a number of options including update and staging slots, you've probably seen
Deploy a web app in Azure App Service, for other readers this is a good start point.
In General the Azure deployment model is concerned about the active connections to the IIS/Web Site stack, in general update ensures uninterrupted user access by taking the instance being deployed out of the load balancer pool and redirecting traffic to the other instances. It then cycles through the instances updating one by one.
This means that at any point in time, during an update deployment there will be multiple versions of your code running at the same time.
If your EF Model has not changed between code versions, then Azure deployment works like a charm, users won't even know that it is happening. But if you need to apply a migration as part of the migration BEWARE
In General, EF will only load the model if the code and DB versions match. It is very hard to use EF Migrations and support multiple code versions of the model at the same time
EF Migrations are largely controlled by the Database Initializer.
See Upgrade the database using migrations for details.
As a developer you get to choose how and when the database will be upgraded, but know that if you are using Mirgrations and deployment updates:
New code code will not easily run against the old data schema.
If the old code/app restarts many default initialization strategies will attempt roll the schema back, if this happens refer to point 1. ;)
If you get around the EF model loading up against the wrong version of the schema, you will experience exceptions and general failures when the code tries to use schema elements that are not there
The simplest way to manage a EF migration on a live site is to take all instances of the site down for deployments that include an EF Migration
- You can use a maintenance page or a redirect, that's up to you.
If you are going to this trouble, it is probably best to manually apply the DB update, then if it fails you can easily abort the deployment, because it hasn't started yet!
Otherwise, deploy the update and the first instance to spin up will run the migration, if the initializer has been configured to do so...
If you absolutely must have continuous deployment of both site code/content and model updates then EF migrations might not be the best tool to get started with as you will find it very restrictive OOTB for this scenario.
I was watching a "Fundamentals" course on Pluralsight and this was touched upon.
If you have 3 sites, Azure will take one offline and upgrade that, and then when ready restart it. At that point, the other 2 instances get taken off-line and your upgraded insance will start, thus running your schema changes.
When those 2 come back the EF migrations would already have been run, thus your sites are back.
In theory then it all sounds like it should work, although depending upon how much EF migrations need running, requests may be delayed.
However, the comment from the author was that in this scenario (i.e. making schema changes) you should consider if your website can run in this situation. The suggestion being that you either need to make your code work with both old and new schemas, or show a "maintenance system down page".
The summary seems to be that depending on what you are actually upgrading, this will impact and affect your choices and method of deployment.
Generally speaking if you want to support active upgrades you need to support multiple version of you application simultaneously. This is really the only way to reliably stay active while you migrate/upgrade. Also consider feature switches to scale up your conversion in a controlled manner.

Resources