Need explanation for Docpad persistence - node.js

I am pretty confused with the architecture behind how data is persisted in Docpad. From blogs and forums, I got to know in-memory (and/or out directory) is used for generated contents. But one of the selling points of Docpad is "completely file based". From the sound of it, hosting it on Heroku or any ephemeral file system doesn't seem logical. Can anyone give some explanation/clarification?

DocPad is pitched as a next generation web architecture. This mindmap showcases why we call it that perfectly:
DocPad Architecture Vision http://d.pr/i/jmmZ+
The workflow being like so:
Importers bring data in, from any source, be it the local file system, or tumblr, or mongo database.
These get injected into the DocPad in-memory database
At generation time, DocPad will then render what needs to be rendered, and output static content into the out directory
Dynamic documents (documents that re-render on each request) and dynamic abilities (server extensions) are now able to make use of the in memory database and perform advanced cool stuff like file uploads, contact forms, search pages, whatever
In that sense, DocPad is a next generation web architecture that has static site generation abilities, as well as dynamic site generation abilities. What separates DocPad from traditional web architectures, is that traditional web architectures consider the content and templating separate beings, where DocPad considers them the same and just separated by their extension. Traditional web architectures also are dynamic by defaults, with static site generation accomplished via caching, rather than the other way round of being static by default.
Because of this load everything in the in-memory database situation, we are suffering some from growing pains with performance during generation and post-generation. Discussion here. However there is nothing there that can't be fixed with enough time and resources. Regardless of this, DocPad will still be faster than your traditional web architecture due to the static nature (faster requests) as well as the asynchronous nature (faster generations).
In terms of how you would handle file uploads:
If you are doing a static website with DocPad, you would have a backend API server somewhere else that you would do the upload too and load the data into DocPad as a single page application style.
If you are doing a dynamic website with DocPad, you would host DocPad on a server like Heroku, and extend the server to handle the file upload to a destination like Amazon S3, Dropbox, or into MongoDB or the like. You can then choose to expose the file via templateData as a link, or inject the file into the DocPad in-memory database as a file. Which one you chose is whether or not you just want to reference the upload or treat it as a first class citizen in the DocPad universe (it gets it's own URL and page).
For dynamic sites, I would say I really go with the static site + single page application approach. You get benefits like responsive design, offline support, really fast UX which without doing it that way, you struggle a bit accomplishing it with the dynamic site approach regardless of which web architecture you build it on.

Well, I can't top off Benjamin's excellent explaination, but if you want a TLDR explaination:
docpad is used to (biggest-use-case) generate STATIC websites, a-la github pages or old websites of 1990s. You can write your pages in whatever you like (Jade, eco, coffeescript, etc) and it will compile the pages and output HTML files. Think of it as a "Compile-once-server-forever" thing.
On the other hand, if you want Dynamic content on your site, you'd like to use Nodejs for pulling in the dynamic data from other sites, or generating it on the fly.
As for your concern about Heroku's ephemeral file system, (I don't know exactly how what works) you can use Amazon's S3 for storage. Check out this

Related

What is a way to serve up a sizeable amount of static content from a Next.JS app deployed on Heroku without a custom server?

I have a webapp that serves up a significant but not ridiculous amount of static data (~3gigs of smallish files).
With Next.JS and my home computer this is no issue. I dump the content in /public and it serves it right up. Runs fantastic.
But when I deploy to Heroku, the slug has size limits, git has size limits, etc. Their processes are more oriented towards CDNs, which are great but I still need to have the Next.JS server offer up the content to be CDN'd. Most of the documentation pushes the user to use S3 for this sort of storage, but then I have to write a custom server for Next.JS, link up to S3, etc.
Doable, but feels like a bit more work than necessary. Hoping someone might have a simpler method to suggest?

Is express.static middleware optimal for streaming videos?

I want to stream videos from a server to a web page with an html video tag. I am using node, and plan to stick with it (no nginx).
For the moment, I am using express.static middleware, i.e. serve-static, but I am wondering since it is made for serving assets, html pages, etc ... if it is suitable for streaming big videos!?
I took a peak at the code, and it seems that it does things properly : support for Accept-Ranges header, etc ... but I lack experience and knowledge about this specific topic, so I can't figure out whether things are as optimal as they could be.
Any suggestion of a better express middleware, or node server for this? purpose?
EDIT
I do not need to do anything fancy such adaptative bitrate, etc ... I simply want to make sure that - within the node realm - this setup is optimal to serve a video, since my server is installed on an embedded system with very little RAM available.
The best solution is to use a proper optimized web server, such as Nginx.
express.static is for utility purposes. Node.js as a whole is useful for building your application server. If you want to serve static files, use a web server. Otherwise you have the extra overhead of JavaScript for no benefit.
This goes for any static files, not just video. The size of the static content really has no bearing here on what's best, as all the servers stream large resources from disk.

Isomorphism in an SPA consuming a REST service

I'm currently in the planning stages of a new project which is composed of a storefront, a highly reactive user dashboard, and the individual products being offered via the storefront being highly interactive mini-apps. We're trying to get away with making the entire platform a SPA and design the entire thing on a Flux architecture with React for the front-end views.
One issue, as with most SPAs, is SEO. I've prototyped an isomorphic solution based on the este.js dev stack. One issue is that our app consumes pretty much all of its data from a RESTful server, which is separate from the web server serving up the SPA. This means that the web server would need to fetch a considerable amount of data from the RESTful server, to isomorphically generate the HTML snapshot.
I've considered having a separate crawler process of my own crawl the entire storefront periodically and isomorphically generate HTML snapshots of the pages that could be served up when the web server encounters a search engine crawler. I'm not sure if this is a good approach though, as it would likely introduce additional maintenance and, frankly, seems a bit fragile. I could just have the web server isomorphically generate the HTML on the fly, but I fear bogging the server down for ordinary users as the server would be pulling considerable data from the REST API...
Is there a better way to handle such a case?
Check out Yahoo's Fetchr, an open-source library that allows you to isomorphically hit your API. It ties into Facebook's Flux architecture, so you need to have Stores, but at the very least you can glean some code and concepts from it. If you're in the planning phase, you might even consider going with the Flux or Fluxible.
http://fluxible.io/guides/data-services.html
https://github.com/yahoo/fetchr

Single page app + node.js backend (REST) + CMS - best concepts/practices

We are going to build big social web app. We have to implements 2 big modules:
FrontEnd - single page app (Backbone.js)
CMS - system to manage contents of FrontEnd (daily content, sponsors, banners, links, special offers, upload media etcetc)
FrontEnd will use Node.js powered REST api which will use DB in the cloud (PG or Mongo - didnt decide yet).
My question is: should CMS also use same REST api as FrontEnd? Or should we make separate app (not node.js neccessery) for CMS that would "talk" with db in the cloud directly? My question arises because on previous project we had this issue:
Single REST api for FrontEnd and CMS.
When we wanted new functionality in CMS we had to implement it in RESTapi - and then we had to restart whole APP (RESTapi) which was problematic in production...
So:
Implement 2 RestApis - one for FrontEnd and one for BackEnd?
Implement 1 RestApi for FrontEnd and implemnt CMS as separate app talking directly to database?
How do you do it?
Out goal is to implement super-fast FrontEnd and Big/Heavy CMS (its is going to be bigger than FrontEnd). So we are thinking of completly separating CMS module from FrontEnd module. Eventual need for communication between modules would be implemented through redis pub/sub for example - What do you think?
Software architecture decisions are always very contextual - the people most qualified to make the call are you and your team, since you know way more than we do. That being said, based on the info you've shared, here are some things to consider:
Content Management as a problem space is pretty mature. Unless part of your revenue model involves innovations in how you handle Content Management, you would be unwise to build one. There are fantastic CMSes both open source and commercial, ranging in price from hundreds of dollars to hundreds of thousands. I cannot caution you strongly enough against the common developer fallacy to discount the value of our own time. Even if you spend an entire engineer's salary-worth on a CMS, you'll almost certainly come out ahead.
An architecture that uses a CMS should reflect the reality of #1, that CMSes are mature and stable. You want a strong and well-defined interface boundary between the parts of your system that are unique to you and specific to your revenue model, and the parts that are interchangeable with COTS (commercial off-the-shelf) - even if you do end up building it yourself (which again, I strongly caution against) - you'll run into impedance mismatch problems that are very hard to get out of and create friction to new feature delivery across the entire system if you design something as if it's bespoke when it's not (or vice versa).

Building a website backend in c#, compiled to a binary

I am creating a novel website that integrates web feeds from around the internet. I want to build a backend that does CPU intensive analysis of the web data on a regular basis, which will eventually add the results continuously into a database.
This database will be accessable by the website through a normal asp.net backend that will server the page up to the client.
Is it advisable, and best practice, to build the complex CPU operations in c# binaries that run continuously on the server?
Sounds like you want a .NET executable that either runs on a schedule (cronjob-style) or that schedules itself. In any case it's wise to have it completely separate to your website process. It sounds like data-generation and data-serving are separate concerns, so they should be kept separate. This also means that you can move it off the web-serving machine if load becomes an issue. If you're updating a live database remember to take transactions into account.

Resources