Version queries in URL to bypass browser cache - web

I'm writing a web application which will likely be updated frequently, including changes to css and js files which are typically cached aggressively by the browser.
In order for changes to become instantly visible to users, without affecting cache performance, I've come up with the following system:
Every resource file has a version. This version is appended with a ? sign, e.g. main.css becomes main.css?v=147. If I change a file, I increment the version in all references. (In practice I would probably just have a script to increment the version for all resources, every time I deploy an update.)
My questions are:
Is this a reasonable approach for production code? Am I missing something? (Or is there a better way?)
Does the question mark introduce additional overhead? I could incorporate the version number into the filename, if that is more efficient.

The approach sound reasonable. Here are some points to consider:
If you have many different resource files with different version numbers it might be quite some overhead for developers to correctly manage all these and increase them in the correct situations.
You might need to implement a policy for your team
or write a CI task to check that the devs did it right
You could use one version number for all files. For example when you have a version number of the whole app you could use that.
It makes "managing" the versions for developers a no-op.
It changes the links on every deploy
Depending on the number of resource files you have to manage the frequency of deploys vs. the frequency of deploys that change a resource file and the numbers of requests for these resource files one or the other solution might be more performant. This is a question of trade off.

Related

Semantic versioning - major version for a traditional web application

I have a Rails app which is a traditional web application (HTTP requests are processed and HTML pages are rendered). As of now, it does not have an APIs that are exposed to other apps.
I want to use semantic versioning for versioning the application. Currently it is at '0.0.0'.
Quoting from the documentation:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
From what I understand, because there are no applications dependent on mine, the major version will never change. Only the minor and patch versions will change, the major version will always remain 0.
I want to know if my understanding is correct. Is there any scenario in which my major version will change?
Since you're not developing and releasing software package, semantic versioning is not directly applicable. It sounds like a single "release" number could be enough for your use case, since what you need is track when a code change will be in test and in prod. Assuming code must go through test before going to prod, you would update the number whenever you update the test environment with code from the development branch. This way, at a given moment development would have release N, test would have N-1, and prod N-2.
API versioning is a different problem, independent of release numbering. In my experience API users only care about breaking changes, so those need to be versioned. Also, since users are slow to update their apps you must be prepared to keep old versions around indefinitely.
One way you could think about this is to think about the user's flow through the application as the basis for versioning. If a breaking change happens (i.e. the user's flow is changed in a way that makes the old route impossible) then it could be considered breaking. If you're adding new functionality that hasn't existed before (i.e. the user has access to a new feature or sees something new on the website that they can interact with) then that could be considered a minor version increase. If you're deploying minor fixes to things like text, then that could be considered a patch-level change.
The problem with this approach, though, is that you need to understand a user's workflow through the application to be able to correctly increment the major version, and as software developers we're still pretty terrible at doing that properly.
Ref: https://christianlydemann.com/versioning-your-angular-app-automatically-with-standard-version

Node.JS: test code vs. production code organization

From a cursory look in several node.js projects on Github I noticed that the common convention is to put test files under a ./spec directory (exact name may vary: ./tests, ./specs, etc.). Let's call this the "classic" project organization.
On the other hand, there is also (at least theoretically) the "localizing" organization: each test file is in the same directory as the production file it tests (e.g., under ./controllers we will have login_controller.js as well as login_controller.spec.js).
In order to avoid theological battles on this clearly subjective topic I will ask concrete questions:
Has anyone saw major modules/apps using the localizing organization?
Are there hard drawbacks/limitations to the localizing organization? by "hard" I mean something along the lines of "well, Heroku does not include the specs/ directory in its deployment bundle (a.k.a slug) so the classic organization has a smaller footprint on the server".
Are there testing frameworks (Mocha, jasmine-node, and co.) that somehow impose the "classic" scheme?
No, but that is up to your organizational preference. I personally would prefer a test directory.
Nope. Heroku includes everything in your slug that it receives, the only things that are excluded are the ones excluded from git in your .gitignore file.
Not 100% sure, but generally no, test frameworks do not impose a structure on your code. They simply provide the tools for you to write your tests using the structure you want.
Consistency here is more important than the direction you decide to go. There are a few relatively minor issues that I see going with test files next to source files.
Potential code navigation issues. I could see you sometimes opening
the wrong file on accident. Opening the test file when you mean to
open the source file, etc. This would happen more often if those
files are side by side and the only difference is .spec in the
filename.
Potential issues with unit test runner. Most unit test runners seem
to prefer a folder of tests by default. I'm sure you could configure
them to look across the entire project, but it depends on the test
runner.
Potentially slower unit test automation. Since your test files are
mixed across your entire project, the test runner will have to scan
your entire project for test files instead of a dedicated directory.
For a large code base, this could mean it takes longer for your test
suite to complete. Chances are the difference in speed is quite
small however.
As I said, these are minor issues and you can definitely work around them, but it does add some friction. You have to weigh the potential downsides with what you see as the benefits of having your test files reside next to your source files.

Partial packages in Continuous Delivery

Currently we are running a C# (built on Sharepoint) project and have implemented a series of automated process to help delivery, here are the details.
Continuous Integration. Typical CI system for frequent compilation and deployment in DEV environment.
Partial Package. Every week, a list of defects accompanied fixes is identified and corresponding assemblies are fetched from the full package to form a partial package. The partial package is deployed and tested in subsequent environments.
In this pipeline, there are two packages going through are being verified. Extra effort is used to build up a new system (web site, scripts, process, etc) for partial packages. However, some factors hinder its improvement.
Build and deploy time is too long. On developers' machines, every single modification on assemblies triggers around 5 to 10 minute redeployment in IIS. In addition, it takes 15 minutes (or even more) to rebuild the whole solution. (The most painful part of this project)
Geographical difference. Every final package is delivered to another office, so manual operation is inevitable and package size is preferred to be small.
I will be really grateful to have your opinions to push the Continuous Delivery practices forward. Thanks!
I imagine the reason that this question has no answers is because its scope is too large. There are far too many variables that need to be eliminated, but I'll try to help. I'm not sure of your skill level either so my apologies in advance for the basics, but I think they'll help improve and better focus your question.
Scope your problem to as narrow as possible
"Too long" is a very subjective term. I know of some larger projects that would love to see 15 minute build times. Given your question there's no way to know if you are experiencing a configuration problem or an infrastructure problem. An example of a configuration issue would be, are your projects taking full advantage of multiple cores by being built parallel /m switch? An example of an infrastructure issue would be if you're trying to move large amounts of data over a slow line using ineffective or defective hardware. It sounds like you are seeing the same times across different machines so you may want to focus on configuration.
Break down your build into "tasks" and each task into the most concise steps possible
This will do the most to help you tune your configuration and understand what you need to better orchestrate. If you are building a solution using a CI server you are probably running using a command like msbuild.exe OurProduct.sln which is the right way to get something up and running fast so there IS some feedback. But in order to optimize, this solution will need to be broken down into independent projects. If you find one project that's causing the bulk of your time sink it may indicate other issues or may just be the core project that everything else depends on. How you handle your build job dependencies is dependent up your CI server and solution. Doing it this way will create more orchestration on your end, but give faster feedback if that's what's required since you're only building the project that had the change, not the complete solution.
I'm not sure what you mean about the "geographical difference" thing. Is this a "push" to the office or a "pull" from the offices? This is a whole other question. HOW are you getting the files there? And why would that require a manual step?
Narrow your scope and do multiple questions and you will probably get better (not to mention shorter and more concise) answers.
Best!
I'm not a C# developer, but the principles remain the same.
To speed up your builds, it will be necessary to break your application up in smaller chunks if possible. If that's not possible, then you've got bigger problems to attack right now. Remember the principles of API's, components and separation of concerns. If you're not familiar with these principles, it's definitely worth the time to learn about them.
In terms of deployment - great that you've automated it, but it sounds exactly the same as you are building a big-bang deployment. Can you think of a way to deploy only deltas to the server(s), are do you deploy a single compressed file? Break it up if possible.

What are the main reasons against the Windows Registry?

If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Which features would be absolutely necessary?
What are the main concerns (security, ease-of-configuration, ...)?
I think the Windows Registry was not a bad idea, just the implementation didn't fullfill the promises. A common place for configurations including for example apache config, database config or mail server config wouldn't be a bad idea and might improve maintainability, especially if it has options for (protected) remote access.
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
A kernel-based registry? Why? Why? A thousand times, why? Might as well ask for a kernel-based musical postcard or inetd for all the point it is putting it in there. If it doesn't need to be in the kernel, it shouldn't be in. There are many other ways to implement a privileged process that don't require deep hackery like that...
If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Make sure that applications can change many entries at once in an atomic fashion.
Make sure that there are simple command-line tools to manipulate it.
Make sure that no critical part of the system needs it, so that it's always possible to boot to a point where you can fix things.
Make sure that backup programs back it up correctly!
Don't let chunks of executable data be stored in your registry.
If you must have a single repository, at least use a proper database so you have tools to restore, backup, recover it etc and you can interact with it without having a new set of custom APIs
the first one that come to my mind is somehow you need to avoid orphan registry entries. At the moment when you delete program you are also deleting the configuration files which are under some directory but after having a registry system you need to make sure when a program is deleted its configuration in registry should be deleted as well.
IMHO, the main problems with the windows registry are:
Binary format. This loses you the availability of a huge variety of very useful tools. In a binary format, tools like diff, search, version control etc. have to be specially implemented, rather than use the best of breed which are capable of operating on the common substrate of text. Text also offers the advantage of trivially embedded documentation / comments (also greppable), and easy programatic creation and parsing by external tools. It's also more flexible - sometimes configuration is better expressed with a full turing complete language than trying to shoehorn it into a structure of keys and subkeys.
Monolithic. It's a big advantage to have everything for application X contained in one place. Move to a new computer and want to keep your settings for it? Just copy the file. While this is theoretically possible with the registry, so long as everything is under a single key, in practice it's a non-starter. Settings tend to be diffused in various places, and it is generally difficult to find where. This is usually given as a strength of the registry, but "everything in one place" generally devolves to "Everything put somewhere in one huge place".
Too broad. Its easy to think of it as just a place for user settings, but in fact the registry becomes a dumping ground for everything. 90% of what's there is not designed for users to read or modify, but is in fact a database of the serialised form of various structures used by programs that want to persist information. This includes things like the entire COM registration system, installed apps, etc. Now this is stuff that needs to be stored, but the fact that its mixed in with things like user-configurable settings and stuff you might want to read dramatically lowers its value.

Distributing a bundle of files across an extranet

I want to be able to distribute bundles of files, about 500 MB per bundle, to all machines on a corporate "extranet" (which is basically a few LANs connected using various private mechanisms, including leased lines and VPN).
The total number of hosts is roughly 100, and the goal is to get a copy of the bundle from one host onto all the other hosts reliably, quickly, and efficiently. One important issue is that some hosts are grouped together on single fast LANs in which case the network I/O should be done once from one group to the next and then within each group between all the peers. This is as opposed to a strict central server system where multiple hosts might each fetch the same bundle over a slow link, rather than once via the slow link and then between each other quickly.
A new bundle will be produced every few days, and occasionally old bundles will be deleted (but that problem can be solved separately).
The machines in question happen to run recent Linuxes, but bonus points will go to solutions which are at least somewhat cross-platform (in which case the bundle might differ per platform but maybe the same mechanism can be used).
That's pretty much it. I'm not opposed to writing some code to handle this, but it would be preferable if it were one of bash, Python, Ruby, Lua, C, or C++.
I think all these problems have been solved by modern research into p2p networking and well packaged into nice forms. A bit of script and bit torrent should solve these problems. torrent clients exist for all modern OSs, then a script on each machine to check a location for a new torrent file, start the DL, then delete the old bundle once the DL has finished.
What about rsync?
I'm going to suggest you use compie's idea of rysnc to copy the files in which case you can use a scripting language of your choice.
On the propagating system you will need a script containing some form of representation of the hosts and a matrix between them weighted with the speed. You then need to calculate a minimum spanning tree from that information. From that, you can then send messages to the systems to which you intend to propagate detailing the MST and the bundle to fetch, whereby that script/daemon begins transfer. That host then contacts the hosts over the fastest links...
You could implement it in bash - python might be better or a custom C daemon.
When you update the network you'll need to update the matrix based on latest information.
See: Prim's Algorithm.

Resources