Varnish - WebTrends statistics - varnish

We currently get web analytics for a WordPress site using WebTrends.
If we use a caching mechanism like Varnish, I would assume WebTrends would suddenly report a dramatic reduction in traffic.
Is this correct and, if so, can you avoid this problem and get the correct statistics reported by WebTrends?

In my experience, acceleration caches shouldn't interfere with your Analytics data capture, because the cached content should include all of the on-page data points (such as meta tags) as well as the WT base tag file, which the user's browser will then execute and which will then make the call to the WT data collection server.
By way of a disclaimer, I should add that I haven't got any specific experience with Varnish, but a cache that acts as a barrier to on-page JavaScript executing is basically broken, and I've personally never had a problem with one preventing analytics software from running.
The only conceivable problem I could foresee is if a cache was going to the extent of scanning pages for linked resources (such as the "no javascript" image in the noscript tag), acquiring those resources in advance, and then reconfiguring the page being served to pull those resources from the cache rather than the third party servers. In which case you might end up with spurious "no javascript" records in your data.

Just make sure that your varnish config is not removing any webtrends cookies and it should be percetly OK. By default it does not but if you use some ready-made wordpress vcl then it might be you will need to exclude these cookies together with the wordpress-specific ones in the configuration.

Related

URL Fingerprinting / Agressive Caching with NGINX + Express

What is the recommended technique for handling aggressive caching and URL fingerprinting in an NGINX (proxy) and Node / Express stack?
Google recommends to "use fingerprinting to dynamically enable caching." in their best practice guidelines and this is exactly what I'm trying to achieve.
I've looked at quite a few different approaches for fingerprinting but I'm struggling to understand under what scenario these will actually generate a new fingerprint and what part of the development pipeline it's best to sit. I had previously assumed that if 'Last-Modified' changes on the file then the server will generate another fingerprint but that doesn't seem to be the case on yet. (Unless I've misconfigured)
Here are a few different approaches:
Runtime Fingerprinting
dactyloscope
static-asset
Build Fingerprinting
asset-rack
node-version-assets
CI Fingerprinting
grunt-fingerprint
grunt-asset-versioning
So a couple of questions I hope someone can answer:
Is fingerprinting even a requirement with ETags in place or are there too many holes in cross-browser support?
Assets should arguably sit on a CDN so is this problem largely deferred to a CDN provider (if so how do you update references without manual involvement)?
How does a new fingerprint get generated without manual cache clear?
What is the suggestion on where this fingerprinting will sit in the developer pipeline? I want to avoid a dependency on the likes of Grunt.js
I feel like I'm missing something blindingly obvious so if you can answer just one of these questions I'd be really grateful.
Fingerprinting and Etags are separate features for reducing load times.
Etags avoid having to resend an asset if the browser has cached it and the asset has not changed. But, a separate HTTP roundtrip is still required for the browser to send an If-None-Match and get back 304 Not Modified.
The best way of speeding up an HTTP roundtrip is to avoid making one at all. When the second page of a website uses the same assets as the first page, and those assets have far future cache expires headers, then there is no need to even make a single round trip for those assets when they are requested after the first time.
Fingerprinting is the technique of giving each asset a unique name that is derived from its content. Then, when even one bit in an asset (such as a CSS bundle) changes, its name changes, and so a browser will GET the updated asset. And, because fingerprinting uses a cryptographic hash of the contents, the unique name is calculated the same across multiple servers, as long as the asset is identical. Caches everywhere (CDNs, at ISPs, in networking equipment, or in web browsers) are able to keep a copy of each asset, but since the HTML references the unique name of each asset, only the correct version of that asset will ever be served from a cache.
Both Etags and fingerprinting are supported by every browser.
Fingerprinting is not required, it is an optimization. If you are using technologies like Stylus, Browserify, and AngularTemplateCaches that already require a build step, than adding fingerprints is cost-free.
Your HTML pages will have names like /aboutus instead of the /aboutus-sfghjs3646dhs73shwsbby3 they would get with fingerprinting. All the solutions you link to support fingerprinting of Javascript, CSS, and images, and a way to dynamically substitute in the fingerprinted name to the HTML. So, the HTML will reference /css-hs6hd73ydhs7d7shsh7w until you change a byte in the CSS, and then they will reference /css-37r7dhsh373hd73 (a different file).
Fingerprints only need to be generated when the file is modified, which should generally be on server restart or build.
I recommend Asset Rack, which supports lots of asset types, and can serve the fingerprinted assets from RAM or push them to a CDN. It generates all fingerprints each time Express is started up.

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

How much does a single request to the server cost

I was wondering how much do you win by putting all of your css scripts and stuff that needs to be downloaded in one file?
I know that you would win a lot by using sprites, but at some point it might actually hurt to do that.
For example my website uses a lot of small icons and most of the pages has different icons after combining all those icons together i might get over 500kb in total, but if i make one sprite per page it is reduced to almost 50kb/page so that's cool.
But what about scripts js/css how much would i win by making a script for each page which has just over ~100 lines? Or maybe i wouldn't win at all?
Question, basically i want to know how much does a single request cost to download a file and is it really bad to to have many script/image files with todays modern browsers and a high speed connections.
EDIT
Thank you all for your answers, it was hard to chose just one because every answer did answer my question, I chose to reward the one that in my opinion answered my question about request cost the most directly, I will not accept any answer as correct because everyone was.
Multiple requests means more latency, so that will often make a difference. Exactly how costly that is will depend on the size of the response, the performance of the server, where in the world it's hosted, whether it's been cached, etc... To get real measurements you should experiment with your real world examples.
I often use PageSpeed, and generally follow the documented best practices: https://developers.google.com/speed/docs/insights/about.
To try answering your final question directly: additional requests will cost more. It's not necessarily "really bad" to have many files, but it's generally a good idea to combine content into a single file when you can.
Your question isn't answerable in a real generic way.
There are a few reasons to combine scripts and stylesheets.
Browsers using HTTP/1.1 will open multiple connections, typically 2-4 for every host. Because almost every site has the actual HTML file and at least one other resource like a stylesheet, script or image, these connections are created right when you load the initial URL like index.html.
TCP connections are costly. That's why browsers open directly multiple connections ahead of time.
Connections are usually limited to a small number and each connection can only transfer one file at a time.
That said, you could split your files across multiple hosts (e.g. an additional static.example.com), which increases the number of hosts / connections and can speed up the download. On the other hand, this brings additional overhead, because of more connections and additional DNS lookups.
On the other hand, there are valid reasons to leave your files split.
The most important one is HTTP/2. HTTP/2 uses only a single connection and multiplexes all file downloads over that connection. There are multiple demos online that demonstrate this, e.g. http://www.http2demo.io/
If you leave your files split, they can also be cached separately. If you have just small parts changing, the browser could just reload the changed file and all others would be answered using 304 Not Modified. You should have appropriate caching headers in place of course.
That said, if you have the resources, you could serve all your files separately using HTTP/2 for clients that support it. If you have a lot of older clients, you could fallback to combined files for them when they make requests using HTTP/1.1.
Tricky question :)
Of course, the trivial answer is that more requests takes more time, but that is not necessarily this simple.
browsers open multiple http connections to the same host, see http://sgdev-blog.blogspot.hu/2014/01/maximum-concurrent-connection-to-same.html Because that, not using parallel download but rather downloading one huge file is considered as a performance bottleneck by http://www.sitepoint.com/seven-mistakes-that-make-websites-slow/
web servers shall use gzip content-encoding whenever possible. Therefore size of the text resources such as HTML, JS, CSS are quite compressed.
most of those assets are static content, therefore a standard web server shall use etag caching on them. It means that next time the download will be like 26 bytes, since the server tells "not changed" instead of sending the 32kbyte of JavaScript over again
Because of the etag cache, the whole web site shall be cacheable (I assume you're programming a game or something like that, not some old-school J2EE servlet page).
I would suggest making 2-4 big files and download that, if you really want to go for the big files
So to put it together:
if you have only static content, then it is all the same, because etag caching will shortcut any real download from the server, server returns 304 Not modified answer
if you have some generated dynamic content (such as servlet pages), keep the JS and CSS separate as they can be etag cached separately, and only the servlet page needs to be downloaded
check that your server supports gzip content encoding for compression, this helps a lot :)
if you have multiple dynamic content (such as mutliple dynamically changing images), it makes sense to have them represented as 2-4 separate images to utilize the parallel http connections for download (although I can hardly imagine this use case in the real life)
Please, ensure that you're not serving static content dynamically. I.e. try to load the image to a web browser, open the network traffic view, reload with F5 and see that you get 304 Not modified from the server, instead of 200 OK and real traffic.
The biggest performance optimization is that you don't pull anything from the server, and it comes out of the box if used properly :)
I think #DigitalDan has the best answer.
But the question belies the real one, how do I make my page load faster? Or at least , APPEAR to load faster...
I would add something about "above the fold": basically you want to inline as much as will allow your page to render the main visible content on the first round trip, as that is what is perceived as the fastest by the user, and make sure nothing else on the page blocks that...
Archibald explains it well:
https://www.youtube.com/watch?v=EVEiIlJSx_Y
How much you win if you use any of these types might vary based on your specific needs, but I will talk about my case: in my web application we don't combine all files, instead, have 2 types of files, common files, and per page files, where we have common files that needed globally for our application, and other files that is used for its case only, and here is why.
Above is a chart request analysis for my web application, what you need to consider is this
DNS Lookup happens only once as it cached after that, however, DNS name might be cached already, then.
On each request we have:
request start + initial connection + SSL negotiation+ time to first byte + content download
The main factor here which takes majority of request time in most cases is the content download size, so if I have multiple files that all of them needed to be used in all pages, I would combine them into one file so I can save the TCP stack time, on the other hand, if I have files needed to be used in specific pages, I would make it separate so I can save the content download time in other pages.
Actually very relevant question (topic) that many web developer face.
I would also add my answer among other contributors of this question.
Introduction before going to answer
High performance web sites depending on different factors, here is some consideration:
Website size
Content type of website (primary content Text, image, video or mixture)
Traffic on your website (How many people visiting your website average)
Web-host Location vs your primary visitor location (with in your country, region and world wide), it matters a lot if you have website for Europe and your host is in US.
Web-host server (hardware) technology, I prefer SSD disks.
How web-server (software) is setup and optimized
Is it dynamic or static web site
If dynamic, how your code and database is structured and designed
By defining your need you might be able to find the proper strategy.
Regarding your question in general
What regards your website. I recommend you to look at Steve Souders 14 recommendation in his Book High Performance Web Sites.
Steve Souders 14 advice:
Make fewer HTTP requests
Use a Content Delivery Network (CDN)
Add an Expires Header
Gzip Components
Put Style-sheets at the Top
Put Scripts at the Bottom
Avoid CSS Expressions
Make JavaScript and CSS External if possible
Reduce DNS Lookups
Minify JavaScript
Avoid Redirects
Remove Duplicates Scripts
Configure ETages
Make Ajax Cacheable
Regarding your question
So if we take js/css in consideration following will help a lot:
It is better to have different codes on different files.
Example: you might have page1, page2, page3 and page4.
Page1 and page2 uses js1 and js2
Page3 uses only js3
Page4 uses all js1, js2 and js3
So it will be a good idea to have JavaScript in 3 files. You are not interested in including every thing you have that you do not use.
CSS Sprites
CSS at top and JS at the end
Minifying JavaScript
Put your JavaScript and CSS in external files
CDN, in case you use jQuery for example do not download it to your website just use the recommended CDN address.
Conclusion
I am pretty sure there is more details to write. And not all advice are necessary to implement, but it is important to be aware of those. As I mentioned before, I suggest you reading this tiny book, it gives you more details. And finally there is no perfect final solution. You need to start some where, do your best and improved it. No thing is permanent.
Good luck.
the answer to your question is it really depends.
the ultimate goal of page load optimization is to make your users feel your page load is fast.
some suggestions:
do not merge common library js css files like jquery coz they might have already cached by brower when you visited other sites so u don't even need to download them;
merge resources, but at least separate first screen required resouces and the others coz the earlier user could see some meaningful stuff, the faster they feel about your page;
if several of your pages shared some resources, separate the merged files for shared resources and page specific resources so that when you visit the second page, the shared ones might have already been cached by browser, so the page load is faster;
user might be using a phone with slow or inconsistent speed 3g/4g network, so even 50k of data or 2 more requests does make them feel different a lot;
Is really bad to have a lot of 100-lines-files and is also really bad to have just one or two big files, though for each type css/js/markup.
Desktops have mostly high speed connection, and mobile has also high latency.
Taking all the theory about this topic, i think the best approach shall be more practical, less accurate and based upon actual connection speed and device types from a statistical point of view.
For example, i think this is the best way to go today:
1) put all the stuff needed to show the first page/functionality to the user, in one file, shall be under 100KB - this is absolutely a requirement.
2) after that, split or group the files in sizes so that the latency is no longer noticeable together with the download time.
To make it simple and concrete, if we assume: time to first byte is around ~200ms, the size of each file should be between ~120KB and ~200 KB, good for the most connections of today, averaged.

Understanding Azure Caching Service

By caching we basically mean, replicating data for faster access. For example -
Store freqeuently used data, from DB into memory.
Store static conents of Web page in the client browser.
Cloud hosting already uses closest DataCenter (CDN) to serve contents to the user. My question is, how does Caching Service makes it faster.
CDN is used to improve the delivery performance between your service datacenter and your customer, by introducing a transparent proxy datacenter that is nearer your customer. The CDN typically is set up to cache - such that requests from different customers can be serviced by the same "CDN answer" without calling the origin service datacenter. This configuration is predominantly used to offload requests for shared assets such as jpegs, javascript etc.
Azure Caching Service is employed behind your service, within your service datacenter. Unlike the built in ASP.NET cache, Azure Cache runs as a seperate service, and can be shared between servers/services. Generally your service would use this to store cross-session or expensive-to-create information - e.g. query results from a database. You're trading:
value of memory to cache the item (time/money)
cost (time/money) of creation of the item
number of times you'd expect to reuse the item.
"freshness" of information
For example you might use the memory cache to reduce the number of times that you query Azure Table, because you expect to reuse the same information multiple times, the latency to perform the query is high, and you can live with information potentially being "stale". Doing so would, save you money, and improve the overall performance of your system.
You'd typically "layer" the out-of-process Azure Cache with on-machine/in-process cache, such that for frequent queries you pull information as follows:
best - look first in local/on-box cache
better - look in off-box Azure Service Cache, then load local cache with result
good - make a call/query to expensive resource, load Azure Cache and local cache with result
Before saying anything I wanted to point you to this (very similar discussion):
Is it better to use Cache or CDN?
Having said that, this is how CDN and Caching can improve your website's performance.
CDN: This service helps you stay "closed" to your end user. Wit CDN, your websites content will be spread over a system of servers, each in its own location. Every server will hold a redundant copy of your site. When accessed by visitor, the CDN system will identify his/hers location and serve the content from the closest server (also called POP or Proxy).
For example: When visited from Australia your be server by Australian server. When visited from US you'll be server by US server and etc...
CDN will me most useful is your website operated outside of its immediate locale.
(i.e. CDN will not help you is your website promotes a local locksmith service that only has visitors from your city. As long as your original servers are sitting near by...)
Also, the overall coverage is unimportant.
You just need to make sure that the network covers all locations relevant to you day-2-day operations.
Cache: Provides faster access to your static or/and commonly used content objects. For example, if you have an image on your home page, and that image is downloaded again and again (and again) by all visitor, you should Cache it, so that returning visitor will already have it stored in his/hers PC (in browser Cache). This will save time, because local ressourses will load fasted and also save you bandwidth - because the image will load from visitor's computer and not from your server.
CDN and Caching are often combined, because this setup allows your to store Cache on the CDN network.
Also, this dual setup can also help improve Caching efficiency - For example it can help with dynamic Caching by introducing smart algorithms into the "top" CDN layer.
Here is more information about Dynamic Caching (also good introduction to HTTP Caching directives)
As you might already know, from reading the above mention post, no one method is better and they are at their best, when combined.
Hope this answers it
GL

My Windows Azure MVC3 application is slow. How can I see what's wrong?

I have deployed my Windows Azure application to the cloud. Now that it's running it seems to be slow. Some pages taking up to three seconds to return and all the look ups are to table storage with direct key lookups.
It's not very significant but when I check with fiddler I see all of my web requests are resulting in Status codes 200. Even those for the CSS. Is this expected. I thought the CSS would be cached.
Getting back to the original question. When performance is slow is there a way I can work out why? I already set the solution configuration to "Release". What more is there that I can do?
Any tips / help would be much appreciated.
For investigating the problems in production, you could try using StackOverflow's profiler to work out where the slowness is occurring - http://code.google.com/p/mvc-mini-profiler/
For looking at how to encourage browsers to use cached content for css, js and images, I think you can just use web.config files in subfolders - see IIS7 Cache-Control - and you should also be able to setup gzip compression.
You can try http://getglimpse.com
Seems promising
Put your files in the Azure storage and provide cache instructions:
Add Cache-Control and Expires headers to Azure Storage Blobs
If you want to do it from IIS, provide the proper HTTP caching instructions to the browser.
Best practices for speeding up your website.
Anyway, you have to provide more details about what are you doing. Are you using Session? how many queries launch each page?
The fact that in your computer, with just one client (you) goes fast, doesn't mean the application is fast, you have to try with lots of users in order to ensure there is no bottlenecks, contention locks, busy resources etc.. etc..

Resources