storing quick analytics using redis and node.js - node.js

I am new to redis and would like to store the web analytic of web site globally and per user activity .
Below is what i am stuck with.
// to get all unique ips
client.sadd('visitors',ip);
// to records hits per ip
client.hincrby('hits',ip,1);
The above so far works fine and i do get number of different ips and hit counter per ip.
the problem comes to store the activities made by each ip. i.e. Storing the link he clicked, searches he did, with datetime
Can some one please throw light on how to best manage it.
Thanks

the problem comes to store the activities made by each
You will need a separate structure for storing these.
The simplest rational structure is to have a "list of actions by session". Take a look at the sorted sets commands which provide a basic framework for creating a list of actions within a session.
This will get you something quickly. However, this is probably not what you really want. In fact redis is probably not useful for this at all.
If you want to re-trace an entire site visit you really want to connect to some sort of true analytics framework. There are dozens of website tracking tools that provide this type of functionality, so it's not really clear that building one is very efficient.

Related

Hybrid App Development, Database-Driven Content

I've been doing a lot of research, and perhaps just need a few dots connected.
I have an idea for a mobile app/website that contains lists of local eating/drinking establishments along with the deals/specials they offer each day. The idea is to create an app that people can refer to in order to save money on a night out.
I'm familiar enough with HTML/CSS/JS to create a functioning website, but when it comes to backend I'm a little confused. Editing the markup in order to reflect changes (e.g. a new deal starts or new establishment opens up) is a bit cumbersome. Now I know I want a database with my information ready to be displayed on my page. Does this mean that I need to develop my own API for everything, and then make sure it integrates with the hosting website that I end up choosing?
I feel like I'm missing something that should make it obvious what the next step is. Can anyone offer any advice?
The short answer is yes, you are exactly right.
The long answer is that is definetly one way to do it. But, for large projext just using JS can get quite cumbersomoe on your client end. Usually the first level would be using something like ajax. It's a great way to start and you can go a long way with just ajax. This is acutually where most people "start" when using just javascript to make api calls. The next level would be to use a framework like Angular. This will of course do more for you than just help handle api calls and it requires a larger investment in learning.
So that is all client side...
Now for the server side part... When you publish a website you are now dealing with "server-side" content. You have taken your static content and it is served up from the server but it's always the same static content from the server then it becomes dynamic on the client when all the javascript starts getting parsed.
The API is another server side component. But instead of being static like your pages, a bunch of files just sitting there, it is an actual application on the server. It takes a command via an api request and then does its thinking and then spits out a response object dynamically to the requester, which in this case will be the JS on your site.
Now, if you don't like the idea of learning to make your own API there are resources out there that will host an api for you and give you a gui to build your own API. I can't recommend one because I have never used one, but I do work with businesses that do and they love the fact they don't have to hire a dev to make thier apis. The downside is they are tied to that service and limited to the functionality that the service offers. It's not a big limitation as the services are quire powerful but if you are going to be managing complex data sets then it would probably be better to learn to make your own api.
Hope that clears things up a bit for you!

What server should I use if I am expecting over 20k visits/day?

I recently launched a fantasy football online game for the English premier league called Myfpl11.com and I want to know what server should I choose if I am expecting 20k visits a day. My visits are going up and I want the site to keep performing smoothly. Please help.
This is probably the wrong area of StackExchange to ask this sort of question. However ...
The first thing you should do is get prepared to scale horizontally instead of vertically. If you keep growing you will soon grow out of any single server that you purchase.
Instead, what you need to do is start looking at ways to modify your website to be able to work over multiple systems. If you're experiencing load issues on the server you currently have, spin up another one of the exact same instance and move the database to that server, so you will then have two -- one dedicated to the database (which will really help it do its job) and one dedicated to serving traffic.
From there look at how you can scale in to multiple web processes, databases and add caching layers.
You can add cloudflare.com as your DNS service which will help you out by better caching your assets, but most importantly they will deliver a technical issues page to your users if your site does fall over at any stage. This is really helpful for saving face, because they will get an actual page with a message instead of a continually loading white-page.
Look at using services like digitalocean.com or linode.com (both very affordable and great staff) where you can easily add/remove resources as you need them.

Creating session mechanism with core nodejs

I am trying to create a complete session managment in nodejs for logins, chat sessions etc.
I googled a lot and every solution that i got was with some framework/module. I don't want to use any module/framework. I would rather like to build my own solution for this:
So this is the plan:
I will set a session cookie on the client machine (yet to figure out how)
For each cookie, i will be maintaining a unique id in the database instead of files as is the case with php (i am using mongodb)
When a user opens the application, a cookie will be set, a entry will be made in database and corresponding information from the db will be fetched.
I am yet to lay a concrete plan for this. I wanted to know whether doing it this way is a good idea? i read somewhere....'Real men don't use any framework. They make everything on their own' :P
Please correct me if i am on a wrong direction. M just starting with these things....
I'm not aware of any node.js frameworks that are closed-source. Just pick one that seems to do what you want to do, download it, and study the source code to see how the developer implemented it. Then come up with your (perceived) improvement on how they did it. You'll probably find that implementing session management involves a whole bunch of nitpicky details that were never obvious to you.
Ignore all the above advice if this is a school assignment where you're not allowed to look at related code. If that's the case, I pity you because you have an incompetent teacher.

Automate the export of Facebook Insights data

I'm looking for a way of programmatically exporting Facebook insights data for my pages, in a way that I can automate it. Specifically, I'd like to create a scheduled task that runs daily, and that can save a CSV or Excel file of a page's insights data using a Facebook API. I would then have an ETL job that puts that data into a database.
I checked out the oData service for Excel, which appears to be broken. Does anyone know of a way to programmatically automate the export of insights data for Facebook pages?
It's possible and not too complicated once you know how to access the insights.
Here is how I proceed:
Login the user with the offline_access and read_insights.
read_insights allows me to access the insights for all the pages and applications the user is admin of.
offline_access gives me a permanent token that I can use to update the insights without having to wait for the user to login.
Retrieve the list of pages and applications the user is admin of, and store those in database.
When I want to get the insights for a page or application, I don't query FQL, I query the Graph API: First I calculate how many queries to graph.facebook.com/[object_id]/insights are necessary, according to the date range chosen. Then I generate a query to use with the Batch API (http://developers.facebook.com/docs/reference/api/batch/). That allows me to get all the data for all the available insights, for all the days in the date range, in only one query.
I parse the rather huge json object obtained (which weight a few Mb, be aware of that) and store everything in database.
Now that you have all the insights parsed and stored in database, you're just a few SQL queries away from manipulating the data the way you want, like displaying charts, or exporting in CSV or Excel format.
I have the code already made (and published as a temporarily free tool on www.social-insights.net), so exporting to excel would be quite fast and easy.
Let me know if I can help you with that.
It can be done before the week-end.
You would need to write something that uses the Insights part of the Facebook Graph API. I haven't seen something already written for this.
Check out http://megalytic.com. This is a service that exports FB Insights (along with Google Analytics, Twitter, and some others) to Excel.
A new tool is available: the Analytics Edge add-ins now have a Facebook connector that makes downloads a snap.
http://www.analyticsedge.com/facebook-connector/
There are a number of ways that you could do this. I would suggest your choice depends on two factors:
What is your level of coding skill?
How much data are you looking to move?
I can't answer 1 for you, but in your case you aren't moving that much data (in relative terms). I will still share three options of many.
HARD CODE IT
This would require a script that accesses Facebook's GraphAPI
AND a computer/server to process that request automatically.
I primarily use AWS and would suggest that you could launch an EC2
and have it scheduled to launch your script at X times. I haven't used AWS Pipeline, but I do know that it is designed in a way that you can have it run a script automatically as well... supposedly with a little less server know-how
USE THIRD PARTY ADD-ON
There are a lot of people who have similar data needs. It has led to a number of easy-to-use tools. I use Supermetrics Free to run occasional audits and make sure that our tools are running properly. Supermetrics is fast and has a really easy interface to access Facebooks API's and several others. I believe that you can also schedule refreshes and updates with it.
USE THIRD PARTY FULL-SERVICE ETL
There are also several services or freelancers that can set this up for you at little to no work on your own. Depending on where you want the data. Stitch is a service I have worked with on FB-ads. There might be better services, but it has fulfilled our needs for now.
MY SUGGESTION
You would probably be best served by using a third-party add-on like Supermetrics. It's fast and easy to use. The other methods might be more worth looking into if you had a lot more data to move, or needed it to be refreshed more often than daily.

Reverse Engineering data from Lucene/Solr Indexes

I am investigating whether it is feasable to deploy search servers to the cloud and one of the questions I had revolved around data security. Currently all of our fields (except a few used for faceting) are indexed and not stored (except for the ID, which we use to retrieve the document after search has completed).
If for some reason the servers within the cloud were compromized, would it be possible for that person to reverse engineer our data from the indexes even without the fields being stored.
Depends on the security level you need and the sensitivity of the document content...
With a configuration you describe it wouldn't be possible to rebuild the original as a "clone"... BUT it would be possible to reverse enough information to gain a lot of knowledge about the content... depending on the context this could be damaging...
An important point:
If you use the cloud based servers to build the index and they get compromized THEN there would be no need for "reversing" depending on your configuration: at least for any document you index after the servers get compromized because for building the index the document gets sent over as it is (for example when using http://wiki.apache.org/solr/ExtractingRequestHandler)...
As Yahia says, it's possible to get some information. If you're really concerned about this, use an encrypted file system, as Amazon suggests.

Resources