I have the snipped in the index.html but do I need to include it in all other pages? I've tried to work around it but nothing seems to work other than including it in the header of each html file.
In order to use this, the Segment analytics.js snippet must be included on every page (not necessarily the header, though that would be the easiest). The js snippet uses cookies to manage the session and state of the user, so if you want a unified picture of the user's actions on your page, it must be loaded on every page.
More details here:
https://segment.com/docs/libraries/analytics.js/quickstart/
Related
In a system I have to maintain (didn't build it, just inherited it) we have a Foursquare implementation that hasn't been used in quite a while. Trying to revive it failed, because our page is now loaded via HTTPS, which it didn't used to be.
We are using the "Save to Foursquare" button as well as the API request to retrieve the number of Check-ins. I already switched all the JS includes and intent links from http to https and at least now it shows the number and the button correctly.
However, I can't click the button and checking the browser's console I found that it added a script tag to the head of this page which tries to access http://platform.foursquare.com/js/modules/widgets.asyncbundle.js. The browser obviously blocks this, because it's not using HTTPS.
The file we are explicitly loading is https://platform.foursquare.com/js/widgets.js. It seems to me like this script is not reacting correctly to HTTP vs. HTTPS. There is probably a very simple solution to this, so what am I missing?
I don't know if you've tried it yet but the foursquare website says this on the matter:
Change the source of the JavaScript file to https://platform-s.foursquare.com/js/widgets.js
Add {"secure":true} to the global configuration block (window.___fourSq)`
The same link (see below) has all the different ways to call the Save To Foursquare function using its .saveTo() function.
https://developer.foursquare.com/overview/widgets
I hope this information and links helps! Cheers.
I'm wanting to know if it's possible to detect which website a user has come from and serve to them different content based on which website they have just come from.
So if they've come from any other website on the internet and landed on my page, they will see my normal html and css page, but if they come from a specific website (this specific website would have also been developed by me so I have control over the code server-side and client-side) then I want them to see something slightly different.
It's a very small difference that I want them to see, and that's why I don't want to consider taking them to a different version of the website or a different page.
I'm also not sure if this solution will be placed on the page they coming from or the page that they arriving on?
Hope that's clear. Thanks!
I would add a URL parameter like http://example.com?source=othersite. This way you can easily adjust the parameter and can use javascript to detect this and slightly alter your landing page.
Otherwise, you can use the HTTP referrer sent via the browser to detect where they came from, but you would need to tell us your back end technology to get an example of that, as it differs a bit.
In javascript, you can do something as easy as
if(window.location.href.indexOf('source=othersite') > 0)
{
// alter DOM here
}
Or you can use a URL Parameter parser as suggested here: How to get the value from the GET parameters?
What you want is the Referer: HTTP header. It will give the URL of the page which the user came from. Bear in mind that the Referer can easily be spoofed, so don't take it as a guarantee if security is an issue.
Browsers may disable the referer, though. Why not just use a URL parameter?
I'm trying to write a userscript/Chrome extension to capture JSON data being sent while using a web service so that I can reformat it and display selected portion on page. Currently the JSON is sent as the application loads (as I've observed from watching traffic with Fiddler 2). Is my only option to request the JSON again or is capture possible? As I'm not providing a code example, a requested answer is even some guidance on what method / topic to research or if I'm barking up the wrong tree.
No easy way.
If it is for a specific site you might look into intercepting and overwriting part of a code which sends a request. For example if it is sent on a button click you can replace existing click handler with your own implementation.
You can also try to make a proxy for XMLHttpRequest. Not sure if this even possible, never seen a working example. You can look at some attempts here.
For all these tasks you probably would need to run your javascript code out of sandboxed content script to be able to access parent page variables, so you would need to inject <script> tag with your code right into the page from a content script:
how to check if a certain page is being accessed from a crawler or a script that fires contineous requests?
I need to make sure that the site is only being accessed from a web browser.
Thanks.
This question is a great place to start:
Detecting 'stealth' web-crawlers
Original post:
This would take a bit to engineer a solution.
I can think of three things to look for right off the bat:
One, the user agent. If the spider is google or bing or anything else it will identify it's self.
Two, if the spider is malicious, it will most likely emulate the headers of a normal browser. Finger print it, if it's IE. Use JavaScript to check for an active X object.
Three, take note of what it's accessing and how regularly. If the content takes the average human X amount of seconds to view, then you can use that as a place to start when trying to determine if it's humanly possible to consume the data that fast. This is tricky, you'll most likely have to rely on cookies. An IP can be shared by multiple users.
You can use the robots.txt file to block access to crawlers, or you can use javascript to detect the browser agent, and switch based on that. If I understood the first option is more appropriate, so:
User-agent: *
Disallow: /
Save that as robots.txt at the site root, and no automated system should check your site.
I had a similar issue in my web application because I created some bulky data in the database for each user that browsed into the site and the crawlers were provoking loads of useless data being created. However I didn't want to deny access to crawlers because I wanted my site indexed and found; I just wanted to avoid creating useless data and reduce the time taken to crawl.
I solved the problem the following ways:
First, I used the HttpBrowserCapabilities.Crawler property from the .NET Framework (since 2.0) which indicates whether the browser is a search engine Web crawler. You can access to it from anywhere in the code:
ASP.NET C# code behind:
bool isCrawler = HttpContext.Current.Request.Browser.Crawler;
ASP.NET HTML:
Is crawler? = <%=HttpContext.Current.Request.Browser.Crawler %>
ASP.NET Javascript:
<script type="text/javascript">
var isCrawler = <%=HttpContext.Current.Request.Browser.Crawler.ToString().ToLower() %>
</script>
The problem of this approach is that it is not 100% reliable against unidentified or masked crawlers but maybe it is useful in your case.
After that, I had to find a way to distinguish between automated robots (crawlers, screen scrapers, etc.) and humans and I realised that the solution required some kind of interactivity such as clicking on a button. Well, some of the crawlers do process javascript and it is very obvious they would use the onclick event of a button element but not if it is a non interactive element such as a div. The following is the HTML / Javascript code I used in my web application www.so-much-to-do.com to implement this feature:
<div
class="all rndCorner"
style="cursor:pointer;border:3;border-style:groove;text-align:center;font-size:medium;font-weight:bold"
onclick="$TodoApp.$AddSampleTree()">
Please click here to create your own set of sample tasks to do
</div>
This approach has been working impeccably until now, although crawlers could be changed to be even more clever, maybe after reading this article :D
The new Facebook Javascript SDK can let any website login as a Facebook user and fetch data of a user...
So it will be, www.example.com including some Javascript from Facebook, but as I recall, that script is considered to be of the origin of www.example.com and cannot fetch data from facebook.com, because it is a violation of the "same origin policy". Isn't that correct? If so, how does the script fetch data?
From here: https://developer.mozilla.org/en/Same_origin_policy_for_JavaScript
The same origin policy prevents a
document or script loaded from one
origin from getting or setting
properties of a document from another
origin. This policy dates all the way
back to Netscape Navigator 2.0.
and explained slightly differently here: http://docs.sun.com/source/816-6409-10/sec.htm
The same origin policy works as
follows: when loading a document from
one origin, a script loaded from a
different origin cannot get or set
specific properties of specific
browser and HTML objects in a window
or frame (see Table 14.2).
The Facebook script is not attempting to interact with script from your domain or reading DOM objects. It's just going to do its own post to Facebook. It gets yous site name, not by interacting with your page, or script from your site, but because the script itself that is generated when you fill out the form to get the "like" button. I registered a site named "http://www.bogussite.com" and got the code to put on my website. The first think in this code was
iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.bogussite.com&
so the script is clearly getting your site info by hard-coded URL parameters in the link to the iFrame.
Facebook's website is by far not alone in having you use scripts hosted on their servers. There are plenty of other scripts that work this way.. All of the Google APIs, for example, including Google Gears, Google Analytics, etc require you to use a script hosted on their server. Just last week, while I was trying to figure out how to do geolocation for our store finder for a mobile-friendly web app, I found a whole slew of geolocation services that had you use scripts hosted on their servers, rather than copying the script to your server.
I think, but am not sure, that they use the iframe method. At least the cross domain receiver and xfbml stuff for canvas apps uses that. Basically the javascript on your page creates an iframe within the facebook.com domain. That iframe then has permission to do whatever it needs with facebook. Communication back with the parent can be done with one of several methods, for example the url hash. But I'm not sure which if any method they use for that part.
If I recall, they use script tag insertion. So when a JS SDK call needs to call out to Facebook, it inserts a <script src="http://graph.facebook.com/whatever?params...&callback=some_function script tag into the current document. Then Facebook returns the data in JSON format as some_function({...}) where the actual data is inside the ... . This results in the function some_function being called in the origin of example.com using data from graph.facebook.com.