I read about website tracking cookies at "http://www.newfangled.com/unlimited_vs_limited_web_tracking" and am wondering how they are implemented.
On page 2 of the article, the author writes, "third-party trackers using beacon technology can match the data they collect about you in real time with other databases containing geolocation, financial, and medical information in order to expand your profile to predict your age, gender, zip code, income, marital status, parenthood, home ownership, as well as unique interests."
I've thought of a few ways trackers could be implemented and am hoping answers to the following questions will help me get some clarity about how trackers work.
When you visit a website, do all of your cookies become available to the website? E.g., if I visited StackO.com , would the site be able to access my facebook/google/other cookies?
To track your visits from site to site, do various websites share information in a database, i.e. when you visit FB, google, CNN...do they log your activity in a shared database that's accessible by companies in the group?
When you visit a website, do all of your cookies become available to the website?
Yes
To track your visits from site to site, do various websites share information in a database, ie when you visit FB, google, CNN...do they log your activity in a shared database that's accessible by companies in the group?
Yes
In general, yes. If you look at cookies set by CNN.com for example, there's cookies set for scorecardresearch.com domain:
http://webcookies.info/cookies/cnn.com/11993/
Then there's some JavaScript code or 1x1 image that actually creates a request to the scorecardresearch.com servers. This way Scorecard Research can track you as you move from CNN to other websites. And they will definitely aggregate the information from various websites using their technology.
Profiling is just making use of this aggregated behavioral data.
Related
I am setting up a website as a volunteer for a scout charity. As part of the web functionality we will be storing email addresses and names in a database with password hashes and some other info such as creation date and site roles.
Is this something that would be covered by GDPR? I have tried to do some reading online but can't seem to find something definitive that covers this use case.
Yes Norman it is personal data and hence processing it (which includes storing it) comes under GDPR ambit, You may want to mention it in your privacy notice stating the high level purpose it is used for and just to ease the users let the users know, the data is not being passed to other third parties.
IAPP is good authoritative and credible site for all such queries. You may want to read this series.
https://iapp.org/news/a/business-impacts-of-the-general-data-protection-regulation-part-three/
I am trying to stop spam accounts from being created on my website. I run a website that has approximately 50-80k pageviews per month. It's a social media website. Users sign up and communicate with one another for free. We've been battling with spam as of late even though we have implemented multiple security measures to counteract bots. I'd like to get any further suggestions of tips and tricks that I can try and also some help to see if I can identify if these are people coming from clickfarms, etc. (i.e. real people or computers)
Problem:
Signup form being completed and users posting spam in their profile information. Spammer signs up for the website by completing the signup form, activates their account via an email account, Logs into their account, and then completes their profile, putting spam in the description box with a link/url to their website they are advertising (everything from ##$%S enlargment to random blogs, to web developer websites, etc.) If there was one link they were posting we could detect it and ban them but they are not -- They are coming from multiple IP's, posting various links, using multiple email provider addresses for activating the accounts, registering with information from multiple countries, and creating about 10-30 accounts per day. Before implementing many security measures we were getting moreso around 100-200 fake accounts per day, but now we're down to 10-30 ... so we've seen some improvement, but the issue is still annoying me. So I'm half thinking now that the security measures are helping quite a bit, but that this is possibly humans still targeting our website and perhaps getting paid per signup they do or something similar to that. Even if so, is there any way I could confirm they are humans versus bots?
Security measures:
I won't get into all of the details here (for security reasons), but I'll just indicate what we've done to counteract the spambots:
Created honeypots at various areas of our website which automatically ban based on IP
IP banning - based on known botter/spammer ip addresses
Duration detection of signup form pageload to form submission -- if less than 5 seconds to complete our signup form, we're confirming you're a bot and then preventing the signup
Hidden checkbox in signup form -- there is a hidden checkbox in the signup form that is invisible to regular users (if a bot checks it we are automatically detecting and preventing the signup)
Google re-Captcha - We've enabled Google re-Captcha in our signup form as well
Email activation link - We send our users an activation email with a link that they have to click on to signup -- they are not able to sign into our website until they've activated their account.
Future actions include:
Detecting what users are posting in their descriptions in their profiles and banning based on that -- string detection for banned words, etc.
Any other suggestions or tips or tricks? In all honesty, if spam bots are getting through all of those security measures above --
do you think they are just that intelligent?
do you think we're being targeted?
Also, any way I can determine if they are bots or real humans? Suggestions?
This is a perennial problem; over the years I've found that as I add more anti-spam measures, the spammers continually get better at circumventing my measures.
I recommend doing an analysis of your spam to figure out how you can detect it. The spam itself contains the key to how to outsmart it. Look at the patterns, the structure, and decide what information is most useful and how the easiest way is to filter it out. Your spam detection doesn't need to be perfect, but generally, you want to get as much as possible, while getting as few false positives as possible.
Also, to answer your one question, you can make your bot-detection perfect, but there will always be humans submitting spam. And humans are tough to outsmart, and you may always need some manual attention to do it.
You are already implementing a lot of measures. Here are some more I would suggest:
When a signup form is generated, put a hidden field with a unique hash generated from the user's browser info, including the user's IP, HTTP user agent, and the date. Then, when the form is submitted, check the hash. This one method eliminated a surprising amount of spam.
If you want to take the previous method even farther, use a custom, time-sensitive hash in the URL of your contact form, and have the link to this form be dynamically generated. This way, if a spammer stores the form's URL, it won't work, but the link will work for every legitimate user of the site.
Make it so newly created, non-trusted users, cannot display any public profile information, such as URL's or text even. With a site as small as yours you could require manual approval of each user, and if your userbase got bigger, you could use an automated reputation system, a lot like Stack Overflow and the other Stack Exchange sites use. This removes the incentive for spam. Also, I found an overwhelming majority of spammers only ever logged onto the site once. If you wait to do the manual approval of users, until they have logged on twice, or even have returned to the site on another day, using a persistent cookie, you will filter out the vast majority of spammers and you will only have to do a small amount of manual approval work. Then have the system delete the unvalidated/inactive accounts after a certain amount of time.
Check for certain keywords or structure of info. I found an overwhelming majority of my spammers would use certain words or phrases that were never used by my legitimate users. Another one was entering a phone number in their profile, a common pattern in spammers, that no legitimate user ever did. Also look for signs of foul play like XSS attacks. A huge portion of spammers will, at some point, submit something that has a ton of HTML tags in it, you can either use the tags itself to filter them out, or you can do something like stripping the HTML tags and then comparing string length and banning them if it's more than a small amount (i.e. allow someone to do something simple like a few <em></em> or <strong></strong> tags.) Usually, if there are HTML tags in the entry, there's a ton of it. Also look for material with weird encodings or characters that don't make sense. This is often an attempt at sophisticated SQL injection attacks, XSS, or other types of hacking attempts.
Use external IP blacklists. AbuseIPDB is one example; it has an API that you can use to check new IP's before storing them in your temporary database. Their free plan allows checking of up to 1000 IP's a day and you can pay for more than that. It won't catch all the manual spam but I find they catch a ton of the automated spam.
Are they targeting you? Yes. They are targeting everyone. But any site with 50k+ pageviews a month is high enough volume to be an attractive target. The higher traffic you get, the more attractive of a target you will be. Even some of my tiny sites have been targeted with suprisingly sophisticated attacks these days. Everyone needs to be on guard.
Good luck. I wish this weren't so much of a problem, but it is.
I have a web proxy site to help chinese, they can view a webpage blocked by government by a modified url, like http://www.google.com.mywebproxy.domain/.
But some website has been report as a fishing site on google and others. I need know the rules how search engine detect a fishing site, and I can block some page that I should never proxy.
for example, how can I detect a website has a form input credit card information?
how can I detect a website has a form input credit card information?
I know it somewhat simplistic, but what about a simple rule that looks for Credit Card related terms on page/in form?
Think about it, for phishing attack to work, the attacker will need to convince the visitor - in one way or another - to provide his/hers CC info. So you can make a list of payment related terms like "credit card, payment, CC, billing" and so on and use them to determent page intent.
Having said that:
a: images/flash will provide a loophole
b: you`ll need to cover different translations of all terms
c: as in your case, some "legit" sites will be blocked
This of course dose not describe the working of Google (or other) filtering algorithms which use a more complex set of rules based on multiple verification vectors and existing data pools for cross-reference.
An exact mix of those is a closely guarded secret and I agree with Rob, contacting someone for a manual check-up is probably the best solution.
I am asking a question that's somewhat related to these:
Secure way of serving videos
secure streaming of videos
However, no one provided an answer that seems relevant to my situation.
My situation is as follows:
I'm building a very simple Learning Management System. Students have access to Video lessons if they have paid for it. I would like to prevent:
bots/spiders from finding these videos and downloading it
for people to simply view source, copy the url of the video, and share it with other people
I doubt very much people will try to hack the site to steal the videos.
What is the best way to secure these videos from being shared? Do i have to store the videos on my webserver? Can i leverage video platforms like youtube or vimeo?
Long story short, there is no simple solution.
I will say straight up that if there was a way to stop people from downloading videos, every video website would be doing it.
I have thought of a few ways, listed out below, of what you could do to make it not worthwhile for the student/viewer to download the videos.
obscure the URL
change the URL frequently
restrict the number of downloads per IP address/subnet
make them view it in a custom-built "custom-served" video player
use a video streaming service already available
Each are discussed in greater detail below.
Obscuring the URL
You could obscure the URLs like so:
http://mylearningmanagementsystem.com.au/e12d8cd38f00f204e9801998ecc8427e/video.flv
You could calculate a hash of the name of the file itself (or salt and hash, the above is just an example) and use that in a URL.
This could be achieved in such a way that they would be obscure enough, but still bookmarkable and user-friendly for the viewers.
If you wanted to go one step further, you could have video broken up into parts - this is discussed in the custom built section.
Change the URL frequently
With some code, you could set the videos to change URLs every Sunday night at 11.59pm for your timezone. However, any page that you link to would have to be either automatically or manually updated, and that is a hassle in itself (how do you test the code/what if it falls over and you don't realise/things like that).
Even if you get all of that working, any user that bookmarked the page would suffer from link rot.
Restricting the number of downloads per IP address/subnet
With some funky server-side code, you could limit the number of times a video can be downloaded to an IP address (or depending on the user case, a subnet of the IP).
This is not my strong point, but you could look at articles on Dynamic IP Restrictions. The below is an excerpt from the website
Dynamically blocking of requests from IP address based on either of the following criteria:
The number of concurrent requests.
The number of requests over a period of time.
There is also the possibility of doing the same with Drupal.
Make them view it in a custom-built "custom-served" video player
You can go the extra mile and make your own video-management system (which it seems like you are), and serve the videos from your own server (which is what I meant by custom-served) but some programs that have attempted this were flawed like Sony's CD management software or were punishing honest users, like Apple iTunes' FairPlay DRM software.
If you do end up going the route of giving users a program/web service to watch videos and restrict them to an password/encryption key, you could annoy the customers who paid for your content in good faith. This is essentially what all copyright protection systems tried and utterly failed with, because either the program wasn't secured well enough or people simply stopped using it because it was awkward to work with.
When you serve the videos to the users, you could break them up and separate them by chapters, as in the first chapter is one video, the second is another, and so on (like below):
http://mylearningmanagementsystem.com.au/video_title/chapter_01/video.flv
http://mylearningmanagementsystem.com.au/video_title/chapter_02/video.flv
http://mylearningmanagementsystem.com.au/video_title/chapter_03/video.flv
... and you could combine that with the hashing idea in the first section (Obscuring the URL):
http://mylearningmanagementsystem.com.au/e12d8cd38f00f204/8fd3611c40e74c3d/video.flv
http://mylearningmanagementsystem.com.au/e12d8cd38f00f204/92d7f54d09c80436/video.flv
http://mylearningmanagementsystem.com.au/e12d8cd38f00f204/27bd98792bea3103/video.flv
This could have its downsides though:
low internet users who pause the video at the start to let it load, will experience issues (less common a problem now, as the internet is now much faster and easier to access)
if one video is missing, the whole video will be unplayable
how will you manage each link? Will each video name have the same hash or a different hash?
will you have to manually break up each video?
The key point here is that this does make a lot of unnecessary work for you. The next option would be to use a video streaming service that is already available.
Use a video streaming service already available
There are plenty of options out there to host and share your video. YouTube and Vimeo are two of these options. I will explain why I prefer the latter.
Password protection
If you wanted to share the videos only with a specific number of paying people, you can protect your videos with a password on Vimeo. AFAIK, YouTube does not offer this service - it only allows you to select members to view the video.
Not only that, but you can add a bunch of videos to an album (in Vimeo), and password-protect the album, so you only have to change the password for the album.
Keep in mind that you may run into increased support messages like "But is this the current password or the one for last week?"
Set embed settings
You can make the video unable to embed on any page, so that users would have to go to Vimeo directly, type in the password (if you set one above), and view it inside their web browser. AFAIK, you can embed any video from YouTube that you can view.
You will have to keep in mind that a quick Google search revealed that there are heaps of online sites that allow you to download videos from these video-hosting websites. There are even browser addons for Firefox and Chrome.
If your business depended on your videos for monetising purposes and you wanted to go one step further, there are paid streaming services that specialise on content distribution with proper access right management and content protection. One of these services is Brightcove. Excerpts from Brightcove follow:
Brightcove Video Cloud securely delivers the highest quality on-demand and live video experiences to reach your audience—no matter where they are. We simplify delivery to an increasingly complex ecosystem of devices and standards across the web, mobile and connected TVs
... and ...
Protect your valuable content
Ensure your video is safe. Use RTMPe stream encryption and SWF verification to prevent video stream ripping and content theft and ensure that your video stream plays back only in your authorized players.
Fine-grained Access Control
Pinpoint exactly when and where your content is displayed to comply with content licensing restrictions, global launch roll-out schedules or secure behind-the-firewall delivery. The user-friendly graphical interface allows you to restrict access by date, domain, geography, player or IP address. For even greater control restrict access to sensitive materials by IP address range and ensure content is accessible only from within approved networks.
At the end of the day...
If you can view it, you can download it, no matter how much you obscure it.
If there was a way to stop people from downloading videos, every video website would be doing it.
If you had unlimited resources, you could combine all of the techniques listed above to make it not worth anyone's time. But, after all the effort you put in, a viewer could always set up one of many screen capture programs to record all the videos onto their hard drive.
It's up to you, and how vigilant you want to be with your videos. Remember that the effort and time you spend making it harder to rip a video, is proportional to making it harder for regular paying customers to get and use the content as well.
More information:
How can I make a video not downloadable?
Vimeo privacy settings
Video streaming service | Online Streaming Video | Brightcove
Maybe it's a bit too late, but I'm putting this here so that it would help others.
As others have stated, there's no way to secure contents once they reach someone's computer. But we can prevent uncontrolled sharing of the content by putting some barriers in place.
One such method that I've noticed many websites including linkedin, pluralsight, and many others use is a resource url with authorization information secured with hash. Such tokens include enough information for identifying the content to be served and a time-frame between which the url is valid.
Suppose the video you want to secure is :
example.com/videos/1234.mp4
Here's an example of how you'd generate a token on first request of the resource (after you've authenticated the user and done other verfications) :
validFrom = unixTimestamp
validTo = unixTimestamp
video = 1234.mp4
privateKey = yourSecretKey
token = HASH(validFrom.validTo.videoUrl.privateKey)
Now, create a url with all the above information excluding the private key. Your final url would be something like this :
example.com/video?validfrom=1566831998&validto=1566839198&path=1234.mp4k&hash=HhgcWmRViYeQLn4AZoQvkVXotPU
Now, whenever a request is made for a video at the path /video, you'd take all the parameters from the url (excluding the hash), and create a hash as you did earlier from the parameters and your private key in the same order. The url can be said to be valid and untempered if the hash that you just generated matches with the one that was included in the URL. This same technique is used in JWT authentication and is really efficient. As you don't have to store or retrieve information to and from any database. This makes it very quick and easy to implement.
Once you've validated the token, you can return the FileStream to the media that was requested in the url.
If it is a small and not too dynamic group then youtube or vimeo might be a possible option. But it is not scalable.
If you have a dynamic audience where members may join and leave at different times then you need to have the videos encrypted on your own server.
The biggest challenge now would be the key distribution. You need to have the key scheme such that each user has a unique key but the key used to encrypt the video is the same.
Here is one possible method: https://sparrow.ece.cmu.edu/group/pub/old-pubs/elk.pdf
other algorithms you might want to look at are : MARKS, LKH, etc.
I've noticed that some email services (like gmail or my school's webmail) will redirect links (or used to) in the email body. So when I put "www.google.com" in the body of my email, and I check that email in gmail or something, the link says something like "gmail.com/redirect?www.google.com".
This was very confusing for me and the people I emailed (like my parents, who are not familiar with computers). I always clicked on the link anyway, but why is this service used? (I'm also worried that maybe my information was being sent somewhere... Do I have anything to worry about? Is something being stored before the redirect?)
Sorry if this is unwarranted paranoia. I am just curious about why some things work the way they do.
Wikipedia has a good article on URL redirection. From the article:
Logging outgoing links
The access logs
of most web servers keep detailed
information about where visitors came
from and how they browsed the hosted
site. They do not, however, log which
links visitors left by. This is
because the visitor's browser has no
need to communicate with the original
server when the visitor clicks on an
outgoing link. This information can be
captured in several ways. One way
involves URL redirection. Instead of
sending the visitor straight to the
other site, links on the site can
direct to a URL on the original
website's domain that automatically
redirects to the real target. This
technique bears the downside of the
delay caused by the additional request
to the original website's server. As
this added request will leave a trace
in the server log, revealing exactly
which link was followed, it can also
be a privacy issue.1 The same
technique is also used by some
corporate websites to implement a
statement that the subsequent content
is at another site, and therefore not
necessarily affiliated with the
corporation. In such scenarios,
displaying the warning causes an
additional delay.
So, yes, Google (and Facebook and Twitter do this to) are logging where your services are taking you. This is important for a variety of reasons - it lets them know how their service is being used, shows trends in data, allows links to be monetized, etc.
As far as your concerns, my personal opinion is that, if you're on the internet, you're being tracked. All the time. If this is concerning to you, I would recommend communicating differently. However, for the most part, I think it's not worth worrying about.
This redirection is a dereferrer to avoid disclosure of the URL in the HTTP Referer field to third party sites as that URL can contain sensitive data like a session ID.