How does the "mark as read" system on webforums work? - forum

I've wondered about this for some time now. I'm wondering webforums implement the option to highlight something you haven't read. How the forum knows.
Since most webforums have a function to show you all posts since your last visit, they must save the last time you visited one of their pages in your userdata in the database.
But that doesn't explain how individual topics are still highlighted after you've read just one.

A many to many table connecting a user to a topic/post with flags for read/favorite etc.

Many web forums store a huge list of the last time you looked at each topic you've looked at.
This gets out of hand quickly, but there are mitigations. See Determining unread items in a forum

Keeping track of what posts a visitor has read is of course not that much of a big deal. Since it's highly likely that the number of posts a visitor read will be much less than the posts not read. So, if you know what posts a visitor has read, you also know what posts this visitor didn't read. To make this less computational intensive you'd normally do this only over a certain period of time, say the last two weeks. Everything before that time will be considered read.

Usually, this list of "unread" items only shows changes that have been made since the last time you logged out.
Use the user's last activity date/time to mark items as "unread" (any activity in a topic after that time is marked "unread"). Then store in a Session variable, a list of topic IDs that the user viewed since last login. Combining these two would give you a relatively accurate list of unread topics.
Of course this data would then be lost on log-out or session expire and the cycle would start again without sacrificing an unnecessary amount of SQL queries.

On the custom forum I used to work with, we used a combination of your last visit time (updated every time you viewed another page - usually cookied), and a "mark read" button on each topic that added a date/time value to a SQL table containing your UserID, the TopicID and the Date/Time.
Thus to view new topics we would look at your last visit date and anything created after that point in time was a new topic.
Once you entered a topic any topic you had clicked "mark read" on would only show the initial topic and then any replies with a date/time added after you clicked the mark read button. If you have fewer viewers and performance to spare you could basically set it up to add an entry to the table for every topic the user clicks on, when they click on it.

Another option you have, and I have actually seen this done before in a vBulletin installation, is to store a comma separated list of viewed topic ids client-side in a cookie.
Server-side, the only thing stored was the time of the user's previous visit. The forum system used this in conjunction with the information in the user's cookie to show 'as read' for any topic where either
Last modified date (ie last post) older than the user's previous visit
Topic ID found in the user's cookie as a topic the user has visited this session.
I'm not saying it's a good idea, but I thought I'd mention it as an alternative - the obvious way to do it has already been stated in other answers, ie store it server-side as a relation table (many to many table).
I guess it does have the advantage of putting less burden on the server of keeping that information.
The downsides are that it ties it to the session, so once a new session is started everything that occurred before the last session is considered 'already read'. Another downside is that a cookie can only hold so much information, and a user may view hundreds of topics in a session, so it approaches the storage limit of the cookie.

One more approach:
Make sure your stylesheet shows a clear difference between visited and non-visited links, taking advantage of the fact that browsers remember visited pages persistently.
For this to work, however, you'd need to have consistent URLs for topics, and most forum systems don't tend to do this. Another downside to this is that users may clear their history, or use more than one browser. This therefore puts this measure into the 'not highly reliable category'; you would probably just do this to augment whatever other measure you are using to track viewed topics.

Related

Need ideas for rewarding the users of a wiki

I need ideas as to how best reward users of a wiki to make them motivated to keep contributing in a constructive manner. Articles can be upvoted, so the thought is to reward the contributors based on how much they have contributed to a specific article as well as how many upvotes it has gotten. The idea so far is to use a system of rewarding points to those who wrote the article ,and points from the amount of bytes the user have generated by editing articles.
The immediate problems i see is how to correctly assess when to give points in situations such as when a user edits parts of an article that has already been edited before. When a user edits for the sake of correcting misspellings(an example is a user who edits a single word), whether this should give points as i don’t see how the backend would distinct between a user correcting a mistake in spelling or farming points by making small changes here and there.
There is also the issue with how to manage the byte contribution point system with regards to how to handle a situation when a user’s contribution have been overridden by an edit, if they should get to keep their points from contributing bytes now that their original piece of text is gone.
The intention is to make the user feel rewarded for their work without making the reward system too competitive(making them focus more on generating points rather than producing content of value).
Give the major concern appears to be avoiding low value edits you could cap the amount per day and edits per article. For example instead of a user being able to apply multiple edits to a page one word at a time you make it so they are only rewarded for editing a page once a day. Additional edits would give them no additional points however would still be accepted. It doesn't have to be a page either you could use paragraph or whatever level of granularity works for the content. The most important thing is to track all of this over time and do spot checks on whether the top users are indeed the ones that have contributed value according to whatever metric you decide is important.
User always try and game any points system so whatever you choose I would make sure to track enough information so that you can change your algorithm in the future and understand how it will work.

Instagram synced many images from a tag with Real time, what to do with deleted images

Using the Instagram API, I subscribed to a tag with the Real time feature. I sync media that match my project's criteria, then save those to DB. When users visit my website, I display these images from my DB (and not from instagram API).
From time to time, I see broken links showing up in the images. I identified that the source of the problem is that those images have now been deleted.
What's a good way to handle this?
Probably not attempting to duplicate the Instagram DB (or part thereof) would be the best option. Depending on the usage of your project and what sort of tags you're subscribing to, that could get pretty large pretty quickly.
Short of that, doing a quick HTTPRequest to the image URL (and checking the response code) before deciding whether to display it would do the job.
#Steve Crawford is on the right track.
The problem with your solution is that you are duplicating volatile data that you:
a) can't control
b) don't receive notifications on.
I would think the better method would be track the meta data of the images you are interested (like the author, url,date,etc) and then display them if they are still available.
If you are going to cache data you also need to a way to invalidate your cache. So another option would be to duplicate the data as you already are, but also have a background job to ensure that the data is still valid and remove the ones that aren't.

Preventing bots from doing form submissions

At my site, I present a form for visitor input. No login is required. I cannot require a login. So anyone browsing the site can submit the form. It also opens up the form to bots. I need to prevent the bots. I had asked the question on the following thread.
Unwanted garbage input from bots?
I did get some useful response. I read a few solutions to the this (captcha and non-captcha).
Mine is not a site where a I get significant traffic. My users are not terribly computer savvy. So I was thinking of doing something like this. I am not a very accomplished programmer and what I am saying here may be very stupid. But I am simply trying to learn, so please bear with me.
Every time I present the form, I generate a unique key (unix time + remote host IP). I store the key in a db table and I send out the form with the key being a hidden field on the form. When a form is submitted, I check to see if the value for the key is in the db table. If it is, I remove the key from the db table and I process the form. If the key is not in the db table, I discard the form and ask the user to do the operation again.
With every submission I also remove stale entries(where the users did not submit the form within a stipulated time). I will need to have some mechanism where I prevent the request for the form, from bots. Say for example, if I have n number of pending requests from a particular host, I ask people to request for the form after a few moments.
Will something like this work?
the bots will be able to request the hidden field and submit it anyway. try a non-re-captcha library so that your users don't get overwhelmed (recaptcha is overwhelming due to its extra goal of hijacking your users to do OCR of pretty illegible text).
however, since you ask for a non-captcha solution, i would propose that you measure the time between form request and form submission (with the hidden key). a bot would submit the form within a couple of seconds of request, but a human would not.
if you find that this simple approach does not work for your site then you can try something more complex.
You could also hide the form and then a user would have to click on a button to reveal it. Much like how twitter does it when you log in.
I wouldn't worry too much about bots submitting your form. It's not gonna happen. If you're terribly fearful then instead of a captcha ask a stupid question like "what is 1+1?" before a submission.
It all depends on how desperately the spammers want to submit junk to your form. Your method will work for the most stupid of bots, but as agks mehx pointed out it's trivial for a bot to load up the form and extract the field if someone bothers to take a minute or so to tweak their bot.
At the other end of the spectrum, there's little you can do to automatically stop the "pay people in certain countries the equivalent of 10¢/hr to spam every board they can find" tactic without locking things down to an extent that also prevents the general public from posting useful comments.
What about hashing form field names so the name is different each time? hash(Original field name + time stamp + secret salt) and the just pass the time stamp with the form, it will take ages for the bot to figure it out, especially if the salt is different per user and changes every couple of hours/days. Just an idea I had. Was wondering if you think it would stop bots?

How can I prevent bulk vulnerability scanning without using a CAPTCHA component?

How can I prevent that forms can be scanned with a sort of massive vulnerability scanners like XSSME, SQLinjectMe (those two are free Firefox add-ons), Accunetix Web Scanner and others?
These "web vulnerability scanners" work catching a copy of a form with all its fields and sending thousands of tests in minutes, introducing all kind of malicious strings in the fields.
Even if you sanitize very well your input, there is a speed response delay in the server, and sometimes if the form sends e-mail, you vill receive thousands of emails in the receiver mailbox. I know that one way to reduce this problem is the use of a CAPTCHA component, but sometimes this kind of component is too much for some types of forms and delays the user response (as an example a login/password form).
Any suggestion?
Thanks in advance and sorry for my English!
Hmm, if this is a major problem you could add a server-side submission-rate limiter. When someone submits a form, store some information in a database about their IP address and what time they submitted the form. Then whenever someone submits the form, check the database to see if it's been "long enough" since the last time that IP address submitted the form. Even a fairly short wait like 10 seconds would seriously slow down this sort of automated probing. This database could be automatically cleared out every day/hour/whatever, you don't need to keep the data around for long.
Of course someone with access to a botnet could avoid this limiter, but if your site is under attack by a large botnet you probably have larger problems than this.
On top the rate-limiting solutions that others have offered, you may also want to implement some logging or auditing on sensitive pages and forms to make sure that your rate limiting actually works. It could be something simple like just logging request counts per IP. Then you can send yourself an hourly or daily digest to keep an eye on things without having to repeatedly check your site.
Theres only so much you can do... "Where theres a will theres a way", anything that you want the user to do can be automated and abused. You need to find a median when developing, and toss in a few things that may make it harder for abuse.
One thing you can do is sign the form with a hash, for example if the form is there for sending a message to another user you can do this:
hash = md5(userid + action + salt)
then when you actually process the response you would do
if (hash == md5(userid + action + salt))
This prevents the abuser from injecting 1000's of user id's and easily spamming your system. Its just another loop for the attacker to jump through.
Id love to hear other peoples techniques. CAPTCHA's should be used on entry points like registration. And the method above should be used on actions to specific things (messaging, voting, ...).
also you could create a flagging system, and anything the user does X times in X amount of time that may look fishy would flag the user, and make them do a CAPTCHA (once they enter it they are no longer flagged).
This question is not exactly like the other questions about captchas but I think reading them if you haven't already would be worthwhile. "Honey Pot Captcha" sounds like it might work for you.
Practical non-image based CAPTCHA approaches?
What can be done to prevent spam in forum-like apps?
Reviewing all the answers I had made one solution customized for my case with a little bit of each one:
I checked again the behavior of the known vulnerability scanners. They load the page one time and with the information gathered they start to submit it changing the content of the fields with malicious scripts in order to verify certain types of vulnerabilities.
But: What if we sign the form? How? Creating a hidden field with a random content stored in the Session object. If the value is submitted more than n times we just create it again. We only have to check if it matches, and if it don't just take the actions we want.
But we can do it even better: Why instead to change the value of the field, we change the name of the field randomly? Yes changing the name of the field randomly and storing it in the session object is maybe a more tricky solution, because the form is always different, and the vulnerability scanners just load it once. If we don’t get input for a field with the stored name, simply we don't process the form.
I think this can save a lot of CPU cycles. I was doing some test with the vulnerability scanners mentioned in the question and it works perfectly!
Well, thanks a lot to all of you, as a said before this solution was made with a little bit of each answer.

How to encourage a user to fill in long application forms?

What I can think of is pre-populating certain form input elements based on the user's geographical information.
What are other ways can you think of to speed up user input on long application forms?
Or at least keep them focus on completing the application form?
If you have a long form, try to prune it down. Don't ask them to fill in fields that you don't really need.
If the form spans several pages, give the user some feedback as to how many more pages there are. We users hate clicking on the continue button wondering if this will be the last page.
Never lose a field that they filled in, no matter what they do. This could have security implications if passwords are involved.
Use dropdowns to provide the user with options unless there are a lot of options that the user would have to scroll through or if the terms in the dropdown aren't widely accepted (e.g. dropdown filled with Systems Engineer, Solution Developer, IT Application... I just want Programmer.).
Provide help for fields that might be hard to fill in (or provide examples).
If it is possible in your case, just collect the bare minimum up front and then allow the user to use the basic features of your service.
For the user to upgrade to a better level of service, they will need to fill in the 2nd form with more detail.
How important it is to you to collect ALL that information up front ? It is worth losing customers by demanding too much from them ? Why not demand it later at a time more convenient to the user.
Creating a multi-step wizard offering only a small number of input fields per step. Ensure that they are aware of how far they have progressed in the sequence.
The psychology is that once a user is 'invested' in a task, they are more likely to continue. If you present the whole list of input fields at once, you scare them off.
Offering musings at each step (cartoon, humor, sayings etc) makes them move to the next step out of curiosity.
Users won't mind filling in long forms if and only if they feel that the questions that you ask are important: otherwise they will be discouraged, and become impatient with it.
Remember, in a web application people have very, very short attention spans. When the user starts feeling that you are asking too much, they're usually right.
Keep required information as few as possible: other info should only be optional, and you have to give something in return to the user to compel them to complete that information.
However you implement it, please please please use some kind of Ajax hearbeat to store their progress server side and repopulate it if it's lost. There is nothing more infuriating to a user that working through a long form and having a browser or network hiccup lose their entire submission.
Whenever it happens to me I generally never give it a second shot, because at that point recreating my submission isn't worth whatever I was signing up for.
Checklist:
Explain clearly the purpose of the form. (What's in it for them?)
Prune, prune, prune, and keep questions clearly relevant!
Give the user feedback on his/her progress (if the form is split over multiple pages)
Ask for as little as you can up-front and leave the rest for later.
Clearly mark required fields
Group fields logically.
Keep labels/headings brief and easy to understand.
Prefill as much as possible - but not too much.
Spread super long forms over multiple pages and allow backtracking.
Cleverly placed "Back", "Save" and "Cancel" buttons put people's minds at ease - even when redundant.
Provide friendly (but clear!) validation error messages, in a timely manner.
Allow the user to reclaim half-filled in forms - don't lose their data!
No matter what you do, do not include a reset button. :-)
Finally:
Explicitly tell the user when the process is finished. ("Thank you! Your application has been sent.")
Tell the user what will happen next. ("A confirmation e-mail has been sent to your e-mail address, and we'll process your application within two working days.")
use Ajax to populate and update the controls asynchronously.It will speedup the filling of long application forms.
Split it up into multiple pages - there's nothing quite so discouraging as seeing that you have another 100 questions to go.
Put validation on each input and check it onblur(). If they get to the end of the page and then it says "question #2 was incorrect", chances are they've forgotten what that one was anyway and it'll be more difficult to return to it. Plus, if they answer a series of similar inputs in a particular, incorrect way, you should let them know straight away (eg: entering dates as mm/dd/yyyy when you want dd/mm/yyy)
Split the form into several steps. It's like how someone is much more likely to read five 3-sentence paragraphs than one big 15-sentence paragraph of the same length.
I agree with tim; just let them fill in the bare minimum information and then leave the rest to profile updates. If any data is necessary for the service offered on your site, ask for it when they try to avail of the service (and no earlier).
That said, I wouldn't advocate the kind of forcing function that adam suggests. It pays to give your users the warm, fuzzy feeling that they are privileged and can use ALL of the services on your site. Although, if you look at it hard enough, adam's and my suggestions are pretty much the same.
If the application needs to include a lot of information, then make sure the user can save at any point, and log off, and log in later to complete the form. This would make more sense if some of the information is not necessarily easily available. Tax returns are an obvious example, where some of the data may need to be calculated, or the user must find the relevant documentation.
In some cases the user might use the same information in multiple applications. In that case it might make sense for the user to register their details (Name, Address, Telephone numbers, etc), which are automatically filled in on each application. For example, if you had a website for a recruitment agency, they may allow users to register their details, and then to apply for a particular job, they can just include a personal statement that applies to that job in particular.
As another consideration, if some information may be incorrect (particular if this is not always clear, such as a CAPTCHA, or a user name that must be unique), either separate it from the rest of the data, or otherwise make it so a mistake doesn't mean the rest of the information must be reentered.
These are basically ways of avoiding the user having to enter the same information twice.

Resources