Yahoo! BOSS: Include duplicate results - search

I can easily get the results I want from Yahoo! BOSS. However, for the particular data I'm trying to get, it's important that "duplicate" results be included. I know Yahoo! has them, since when I search for the query manually, it offers me a link to see these similar results.
Is there any way to request these deeper results with the Yahoo! BOSS API?

After some research, it would seem that the answer is, at present, no.
I will continue to pursue the subject here.

Related

How to search content in getstream

How to filter activities in e.g. If I want to filter activity based on some content, it should list all the activities which had that content. Is there any such functionality provided by getstream?
Could you please suggest ?
I'm looking for the same thing. If we assume GetStream is like Facebook, Facebook doesn't really let you search their feed, and in general I think it's a tricky thing to do. What Facebook DOES do is have that SLOW search up at the top (you can tell it takes a long time and half the time doesn't even really have good results). If we accepted that our search would be slow we can paginate through a list of all the objects in our feed from GetStream then search through them on our site. Not a good solution but haven't really seen any other ones.

Related posts functionality

I am trying to build a related posts functionality like what we see here in stackoverflow on the bottom right corner. My main difficulty is that the related posts have to be determined at runtime when loading a particular post.
I am thinking to look for posts with similar or equal set of tags, also similar titles and possibly similar sets of keywords in the main post content, the problem is that the more signals you look for the slower the database will be to return data.
I have also thought about using Google Site Search (we have an account) as Google does understand the relationship between posts very well, but unfortunately the related: parameter is broken according to Google.
I am looking for ideas on how to best achieve this. Has any of you ever done something like this? What's the best way to achieve this type of thing?

WikipediaAPI: How do i find informations about a city

I want to send a query to Wikipedia. The Reponse contains some informations about a City, which I asked for.
Some examples:
I would get some informations about Munich:
http://de.wikipedia.org/w/api.php?action=query&prop=revisions&titles=M%C3%BCnchen&rvprop=content&format=xml
These query send me the desired response.
But there are some other cases in which Wikipedia doesn't know what i mean (if i search "Neustadt" on de.wikipedia.org i become a List of different "Neustadts", because there are Many of them.
But how can I catch the desired articel?...In My Database all the Citys have coordinates,zip-codes and phonecodes. But this I can't search in Wikipeida,or?
//EDIT: I search the URL from the article
The problem with the wikipedia data is its largely unstructured, you might have better luck looking at something like dbpedia which is an effort to pull structured information from wikipedia and make it searchable using SPARQL

Using GMail as an interface to my database

What if I choose to use GMail's awesome mail archive search capabilities on my database? What if, for every transaction that my database is responsible, I emailed details of that transaction to a GMail address that exists for the sole purpose of searching and retrieving transactions.
Anyone logged into that account could search according to labels, invoice numbers, customer names - whatever using Google's search engine. The results are presented as 'email messages'.
Imagine a user working from the standard (web-based) GMail account searches for an invoice number via GMail's search box - he's returned all instances where the db did anything that included that unique number. Opening any of these 'email messages' would have the static text text included at the time of the transactions (historical and tracking gold) but could also carry a Gadget that could transform the 'message' into an editor so as to execute a new transaction on that invoice.
Imagine further that I wasn't the first one to think of this - cuz surely i'm not - and even if i were, i'm not smart enough to execute the idea alone.
Are you aware of efforts similar to this?
thx
[?belongs on superuser instead?]
An interesting idea, however given your search parameters it might be unreliable. Although gmail's search is great, I have found issues when searching for partial terms. Case in point, I had an email whose subject line was "stuffas". When I searched for "stuffa" I got no results, when I searched for "stuffas" I got the email in the search result. Additionally, I had an email with an 8 digit number inside the body. When I searched for 7 digits out of 8, I got no results, but when I put all 8 digits, the email appeared in the results. So, search in gmail may not be as powerful of a solution as you think. Again this is my experience, I'd love to hear if someone is able to partial search numbers in gmail.
I just had the same idea; 4 years after you. It still doesn't look like this has 'been done before' in any production sense. But now in 2014, I really don't see why not. Python packages for interfacing with gmail are already there and dead-simple to use. It does not take a whole lot of abstraction to turn this into a generalized key-value storage.
Its probably not exactly the fastest database, and not the best solution for everything; but as an easy-to-use, easy to search, trivial to configure, 100% uptime, cloud stored and backed up, free-as-in-beer database, its pretty epic as far as I can see.
Anyone else has seen examples of this having been done before?
Edit: having thought about it some more, there are several answers as to why this is a bad idea:
gmail does not permit random access from different locations; it will block you account. quite a showstopper
amazon simpleDB also gives you a simple key-value store with the same characteristics (plus good python support), and isn't THAT big of a pain to set up if you are willing to spend a day wrapping your head around it. And is also effectively free for the kind of traffic that youd be able to cram into a gmail account.

effective way to determine if a message is spam?

Is there a way to determine if the given message is a spam? For example those who posts on forums and advertise their own sites for various products.
There are many ways, non of which are fool proof.
The current technology that achieves the best results is bayesian inference for statistical analysis.
One way is to look at where it came from. Most spam comes from certain ISPs and there are lists of those available.
Another way is to check where the posts links lead. IE if the links in the post lead to a completely unrelated site or a known spammer haven, it's probably spam.

Resources