How to search newsgroups in gnus - search

I have gnus working for multiple email addresses with searching via
(nnir-search-engine imap)
I have newsgroup reading setup and working fine too, however, I have never been able to get searching in newsgroups working even though I have
(setq gnus-select-method '(nntp "news.gmane.org"
(nnir-search-engine gmane)))
With the latter, with my cursor on a gname newsgroup, I expect to be able to do G G enter a search and have it return a list of hits as it does with imap search. Instead, I get the message
Contacting host: search.gmane.org:80
open-network-stream: search.gmane.org/80 nodename nor servname provided, or not known
in my mini and Messages buffers.
Any idea what is going on and how to rectify this?
One thought I had was perhaps that I needed to utilize gnu-agent and an agent category to allow me to download messages via J s (all of which I did set up, but haven't fully understood where it is saving, etc.).
Everything else works great in gnus, I just want to search newsgroups too in gnus.
p.s. I have downloaded Unison, which is quite nice and free now, and it can do what I need, but I still hope to do it in gnus.

The gmane search engine does not work because gmane has undergone some changes: gmane search has gone bust for the last two years (?) or so, ever since Lars decided that he was not going to continue with gmane. Although the people who took over brought the nntp service back, search is still missing.
There are other search engines however: the gnus manual lists swish++, swish-e, namazu, notmuch and hyrex (obsolete). I have no idea how well each works: I do know that they require configuration (imap search and gmane search (before it broke), worked right out of the box).
The doc has very few details on the rest, but it does describe how to set up namazu: it requires that you create and maintain index files, presumably indexing a set of local files. The doc's emphasis is on indexing local email, but presumably it would work similarly for downloaded local news articles.

Related

Verbatim search in azure/cognitive/bing web search (API, not website)

I cannot find any option to achieve a verbatim azure/cognitive/bing Web search.
In my case the difference is trying to sift through tens of millions of irrelevant search results to find the 10 results that actually match my query literally.
Even though I am a paying customer, there is no support available. And the API documentation did not help either.
I would think it should be super easy to provide a verbatim search option. Is there one that I did not see?
I checked further and it seems for the Bing Search APIs - +"phrase" works and returns documents containing this phrase at the top. Just add + in front of what you have been trying. Support link is here: https://azure.microsoft.com/en-us/support/plans/.

Solr behind Drupal returns too many results for specific query

We've got Solr sat behind one of our client's Drupal 7 websites, and while it's working well, it returns too many results for what should be quite specific queries. (It also has relevance/weighting problems; but I'm hoping that solving this problem will remove the - literally - irrelevant results.)
For example, searching for the phrase 'particular phrase in london' should return the node with that as its title, quite high up; I don't even think that any other content should be returned. But I find that it's returning lots of content, purely on the fact that it mentions "London"!
Frivolously, searching for the ridiculous phrase 'piecrusts in london' returns a lot of results too, apparently just because they mention London. No content on the site mentions actual piecrusts.
When I search for 'particular phrase in london', here are the parameters that end up in the catalina.out log on the server (whitespace added for clarity):
{spellcheck=false&facet=true&f.im_field_health_topic.facet.mincount=1
&facet.mincount=1&f.ds_created.facet.date.gap=%2B1YEAR
&spellcheck.q=particular+phrase+in+london
&qf=taxonomy_names^2.0&qf=path_alias^5.0&qf=content^40&qf=label^21.0
&qf=tos_content_extra^1.0&qf=ts_comments^20&qf=tm_vid_3_names^200
&facet.date=ds_created
&f.ds_created.facet.date.start=1970-01-01T00:00:00Z/YEAR
&f.bundle.facet.mincount=1&hl.fl=content,ts_comments
&json.nl=map&wt=json&rows=10&fl=id,entity_id,entity_type,bundle,bundle_name,
label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,
tos_name,tm_node,zs_entity
&start=0&facet.sort=count&f.bundle.facet.limit=50&q=special+phrase+in+london
&f.ds_created.facet.date.end=2012-01-01T00:00:00Z%2B1YEAR/YEAR
&bf=recip(ms(NOW,ds_created),3.16e-11,1,1)^150.0
&facet.field=im_field_health_topic&facet.field=bundle
&f.im_field_health_topic.facet.limit=50&f.ds_created.facet.limit=50}
hits=1998 status=0 QTime=14
Note that these parameters have been built by Drupal's Apache Solr module; I don't believe we've got any particular custom code of our own that's doing anything to it.
This corresponds to the following URL, if entered directly in the browser:
http://example.com:8081/solr/CLIENT/select?spellcheck=false&facet=true&f.im_field_health_topic.facet.mincount=1&facet.mincount=1&f.ds_created.facet.date.gap=%2B1YEAR&spellcheck.q=particular+phrase+in+London&qf=taxonomy_names^2.0&qf=path_alias^5.0&qf=content^40&qf=label^21.0&qf=tos_content_extra^1.0&qf=ts_comments^20&qf=tm_vid_3_names^200&facet.date=ds_created&f.ds_created.facet.date.start=1970-01-01T00:00:00Z/YEAR&f.bundle.facet.mincount=1&hl.fl=content,ts_comments&json.nl=map&wt=json&rows=10&fl=id,entity_id,entity_type,bundle,bundle_name,label,is_comment_count,ds_created,ds_changed,score,path,url,is_uid,tos_name,tm_node,zs_entity&start=0&facet.sort=count&f.bundle.facet.limit=50&q=particular+phrase+in+London&f.ds_created.facet.date.end=2012-01-01T00:00:00Z%2B1YEAR/YEAR&bf=recip(ms(NOW,ds_created),3.16e-11,1,1)^150.0&facet.field=im_field_health_topic&facet.field=bundle&f.im_field_health_topic.facet.limit=50&f.ds_created.facet.limit=50
This URL returns nearly 2000 results - that's most of the content on the site! I've experimented with removing each query parameter at a time, and the only one to make any difference seems to be qf and q: if I remove qf, zero results; if I remove q, I get more results back!
I guess there are two questions here:
Is there anything in these parameters that tell Solr "don't worry if 'particular phrase', or 'piecrusts' appears: just collate the results for 'london'" and then order by relevancy? I would add that I think 'in' is mentioned in the stopwords file, so we can probably ignore the effect of that (?)
Or is this something in the (standard Drupal) schema that I need to change?
I appreciate that sometimes search is better for the visitor if it's inclusive; Google does return results even if it doesn't find perfect matches. But, stopwords and stemming aside, the client does require that searches return only results where all words appear in the content.
As mentioned in the post at http://drupal.org/node/1783454, the Apache Solr Search Integration module makes use of the mm param, which is more or less configured to effect rankings by how closely the keywords are in the dataset. Looking through the docs there are other ways you can use the parameter to effect rankings as well. Therefore the results produced by Apache Solr Search Integration are weighted more closely to the AND operator even though it will return more results as you add more keywords. The benefit of this param is that in cases where the user enters keywords that are too restrictive, results will still be returned. Displaying no results is a really quick way to guide people away from your site.
How are you displaying the search ?
Maybe you could solr views to limit the search range ?
http://drupal.org/project/apachesolr_views
thanks
Nick

can you have "variables" in text in google sites?

Sorry, this is a bad question. I don't even know what the title should be. I'm a total noob at making websites so this might be easy to find but I just don't know the terminology to search for. I cannot find anything about how to do this...
What I want to do is have something like references/variables that I can use in a block of text and it will automatically get replaced with whatever value should be there. Best way I can think of to describe it would be if I was using the site as a design doc for a game or something, I would be able to type in [Title] or something similar on any page and when it loads that text would be replaced with whatever my Title is. That way If I ever change titles, names, classes, races, places, items, etc... they would only have to be changed in 1 place and the change would be reflected everywhere.
I notice if I add a link to a page it will automatically use the Title of that page as the text of the link. That is almost exactly what I want. Except when I change the Title of the other page the text of the link remains as the original text. It doesn't get updated to the new Title and that is not at all what I want.
Also, I want to do this in Google Sites and as simply as possible. I don't really want to use a database. I was hoping Google Sites would have some kind of funcionality for this.
I don't believe this is possible (on Google Sites) and likely you need to consider a hosted solution.
Quoting the answer from this relevant post:
You should consider hosting your solution using Google's App Engine
instead of Google Sites. You can set it up so it uses PHP (see link
below), you can configure it to use your domain name and you get
enough CPU, disk and bandwidth allowance to serve around five million
page views for free each month, if you are serving more than that,
their prices are extremely competitive.
Google App Engine:
http://code.google.com/appengine/docs/whatisgoogleappengine.html How
to setup PHP using Google App Engine: http://blog.caucho.com/?p=187
Also I'm not sure how your PHP skills are but if you're unfamiliar with it then this should help to get you started.

Expression Engine search problem

We runnig site-s with EE 1.6.8... Not funny, but my boss like it...
So we implemented a search. Everything is fine but the search url is like this:
/search/results/0374c6c40f159934bc6795f031c4e52f10/
instead
/search/results/keyword
The developers said, that only a paid plugin can we put the keyword in the url.
OMG.
Is it true?
And another Q: after few hours the search url give no results back. It seems, that the session of the cookie expired or anything.
I have two ideas:
1. Our developers want to fool me
2. EE is so, it's not a cms just a cms like thing...
You are correct, the EE Search module uses session-based URLs for results. The reason being that search results are cached for performance, so those results need to expire after a short period of time (as new results might need to appear).
I assume what you want is bookmarkable search results. In this case, I suggest Super Search, or on the free, Google-powered end, the Google Search Results plugin.
Not 100% sure if it would work but in theory you could have www.example.com/search/results/keyword .
In your EE code you would put {exp:weblog:entries search:body="{segment_3}"}title:{title} etc..{/exp:channel:enties} as shown on http://expressionengine.com/legacy_docs/modules/weblog/parameters.html#par_search
The problem is when the keyword contains non [a-z][0-9] characters which is worth considering.
We offer EvoPost on our website for free http://www.eevolution.co.uk/index.php/addons/evopost which will enable you to capture the keywords from a HTTP POST variable e.g. search:body="{ep_txtboxname}"
Feel free to contact us through our website if you need any assistance with the product.
Thanks
Tim
EEvolution Developer

Why and how does the googlebot use my website's search engine?

Looking through my search logs from time to time, I notice that by far the biggest user of my search engine is the google-bot. What gives? Is it looking for content that might not be directly accessible through navigation? If so, how does it know which words and phrases to look for (they're surprisingly relevant). Does it check the most popular keywords on the site? I know I seem to be answering my own question here, but this is really only working it out from first principles. I'd like to hear from someone who knows what they're talking about (i.e. not me).
If your search form's method is get instead of post, each search has its own url, and people might be posting those urls elsewhere. Or if you have a (possibly inadvertently) publicly accessible webstats page that listed those urls, that's another common way for search engines to stumble upon your internal search urls. A third way I've seen is sites that list recent searches on their pages, but this is more intentional. "MySQL Performance Blog" does this to an annoying extent, so any search of their site from google yields hundreds of pages of similar searches, even if none of them found what they were looking for.
Edit: Looks like it does on occasion, but only GET forms:
http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html
Google will use words that occur on your site in search boxes to try to find pages that it can't otherwise.
Google says that for the past few months, it has been filling in forms
on a "small number" of "high-quality" web sites to get back
information. What words has it been entering into those forms? Words
automatically selected that occur on the site, with check boxes and
drop-down menus also being selected.
http://searchengineland.com/google-now-fills-out-forms-crawls-results-13760

Resources