I am looking to use google (or another search engine) in order to provide websites relating the name of an individual with various different variations of a job title
For example
Bob Williams or
Bobby Williams or
Bert Williams or
Barbara Williams or
Bettie Williams
and so on
plus one of the following job types - estate agent/banker/author/publisher etc
I have tried "B * " + Williams + Banker and ("B * " + "Williams") + Lawyer and unfortunately have not been able to get the results I require. Can anyone assist with providing me with the correct syntax and which search engine is best to use.
AFAIK Google does not support this, but does something similar when you are searching
If you search for "Bob" then the engine search for *bob*
So, i guess you cannot do what you are searching for
Related
Performing wild carding on wrongly spelled term will not allow autocorrection/dym be calculated for the non wild carded term.
Example:
Searching iphont will be autocorected to iphone and return
results.
Searching for iphont* will not get corrected and return any results or
suggestions.
I understand there is an processing order but is there an OOB way to make this work instead of doing 2 queries (wild carded query, if no results regular query)?
According to documentation wild carded searches don't support several search features like autocorrection, dym, phrased search, thesaurus etc.
My question is not about parsing.
I have been looking through the wikipedia API. I need to search for companies and get a one sentence summary. It's working good, the only problem I have is when I need to disambiguate. It's hard for my code to know whether "dropbox (service)" or "dropbox (band)" is the dropbox company my user is looking for.
I tried to put the word "company" in the query, expecting it to work like a google search, but unfortunately it didn't.
so my question is: is there an easy way to disambiguate the results I get by telling wikipedia it is a "company" that I want?
If you're looking for companies only then consider using their full names instead of short forms. In case of Dropbox, the name of the company is Dropbox, Inc. If you search for Dropbox, Inc in Wikipedia you will be redirected to the page Dropbox(Service) which i believe is the page youre looking for.
If you dont have the resources to have the name of the company in the perfect format, then consider using Category:Companies to refine your results further.
When you get to the page, you can mine for the extract of the company by using the Mediawiki API as follows
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Dropbox%20(service)
Note: The extract is called section0 in MediaWiki
I recommend trying Wikidata. Wikidata are a multilingual factual database of everything, and they have a query interface at query.wikidata.org. The language the interface uses is called SPARQL. For instance, if you're interested in a list of well-known cats, https://w.wiki/W4W is your query. More details can be found at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service.
import wikipedia
print(wikipedia.summary("COMPANY_NAME"))
Try to filter out the companies by categories - there is a list provided in the end of the page:
xx = wikipedia.page("Dropbox")
xx.title
print(xx.categories)
Suppose I am searching using one of the cts:query API's. I am looking for documents containing the phrase "John and Jane". Some of my documents have "John & Jane"(actually John & Jane) in them. I want them to be returned as well. Also consider reverse situation.
Does Marklogic provide any options to do that?
Queries expressed as cts:query items or XML are easy to rewrite with XQuery typeswitch expressions. The discussion list thread at http://markmail.org/message/6hxmuqnpnfm73j4n has an example of something similar.
Mike gives a good suggestion, but it might be worth to take a step back and look at your problem first. From your comment on Mike's answer I take it that you look for something like thesaurus expansion, but for the 'and' and '&' instead of the other words.
I may be wrong, but to my knowledge MarkLogic doesn't provide features to take care of something like that automatically. Functions like search:search and search:parse are powerfull, but don't go that far. You are up to your own to take a search string like yours, break it into parts manually to wrap it in a cts:query, or use something like search:parse for that, and then pull tricks like that of Mike to walk through your query-tree, and expand any particular search query node you would like to expand in a particular way.
The markmail thread to which Mike points, gives an example of how to walk a query-tree, and manipulate it. A little heavy for this particular case, but there is a thesaurus module that can help in various general cases. The following chapter of the Search Dev Guide explains its features, and ends with a small example of how to apply it:
http://docs.marklogic.com/guide/search-dev/thesaurus#chapter
HTH!
Assume your term to search is "John & Jane"
In order to Search above word ,you can use following line
let $inputSearchDetails ="John & Jane"
let $InputXML := xdmp:unquote($inputSearchDetails, "", ("format-xml", "repair-full"))
I'm trying to find out how to use bing to get lat/long. All the tutorials I find are for plotting points but don't actually show how to extract the coordinates. Anyone have experience with this.
I think the new way to do that is by using their REST service documented here: http://msdn.microsoft.com/en-us/library/ff701715.aspx.
For example:
http://dev.virtualearth.net/REST/v1/Locations?key=BingMapsKey&query=White
House
or
http://dev.virtualearth.net/REST/v1/Locations?key=BingMapsKey&query=1600
Pennsylvania Ave NW Washington, DC
We use the VEMap.Geocode method and it works just great.
see Bing Maps Ajax Control v6.3
As the title says,
lets say I want to get the number of .de domains:
Googling:
inurl:www.*.de
retrieves the correct results but a lot of them are from the same domain.
Is there another way to do this?
the better search query would be: site:de
but even so, the result count of goolge is just a very very very blurry page estimate (a.k.a. completely wrong and not what you are looking for).
google is the wrong source for this.
but via google i found this
http://www.denic.de/hintergrund/geschichte-der-denic-eg.html
August 2009 13 Millionen Domains
unterhalb von .de registriert –
darunter 463.000 IDNs.