I want to send a query to Wikipedia. The Reponse contains some informations about a City, which I asked for.
Some examples:
I would get some informations about Munich:
http://de.wikipedia.org/w/api.php?action=query&prop=revisions&titles=M%C3%BCnchen&rvprop=content&format=xml
These query send me the desired response.
But there are some other cases in which Wikipedia doesn't know what i mean (if i search "Neustadt" on de.wikipedia.org i become a List of different "Neustadts", because there are Many of them.
But how can I catch the desired articel?...In My Database all the Citys have coordinates,zip-codes and phonecodes. But this I can't search in Wikipeida,or?
//EDIT: I search the URL from the article
The problem with the wikipedia data is its largely unstructured, you might have better luck looking at something like dbpedia which is an effort to pull structured information from wikipedia and make it searchable using SPARQL
Related
as the question says: "Is there a way to get all complete sentences that a search engine (e.g. Google) has indexed that contain two search terms?"
I would like to use the (e.g. Google) search syntax: BMW AND Toyota. (<-- this is just an example)
And I would then like to have returned all sentences that mention BMW and Toyota. They must be in a single (ideally: short) sentence though.
Is that possible?
Many thanks!
PS.: Sorry - I have difficulties finding the right tags for my question... Please feel free to suggest more appropriate ones and I will update the question.
PPS.: Let me rephrase my question: If it is not readily possible with an existing search engine, are there any programmatical ways to do that? Would one have to write a crawler for that purpose?
No this may not be possible, as google stores this info based on keywords and other algorithms.
For any given keyword or set of keywords, google must be maintaining a reference to one or many matching (some accurate, some not so accurate) titles.
I do not work for google, but that could one way they are maintaining their search results.
I am looking into using ElasticSearch as a search engine for one of the projects I am working on.
There is still one thing which I need to find an answer for, and I hope someone inhere can help.
The customer want to be able to see some search statistic, like google analytics. Most searched words, new search words and so on.
Is there a way to easily setup this type of search statistic. My idea is something like ElasticSearch stores search history, about the search request made to the REST API. Then my customer can use Kibana or some other visual tool to monitor the search history of ElasticSearch.
Hope someone can help me with an answer for this.
Regards
Jacob
You could adjust the slow log to a time which it will capture all requests, however this will then produce large log files which will require maintenance. You could write an application which handles all of your ES requests, takes the search phrase and indexes this in a separate index i.e. your search history index and then deals with the actual request as normal, returning the response to the user.
I found a number of similar questions on SO but they are all are either 2+ years old or aren't exactly what I am looking for.
All I would like to do is obtain a list of twitter users whose bio/profile contains certain terms (scientist, democrat, 'dog lover', etc.).
I've considered using a google site search but so far the results are incredibly noisy.
Any suggestions would be much appreciated!
CS
The Twitter API supports a People Search similar to the website's "Find on Twitter" search feature. Although you can not directly search using only profile descriptions, it appears that the description content is used as part of the search space. If you can think of a way to narrow down your results even further by directly searching the returned users' descriptions, you should be able to do what you're looking for. Check out the Twitter API documentation for more info.
Example:
Try searching for "husband father of three", and you get these results, which obviously are returned because of the profile descriptions.
I have used one tool to search twitter profiles using keywords and many advance filters. I love the information which has been provided by the FollowerSearch tool. The information was very specific, which helps me to analyze the public twitter profiles.
One of the best tools for quickly searching among the 800 million public Twitter accounts in the database is FollowerSearch.
With FollowerSearch, you can quickly conduct searches for Twitter influencers and Twitter bios across its massive database of more than 800 million Twitter profiles. You may look for Twitter profiles based on information like their location, line of work, number of followers, etc.
Twitter Influencer Profile Search
A Twitter bios search will assist you in simplifying the process, whether you're looking for influencers or new talent. You can discover Twitter folks who share your interests. Find out exact information on all the accounts whose bios contain your search term.
Identify key accounts and Twitter influencers that have required terms in their Twitter bios.
Look up new and budding talent.
Find Twitter users with similar interests.
Search Twitter profile or Search Twitter bios for any desired term.
I created a tool that does exactly what your looking for. Find70 let's you search for twitter profiles by their twitter bio. In fact, you can set up as many search filters as you want and define your own weighting for each filter. In your example above, you could search for: scientist, democrat, 'dog lover' and it would return all the accounts that have those in the bio. This can be combined with other filters too. Here it is http://www.find70.com/?t=stack
I came across this site called social mention and am curious about how applications like this work, hopefully somebody can offer some glimpses/suggestions on this.
Upon looking at the search results, I realize that they grab results from facebook, twitter, google.... I suppose this is done on the fly, probably through some REST api exposed by the mentioned?
If what I mention in point 1 is probably true, does that means sentiment analysis on the documents/links return is done on the fly too? Wouldn't that be too computationally intensive? I am curious because other than sentiments, they also return the top keywords in the document set.
They have something called the "trends". They looked like the trendingtopics in twitter, but seems like they also include phrases >3 words long. Is this relevant to nlp's entity extraction or more to keyphrase extraction? Is there apis other than that of Twitter that provides this? Is "trends" generally done on search queries submitted by users or do the system actually processes the pages?
A curious man.
sentiment can be fast and on the fly, if it is for example rule-based and the dictionaries are in memory. Curious? Get in touch
I can easily get the results I want from Yahoo! BOSS. However, for the particular data I'm trying to get, it's important that "duplicate" results be included. I know Yahoo! has them, since when I search for the query manually, it offers me a link to see these similar results.
Is there any way to request these deeper results with the Yahoo! BOSS API?
After some research, it would seem that the answer is, at present, no.
I will continue to pursue the subject here.