Elasticsearch: Penalize documents that have similar neighbors to de-cluster - search

I have an Elasticsearch instance filled with job listings, and when a user searches, it aims to return relevant results. However, we have many copies of each job but at different locations, and since those are all very relevant to the search, they turn up at the top of the list, cluttering results.
Here is an example:
Search: facebook
Results:
- Facebook Engineering Internship
Atlanta, Georgia
- Facebook Engineering Internship
Madison, Wisconsin
- Facebook Engineering Internship
Palo Alto, California
What I would like to do here is de-cluster results that are too similar, effectively penalizing a particular document result based on the document that is returned right before it. That would enable the top results to have more variety, looking something more like this:
Search: facebook
Results:
- Facebook Engineering Internship
Atlanta, Georgia
- Facebook Marketing Trainee
Palo Alto, California
- Social Media Expert: Facebook (Verizon)
Chicago, Illinois
- Facebook Engineering Internship
Madison, Wisconsin
How can I do this? If I'm conceptualizing this the wrong way with penalizing documents based on neighbors, please let me know that as well.

Related

openai.error.InvalidRequestError: does not have access to the answers endpoint

When I'm trying to implement the QA system with GPT-3, there is an error occurred:
openai.error.InvalidRequestError: Org org-Ilv48EJDyLWiTc2SJWjOnRaM does not have access to the answers endpoint. Reach out to deprecation#openai.com if you have any questions
My code is:
import openai
openai.api_key = "my-openai-key"
document_list = ["Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14 percent of its shares and control 56 percent of the stockholder voting power through supervoting stock. They incorporated Google as a privately held company on September 4, 1998. An initial public offering (IPO) took place on August 19, 2004, and Google moved to its headquarters in Mountain View, California, nicknamed the Googleplex. In August 2015, Google announced plans to reorganize its various interests as a conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and will continue to be the umbrella company for Alphabet's Internet interests. Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the CEO of Alphabet.",
"Amazon is an American multinational technology company based in Seattle, Washington, which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is one of the Big Five companies in the U.S. information technology industry, along with Google, Apple, Microsoft, and Facebook. The company has been referred to as 'one of the most influential economic and cultural forces in the world', as well as the world's most valuable brand. Jeff Bezos founded Amazon from his garage in Bellevue, Washington on July 5, 1994. It started as an online marketplace for books but expanded to sell electronics, software, video games, apparel, furniture, food, toys, and jewelry. In 2015, Amazon surpassed Walmart as the most valuable retailer in the United States by market capitalization."]
response = openai.Answer.create(
search_model="ada",
model="curie",
question="when was google founded?",
documents=document_list,
examples_context="In 2017, U.S. life expectancy was 78.6 years.",
examples=[["What is human life expectancy in the United States?","78 years."]],
max_tokens=10,
stop=["\n", "<|endoftext|>"],
)
print(response)
where "my-openai-key" is the secret key allocated in openai's website.

How do I use Transactions API for non paid table booking google action?

So I am making a google action in which user can book a table and I completed making it and gave it for review it got rejected and they said my skill have to use the transaction API for non paid booking table so how do I do that?
Good morning, Jigar! If your action doesn't require or accept payment, you could reframe the marketing into a "reservations" app, where your action asks the following:
"What kind of food are you in the mood for?" (User says, "Mexican food")
"Ok, I can make a reservation at a nearby Mexican restaurant. To do that, I'll need to know your approximate location. Is that alright?" (see Permissions for implementation details)
(Hit the Yelp API or Google Maps API to find a list of top rated Mexican restaurants near the user's location.)
"Ok, Yelp's top rated Mexican restaurant nearby is Super Awesome Mexican Food on Sixth Street. Would you like to make a reservation?" (user says yes)
"How many people are in your party?" (User says, "2")
"Ok. A table for two is available at [list like 7 pm, 7:30 pm, 8:00 pm, etc]. Which time would you like to reserve?" (User says, "Seven thirty")
"Ok, I'll need an email to finalize your reservation. Which address should I use?"
"Ok, your table is reserved, and an email has been sent to your inbox and the restaurant. Please show the email to the host when you arrive. Bon appetit!"
Since you said your app in non paid, this seems like it gets across the same features, without dealing with transactions or money. I'm guessing Google's review team had a problem with the term "booking", which implies money exchanges hands.

Text mining algorithm similar text

Hi I am writing a small app using Facebook to group people by social networks. The main problem I face is grouping similar texts together. Some people have the education as Anna University, Guindy while others put it as Anna University. How do I group these together? What algorithm or term should I search for?

Searching user profiles on Twitter

I found a number of similar questions on SO but they are all are either 2+ years old or aren't exactly what I am looking for.
All I would like to do is obtain a list of twitter users whose bio/profile contains certain terms (scientist, democrat, 'dog lover', etc.).
I've considered using a google site search but so far the results are incredibly noisy.
Any suggestions would be much appreciated!
CS
The Twitter API supports a People Search similar to the website's "Find on Twitter" search feature. Although you can not directly search using only profile descriptions, it appears that the description content is used as part of the search space. If you can think of a way to narrow down your results even further by directly searching the returned users' descriptions, you should be able to do what you're looking for. Check out the Twitter API documentation for more info.
Example:
Try searching for "husband father of three", and you get these results, which obviously are returned because of the profile descriptions.
I have used one tool to search twitter profiles using keywords and many advance filters. I love the information which has been provided by the FollowerSearch tool. The information was very specific, which helps me to analyze the public twitter profiles.
One of the best tools for quickly searching among the 800 million public Twitter accounts in the database is FollowerSearch.
With FollowerSearch, you can quickly conduct searches for Twitter influencers and Twitter bios across its massive database of more than 800 million Twitter profiles. You may look for Twitter profiles based on information like their location, line of work, number of followers, etc.
Twitter Influencer Profile Search
A Twitter bios search will assist you in simplifying the process, whether you're looking for influencers or new talent. You can discover Twitter folks who share your interests. Find out exact information on all the accounts whose bios contain your search term.
Identify key accounts and Twitter influencers that have required terms in their Twitter bios.
Look up new and budding talent.
Find Twitter users with similar interests.
Search Twitter profile or Search Twitter bios for any desired term.
I created a tool that does exactly what your looking for. Find70 let's you search for twitter profiles by their twitter bio. In fact, you can set up as many search filters as you want and define your own weighting for each filter. In your example above, you could search for: scientist, democrat, 'dog lover' and it would return all the accounts that have those in the bio. This can be combined with other filters too. Here it is http://www.find70.com/?t=stack

How to perform website benchmarking?

I am trying to do competitive analysis of online trends prevailing in real estate domain at state level in a country. I have to create a report which is not biased towards any particular company but it compares or just shows how the companies are performing for a list of trends. I will use parameters of Clickstream analysis to show the statistics of how the websites of the company perform. The trend specific performance can be depicted by Sentiment Analysis in my opinion. If there is some other way to do it in an effective manner I am looking forward to any such approach.
Now, I am not able to find any trends that come in common.
How can I find general trends which will be common for all real estate comapnies ?
I tried using Google Trends. They provide graphical and demographic information regarding a particular search term and lists related terms to the search which I am clueless how to use. And as I drill down from country to state, the amount data is very less.
Once I have the trends then I've to find how people are reacting to those trends. Sentiment Analysis is the thing which will provide me this info.
But even if I get the trends how will I get trend specific data from which I can calculate its polarity ?
Twitter and other social media sites can provide some data on which sentiment analysis can be performed. I used this site which gives the positive, negative and neutral behaviour related to some term on twitter. I need something analogous to this but the dataset on which this analysis can be performed should not be limited to social media only.
Are there any other entities I can add in this competitive analysis
report ?
The report will be generated on monthly basis. And I want maximum amount of automation in above tasks. I am thinking of using web-scraping also to scrape data of similar format. I would also like to know what data I should scrape and what data I should manually extract.

Resources