Searching for information about the Go language - search

I was trying to look for information about generic database drivers for Google's Go language, when I notice I have hard time to get any result.
Go SQL returns nothing related to the Go language, and golang SQL only returns useful results from the mailing list (and not, say, from github).
Is there any smarter way I can look for information about the go language?
One of the creator said that search engines will recognize the context of the overloaded word "go", and it'll eliminate my problem, but I say - why bother? go issue9!

Take a look at the dashboard: http://godashboard.appspot.com/project. It's the best overview of current Go related projects. Another one is http://go-lang.cat-v.org/.
mue

Related

Smart search for acronyms in Salesforce

In Salesforce's Service Cloud one can enable the out of the box search function where the user enters a term and the system searches all parts of the database for a match. I would like to enable smart searching of acronyms so that if I spell an organizations name the search functionality will also search for associated acronyms in the database. For example, if I search type in American Automobile Association, I would also get results that contain both "American Automobile Association" and "AAA".
I imagine such a script would involve declaring that if the term being searched contains one or more spaces or periods, take the first letter of the first word and concatenate it with the letters that follow subsequent spaces or periods.
I have unsuccessfully tried to find scripts for this or articles on enabling this functionality in Salesforce. Any guidance would be appreciated.
Interesting question! I don't think there's a straightforward answer but as it's standard search functionality, not 100% programming related - you might want to cross-post it to salesforce.stackexchange.com
Let's start with searchable fields list: https://help.salesforce.com/articleView?id=search_fields_business_accounts.htm&type=0
In Setup there's standard functionality for Synonyms, quite easy to use. It's not a silver bullet though, applies only to certain objects like Knowledge Base (if you use it). Still - it claims to work on Cases too so if there's "AAA" in Case description it should still be good enough?
You could also check out the trick with marking a text field as indexed and/or external ID and adding there all your variations / acronyms: https://success.salesforce.com/ideaView?id=08730000000H6m2 This is more work, to prepare / sanitize your data upfront but it's not a bad idea.
Similar idea would be to use Tags although that could explode in size very quickly. It's ridiculous to create a tag for every single company.
You can do some really smart things in data deduplication rules. Too much to write it all here, check out the trailhead: https://trailhead.salesforce.com/en/modules/sales_admin_duplicate_management/units/sales_admin_duplicate_management_unit_2 No idea if it impacts search though.
If you suffer from bad address data there are State & Country picklists, no more mess with CA / California / SoCal... https://resources.docs.salesforce.com/204/latest/en-us/sfdc/pdf/state_country_picklists_impl_guide.pdf Might not help with Name problem...
Data.com cleanup might help. Paid service I think, no idea if it affects search too. But if enabling it can bring these common abbreviations into your org - might be better than reinventing the wheel.

Better or Not combine Search Engine and Recommend System?

In our project, we use search engine, but the result need to be ranked based on each user's interest, similar to recommendation according to users' keyword.
If we separate the two system, it would cost a lot time.
Is there a better way to combine Search Engine and Recommend System together?
Or is there a simple way to customize my ranking strategy to achieve this?
This is what we were trying to do in our project as well. There are two things while solving this problem - Relevancy vs Personalization. You should look at how much of personalization is ruining the relevancy of the query. For example, if I'm suggesting news, then it makes sense to suggest based on location. I hope you already would have analyzed the use cases.
The way that I followed was - after getting the results on the search, then re-rank results to give personal suggestions. For example if I was searching for a specific algorithm to code, then getting the result set and re-ranking on my preference, lets say on, Java (based on my previous history) will make sense. In any case relevancy is of utmost importance and then we fit in user's preferences.
Again the use case is important, if this was for a news search, then directly querying and retrieving on location is best way to do it.

what algorithm does freebase use to match by name?

I'm trying to build a local version of the freebase search api using their quad dumps. I'm wondering what algorithm they use to match names? As an example, if you go to freebase.com and type in "Hiking" you get
"Apo Hiking Society"
"Hiking"
"Hiking Georgia"
"Hiking Virginia's national forests"
"Hiking trail"
Wow, a lot of guesses! I hope I don't muddy the waters too much by not guessing too.
The auto-complete box is basically powered by Freebase Suggest which is powered, in turn, by the Freebase Search service. Strings which are indexed by the search service for matching include: 1) the name, 2) all aliases in the given language, 3) link anchor text from the associated Wikipedia articles and 4) identifiers (called keys by Freebase), which includes things like Wikipedia article titles (and redirects).
How the various things are weighted/boosted hasn't been disclosed, but you can get a feel for things by playing with it for while. As you can see from the API, there's also the ability to do filtering/weighting by types and other criteria and this can come into play depending on the context. For example, if you're adding a record label to an album, topics which are typed as record labels will get a boost relative to things which aren't (but you can still get to things of other types to allow for the use case where your target topic doesn't hasn't had the appropriate type applied yet).
So that gives you a little insight into how their service works, but why not build a search service that does what you need since you're starting from scratch anyway?
BTW, pre-Google the Metaweb search implementation was based on top of Lucene, so you could definitely do worse than using that as your starting point. You can read some of the details in the mailing list archive
Probably they use an inverted Index over selected fields, such as the English name, aliases and the Wikipedia snippet displayed. In your application you can achieve that using something like Lucene.
For the algorithm side, I find the following paper a good overview
Zobel and Moffat (2006): "Inverted Files for Text Search Engines".
Most likely it's a trie with lexicographical order.
There are a number of algorithms available: Boyer-Moore, Smith-Waterman-Gotoh, Knuth Morriss-Pratt etc. You might also want to check up on Edit distance algorithms such as Levenshtein. You will need to play around to see which best suits your purpose.
An implementation of such algorithms is the Simmetrics library by the University of Sheffield.

Open source projects for email scrubbing generating structured data from unstructured source?

Don't know where to start on this one so hopefully you guys can clear up my question. I have project where email will be searched for specific words/patterns and stored in a structured manner. Something that is done with Trip it.
The article states that they developed a DataMapper
The DataMapper is responsible for taking inbound email messages
addressed to plans [at] tripit.com and transforming them from the
semi-structured format you see in your mail reader into a highly
structured XML document.
There is a comment that also states
If you're looking to build this yourself, reading a little bit about
Wrappers and Wrapper Induction might be helpful
I Googled and read about wrapper induction but it was just too broad of a definition and didn't help me understand how one would go about solving such problem.
Is there some open source project out there that does similar things?
There are a couple of different ways and things you can do to accomplish this.
The first part, which involves getting access to the email content I'll not answer here. Basically, I'll assume that you have access to the text of emails, and if you don't there are some libraries that allow you to connect java to an email box like camel (http://camel.apache.org/mail.html).
So now you've got the email so then what?
A handy thing that could help is that lingpipe (http://alias-i.com/lingpipe/) has an entity recognizer that you can populate with your own terms. Specifically, look at some of their extraction tutorials and their dictionary extractor (http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html) So inside of the lingpipe dictionary extractor (http://alias-i.com/lingpipe/docs/api/com/aliasi/dict/ExactDictionaryChunker.html) you'd simply import the terms you're interested in and use that to associate labels with an email.
You might also find the following question helpful: Dictionary-Based Named Entity Recognition with zero edit distance: LingPipe, Lucene or what?
Really a very broad question, but I can try to give you some general ideas, which might be enough to get started. Basically, it sounds like you're talking about an elaborate parsing problem - scanning through the text and looking to apply meaning to specific chunks. Depending on what exactly you're looking for, you might get some good mileage out of a few regular expressions to start - things like phone numbers, email addresses, and dates have fairly standard structures that should be matchable. Other data points might benefit from some indicator words - the phrase "departing from" might indicate that what follows is an address. The natural language processing community also has a large tool set available for text processing - check out things like parts of speech taggers and semantic analyzers if they're appropriate to what you're trying to do.
Armed with those techniques, you can follow a basic iterative development process: For each data point in your expected output structure, define some simple rules for how to capture it. Then, run the application over a batch of test data and see which samples didn't capture that datum. Look at the samples and revise your rules to catch those samples. Repeat until the extractor reaches an acceptable level of accuracy.
Depending on the specifics of your problem, there may be machine learning techniques that can automate much of that process for you.

"Did you mean?" feature in Lucene.net

Can someone please let me know how do I implement "Did you mean" feature in Lucene.net?
Thanks!
You should look into the SpellChecker module in the contrib dir. It's a port of Java lucene's SpellChecker module, so its documentation should be helpful.
(From the javadocs:)
Example Usage:
import org.apache.lucene.search.spell.SpellChecker;
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
AFAIK Lucene supports proximity-search, meaning that if you use something like:
field:stirng~0.5
(it s a tilde-sign)
will match "string". the float is how "tolerant" the search would be, where 1.0 is exact match and 0.0 is match everything (sort of).
Different parsers will however implement this differently.
A proximity-search is much slower than a fuzzy-search (stri*) so use it with caution. In your case, one would assume that if you find no matches on a regular search, you try a proximity-search to see what you find, and present "did you mean" based on the result somehow.
Might be useful to cache this sort of lookups for very common mispellings, for performance reasons.
Google's "Did you mean?" is (probably; they're secretive, of course) implemented by consulting their query log. Look to see if people who searched for the query you're processing searched for something very similar soon after; if so, it indicates they made a mistake, and realized what they ought to be searching for.
Since you probably don't have a huge query log, you could approximate it. Take the query, split up the terms, see if there are any similar terms in the database (by edit distance, whatever); replace your terms with those nearby terms, and rerun the query. If you get more hits, that was probably a better query. Suggest it to the user. (And since you've already got the hits, and most people only look at the top 2 results, show them those.)
Take a look at google code project called semanticvectors.
There's a decent amount of discussion on the Lucene mailing lists for doing functionality like what you're after using it - however it is written in java.
You will probably have to parse and use some machine learning algorithms on your search logs to build a feature like this!

Resources