Hierarchical Autosuggest

Hierarchical Autosuggest - search

I am designing an autosuggest feature on a quick-search box. Suggestions will include small icons, multiline text, etc. The application is handling orders. The search field will recognize a variety of different meaningful terms - e.g. customer surname, order id, etc. But when an order ID is input, I want users to get an opportunity to view either the order, or the person. I was thinking that I would like a hierarchy within the list - so if i type 1234, and it matches 5 orders for 3 different people, the 3 people are returned at the top level, and their 5 orders underneath the respective customer.
Quick mockup:
Has anyone seen something like this implemented elsewhere? Don't want to re-invent the wheel. Also interested in any other feedback.

Answer to your question: No, haven't seen this elsewhere.
Feedback on your mockup:
I would say that it is a pretty creative autosuggest solution.
However, I think it is overkill though. If I just want to quickly navigate to the Order page by searching a specific Order ID (and expecting only one result in the autosuggest), but the autosuggest shows up five order items under three people (as shown in your mockup), I think that is way too much, put aside performance.
My idea:
Each autosuggest item contains one Primary Line that can clearly identify the item and additional Details Line(s) that provide more description about the item, similar to Google's search result page and Facebook search autosuggest.
For example, the autosuggest shows up each item like this when users search for an order:
(Order Icon) 23-34534
Loaf of Bread, Soda and more.
By Bob Jones, Paul Smith and others.
You can make each order item (Loaf of Bread, Soda, more) link to the respective order item line in the Order page, and each person name to the respective person page. This method is more concise and takes less space than your mockup while still providing the functionality that you want.
Sometimes, simple is better, less is more. Remember the KISS principle. Think of Apple iPod and iPhone as examples.

Related

The user needs to pick an option from a 100 items list. What is the best way to make this happen in dialogflow?

I have a bot where the user needs to choose a specific item from a big list, over 100 items. How can I do this in Dialogflow ? I used a Webview for this on FB Messenger but in Dialogflow this is not an option.
Any thoughts ?

This may be a bit harsh, but my first thought is that you need to rethink your design. I can think of very few cases where I would be presented with a list of 100 items, need to pick a specific one from them, and would find that a pleasant experience.
Imagine walking into a restaurant and the waiter, instead of handing you a menu, started reciting every item that is available. Even menus are broken into sections for a reason.
If the users have an idea what they need to choose, give them the option to type that in at any point, of course. (For example, if they need to pick a country from a list.)
But otherwise, help them narrow down their choices to only what applies, in the same way a waiter might help you narrow down choices by asking if you wanted fish, beef, or something else.

What person and mood should I use in Gherkin/Specflow Given/When/Then statements?

I am a bit confused with the way people write statements in the Gherkin language to describe various actions performed for acceptance testing.
In some articles people use "I" and in some articles people use "User".
The same is the case for reaction (Then) statements:
Case 1 --> xyz page should be displayed
Case 2 --> xyz page is displayed
Ex 1:
Given statement abc
When user performs action A
Then screen xyz should be displayed
Ex 2:
Given statement abc
When I perform action A
Then screen xyz is displayed
Is it better to write "user" or "I", and is it better to write "should be" or "is", so that my BDD scenarios are presentable and correct as per standards?
References to any article would also be a great help. Thanks in advance.

Both are correct, and have different benefits.
Dan North, who invented BDD, says he prefers 1st person ("I"), as it allows him to put himself in the user's shoes. However, he's often used 3rd person ("he / she / the customer") as he does in his introductory article.
The first-person use can help to make a scenario fit with the standard story template:
As <a stakeholder>
I want <something>
So that <goal>.
If the stakeholder is the user, then it makes sense to use "I" again in the scenario.
However, sometimes scenarios' outcomes aren't really for the benefit of the user.
As the moderator of the site
I want users to prove that they're human
So that I can limit spam.
In this case, it would be odd to put the scenario in the perspective of the user, because the user doesn't really want to be filling in that captcha box. We'd probably use 3rd person here.
Given an odd-looking number "31" on a door frame
When the user identifies the number as "31"
Then the system should authenticate them as being human.
You may also find that you have more than one stakeholder whose outcomes are important. In that case, putting the scenario in the 3rd person can help to spot any other outcomes or important stakeholders that might not have been included.
Given Suzanne searches for a taxi for 4pm to take her to hospital
And the estimated price is $23
When she books the taxi
Then she should get a confirmation email
And the driver should be notified of the trip
And she should be charged $23.
Because both Suzanne, and the driver, and Uber, are all involved in this scenario, it makes more sense to put them in the 3rd person.
I tend to prefer the 3rd person, especially for large products with a lot of scenarios, as I find it confusing to have to switch 1st person roles, and it allows for consistency. It also means you can give the actors in the scenarios memorable names and talk about them more easily ("The one where Clarence Clumsy types his number in wrong", for example).
However, remember that when you're talking to your stakeholders to get hold of these scenarios, the most important thing is the conversation. Write down their words as closely as you can, and only compromise the language afterwards when you come to rephrase it using Gherkin.

How to determine if a piece of text mentions a product

I'm new to natural language process so I apologize if my question is unclear. I have read a book or two on the subject and done general research of various libraries to figure out how i should be doing this, but I'm not confident yet that know what to do.
I'm playing with an idea for an application and part of it is trying to find product mentions in unstructured text (e.g. tweets, facebook posts, emails, websites, etc.) in real-time. I wont go into what the products are but it can be assumed that they are known (stored in a file or database). Some examples:
"starting tomorrow, we have 5 boxes of #hersheys snickers available for $5 each - limit 1 pp" (snickers is the product from the hershey company [mentioned as "#hersheys"])
"Big news: 12-oz. bottles of Coke and Pepsi on sale starting Fri." (coca-cola is the product [aliased as "coke"] from coca-cola company and Pepsi is the product from the PepsiCo company)
"#OMG, i just bought my dream car. a mustang!!!!" (mustang is the product from Ford)
So basically, given a piece of text, query the text to see if it mentions a product and receive some indication (boolean or confidence number) that it does mention the product.
Some concerns I have are:
Missing products because of misspellings. I thought maybe i could use a string similarity check to catch these.
Product names that are also English words or things would get caught. Like mustang the horse versus mustang the car
Needing to keep a list of alternative names for products (e.g. "coke" for "coco-cola", etc.)
I don't really know where to start with this but any help would be appreciated. I've already looked at NLTK and SciKit and didn't really gleam how to do this from there. If you know of examples or papers that explain this, links would be helpful. I'm not specific to any language at this point. Java preferably but Python and Scala are acceptable.

The answer that you chose is not really answering your question.
The best approach you can take is using Named Entity Recognizer(NER) and POS tagger (grab NNP/NNPS; Proper nouns). The database there might be missing some new brands like Lyft (Uber's rival) but without developing your own prop database, Stanford tagger will solve half of your immediate needs.
If you have time, I would build the dictionary that has every brands name and simply extract it from tweet strings.
http://www.namedevelopment.com/brand-names.html
If you know how to crawl, it's not a hard problem to solve.

It looks like your goal is to classify linguistic forms in a given text as references to semantic entities (which can be referred to by many different linguistic forms). You describe a number of subtasks which should be done in order to get good results, but they nevertheless are still independent tasks.
Misspellings
In order to deal with potential misspellings of words, you need to associate these possible misspellings to their canonical (i.e. correct) form.
Phonetic similarity: Many reasons for "misspellings" is opacity in the relationship between the word's phonetic form (i.e. how it sounds) and its orthographic form (i.e. how it's spelled). Therefore, a good way to address this is to index terms phonetically so that e.g. innovashun is associated with innovation.
Form similarity: Additionally, you could do a string similarity check, but you may introduce a lot of noise into your results which you would have to address because many distinct words are in fact very similar (e.g. chic vs. chick). You could make this a bit smarter by first morphologically analyzing the word and then using a tree kernel instead.
Hand-made mappings: You can also simply make a list of common misspelling → canonical_form mappings. This would work well for "exceptions" not handled by the above methods.
Word-sense disambiguation
Mustang the car and Mustang the horse are the same form but refer to entirely different entities (or rather classes of entities, if you want to be pedantic). In fact, we ourselves as humans can't tell which one is meant unless we also know the word's context. One widely-used way of modelling this context is distributional lexical semantics: Defining a word's semantic similarity to another as the similarity of their lexical contexts, i.e. the words preceding and succeeding them in text.
Linguistic aliases (synonyms)
As stated above, any given semantic entity can be referred to in a number of different ways: bathroom, washroom, restroom, toilet, water closet, WC, loo, little boys'/girls' room, throne room etc. For simple meanings referring to generic entities like this, they can often be considered to be variant spellings in the same way that "common misspellings" are and can be mapped to a "canonical" form with a list. For ambiguous references such as throne room, other metrics (such as lexical-distributional methods) can also be included in order to disambiguate the meaning, so that you don't relate e.g. I'm in the throne room just now! to The throne room of the Buckingham Palace is beautiful.
Conclusion
You have a lot of work to do in order to get where you want to go, but it's all interesting stuff and there are already good libraries available for doing most of these tasks.

Accurate algorithm for normalizing taxonomy terms?

I'm developing a shopping comparison website, and the project is in a very advanced stage. We index 50 million products daily using merchant feeds from various affiliate networks. Most of the problems I had is already solved, including the majority of the performance bottlenecks.
What is my problem: Please, first of all, we are using apache solr with drupal BUT, this problem IS NOT specific to drupal or solr, if you do not have knowledge of them, it doesn't matter.
We receive product feeds from over 2000 different merchants, and those feeds are a mess. They have no specific pattern, each merchant send the feeds the way they want. We already solved many problems regarding this, but one remains. Normalizing the taxonomy terms for the faceted browsing functionality.
Suppose that I have a "Narrow by Brands" browsing facet on my website. Now suppose that 100 merchants offer products from Microsoft. Now comes the problem. Some merchants put in the "Brands" column of the data feed "Microsoft", others "Microsoft, Inc.", others "Microsoft Corporation" others "Products from Microsoft", etc... there is no specific pattern between merchants and worst, some individual merchants are so sloppy that they have different strings for the same brand IN THE SAME DATA FEED.
We do not want all those different brands appearing in the navigation. We have a manual solution to the problem where we manually map the imported brands to the "good" brands table ("Microsoft Corporation" -> "Microsoft", "Products from Microsoft" -> "Microsoft", etc..). We have something like 10,000 brands in the database and this is doable. The problem is when it comes with bigger things like "Authors". When we import books into the system, there are over 800,000 authors and we have the same problem and this is not doable by hand mapping. The problem is the same: "Tom Mike Apostol", "Tom M. Apostol", "Apostol, Tom M.", etc...
Does anybody know a good way to automatically solve this problem with an acceptable degree of accuracy (85%-95% accuracy)?
Thanks you for the help!

Some idea that comes to my mind, altough it's just a loose thought:
Convert names to initials (in your example: TMA). Treat '-' as spaces, so fe. Antoine de Saint-Exupéry would be ADSE. Problem here is how to treat ",", altough, it's common usage is to have surname before forename, so just swapping positions should work (so A,TM would be TM,A, get rid of comma - TMA).
Filters authors in database by those initials
For each intitial, if you have whole name (Tom, Apostol) check if it match, otherwise (M.) consider it a match automatically.
If you want some tolerance, you can compare names with Levenshtein distance and tolerate some differences (here you have Oracle implementation)
Names that match you treat as the same authors, to find the whole name, for each initial (T, M, A) you look up your filtered authors (after step 2) and try to find one without just initial (M.) but with whole name (Mike), if you can't find one, use initial. Therefore, each of examples you gave would be converted to the same value, which would be full name (Tom Mike Apostol).
Things that are worth to think about:
Include mappings for name synonyms (would be more likely maximally hundred of records, like Thomas <-> Tom
This way is crucial to have valid initials (no M instead of N etc.).
edit: I've coded such thing some time ago, when I had to identify a person by it's signature, ignoring scanning problems, people sometimes sign by Name S. Surname, or N.S. or just by Name Surname (which is another thing maybe you should consider in the solution, to allow the algorithm to ignore second name, altough in your situation it would be rather rare to ommit someone's second name I guess).

Organizing Lots of Data in Search Results

I'm working on a pretty basic web app (not much more than CRUD stuff). However, the requirements call for a bunch of data to be displayed with each item in the search results - IDs, dates, email addresses, long descriptions... too much to fit neatly into a simple grid, and too dissimilar to make them flow together (like the natural language example from this article.)
Is there a design pattern for attractively displaying many descriptive fields with each search result?
(Please don't tell me to just remove some fields from the results; that's not an option for this project.)

Obviously there are many ways you can handle this, and to a degree it's a factor of your information design abilities and preferences.
Natural Data Groupings
What I would do is try to organize your data into a small number of "buckets." You state that the data are too dissimilar to be arranged into a sentence, but it's likely you can create a few logical groups. Since we can't see all your data, I'll guess that you have information about a person (email, name, ID?), about some sort of event (dates? type?), or maybe about some kind of object related to the person (orders? classes?). Whatever they are, some of the data will be more closely related to each other than others.
Designing in Chunks
Take each loose "bucket" and design a kind of "plate" -- a grouping just for the information in that bucket. The design problem within this constrained chunk is easier to tackle: maybe it's a little table-like layout, maybe it's something non-tabular, like the stackoverflow user "nameplate". Maybe long textual data have their own plates, or maybe they're grouped into a single plate, but with a preview/detail click-for-more arrangement.
Using a Grid
Now that you have a small number of "plates," go back to a grid-like approach for your overall search result row design. Arrange the plates as units within the row, and be sure to keep them aligned. Following an overall grid (HTML table or otherwise) for the plates will avoid an "information soup" problem. You'll have clean columns that scan well, and a readable, natural information hierarchy. The natural language example you cite would indeed be difficult to parse if it were one of many rows displayed in a search results grid.
Consistency
Be sure to use a common "design vocabulary" when you're working on the chunks -- consistent styling of labels, consistent spacing... so when everything's displayed, despite the bulk of information, it all feels like it's part of the same family.
It's an interesting design exercise. Many comps, lots of iteration, and some brainstorming should get you where you need to be.

It probably depends on the content you're displaying. Look at the StackOverflow layout for this question. It has Votes, Title, Description, Tags, Author, etc. The content wouldn't work well in a grid for sure, nor does it flow nicely on it's own.
I think it's time to get creative ;)

No one ever thinks about what this is going to look like on their screen, do they?
One thing you can do is truncate the displayed text, and then display the expanded version in a tooltip on hover, or after the user clicks on it.
For example, display only the two-letter state abbreviation but show the full state name on hover.
Or, to save even more space, only display the state abbreviation, and put the entire address in the tooltip.
For long descriptions, you can display only the first few characters, followed by an ellipsis or the word "More". Then, show the full text either on hover or on click.
One disadvantage of the hover approach is that you can't sort the column on that text. There's nothing for the user to click to request the sort.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string