Problem creating a hispanic person in dialogflow - dialogflow-es

So,
Due to cultural differences people in hispanic countries have quite a number of surnames.
Taking someone elses surname isn´t the norm, you just combine your surnames in most cases:
1st husband, 1st bride, 2nd husband, 2nd bride, 3rd husband, 3rd bride, 4th husband, 4th bride.
You have to add a second surname to get Spanish nationality and some people just repeat their last name because they refuse to understand how culturally important this is in Spain
Athletic Bilbao can get away with saying all of their players have basque origins if they start tracing back the multiple surnames and have been known to do so/approach foreign players with basque surnames among the neverending list to ask if they would be interested in joining.
This can be quite problematic in some cases but it makes it easy to differentiate people:
There can be an elevated number of Thomas Smith's in your city, there is hardly ever two Thomas Smith matchingCommonSecondSurname in the same areas.
Because of this people are used to use at least two of their surnames in hispanic countries unless their name is unique enough.
On to my issue:
My dialogflow agent asks someone to identify themselves in order to provide some extra information to the business.
I have added multiple examples with several surnames, they are identified correctly by the training proccess but the agent struggles with them in actual conversation picking either the second surname as the full person or the person first surname as the entity, never the full thing.
Neither option is valid in a hispanic country where I would be using this solution.
Anything I can do to improve this?
Creating a custom entity for a person seems like an arduous task to me.
It is not vital and I could do without the extra tidbit as I am storing their email already. It just seems like a basic thing that should be doable and I am struggling to believe I am the first person to face this issue.

Related

Confusion with entities and aggregate roots for patients, dentists, treatments and medical history

I am new to DDD and decided to practice it with a dental clinic system I'm developing, but I'm struggling with modeling the domain so an extra pair of eyes will be greatly appreciated.
For this dentistry system, the domain expert told me that a patient holds only one medical history. The medical history must have a Record Number which is unique on the system. The medical history holds dental treatments the patient could have (like planned treatments) as well as treatments that the patient already had. Every treatment has a price, and so the medical history contains a Total price on it (based on planned/applied treatments). Whenever a patient gets a treatment done, he/she will have to pay with at least 50% of that treatment price, meaning he/she will eventually pay the rest of it on future appointments (if no treatment plan exists, he/she will have to pay for the 100% of the price). Finally, this dentistry clinic gives the option to patients to pay on different currencies, because sometimes a patient that comes for the day has only Euros, but then he decides he wants a plan and for future appointments will pay on Pounds.
Based on all this, and my beginner knowledge of DDD, my first thinking is that I have these entities:
Patient
Treatment
Dentist
I will have several value objects, but the most important ones might be:
Money (for prices and currency)
Signature (for applied treatments)
Tooth or Teeth (used on Treatment entity)
And I can only find one aggregate which is Medical history since it puts together patient info, as well as treatments (planned and applied). But this will mean that whenever I update a Medical History, I will have to update patient info and treatments, even if one of those never changed. Patients could change their personal information, which will be reflected in medical history, but it doesn't affect treatments.
I am a bit confused on how to model this. Please help!
Remember that Aggregates, and by extension Bounded Contexts (BC), are a grouping of data and business logic that belong together (and most likely things that need to change transactionally). The data that an aggregate contains is there because the business logic needs it, not because some application screen needs it. This is very important to clear up some confusion and to free you of some constraints in order to design your aggregates.
For example, when you display the Medical History to the user, you might want to show the Patient's name, address, age and so on, and also the treatments prices, but if you think about it, you don't need any of this to manage the Medical History. From what you say, the Medical History has a Record Number, a PatientId, and a list of TreatmentIds with maybe the Dates when they were done.
When you want to display the Medical History to the user, you can use UI Composition. So, you get the Medical History (which is mostly a bunch of Ids and dates). Then from the Medical History's PatientId, you can get the Patients's information from the BC that owns it. From the TreatmentIds, you can get the Treatment descriptions from some BC that owns that and their prices from the BC they belong to.
So, based on that, you can build your aggregates not based on the "relevant names" on your domain like Patient, Treatment or Dentist, but by the business logic they implement.
This is just wild guessing, but I can think of:
BC Marketing (for lack of a better name): Contains the descriptions of all treatments, information about the Dentists, Information about the rooms and materials, etc. So, texts, pictures and other details.
BC Finances: Contains information about the prices of each treatment, payment records of each payment, credits and debits of each patient, etc. In charge of keeping track of all these things. For example, it could know when a treatment starts/ends and depending on the Patient's record, require 50 or 100% payment. There's no need of direct relation to the Medical History here, it only needs to know if it's the first treatment or not.
BC Scheduling: In charge of scheduling new treatments and keep track of when they start and finish. This could contain the History, or it could potentially be somewhere else if necessary.
BC Medical: In charge of keeping all the medical records, allergies, medical details of the status of the teeth, etc.
BC Patients Care: In charge of tracking patients' information, name, nationality, contact details, etc.
Once you have an idea of the Bounded Context you can define the aggregates. There can be one or more per BC. Also, some things might not be an aggregate. For example, the Medical History might not require an actual aggregate if it's basically a record of treatment Ids and the dates they were made and there's no business logic associated (the history is not going to deny a treatment, have opinions on when a treatment should happen and so on, it's just a history).
Don't take this as a recommended design, but just as a thought process to come up with your own solution.
Entities have an Id where as Value Objects have structure identity which means if two value objects have the same value then they are the same.
In case of Money, there is no difference between two $5 bills, so it can be a value object.
You have not described the role and attributes of Tooth and Signature.
In case of Tooth, does it matter whose Tooth is it? Can You replace a patient's tooth with any other tooth which has the same attributes? If it does matter, then Tooth requires an Id therefore it is an entity.
In case of Signature, how are you going to compare two signatures? Do you have an image recognition software that can compare the look of two Signatures and decide that they are the same? You might have two patients with similar looking signatures, should their signature be treated as the same?
If you choose Medical history to be an Aggregate, then you should treat it as one object. Do you want to load the entire Medical history, in order to add a new Treatment to it? Can a Treatment be associated with another Entity, such as Dentist? If you can use a portion of Medical history (such as Treatment) individually then it is not an aggregate.
Some good tutorials:
Entity vs Value Object by Vladimir Khorikov
Entities, Value Objects, Aggregates and Roots by Jimmy Bogard

What person and mood should I use in Gherkin/Specflow Given/When/Then statements?

I am a bit confused with the way people write statements in the Gherkin language to describe various actions performed for acceptance testing.
In some articles people use "I" and in some articles people use "User".
The same is the case for reaction (Then) statements:
Case 1 --> xyz page should be displayed
Case 2 --> xyz page is displayed
Ex 1:
Given statement abc
When user performs action A
Then screen xyz should be displayed
Ex 2:
Given statement abc
When I perform action A
Then screen xyz is displayed
Is it better to write "user" or "I", and is it better to write "should be" or "is", so that my BDD scenarios are presentable and correct as per standards?
References to any article would also be a great help. Thanks in advance.
Both are correct, and have different benefits.
Dan North, who invented BDD, says he prefers 1st person ("I"), as it allows him to put himself in the user's shoes. However, he's often used 3rd person ("he / she / the customer") as he does in his introductory article.
The first-person use can help to make a scenario fit with the standard story template:
As <a stakeholder>
I want <something>
So that <goal>.
If the stakeholder is the user, then it makes sense to use "I" again in the scenario.
However, sometimes scenarios' outcomes aren't really for the benefit of the user.
As the moderator of the site
I want users to prove that they're human
So that I can limit spam.
In this case, it would be odd to put the scenario in the perspective of the user, because the user doesn't really want to be filling in that captcha box. We'd probably use 3rd person here.
Given an odd-looking number "31" on a door frame
When the user identifies the number as "31"
Then the system should authenticate them as being human.
You may also find that you have more than one stakeholder whose outcomes are important. In that case, putting the scenario in the 3rd person can help to spot any other outcomes or important stakeholders that might not have been included.
Given Suzanne searches for a taxi for 4pm to take her to hospital
And the estimated price is $23
When she books the taxi
Then she should get a confirmation email
And the driver should be notified of the trip
And she should be charged $23.
Because both Suzanne, and the driver, and Uber, are all involved in this scenario, it makes more sense to put them in the 3rd person.
I tend to prefer the 3rd person, especially for large products with a lot of scenarios, as I find it confusing to have to switch 1st person roles, and it allows for consistency. It also means you can give the actors in the scenarios memorable names and talk about them more easily ("The one where Clarence Clumsy types his number in wrong", for example).
However, remember that when you're talking to your stakeholders to get hold of these scenarios, the most important thing is the conversation. Write down their words as closely as you can, and only compromise the language afterwards when you come to rephrase it using Gherkin.

Heuristic to predict Name or Company

Problem
We are recieving strings and they may either represent a company name or a person's name. We need a heuristic to determine this.
Initial thoughts
Use an XML doc with either node Commercial String /Commercial or Personal String /Personal and score matching strings +1 (sorry dont know how to format XML in SO)
Cant just check for proper nouns. I.E. Bob's Company is a company where Bob Compton is a name
Need to return confidence level in some format. I can't think of how to do it as a percentage, all I can think to do is if it finds a match use an integer
Possible Commercial (all will be converted to lower case): co, co., inc, inc., etc (verbose versions of each)
I can get a English Name list from online
Question
Has anyone ran into this kind of domain problem before? What methods did you use? Any flashy way of solving this?
Thank You.
I haven't done this before, but some other thoughts:
Check for non-proper nouns (e.g. "and", "the", "piping"). In fact, if you have an English dictionary and a names list, any word that is not a name could be a good pointer to a company name.
A big problem is that some companies are just named after a person(s). "Fred Meyer", "J.C. Penney", and "Lockheed Martin" are examples of companies that look just like human names. There's likely no really good way around this (probably nothing easy anyway). If you can categorize first and last names, a double last name or last name only might be a good reason to lower the certainty.
I would agree with your integer idea. Unless you can do some very broad and very thorough testing, your percentages would probably be meaningless. I would probably run all the tests (returning name, company, or unknown) and compare the results, adding up an integer based on consistency in results.
Can you compare to a database of known company names?
E.g. in the UK: http://wck2.companieshouse.gov.uk
Of course, this doesn't help if it's actually someone's name, but there's a company with the same name.

Hierarchical Autosuggest

I am designing an autosuggest feature on a quick-search box. Suggestions will include small icons, multiline text, etc. The application is handling orders. The search field will recognize a variety of different meaningful terms - e.g. customer surname, order id, etc. But when an order ID is input, I want users to get an opportunity to view either the order, or the person. I was thinking that I would like a hierarchy within the list - so if i type 1234, and it matches 5 orders for 3 different people, the 3 people are returned at the top level, and their 5 orders underneath the respective customer.
Quick mockup:
Has anyone seen something like this implemented elsewhere? Don't want to re-invent the wheel. Also interested in any other feedback.
Answer to your question: No, haven't seen this elsewhere.
Feedback on your mockup:
I would say that it is a pretty creative autosuggest solution.
However, I think it is overkill though. If I just want to quickly navigate to the Order page by searching a specific Order ID (and expecting only one result in the autosuggest), but the autosuggest shows up five order items under three people (as shown in your mockup), I think that is way too much, put aside performance.
My idea:
Each autosuggest item contains one Primary Line that can clearly identify the item and additional Details Line(s) that provide more description about the item, similar to Google's search result page and Facebook search autosuggest.
For example, the autosuggest shows up each item like this when users search for an order:
(Order Icon) 23-34534
Loaf of Bread, Soda and more.
By Bob Jones, Paul Smith and others.
You can make each order item (Loaf of Bread, Soda, more) link to the respective order item line in the Order page, and each person name to the respective person page. This method is more concise and takes less space than your mockup while still providing the functionality that you want.
Sometimes, simple is better, less is more. Remember the KISS principle. Think of Apple iPod and iPhone as examples.

Accurate algorithm for normalizing taxonomy terms?

I'm developing a shopping comparison website, and the project is in a very advanced stage. We index 50 million products daily using merchant feeds from various affiliate networks. Most of the problems I had is already solved, including the majority of the performance bottlenecks.
What is my problem: Please, first of all, we are using apache solr with drupal BUT, this problem IS NOT specific to drupal or solr, if you do not have knowledge of them, it doesn't matter.
We receive product feeds from over 2000 different merchants, and those feeds are a mess. They have no specific pattern, each merchant send the feeds the way they want. We already solved many problems regarding this, but one remains. Normalizing the taxonomy terms for the faceted browsing functionality.
Suppose that I have a "Narrow by Brands" browsing facet on my website. Now suppose that 100 merchants offer products from Microsoft. Now comes the problem. Some merchants put in the "Brands" column of the data feed "Microsoft", others "Microsoft, Inc.", others "Microsoft Corporation" others "Products from Microsoft", etc... there is no specific pattern between merchants and worst, some individual merchants are so sloppy that they have different strings for the same brand IN THE SAME DATA FEED.
We do not want all those different brands appearing in the navigation. We have a manual solution to the problem where we manually map the imported brands to the "good" brands table ("Microsoft Corporation" -> "Microsoft", "Products from Microsoft" -> "Microsoft", etc..). We have something like 10,000 brands in the database and this is doable. The problem is when it comes with bigger things like "Authors". When we import books into the system, there are over 800,000 authors and we have the same problem and this is not doable by hand mapping. The problem is the same: "Tom Mike Apostol", "Tom M. Apostol", "Apostol, Tom M.", etc...
Does anybody know a good way to automatically solve this problem with an acceptable degree of accuracy (85%-95% accuracy)?
Thanks you for the help!
Some idea that comes to my mind, altough it's just a loose thought:
Convert names to initials (in your example: TMA). Treat '-' as spaces, so fe. Antoine de Saint-Exupéry would be ADSE. Problem here is how to treat ",", altough, it's common usage is to have surname before forename, so just swapping positions should work (so A,TM would be TM,A, get rid of comma - TMA).
Filters authors in database by those initials
For each intitial, if you have whole name (Tom, Apostol) check if it match, otherwise (M.) consider it a match automatically.
If you want some tolerance, you can compare names with Levenshtein distance and tolerate some differences (here you have Oracle implementation)
Names that match you treat as the same authors, to find the whole name, for each initial (T, M, A) you look up your filtered authors (after step 2) and try to find one without just initial (M.) but with whole name (Mike), if you can't find one, use initial. Therefore, each of examples you gave would be converted to the same value, which would be full name (Tom Mike Apostol).
Things that are worth to think about:
Include mappings for name synonyms (would be more likely maximally hundred of records, like Thomas <-> Tom
This way is crucial to have valid initials (no M instead of N etc.).
edit: I've coded such thing some time ago, when I had to identify a person by it's signature, ignoring scanning problems, people sometimes sign by Name S. Surname, or N.S. or just by Name Surname (which is another thing maybe you should consider in the solution, to allow the algorithm to ignore second name, altough in your situation it would be rather rare to ommit someone's second name I guess).

Resources