Matching Based on Arbitrary Categories and Similarity Measures

Matching Based on Arbitrary Categories and Similarity Measures - search

I have customer database who have certain attributes, and a customer type. The collection of attributes can vary (they do come from a finite set though), and when I look at a new customer with unknown type, with given attributes, I would like to determine which type s/he belongs to. For example, say I have these customers already in DB,
Customer | Type | Attributes
1 A 44,32,5,'X'
2 A 3,32,66,'A'
3 B 6,32,'A', 'B'
4 C 47,31,2,'H'
5 C 14,32,2,'O'
6 C 2,'C'
7 A 44
When I receive a new customer who has attributes, for example, 3,32,2, I would like to determine which type this customer belongs to, and the code should report its confidence (as percentage) of this match.
What is the best method to use here? Something statistical, or a method based on an affinity matrix of some kind, or recommendation engine style Pearson Correlation coefficients based approach? Sample, pseude code would be most welcome, but any, all ideas are fine.
Thanks,

The way to solve this problem is using Naive Bayes.

Related

Formal UML representation of reshaping a data frame

For documentation of the restructuring of a data table from "wide" using a criteria column for each score to using a score column and a criterion column my first reaction was to use UML class diagram.
I am aware that by changing the structure of the data table, the class attributes have not changed.
My first question is whether the wide or the long version is the more correct representation of the data table?
My second question is whether it would make sense to relate the two representations - and if so, by which relationship?
My third question would be whether something else than a UML class diagram would be more suitable for documenting the reshaping (data preprocessing before showing distribution as a box pot in R).

You jumped a little bit to fast from the table to the UML. This makes your question very confusing, because what is wide as a table is represented long as a class, and the contrary.
Reformulating your problem, it appears that you are refactoring some tables. The wide table shows several values for a same student in the same row. This means that the maximum number of exercises is fixed by the table structure:
ID Ex1 Ex2 Ex3 .... Ex N
-----------------------------
111 A A A ... A
119 A C - ... D
127 B F B ... F
The long table has fewer columns, and each row shows only 1 specific score of 1 specific student:
ID # Score
---------------
111 1 A
111 2 A
111 3 A
...
111 N A
119 1 A
119 2 C
...
You can model this structure in an UML class diagram. But in UML, the table layout doesn't matter: that's an issue of the ORM mapping and you could perfectly have one class model (with an attribute or an association having a multiplicity 1..N) that could be implemented using either the wide or the long version. If the multiplicity would be 1..* only the long option would work.
Now to your questions:
Both representations are correct; they just have different characteristics. The wide is inflexible, since the maximum number of scores is fixed by the table structure. Also adding a new score requires in fact to update a record (so the possible concurrency of both models is not the same). The long is a little more complex to use if you want to show history of a student scores in a row.
Yes it makes sense to relate both, especially if you're writing for a transformation of the first into the second.
UML would not add necessarily value here. If you're really about tables and values, you could as well use an Entity/Relationship diagram. But UML has the advantage of allowing database modelling as well and it lets you add behavioral aspects. If not now, then later. You could consider using the non-standard «table» stereotype, to clarify what you are modelling a table (so a low level view on your design).

Handle Multiple Intents in single Utterance in LUIS

I need to handle multiple Intents in single Utterance in Luis. For example, there is an Intent called "Order" and i have configured that with Utterance below.
I want 2 pizza from Dominos and 2 bucket chicken from abc and xyz.
In the above Utterance there is 2 different orders i need to track.
1) 2 pizza from Dominos
where the Entitees are:
Quantity - 2,
Dish - Pizza,
Store - Dominos
2) 2 bucket chicken from KFC
where the Entitees are
Quantity - 2,
Dish - bucket chicken,
Store - abc and xyz
abc and xyz is the Store name and its a single store (store name itself contains and like Larson & toubro).
How can i handle this in LUIS? How can we handle same entitees multiple times in single utterance? Anyother NLP supports this thing?.
Can someone guide me on this?

With the example you provided, it looks like there is only one intent and that is Order. The user might order different items from different stores. So basically your utterances follow a pattern:
Buy x from b
Buy x,y,z from store a
Buy x from a and y from b
You can train your LUIS app with possible patterns to achieve more accuracy.
At the API endpoint you can check for entities do the following:
If there is only one entity of type item and one entity of type store then its type 1.
If there are multiple entities of type items and only one entity of
store then its type 2.
If there are multiple entities of type item and multiple entities of type store then type 3.
Its pretty much easy to get the store and items to order in the first 2 types. For the third type you can make use of the startIndex and endIndex property of entity returned by LUIS. You can group item entity with store entity checking the index values i.e Buy x from a and y from b so x is the first item encountered and so the store a so map item x with store a.
PS: This ain't the best approach, I'll update you if I get a better way out.

How should I use singleton in DDD?

I am implementing a school related app for the local government. I have classes and I want to count grade using the following formula: grade = acedemicYear - commencingYear + 1.
You can model this 2 ways:
both commencingYear and academicYear have a Year type and the acedemicYear year is some kind of singleton, so AcademicYear.getInstance() - this.commencingYear + 1
another solution that commencingYear has a type of AcademicYear, and I have a Calendar class, which gives me the current year, so Calendar.getAcademicYear() - this.commencingYear + 1
Still I don't feel these right. I am not sure whether I should inject the year into the model or it should be inside the model. Another problem that the academic year should be changed more or less manually, at least it starts every year on a different date. By increasing the academic year the grade > 8 means that the class is finished, so children from that class should not be on the current student list. What do you think, what is the best way to model this?

AcademicYear can of course be a value type like you've done. But isn't that over complicating things?
If you have an entity type which contains both commencingYear and academicYear you can easily control the value for those fields. Thus if someone tries to enter a date which is out of bounds you can just thrown an exception.
Regarding the calculation it sounds like business rules and therefore it should be wrapped within a method in the entity or in a domain service.
i.e. it's a rule, but a rule for a specific entity in the domain. thus it should not be wrapped in a static somewhere but implemented in the correct entity class.
Writing a more specific answer is hard as I have now knowledge about your domain or how you have implemented it.

how to avoid redundant association with composite relationship

I have a composite relationship between two objects (A & B) (A is composed of many Bs). Now another class (C) has a one-to-many association relationship to class 'B'. I would like to be able to retrieve all instances of class (A) from class (C).
How do I do this without creating redundant associations? Since 'C' has basically a list of 'Bs' I can't just iterate over all of them asking what's your 'A' and eventually returning a list of 'A' to 'C'.
I really hope someone out there understands this and doesn't find it completely confusing!
Thanks
Update:
Dataset has a list of defined variables. An activity can select a subset of variables from each dataset and give some attributes to them, hence an association class is used. Now if I want to be able to retrieve from an Activity instance the datasets it is registered with, how do I achieve this in UML and in object implementation?

According to your task, it is IMPOSSIBLE to take all B's from all C's. Because there is no sentence that states that any B belong to some C.
On the contrary, as A have compositions of B (notice, A IS NOT composition, A HAS composition of B, for A can have heaps of other things, too), and any B MUST belong to some A object, you can easily get all B's from all A's. Only create the list of B as a set for not to have multiply values.
But even if the association B-A includes B->A connection, you cannot get all A's from B's. Because some A's can be EMPTY. You'll never reach them. from B's.
So, you cannot take all A from C's for TWO important reasons. And NO redundant association will help.
As for the question set after "Update",
For getting All from variables, use
Dataset <---- Variable ---> Activity // This variant is the easiest for adding associations.
For getting connected datasets from an activity,
Dataset <--- Variable <----- Activity
But please, notice, it is not updated, it is DIFFERENT question.

I assume your diagram would look something like this :
If C has a reference to B, and B has a reference to A, then it should be no problem navigating to A from C. There is no need for any additional redundant relationships.

self-referencing core data model for specific use case

I have seen other posts that self-referencing a core data entity is possible, but I feel my use case is a little different and I am having a hard time understanding how to wire things up.
I have a Person entity and I want to track 2 things:
- an array of Person entities whose profile the user "visited"
- an array of Person entities who have viewed "this" users profile
The inverse logic is making it hard to understand.
I have User A, User B.
If user A visits user B, the following relationships should be set up:
- User A's visited profiles shows User B.
- User B should see that user A visited him.
This is a To-Many relationship as things are "interesting" only when you know who you followed and who's following you... :-)
Am I making this more complex than it is? :-(
What I tried:
Person Entity
-visitedProfiles : inverse is viewedProfiles (To-Many relationship)
-viewedProfiles : inverse is visitedProfiles (To-Many relationship)
Result:
User A --> User B (user A visists user B)
User A sees User B in BOTH (visitedProfiles and viewedProfiles) relationship.
Side-effect:
Also, regardless of how many profiles I visit, "visitedProfiles" and "viewedProfiles" always has only 1 item in the array (ie. the last profile I visited)

It's not an especially complicated case. I personally find your choice of words a bit confusing, though. "viewedProfiles" and "visitedProfiles" don't sound like inverses of each other to me. How about "viewedProfiles" and "viewers" instead?
Regardless of the word choice, though, set up the relationships as you have described. If you add B to "viewers of A", then B's "viewedProfiles" will also be updated.
Your side effect of knowing the most recent view/visit takes a little extra work. You could use an ordered relationship for the viewers/viewees; that feels like the simplest thing to do. Or you could add a new entity, a Visit, which notes the viewer, the viewee, and the time/date of the visit. But that second approach is indeed more complicated.

Your definition of the relationships looks OK. You can call either
[a addVisitedProfilesObject:b];
or
[b addViewedProfilesObject:a];
to add b to a.visitedProfiles and a to b.viewedProfiles.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string