Is Myrrix a good choice for content-based recommendations? - myrrix

I understand Myrrix's support for User > Item-based collaborative filtering-style, which will work well for me; but I also need to support content-based recommendations for Items, using a custom similarity algorithm. So if a user selects item X, they will also be able to see the n-most similar items, irrespective of any ratings.
That algorithm will compare Items based upon their intrinsic characteristics and attributes, and I can easily supply that algorithm in Java, but is this supported in Myrrix?

In an indirect way, yes. You can add 'tags' as if they are users and items -- that is, a user tag is like an item that the user interacts with. This provides a way to inject information like user attributes, and vice versa for items. Under the hood, these work just like actual users and items in the algorithm. See the setTag() method and API methods.

Related

Mapping DialogFlow Extracted Parameters to Database column values

What is the suggested solution for handling the mapping of extracted parameters from an intents training phrases to a user-defined value in a database specific to that user.
A practical example I think would be a shopping list app.
Through a Web UI, the user adds catchup to a shopping list which is stored in the database as item.
Then through that agent (i.e. Google Assistant), the utterance results in ketchup being extracted as the item parameter. I wouldn't have a way to know how to map the extracted parameter from the utterance to the user defined value in the daabase
So just to be clear
// in the database added by the user from a web UI
"catchup"
// extracted from voice utterance
"ketchup"
How should I accomplish making sure that the extracted parameters can be matched up to the free form values they have added to the list?
Also, I am inexperienced in this area and have looked through the docs quite a bit and may just be missing this. Wasn't sure if Developer entities, or Session Entities was the solution for this or not.
Either Developer or Session Entities may be useful here. It depends.
If you can enumerate all the possible things that a user can say, and possibly create aliases for some of them, then you should use a Developer Entity. This is easiest and works the best - the ML system has a better chance of matching words when they are pre-defined as part of the training model.
If you can't do that, and its ok that you just want to match things that they have already added to a database, then a Session Entity will work well. This really is best for things that you already have about the user, or which may change dramatically based on context.
You may even wish to offer a combination - define as many entities as you can (to get the most common replies), allow free-form replies, and incorporate these free-form replies as Session Entities.

Search with multiple languages on Azure Search

I have a database which holds keywords for items, and also their localizations in different languages (supporting around 30 different languages right now), if there are any for that item. I want to be able to search these items using Azure Search. However, I'm not sure about how to set up the index architecture. Two solutions come to my mind in this scenario:
Either I will
1) have a different index for each language, and use that language's analyzer for that index. Later on, when I want to search using this index, I will also need to detect the query language coming from the user, and then search on the index corresponding to that language.
or
2) have a single index with a lot of fields that correspond to the different localizations of the item. Azure Search has support on having language priorities when searching, so knowing the user's language may come in handy, but is not necessarily a must.
I'm kind of new to this stuff, so any pointers, links, ideas etc. will be of tremendous help, even if it doesn't answer the question directly.
Option 2 is what we recommend (having a single index with one field per language). You can set some static priorities by assigning field weights using a scoring profile. If you are able to detect the language used in a query, you can scope the search to just that language using the searchFields option.

Complex Finds in Domain Driven Design

I'm looking into converting part of an large existing VB6 system, into .net. I'm trying to use domain driven design, but I'm having a hard time getting my head around some things.
One thing that I'm completely stumped on is how I should handle complex find statements. For example, we currently have a screen that displays a list of saved documents, that the user can select and print off, email, edit or delete. I have a SavedDocument object that does the trick for all the actions, but it only has the properties relevant to it, and I need to display the client name that the document is for and their email address if they have one. I also need to show the policy reference that this document may have come from. The Client and Policy are linked to the SavedDocument but are their own aggregate roots, so are not loaded at the same time the SavedDocuments are.
The user is also allowed to specify several filters to reduce the list down. These to can be from properties that are stored on the SavedDocument or the Client and Policy.
I'm not sure how to handle this from a Domain driven design point of view.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocuments, that I then have to turn into a different object or DTO, and fill with the additional client and policy information? That seem a little slow as I have to load all the details using multiple calls.
Do I have a function on a repository that takes the filters and returns me a list of SavedDocumentsForList objects that contain just the information I want? This seems the quickest but doesn't feel like I'm using DDD.
Do I load everything from their objects and do all the filtering and column selection in a service? This seems the slowest, but also appears to be very domain orientated.
I'm just really confused how to handle these situations, and I've not really seeing any other people asking questions about it, which masks me feel that I'm missing something.
Queries can be handled in a few ways in DDD. Sometimes you can use the domain entities themselves to serve queries. This approach can become cumbersome in scenarios such as yours when queries require projections of multiple aggregates. In this case, it is easier to use objects explicitly designed for the respective queries - effectively DTOs. These DTOs will be read-only and won't have any behavior. This can be referred to as the read-model pattern.

Using Lucene to index private data, should I have a separate index for each user or a single index

I am developing an Azure based website and I want to provide search capabilities using Lucene. (structured json objects would be indexed and stored in Lucene and other content such as Word documents, etc. would be indexed in lucene but stored in blob storage) I want the search to be secure, such that one user would never see a document belonging to another user. I want to allow ad-hoc searches as typed by the user. Lastly, I want to query programmatically to return predefined sets of data, such as "all notes for user X". I think I understand how to add properties to each document to achieve these 3 objectives. (I am listing them here so if anyone is kind enough to answer, they will have better idea of what I am trying to do)
My questions revolve around performance and security.
Can I improve document security by having a separate index for each user, or is including the user's ID as a parameter in each search sufficient?
Can I improve indexing speed and total throughput of the system by having a separate index for each user? My thinking is that having separate indexes would allow me to scale the system by having multiple index writers (perhaps even on different server instances) working at the same time, each on their own index.
Any insight would be greatly appreciated.
Regards,
Nate
Of course, one index.
You can do even better than what you suggested by using ManifoldCF (Apache product that knows how to handle Solr) to manage security.
And one off topic, uninformed suggestion: I'd rather use CloudBees or Heroku (or Amazon) instead of Azure.
Until you will use several machines for indexing I guess it's more convenient to use single index. Lucene community done a lot of work to make indexing process as efficient as it can. So unless you intentionally want to implement distributed indexing I doesn't recommend you to split indexes.
However there are several reasons why you would want to split indexes:
if your machine have several IO devices which could be utilized in parallel. In this case, if you are IO bound, splitting indexes is good idea.
splitting document fields between indexes (this is what ParallelReader is supposed for). This is more exotic form of splitting, but it may be a good idea if search is performed using different groups of fields. Suppose, we have two search query types: the first is using field name and type, and the second is using fields price and discount. If those fields are updated at different rate (I guess, name updates are far more rarely than price updates), updating only part of index would require less IO resources. This will give more overall throughput to the system.

Optional or boolean elements to specify characteristics in XML schema?

I'm trying to create an XML schema to describe some aspects of hospitals. A hospital may have 24 hour coverage on: emergency services, operating room, pharmacist, etc. The entire list is relatively short - around 10. The coverage may be on more than one of these services.
My question is how best to represent this. I'm thinking along the lines of:
<coverage>
<emergencyServices/>
<operatingRoom/>
</coverage>
Basically, the services are optional and, if they exist, the coverage is offered by the hospital.
Alternatively, I could have:
<coverage>
<emergencyServices>true</emergencyServices>
<operatingRoom>true</operatingRoom>
<pharmacist>false</pharmacist>
</coverage>
In this case, I require all the elements, but a value of false means that the coverage isn't offered.
There are probably other approaches.
What's the best practice for something like this? And, if I use the first option, what type should the elements be in the schema?
Best practice here depends really on the consumer.
The short and simple rule is that markup is for structure, and content is for data. So having them contain xs:boolean values is generally the best course.
Now, on to the options:
Having separate untyped elements is simple and clear; sometimes processing systems may have difficulty reading them, because some XML-relational mappers may not see any data in the elements to put in relational tables. But if they had values, like <emergencyServices>true</emergencyServices>, then the relational table would have a value to hold.
Again, if you have fixed element names, it means if your consumer is using a system that maps the XML to a database, every time you add a service, a schema change will have to be made.
There are several other ways; each has trade-offs:
Using a <xs:string> with an enumeration, and allow multiple copies. Then you could have <coverage>emergencyServices</coverage><coverage>operatingRoom</coverage>. It makes adding to the list simpler, but allows duplicates. This scheme does not require schema changes in the database for the consumer.
You could use attributes on the <coverage> element. They would have a xs:boolean type, but still require a schema change. Of course, this evokes the attribute vs. element argument.
One good resource is Chapter 11 of Effective XML. At least this should be read before making a final decision.

Resources