Can I use Single Value Decomposition (SVD) with implicit feedback? What I have is historical purchase data (user-ID, item-ID, timestamp, quantity).
Surprise library says it not possible, but I have read studies where they have used it. What is your thoughts?
Related
I am currently working on a project where a multi criteria decision making algorithm is needed in order to evaluate several alternatives for a given goal. After long research, I decided to use the AHP method for my case study. The problem is that the alternatives taken into account for the given goal contain incomplete data.
For example, I am interested in buying a house and I have three alternatives to consider. One criterion for comparing them is the size of the house. Let’s assume that I know the sizes of some of the rooms of these houses, but I do not have information about the actual sizes of the entire houses.
My questions are:
Can we apply AHP (or any MCDM method) when we are dealing with
incomplete data?
What are the consequences?
And, how can we minimize the presence of missing data in MCDM?
I would really appreciate some advice or help! Thanks!
If you still looking for answers, please let me answer your questions.
Before the detail explain, I coludn't answer with a technical approach in programming language.
First, we can use uncertinal data for MCDM, AHP with statical method.
As reducing lost of data, you can use deep learning concepts like entropy.
The result of it will be get reliability by accuracy of probabilistic approach.
The example that you talked, you could find the data of entire extent in other houses has same extent of criteria. Accuracy will depend on number of criteria, reliability of inference.
To get the perfect answer in your problem, you might need to know optimization, linear algebra, calculus, statistics above intermediate level
I'm student in management major, and I would help you as I can. I hope you get what you want
In below post of Analytics Vidya, ANOVA test has been performed on COVID data, to check whether the difference in posotive cases of denser region is statistically significant.
I believe ANOVA test can’t be performed on this COVID time series data, atleast not in way as it has been done in this post.
Sample data has been consider randomly from different groups(denser1, denser2…denser4). The data is time series so it is more likely that number of positive cases in random sample of groups will be from different point of time.
There might be the case denser1 has random data from early covid time and another region has random data from another point of time. If this is the case, then F-Statistics will high certainly.
Can anyone explain if you have other opinions?
https://www.analyticsvidhya.com/blog/2020/06/introduction-anova-statistics-data-science-covid-python/
ANOVA should not be applied to time-series data, as the independence assumption is violated. The issue with independence is that days tend to correlate very highly. For example, if you know that today you have 1400 positive cases, you would expect tomorrow to have a similar number of positive cases, regardless of any underlying trends.
It sounds like you're trying to determine causality of different treatments (ie mask mandates or other restrictions etc) and their effects on positive cases. The best way to infer causality is usually to perform A-B testing, but obviously in this case it would not be reasonable to give different populations different treatments. One method that is good for going back and retro-actively inferring causality is called "synthetic control".
https://economics.mit.edu/files/17847
Above is linked a basic paper on the methodology. The hard part of this analysis will be in constructing synthetic counterfactuals or "controls" to test your actual population against.
If this is not what you're looking for, please reply with a clarifying question, but I think this should be an appropriate method that is well-suited to studying time-series data.
I want to use spark ALS for multi-behavior implicit feedback recommendation. There are several kinds of implicit user behavior data, such as browses, carts, deals etc.
I have checked numerous online sources for ALS implicit feedback recommendation, but almost all of them utilized only single source of data, in shopping case, the deal data.
I am wonder if whether only the deal data is needed or utilize all kinds of data for better results?
There is no general purpose, principled way to use ALS with multiple behaviors. Sometimes different behaviors are used to vary implicit ratings -- for example, viewing an item might be worth 0.1, viewing in multiple sessions might be worth 0.3, putting it in a cart 0.5, and a purchase 1.0. But this feels a bit hacky, and doesn't readily provide a way to exploit all the data you might have.
For a more principled approach that scales to handling lots of different features, I would take a look at the Universal Recommender. Disclaimer: I've never used it, I just think it sounds promising.
yes, you'd better using all deal data and user data. You use ALS to acquire user vector and deal vector, then compute the similarity of deal and user, if the user or deal have no vector, we can't get the similarity for next recommendation.
I had a test for ALS, and used the similarity of user and deal for training my model, it gave me big surprising,the auc follow as :
2018-06-05 21:25:28,138 INFO [21:25:28] [58] test-auc:0.968764 train-auc:0.972966
2018-06-05 21:25:28,442 INFO [21:25:28] [59] test-auc:0.968865 train-auc:0.973075
because I use all the deal and user information to train the model. but rmse is 3.6, maybe I should tune my parameter.
I wanted to understand the puropse of using SNOMED-CT for normalization of clinical terms.
Let's say I have a criteria/statement like
Gender is Male
My question is if SNOMED-CT is used for normalizing both
Gender and Male OR just one of them like
Sex is M OR
Gender is M
I'm not sure I quite follow the question but this might help. SNOMED CT can repressent the same information in multiple ways. For example left sided hip scan can be repressented using a single concept (426100003 | Ultrasound scan of left hip |) or gluing a laterality of left to the concept for ultrasound of hip (the actual expression is a little complex here, I can post it if you need).
However when doing some operations, e.g. subsumption tests, the form needs to be consistent. Thus there is are standardised forms and standard algorithms to get to them, I nearly always use the Long Normal Form.
So in short the normal form of an expression is a standard repressentation of that expression which can be transformed to from other repressentations.
More information can be found if you search "Normal form" on the technical reference guide: http://ihtsdo.org/fileadmin/user_upload/doc/en_gb/tig.html
Both. It includes terms for the abstract concept of "Gender", the notion of a "Finding of biological sex", and the concept of a specific finding like "Male":
http://browser.ihtsdotools.org/?perspective=full&conceptId1=365873007
http://browser.ihtsdotools.org/?perspective=full&conceptId1=429019009
http://browser.ihtsdotools.org/?perspective=full&conceptId1=248153007
However, please note that the concept of Gender is different from Sex.
Supporting the answer above but from a different perspective
Normalization using SNOMED CT allows computer to
- Define a single set of representations (i.e. you don't have to map from M or F) that can be used for information exchange and understood in all healthcare settings irrespective of the geographic or healthcare domain.
- These representations are used as rules for queries in clinical decision support (for example). Where these rules are developed by a professional body (such as e.g. pharmacists) the rules can be shared irrespective of your legacy system and used consistently across all products. At least that is the intention.
This supports safe clinical practice.
I am looking for a resource similar to WordNet. However, I want to be able to look up the positive/negative connotation of a word. For example:
bribe - negative
offer - positive
I'm curious as to whether anyone has run across any tool like this in AI/NLP research, or even in linguistics.
UPDATE:
For the curious, the accepted answer below put me on the right track towards what I needed. Wikipedia listed several different resources. The two I would recommend (because of ease of use/free use for a small number of API calls) are AlchemyAPI and Lymbix. I decided to go with AlchemyAPI, since people affiliated with academic institutions (like myself) and non-profits can get even more API calls per day if they just email the company.
Start looking up topics on 'sentiment analysis': http://en.wikipedia.org/wiki/Sentiment_analysis
The are some vocabulary compilations regarding affect, aka dictionaries of affect, such as the Affective Norms of English Words (ANEW) or the Dictionary of Affect in Language (DAL). They provide a dimensional representation of affect (valence, activation and control) that may be of use in a sentiment analysis scenario (detection of positive/negative connotation). In this sense, EmoLib works with the former, by default, but may be easily extended with a more specific lexicon to tackle particular needs (for example, EmoLib provides an additional neutral label that is more appropriate than the positive/negative tag set alone in a Text-To-Speech synthesis setting).
There is also SentiWordNet, which gives you positive, negative and objective scores for each WordNet synset.
However, you should be aware that the positive and negative connotation of a term often depends on the context in which it is used. A great introduction to this topic is the book Opinion mining and sentiment analysis by Bo Pang and Lillian Lee, which is available online for free.