Predicting Trending Product - python-3.x

I am a beginner in Machine Learning . I want to build a model for predicting
trending product. Can you please tell me in which layout and what parameters
do I need in my dataset. Let's say I want to predict a certain product from certain category .So I will be collecting dataset from various e-commerce sites e.g ebay, amazon etc. of that category .
Please tell me in detail.

You will need an dataset with features like
Number of sales
Ratings
Recommendations
And many more.
This is be a classification problem. You need to classify the products as trendy or not trendy. Also you will need labels which describe the data as trendy or not trendy.

Related

Label Dutch reviews on specific customer categories for language classification

I am looking for a classification module that is able to classify reviews in custom categories. This needs to be done for specifically Dutch reviews.
Does anyone have an idea what package would be most suitable for such a kind of project?
Thank you in advance.
Kind regards
I am trying to find a package that is able to classify reviews on custom made categories.

Should the dataset be domain specific when it comes to Named Entity Recognition?

For my final year undergraduate project, I intend to use named entity recognition to classify a fiction summary based on LOCATION, PERSON, and so on. When I was looking into datasets I couldn't find any labelled dataset of fiction summaries.
My doubt is, whether the the training dataset for NER should be specific to the domain? in my case, for fiction. If not even though I'm developing a model for fiction can I use dataset like 'conll2003' which is a dataset about news domain?
I would love replies as I'm stuck with this now without being able to proceed in my project.
Thanks in advance :)
I tried labelling an unlabelled fiction summary dataset manually but seems like it will be taking very much long time which I can't afford. That's why I wanted to know whether I can use labelled datasets which are not specific to the domain

What is an appropriate training set size for text classification (Sentiment analysis)

I just wanted to understand (from your experience), that if I have to create a sentiment analysis classification model (using NLTK), what would be a good training data size. For instance if my training data is going to contain tweets, and I intend to classify them as positive,negative and neutral, how many tweets each should I ideally have per category to get a reasonable model working?
I understand that there are many parameters like quality of data, but if one has to get started what might be a good number.
That's a really hard question to answer for people who are not familiar with the exact data, its labelling and the application you want to use it for. But as a ballpark estimate, I would say start with 1,000 examples of each and go from there.

Multitask learning

Can anybody please explain multitask learning in simple and intuitive way? May be some real
world problem would be useful.Mostly, these days i am seeing many people are using it for natural language processing tasks.
Let's say you've built a sentiment classifier for a few different domains. Say, movies, music DVDs, and electronics. These are easy to build high quality classifiers for, because there is tons of training data that you've scraped from Amazon. Along with each classifier, you also build a similarity detector that will tell you for a given piece of text, how similar it was to the dataset each of the classifiers was trained on.
Now you want to find the sentiment of some text from an unknown domain or one in which there isn't such a great dataset to train on. Well, how about we take a similarity weighted combination of the classifications from the three high quality classifiers we already have. If we are trying to classify a dish washer review (there is no giant corpus of dish washer reviews, unfortunately), it's probably most similar to electronics, and so the electronics classifier will be given the most weight. On the other hand, if we are trying to classify a review of a TV show, probably the movies classifier will do the best job.

Need training data for categories like Sports, Entertainment, Health etc and all the sub categories

I am experimenting with Classification algorithms in ML and am looking for some corpus to train my model to distinguish among the different categories like sports,weather, technology, football,cricket etc,
I need some pointers on where i can find some dataset with these categories,
Another option for me, is to crawl wikipedia to get data for the 30+ categories, but i wanted some brainstorming and opinions, if there is a better way to do this.
Edit
Train the model using the bag of words approach for these categories
Test - classify new/unknown websites to these predefined categories depending on the content of the webpage.
The UCI machine learning repository contains a searchable archive of datasets for supervised learning.
You might get better answers if you provide more specific information about what inputs and outputs your ideal dataset would have.
Edit:
It looks like dmoz has a dump that you can download.
A dataset of newsgroup messages, classified by subject

Resources