Is there some algorithms to generate similar tweets? - text

I have study task. I have some tweet. Is there some common algorithms to generate similar tweets?

Related

Text Classification - what can you do vs. what are your capabilities?

Text Classification basically works on the input training sentences. Little or less number of variations of in the sentences do work. But when there is a scenario like
What can you do <<==>> What are your capabilities
This scenario does not work well with the regular classification or bot building platforms.
Are there any approaches for classification that would help me achieve this ?
What you are trying to solve is called Semantic Textual Similarity and is a known and well studied field.
There are many different ways to solve this even if your data is tagged or not.
For example, Google has published the Universal Sentence Encoder (code example) which is intended to tell if two sentences are similar like in your case.
Another example would be any solution you can find in Quora Question Pairs Kaggle competition.
There are also datasets for this problem, for example you can look for SemEval STS (STS for Semantic Textual Similarity), or the PAWS dataset

categorize non-functional requirements

I am developing a machine learning project which analyzes requirement specification and categories the non-functional requirements in to categories like database, web socket, backend technology, etc. As I have researched Naive Bayes is the better way to categorize but due to lack of dataset I have planned to go with Seed LDA for topic modeling. Would it be okay to use LDA or should I use something else?
You can try either LDA or clustering.
Based on my experiences, k-mean clustering could help you have a better visualization about what are you doing and what is happening.
With LDA, it could also be good. You can try it first since k-means take much more time.
I implemented an issue tracking system here using k-means, may you like to take a look. issue tracker

Calculating grammar similarity between two sentences

I'm making a program which provides some english sentences which user has to learn more.
For example:
First, I provide a sentence "I have to go school today" to user.
Then if the user wants to learn more sentences like that, I find some sentences which have high grammar similarity with that sentence.
I think the only way for providing sentences is to calculate similarity.
Is there a way to calculate grammar similarity between two sentences?
or is there a better way to make that algorithm?
Any advice or suggestions would be appreciated. Thank you.
My approach for solving this problem would be to do a Part Of Speech Tagging of using a tool like NLTK and compare the trees structure of your phrase with your database.
Other way, if you already have a training dataset, use the WEKA to use a machine learn approach to connect the phrases.
You can parse your sentence as either a constituent or dependency tree and use these representations to formulate some form of query that you can use to find candidate sentences with similar structures.
You can check this available tool from Stanford NLP:
Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex.

Using pre-defined topics in Mallet

I'm looking to use Mallet to classify different documents by topics that I have defined. I know that Mallet will first determine the topics, then classify the documents but I want to skip the first step because I already have a list of topics with words associated with them. Is there any way to use pre-defined topic lists that I have created to classify documents with Mallet?
Any guidance is appreciated. Thanks!
If you're doing unsupervised learning (without training examples, i.e. docs for each topic), you cannot trivially just set the topics. The point is that the training algorithm does not know anything about the docs in advance. It just tries to separate/distribute them, based on the features you provide.
If you're doing supervised learning, topics are actually classes and you have documents for each class. Then the algorithm tries to learn which features are significant for each class. In mallet you should use the Classification module.
There are probably some fancy topic modelling ideas, which incorporate / skew the topic distributions according to specific keywords, but I don't think that's possible with Mallet.

How to rate tweet comment as positive or negative using Word Sentiment

I want to categorize comments as positive or negative based on the content.
This is a problem of NLP(Natural Lang Processing) and i am finding difficulties in implementing this.
Check out this blog post. The author describes how to build a Twitter Sentiment Classifier with Python and NLTK. Looks like a good start, as sentiment analysis is no easy task with lots of active research going on in the field.
Also search SO for Sentiment Analysis, I believe there already are many useful answers about this topic on the site.
Here is, Combination of Semi Supervised co-occurance based and unsupervised WSD based classifier. Its in Python though. And you need nltk, wordnet, SentiWord-net and movie review corpus which comes with nltk.
https://github.com/kevincobain2000/sentiment_classifier
The problem is quite complex, anyway I love Pattern: http://www.clips.ua.ac.be/pages/pattern-examples-elections
If you are not categorizing a lot of comments you may wish to try using the chatterboax API
Else you can use Linpipe, but you will have to train your models

Resources