What is the difference between the different GloVe models? [closed] - nlp

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
https://nlp.stanford.edu/projects/glove/
I'm trying to use GloVe for summarizing music reviews, but I'm wondering which version is the best for my project. Will "glove.840B.300d.zip" give me a more accurate text summarization since it used way more tokens? Or perhaps the Wikipedia 2014 + Gigaword 5 is more representative than Common Crawl? Thanks!

Unfortunately I don't think anyone can give you a better answer for this than:
"try several options, and see which one works the best"
I've seen work that uses the Wikipedia 2014 + Gigaword 100d vectors that produced SOTA results for reading comprehension. Without experimentation, it's difficult to say conclusively which corpus is closer to your music review set, or what the impact of larger dimensional word embeddings will be.
This is just random advice, but I guess I would suggest trying in this order:
100d from Wikipedia+Gigaword
300d from Wikipedia+Gigaword
300d from Common Crawl
You might as well start with the smaller dimensional embeddings while prototyping, and then you could experiment with larger embeddings to see if you get a performance enhancement.
And in the spirit of promoting other group's work, I would definitely say you should look at these ELMo vectors from AllenNLP:
http://allennlp.org/elmo
They look very promising!

Related

How do I avoid creating long and complicated sequence diagrams? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm very new at creating sequence diagrams. I've been watching countless videos of tutorials to make sure I understand the concept, however, the trouble starts when I have to implement a specific dice game between 2 players into a sequence diagram without making it too long and complicated. I say specific dice game since there's custom rules added to it. This leads to many alternative scenarios and loops. Do I only include the most important parts of the dice game?
Yes, only show what's needed. That would be parts where for example many objects communicate. You never show the full path (which is impossible anyway for almost all cases). SDs are not for graphical programming. They shall help to give you the ideas of certain collaborations. So create as many SDs as needed starting with just some basics. When questions come up you can use additional SDs to clarify these. Also relate SDs to the collaborations (being realizations of use cases). Depending on the tool you use there are different ways to do that (packing would be one of them).

Suggestions for question answering system NLP [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to build a question answering system where I have a set of predefined questions and their answers. For any given question from the user I have to find if the similar question already exists in the predefined questions and send answers. If it doesn't exist it has to reply a generic response. Any ideas on how to implement this using NLP would be really helpful.
Thanks in advance!!
As you have already mentioned in the question, this calls for a solution that computes text similarity. In this case question-question similarity. You have got a bunch of questions and for an incoming query/question, a similarity score has to be computed with every available question in hand. From a previous answer of mine, to do simple sentence similarity,
Convert the sentences into as suitable representation
Compute some of distance metric between the two representations and figure out the closest match
To achieve 1, you can consider converting every word in a sentence to corresponding vectors. There are libraries/algorithms like fasttext that provide vector mapping. A vector representation of the entire sentence is obtained by taking an average over all word vectors. Use cosine similarity to compute a score between the query and each question in the available list.

measuring precision and recall [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We are building a text search solution and want a way to measure precision and recall of the system every time we add new document types. From reading some of the posts here it sounds like a machine learning based solution is the way to go. Can a expert comment on this? We will then look to add machine learning folks to our team.
The only way to get the F1-score require knowledge about the correct class, rank of all samples obtains by evaluation querys, and you also need thoses evaluation querys.
Any machine learning will need a large quantity of manual work to provided thoses samples and/or querys. So large that it wont save you any time.
Another bad aspect of this evaluation is through to learning-related intrinsic errors. It will go with the growing size of the index of the search engine and the number of examples required. You never get a good evaluation.
Forget machine-learning for the evaluation of search engine.
Build by hand your tests querys and sample, by the time it will become big and reliable.
If you really want machine-learning in your system, you should look at query pre-processing. Getting some meta-information about the query by another way (you say SVN, why not?) is generaly a good for performance and while it did'nt change the result, you can use the same sample for an end-to-end evaluation.
That what I have done few years ago, but with naive baye classifier on natural langage analysis.

Need an SVM implementation or a Java library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a data set with 2400 samples and 10,000 features. All the data is binary (+1 or -1). I need to run it past an SVM algorithm so I could compare my algorithm to it. However, I know not much about SVMs or which package to use. I tried reading about to so I could implement one, but it's way over my head, and I need to get from it is the weights vector. I'm a windows user and I've got my implementation in Java. I could export my data into a text file with 1s and 0s. I have access to MatLab, but something tells me it will be extremely slow and won't really run on my 1.6ghz 2gb RAM laptop (and I need it to) fast enough. I have to run the algorithm a couple hundred times to get accurate results.
I'm really just looking for a quick and easy to understand library or SVM implementation that I can use in my case.
Thank you all. Feel free to ask any additional questions to assist me better.
I ended up using a JNI for SVN light that can be found here: http://www.mpi-inf.mpg.de/~mtb/svmlight/
Didn't take long to figure out how to use it and it's surprisingly fast (seconds).
I don't think that there is any path to a 'quick and easy understanding of an SVM.' The math is hard and trying to train one without a good understanding is a very quick trip to shooting yourself in the foot.
OpenSVM from sourceforge is certainly sitting there. It shouldn't matter to you if it's in Java, just download a JDK.
I can't answer your question as to the likely performance of a SVM training procedure in MatLab, perhaps someone else can.

Genetic Algorithms for computer security [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am in the process of choosing project for uni. And I am really interested on combining genetic algorithms and computer security.
Therefore my question, Is it possible to use GA on any aspect for computer security? For example?. I was thinking something like a evolutionary firewall/anti-virus that will be able to self protect/inhibit threats. Is such thing plausible?
I really appreciate you guys input, advice, comments.
First of all, the whole idea of genetic algorithms is still being debated, i.e. if genetic algorithms are in any way better suited to solving optimization problems than other methods (who are either proven, easier to use, or provide other advantages).
That being said, yes, I know of a security-related application of genetic algorithms, mainly used in fuzzing to optimize code paths and therefore code coverage. There is a paper called Vulnerability analysis for x86 executables using genetic algorithm and fuzzing and a BlackHat presentation predating that paper by two years called "Sidewinder": An Evolutionary Guidance System for Malicious Input Crafting
I briefly looked into this before, and there seems to be quite a bit of resources around using genetic algorithms for network intrusion detection. Hope it helps.
Take a look at Stephanie Forrest's group's work on computer immune systems.
It's not traditional GAs, but it's very close. Hope that helps.

Resources