I want to create a sentiment analysis tool from scratch. It doesn't have to be highly complicated. I want to understand the techniques (even if they are quite basic and don't include the latest and the greatest) and be able to code them without using libraries. Can anyone, please, point me the right sources?
Related
I'm interested in vulnerability detection. But not much is known about the beginning.
I'm currently studying static analysis. Static analysis can be done through source code or object files.
I'd like to know difference between source code analysis and object file analysis. I want to explain each pros and cons. You can also provide a link to paper or blog.
Thank you!
For introductions to static source code analysis, I’ll immodestly suggest the references in my Dr Dobbs article:
http://www.drdobbs.com/testing/deploying-static-analysis/240003801. For an example of why binary analysis, though much harder, is also necessary see
https://threatpost.com/new-linux-flaw-enables-null-pointer-exploits-071709/72889/, where a technically correct but unfriendly compiler optimization led to a vulnerability not in the source. (Some of the debate on Slashdot may actually be worth reading: https://it.slashdot.org/story/09/07/18/0136224/new-linux-kernel-flaw-allows-null-pointer-exploits.)
I would like to develop a decision support system for diagnosis disease. I am newbie for the programming. Can anyone suggest which programming language is most suitable?
That really depends on what you want to do exactly (do you want to use prebuilt libraries?).
I see you added weka as a tag. Java is a good option, since it's so versatile, powerful, fast and fairly easy to use, implementable as a web service, plus you can use the weka library for quickly building trees.
But really, any other programming language (C++, python, matlab) will have the power to build a tree.
You have to design a decision mathematical model. In order to do that, you need a back ground in statistic, data mining or even fuzzy logic.
And then, you code the decision making algorithm to get the final results. I suggest you should have a validation phase to ensure your program is useful and correct.
I would like to develop some visualisations for various string matching algorithms. Ideally, once the visualisation has been developed, I should be able to interact with it, for instance, by experimenting with different inputs to see how it affects the algorithm. Can anyone suggest what would be the best tool to use to create these visualisations?
I've been told that Mathematica is a tool that could be used with visualising algorithms, has anyone had much experience in doing this? How well suited would Mathematica be for visualising a string matching algorithm?
If you can code in javascript, d3.js is an amazing data visualization library.
Here's an example of a visualization of an algorithm to generate Hamiltonian graphs. It was built using d3.
Here's another example visualizing min-heap generation.
you can find a lot of visualizations here:
http://www.comp.nus.edu.sg/~stevenha/visualization/
source: Competitive Programming 3 by Steven Halim and Felix Halim
I am looking for a tool that can analyze the emotion of short texts. I searched for a week and I couldn't find a good one that is publicly available. The ideal tool is one that takes a short text as input and guesses the emotion. It is preferably a standalone application or library.
I don't need tools that is trained by texts. And although similar questions are asked before no satisfactory answers are got.
I searched the Internet and read some papers but I can't find a good tool I want. Currently I found SentiStrength, but the accuracy is not good. I am using emotional dictionaries right now. I felt that some syntax parsing may be necessary but it's too complex for me to build one. Furthermore, it's researched by some people and I don't want to reinvent the wheels. Does anyone know such publicly/research available software? I need a tool that doesn't need training before using.
Thanks in advance.
I think that you will not find a more accurate program than SentiStrength (or SoCal) for this task - other than machine learning methods in a specific narrow domain. If you have a lot (>1000) of hand-coded data for a specific domain then you might like to try a generic machine learning approach based on your data. If not, then I would stop looking for anything better ;)
Identifying entities and extracting precise information from short texts, let alone sentiment, is a very challenging problem specially with short text because of lack of context. Hovewer, there are few unsupervised approaches to extracting sentiments from texts mainly proposed by Turney (2000). Look at that and may be you can adopt the method of extracting sentiments based on adjectives in the short text for your use-case. It is hovewer important to note that this might require you to efficiently POSTag your short text accordingly.
Maybe EmoLib could be of help.
Next year will be my graduate year to be an informatics engineering person and I am trying to find ideas about the jounior project. Actually, I have an idea of making an expert system engine. I worked with clips and prolog and I liked clips very much but it seems to be an old engine. Can any one advice me about this idea or give me sources for papers or any topics that may help me? I am thinking to use C language to obtain the high performance, and to build a robust data structure. Also, I am thinking about an idea (I dont know if it could be done) of writing facts and rules (like clips) and then generate a C++ optimal code from these rules such that I can obtain the speed of the machine and use exe file.
I need help to make this idea more clear and how it can be done. Specially because I read about fuzzy logic, nueral network and heard about the new generation of expert system, so I dont know how that can be related to such topic.
For your junior project, I would recommend against writing it in C. Your problem sounds like it needs correctness more than it needs speed. Writing it in C will take longer because you will need to implement a lot of primitives that are not included in the language or any standard library. Also, since C is relatively low-level, there are a lot of opportunities to make low-level mistakes. Write it in a higher level language that is closer to the problem domain. You will have more time to focus on your actual problem because you will spend less time getting the framework set up. If you already know Prolog, it would be good to stick with that. Perhaps you might consider Mercury. It is similar to Prolog, but also designed for speed.
JBoss Rules (also known as Drools) offers the best approach to rule-processing. It's written in Java. It allows you to integrate program components in the rules, and rule-bases into your program components. You can even build or modify rule-bases on the fly.
I've heard that Java is catching up in its ability to do math, but outside of that, you have nothing to fear from performance.