Is there a way to use french in Stanford CoreNLP sentiment analysis? - nlp

I am aware that only the English model is available for sentiment analysis but I found edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz in stanford-parser-3.5.2-models.jar. I'm actually looking at https://github.com/stanfordnlp/CoreNLP Is it possible to use this model instead of englishPCFG.sez.gz with CoreNLP and if so, how ?

CoreNLP does not include sentiment models for languages other than English. While we do ship French parser models, there is no available French sentiment model to use with the parser.
You may be able to find French sentiment analysis training data. There is plenty of information available about how to do this if you're interested; see e.g. this SO post.

Related

How to find Sentence Transformer support languages?

I want to get the sentence embedding results to find the sentence similarities in my NLP project. Since I am working with a low-resource language (Sinhala), I want to know whether any sentence_transformer model supports my low-resource language. However, I was unable to find the pre-trained languages of those models. So How can I find that?
If those models are not trained with this language, How can I implement a sentence embedding model?

Which are best for Name Entity Recognition for Gujarati Language Text?

I am finding out the best working models for Name Entity Recognition in Gujarati Text. I know only 1 of them that is Indic Bert model of hugging face. Can anyone suggest other model which documentation or code available for Name Entity Recognition in Gujarati Language??
I found only IndicBERT model of Hugging Face. I want know other mode or any link where the code is available for Name Entity Recognition.
The recent work Joshi [1] offers L3Cube-GujaratiBERT, available on HuggingFace here. You'll have to fine-tune the model on your specific down-stream task (i.e. Named Entity Recognition in Gujarati). There is a list of Indic NER datasets here, of relevance to your problem is the AI4Bharat Naamapadam dataset which has Gujarati as one of the 11 available Indic languages.
Additional Info
In [1], Joshi initially created the L3Cube-HindBERT and L3Cube-DevBERT models pre-trained on Hindi and Devanagari script (Hindi + Marathi) monolingual corpora, respectively. These offered a modest improvement in performance over the alternative MuRIL, IndicBERT and XLM-R multi-lingual offerings. Given the improvement, the author released other Indic language-based models, namely: Kannada, Telugu, Malayalam, Tamil, Gujarati, Assamese, Odia, Bengali, and Punjabi (all can be found at https://huggingface.co/l3cube-pune).
References
[1] Joshi, R., 2022. L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages. arXiv preprint arXiv:2211.11418.

Can CMU Sphinx support multiple languages in a sentence?

I know CMU Sphinx has many language model , dictionaries and acoustic models.
I want to recognize a sentence which may contain several languages, for example, English and Mandarin.
Can it be done?
Yes, but it is better to try more modern framework like Kaldi. And you will have to train models from data, there are no pretrained models.

Part of speech tagging in OpenNLP vs. StanfordNLP

I'm new to part of speech (pos) taging and I'm doing a pos tagging on a text document. I'm considering using either OpenNLP or StanfordNLP for this. For StanfordNLP I'm using a MaxentTagger and I use english-left3words-distsim.tagger to train it. In OpenNLP I'm using POSModel and train it using en-pos-maxent.bin. How these two taggers (MaxentTagger and POSTagger) and the training sets (english-left3words-distsim.tagger and en-pos-maxent.bin) are different and which one is usually giving a better result.
Both POS taggers are based on Maximum Entropy machine learning. They differ in the parameters/features used to determine POS tags. For example, StanfordNLP pos tagger uses: "(i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs" (read more in the paper). Features of OpenNLP are documented somewhere else which I currently don't know.
The models are probably trained on different corpora.
In general, it is really hard to tell which NLP tool performs better in term of quality. This is really dependent on your domain and you need to test your tools. See following papers for more information:
Is Part-Of-Tagging a Solved Task
Large Dataset for Keyphrases Extraction
In order to address this problem practically, I'm developing a Maven plugin and an annotation tool to create domain-specific NLP models more effectively.

What are some good tools/practises for aspect level sentiment analysis?

I am planning to get some review data from tripadvisor and I want to be able to extract hotel related aspects and assign polarity to them and classify them as negative or positive.
What tools can I use for this purpose and how and where do I start? I know there are some tools like GATE, Stanford NLP, Open NLP etc, but would I be able to perform the above specific tasks? If so, please let me know an approach to go forward. I am planning to use Java as the choice of programming language and would like to use some APIs
Also, should I go ahead with a rule based approach or a ML approach that uses a trained corpus of reviews, so some other approach completely?
P.S : I am new to NLP and I need some help to go forward.
Stanford CoreNLP has lot of features in one package
POS Tagger
NER Model
Sentiment Analysis
Parser
But in Apache OpenNLP package consist
Sentence Detector
POS tagger
NER
Chunker
But they don't have built in feature to find out Sentiment polarity So you have to pass your tags to other libraries such like SentiwordNet to find out the polarity.
I used used OpenNLP and Stanford Core NLP. But for both you need to modify sentiment corpus with respect to restaurant domain.
You can try ConceptNet (http://conceptnet5.media.mit.edu/). See for instance here (at the bottom of the page): https://github.com/commonsense/conceptnet5/wiki/API how to "see 20 things in English with the most positive affect:"

Resources