Java library to get different declinaison of a word (nlp ?) - nlp

For a simple project in Java I would need a library that from a given words it returns me a list of its declinaison (including plural, singular, adjective etc..)
As an example something like this:
"Photo" -> "Photo", "Photograph", photography"
"Walks" -> "Walk", "Walking" ...
I had a look at lib like CoreNLP but I cannot figure out how to achieve this kind of stuff? Plus the doc is kind of bad and I cannot hardly find any nice code example.
Could someone help with this?

Related

How to get a sort of inverse lemmatizations for every language?

I found the spacy lib that allows me to apply lemmatization to words (blacks -> black, EN) (bianchi -> bianco, IT). My work is to analyze entities, not verbs or adjectives.
I'm looking for something that allows me to have all the possible words starting from the caninical form.
Like from "black" to "blacks", for english, or from "bianco" (in italian) and get "bianca", "bianchi", "bianche", etc. Is there any library that do this?
I'm not clear on exactly what you're looking for but if a list of English lemma is all you need you can extract that easily enough from a GitHub library I have. Take a look at lemminflect. Initially, this uses a dictionary approach to lemmatization and there is a .csv file in here with all the different lemmas and their inflections. The file is LemmInflect/lemminflect/resources/infl_lu.csv.gz. You'll have to extract the lemmas from it. Something like...
with gzip.open('LemmInflect/lemminflect/resources/infl_lu.csv.gz)` as f:
for line in f.readlines():
parts = lines.split(',')
lemma = parts[0]
pos = parts[1]
print(lemma, pos)
Alternatively, if you need a system to inflect words, this is what Lemminflect is designed to do. You can use it as a stand-alone library or as an extension to SpaCy. There's examples on how to use it in the README.md or in the ReadTheDocs documentation.
I should note that this is for English only. I haven't seen a lot of code for inflecting words and you may have some difficulty finding this for other languages.

How to place latex-like mathematical formulas (e.g. via mathjax) in Haskell Diagrams?

I am attempting to place latex-style math formulas into a haskell diagram.
The documentation pages
http://projects.haskell.org/diagrams/doc/manual.html#essential-concepts
and
http://projects.haskell.org/diagrams/doc/tutorials.html
suggest that one can use something called 'mathjax' to achieve this.
Is there an explanation or example somewhere of how to actually code this?
Attempting to follow the documentation at those links, my best guess for how would be something like:
mathDiagram :: Diagram B
mathDiagram = stroke $ textSVG "`2 + \sqrt{\pi}`:math:" 1
But this of course gives an error:
induction.hs:13:35: error:
lexical error in string/character literal at character 's'
You can do this using the diagrams-pgf backend. Just use the text function and put dollar signs around your text. Also, see here for an explanation of how to include diagrams in a LaTeX document: http://projects.haskell.org/diagrams/doc/latex.html .

make menhir find all alternatives?

I would like to change the behavior of menhir's output in follwoing way:
I want it to look up all grammatical alternatives if it finds any, and put them in a list and get me back this ambigouus interpretation. It shall not reduce conflicts, just store them.
In the source code of menhir, it seems to me, that I have to look in "Engine.ml". The resultant syntactically determined token comes in a variant type item "Accepted v" as a state of a checkpoint of the grammatical automaton. This content is found by a function "accept env prod" before, that is part of a bundle of recursive functions, that change the states.
Do you have a tip, how I could change these functions to put all the possible results in the list here and proceed as if nothing happened? Or do you think, that this wont work anyway?
Thanks.
What you are looking for is a GLR parser generator (G is for generalized). Menhir is not such tool, and I doubt you could modify it easily to do what you want.
However, there is another tool that does exactly what you want: dypgen.

Capybara: Should I get rid of extracted constants or keep them?

I was wondering about some best practices regarding extraction of selectors to constants. As a general rule, it is usually recommended to extract magic numbers and string literals to constants so they can be reused, but I am not sure if this is really a good approach when dealing with selectors in Capybara.
At the moment, I have a file called "selectors.rb" which contains the selectors that I use. Here is part of it:
SELECTORS = {
checkout: {
checkbox_agreement: 'input#agreement-1',
input_billing_city: 'input#billing\:city',
input_billing_company: 'input#billing\:company',
input_billing_country: 'input#billing\:country_id',
input_billing_firstname: 'input#billing\:firstname',
input_billing_lastname: 'input#billing\:lastname',
input_billing_postcode: 'input#billing\:postcode',
input_billing_region: 'input#billing\:region_id',
input_billing_street1: 'input#billing\:street1',
....
}
In theory, I put my selectors in this file, and then I could do something like this:
find(SELECTORS[:checkout][:input_billing_city]).click
There are several problems with this:
If I want to know the selector that is used, I have to look it up
If I change the name in selectors.rb, I could forget to change it somewhere else in the file which will result in find(nil).click
With the example above, I can't use this selector with fill_in(SELECTORS[:checkout][:input_billing_city]), because it requires an ID, name or label
There are probably a few more problems with that, so I am considering to get rid of the constants. Has anyone been in a similar spot? What is a good way to deal with this situation?
Someone mentioned the SitePrism gem to me: https://github.com/natritmeyer/site_prism
A Page Object Model DSL for Capybara
SitePrism gives you a simple, clean and semantic DSL for describing
your site using the Page Object Model pattern, for use with Capybara
in automated acceptance testing.
It is very helpful in that regard and I have adjusted my code accordingly.

What does "SEM1:3ENCE_B:NW:NG102:EECT300:120:0900:2" mean?

In my project I am developing teachers and their timetable. I was provided with a text file that contains the teacher timetable from my uni. They ware unable to tell me what is the syntax or code language so I would know how to read it and use it in my iPhone app. Can you help me identifying what sot of code is this and how can I read that?
Sample:
SEM1:3ENCE_B:NW:NG102:EECT300:120:0900:2
SEM1:3ENCE_B,3ENCE_C:TW:NLG107:EEEL300:120:0900:1
19:3ENCE_A,3ENCE_B,3ENCE_C:TW:CLG.01:EEEL305_L:120:1100:1
19:3ENCE_A,3ENCE_B,3ENCE_C:TW:NLG107:EEEL305:120:0900:1
SEM1:3ENCE_A,3ENCE_B:TW::EEEL300:120:1100:4
SEM1&2:3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:SK:CLG.06:EEEL315_L:120:1400:4
SEM1:3CS_A,3CS_B,3CS_C,3CS_D,3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:DHE:CLLT:EICG301_L:120:0900:5
SEM1:3CS_A,3CS_B:ABO,DHE:N5.114:EICG301:120:1100:5
SEM1:3CS_A,3CS_B,3CS_C,3CS_D,3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_D:NW:LTS205:EECT300_L:120:1600:2
27:3ENCE_A,3ENCE_B,3ENCE_C,3ENCE_CS::NG100:EEEL320:120:1100:2
SEM1:3CS_A,3CS_B,3CS_C,3CS_D:NW:C2.14:ECSC302_L:120:0900:3
SEM1:3CS_A:NW:NG100:EECT300:120:1400:2
It's not code, it's data. And the best way of interpreting it is to compare this representation with another : Think Rosetta Stone.
Obviously, colon is used to separate the fields, and each line probably represents a single tinmetable item. Each line appears to have 8 fields on it.
One field looks like a course ID : EECT300
Another looks like a time : 0900
As for the rest, you'll have to work it out...
University of Westminster, maybe...?
It is not a code language.
It is just a plain text file which contains data using colons : as a separator
I guess you have to parse it and retrieve the information for each column. You have to be aware of the signification of each column (if no ask to your uni)

Resources