I am trying to develop software to get suitable attributes for entities names depending on entity type.
For example if I have entities such doctor, nurse, employee , customer, patient , lecturer , donor, user, developer, designer, driver, passenger and technician, they all will have attributes such as name, sex, date of birth , email address, home address and telephone number because all of them are people.
Second example word such as university, college, hospital, hotel and supermarket can share attributes such as name, address and telephone number because all of them could be organization.
Are there any Natural Language Processing tools and software could help me to achieve my goal.
I need to identify entity type as person or origination then I attached suitable attributes according to the entity type?
I have looked at Name Entity Recognition (NER) tool such as Stanford Name Entity recognizer which can extract Entity such as Person, Location, Organization, Money, time, Date and Percent But it was not really useful.
I can do it by building my own gazetteer however I do not prefer to go to this option unless I failed to do it automatically.
Any helps, suggestions and ideas will be appreciated.
If I understand correctly, you are mainly interested in knowing if a given word can be mapped to a general category of Human, Organization, etc.
You should use WordNet, which provides a complete hierarchy of the general English lexicon. Try it a bit in the user interface to get of feel of how it works.
WordNet encodes relations between words. One of these relation is hypernymy, a fancy word that means a relation of general-to-particular.
Some examples:
Vehicle is a hypernym of boat.
Vehicle is a hypernem of car.
Human is a hypernym of worker which is a hypernym of plumber.
Hyponymy is the inverse relation of hypernymy:
Boat is a hyponym of vehicle.
Car is a hyponym of vehicle.
Plumber is a hyponym of worker, itself a hyponym of human.
These relations are transitive, so in my last example plumber is also a hyponym of human. This gives you the solution to your problem: any word that has human as hypernym should be mapped to Human and have people attributes.
There are libraries to access WordNet from Java and Python, as well as from many other languages. Here is the documentation for using WordNet with the NLTK Python module.
A short example to determine if a word is hyponym of "human"
from nltk.corpus import wordnet as wn
human = wn.synset('person.n.01')
hyponyms_of_human = set(x for x in human.closure(lambda s:s.hyponyms())
fireman = wn.synsets('fireman')
salad = wn.synsets('salad')
print(any(x in hyponyms_of_human for x in fireman)) # outputs True
print(any(x in hyponyms_of_human for x in salad)) # outputs False
Related
How do I train a Watson Knowledge Studio machine learning annotator to identify education info that is not a part of a proper sentence. For example, two bullet points. How do I form a type system that will identify entities without breaking them all apart? I've considered using relation annotations, but according to the official documentation relation types should only be annotated if the sentence specifically mentions the relation. Such as "Mary works for IBM" is an example of the employedBy relation type. (Mary employedBy IBM) However, their own videos show them annotating "Ford F-150" with a manufacturedBy relation even though the sentence doesn't specifically state the relation. For example, "The Ford F-150 struck a light pole." (F-150 manufacturedBy Ford)
This is the kind of text I'm working with:
B.A., City University of New York, 1995
M.A., New York University, 1997
Ph.D, Columbia University, 1999
I could annotate these with degree, school, and graduationYear entities, but I'll end up getting back "1995", "1997", "1999" "B.A.", "City University of New York", "Columbia University", "M.A.", "New York University", "Ph.D"; a jumble that I can't work with because I can't tell anymore what degree belongs with what school belongs with what graduation year.
As for the expressions which include two bullet points, there is a possibility to improve accuracy to detect sentences as they can work with WKS, using Dictionary-based Tokenizer.
https://console.bluemix.net/docs/services/knowledge-studio/create-project.html#wks_tokenizer
I imported your example text to WKS and checked the result of tokenization, and then the expression was separated into 3 sentences.
In this case you can annotate relations among degree, school and graduation year.
I have to complete a program that implements a car Park system.
I started with the UML diagram as I think it is easier for the program to be done after that, but I am a bit stack.
The scenario is:
Design and implement a class Vehicle (abstract) and the subclasses Car, Van, Motorbike. The classes should include appropriate methods and hold information about the ID plate of the vehicle, the brand of the vehicle and the entry time/date in the parking.
In particular:
• The Car class should also include appropriate methods and hold information
about the number of the doors of the car and the color.
• The Van class should also include methods and information about the cargo
volume of the van.
The class Motorbike should also have methods and information about the size
engine of the motorbike.
You should implement a class DateTime to represent the time/date of the entrance of
the vehicle in the parking. Do not use any predefined library.
Design and implement a class called MyCarParkManager, which extends the
interface CarParkManager. MyCarParkManager maintains the list of the
vehicles currently in the parking.
The class should display in the console a menu from which the user can select among
the following management actions:
• Add a new vehicle in the parking if there are free lots (considering that the max number of lots is 20) and return the number of the free lots remaining. Consider that a Van occupied 2 lots. Display a message with the number of free lots or informing that there are no lots available.
• Delete a vehicle, selecting the ID plate, from the list when the vehicle leaves the car park and return the vehicle instance. Display the type of the vehicle leaving the parking (if it is a car, a van or a motorbike).
• Print the list of the vehicles currently parked. For each vehicle print the ID plate, and the entry time and the type of vehicle (if is a car, a van or a motorbike). The list should be ordered chronologically, displaying the last vehicle entered in the parking as the first in the list.
And this is what I've got so far. My Solution
Since class Vehicle is abstract and cannot be instanced, what should I use to create different vehicle objects, might it be an array? And how should the output be changed, I mean depending on what the input is going to be: If it is car, to ask for color also, if it is a van for cargo volume?
Thanks a lot in advance to who take the time to read it and see if this UML seems right.
Analyze the statement
An important skill that you will start to develop in this module is analyzing a problem statement in order to identify the details needed to develop a solution.In this assignment the first task you should perform is a careful analysis of the problem statement in order to make sure you have all the information to elaborate a solution. Do not make assumption about what is needed! If you are not sure, about the information provided, ask questions.
Design a solution:
The design of your system should be consistent with the Object Oriented principles and easy to understand by an independent programmer.
Source: 5COSC001W Object Oriented Programming - Assignment 1
Suggest you:
class: "VehicleCardInfo" for storing cars information and status
class: RulesForCarPark for validate all data in "VehicleCardInfo"
vocabulary for: car types e.t.c.
CarParkManager as Actor use UseCase "Managing Cars" for CRUD operation for "VehicleCardInfo" objects.
Maybe we need some rules for people to logging and use this application.
UML diagram maybe easy for above (we don't use associations, use dependency).
I am using Wordnet for finding synonyms of ontology concepts. How can i find choose the appropriate sense for my ontology concept. e.g there is an ontlogy concept "conference" it has following synsets in wordnet
The noun conference has 3 senses (first 3 from tagged texts)
(12) conference -- (a prearranged meeting for consultation or exchange of information or discussion (especially one with a formal agenda))
(2) league, conference -- (an association of sports teams that organizes matches for its members)
(2) conference, group discussion -- (a discussion among participants who have an agreed (serious) topic)
now 1st and 3rd synsets have apprpriate sense for my ontology concept. How can i choose only these two from wordnet?
The technology you're looking for is in the direction of semantic disambiguation / representation.
The most "traditional approach" is Word Sense Disambiguation (WSD), take a look at
https://en.wikipedia.org/wiki/Word-sense_disambiguation
https://stackoverflow.com/questions/tagged/word-sense-disambiguation
Anyone know of some good Word Sense Disambiguation software?
Then comes the next generation of Word Sense induction / Topic modelling / Knowledge representation:
https://en.wikipedia.org/wiki/Word-sense_induction
https://en.wikipedia.org/wiki/Topic_model
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Then comes the most recent hype:
Word embeddings, vector space models, neural nets
Sometimes people skip the semantic representation and goes directly to do text similarity and by comparing pairs of sentences, the differences/similarities before getting to the ultimate aim of the text processing.
Take a look at Normalize ranking score with weights for a list of STS related work.
On the other direction, there's
ontology creation (Cyc, Yago, Freebase, etc.)
semantic web (https://en.wikipedia.org/wiki/Semantic_Web)
semantic lexical resources (WordNet, Open Multilingual WordNet, etc.)
Knowledge base population (http://www.nist.gov/tac/2014/KBP/)
There's also a recent task on ontology induction / expansion:
http://alt.qcri.org/semeval2015/task17/
http://alt.qcri.org/semeval2016/task13/
http://alt.qcri.org/semeval2016/task14/
Depending on the ultimate task, maybe either of the above technology would help.
You can also try Babelfy, which provides Word Sense Disambiguation and Named Entity Disambiguation.
Demo:
http://babelfy.org/
API:
http://babelfy.org/guide
Take a look at this list: 100 Best GitHub: Word-sense Disambiguation
and search by WordNet - there are several appropriate libraries.
I didn't use any of them, but this one seems to be promising, because it is based on classic yet effective idea (namely, Lesk algorithm) upgraded by modern word-embedding methods. Actually, before finding it, I was going to suggest to try almost the same ideas.
Note also that all methods try to find the meaning (WordNet sysnet, in your case) that is most similar to the context of the current word/collocation, so it is crucial to have context of the words you're trying to disambiguate. For example, words can come from some text and most libraries rely on that.
In wordnet there are number of words classified in noun,adjective,advarb and verb files separately. How can we get the domain of some words or words in paricular domain using wordnet?
For example, suppose i have some words like (bark,dog,cat) and all these terms are related to animal. But how can we get to know this through wordnet? Is there any mechanism for this?
You cannot relate verbs like "bark" to the "animal" cluster directly based on WordNet. You can, however, relate dog, cat, etc. as being different kinds of animals by searching the hypernyms of these terms. WordNet has a tree-structure where any word is-a member of a category. Traveling up this category-tree from any word will eventually lead you to the root of this tree called entity.
Therefore, you can use the notion of the lowest common ancestor (LCA) of two words in this category-tree. If the LCA of two words is animal or a hyponym of animal, then both are related. So, if you start with some prior knowledge (say, "dog is an animal"), then you can add other animals to this cluster by following this algorithm.
To also include terms like "bark", "moo", etc., you will need to employ more complex distance measures. These are metrics that look into different types of tree-based relationships (e.g. the path score or the Wu-Palmer score) or the extent of overlap between the dictionary definitions of the words (e.g. LESK).
For example, the LESK score between "dog" and "bark" is 158, while between "dog" and "catapult" is 39. A high score thus indicates that the words belong to the same (or similar) category.
A good software package (in Java) where such distance measures are provided is the WS4J package. They have an online demo here.
1.Which of the following types of written design documents do we normally use on DDD projects:
a. Requirements specifications document
b. Document explaining the the meaning of core elements
c. Document giving the bird's eye view of an application structure
d. Document explaining the meaning behind the terms used by Ubiquitous language
e. Document listing the vocabulary of Ubiquitous language
f. Informal UML diagrams
anything else?
2.Which document types should be created as standalone documents and which should be combined within a single document ( example: document containing diagrams surrounded by text )?
3.And what are Requirements specifications? A list of use cases, a list of tasks program is able to perform or combination of both?
thanks
Consider the following:
A statement of the purpose of your application in 25 words or less
A representation of your model in both code and uml
A list of features corresponding to the current or desired model
A list of constraints (business rules) on the model
Where applicable, a sequence diagram for each feature
A statement of non-functional requirements
An architectural overview for team members (including model boundaries and contexts)
Team instructions and procedures
Note: use cases or user stories can inform your list of features. However, I recommend that a feature be the unit of work.
I recommend that the initial model be created (discovered) in a modeling workshop attended by both domain experts (business) and developers. It must be led by someone proficient in domain modeling.
Business rules are constraints on the model of two types: Property and Collaboration. By way of example, business rules prevent an elevator from moving with the doors open, a perishable item being placed in a non-refrigerated bin, or a cancelled purchase being shipped.
I think Event Storming might be a good solution. A photo of the workshop should be enough. If not you can use the same artifacts into a digital document.