Extracting Entity-Verb relations from open knowledge bases like Freebase and DBPedia - nlp

Is there any way that we can extract entity-verb relations from already existing online KBs like Freebase, DBPedia, Wikidata or Wordnet, I checked and only found that these sources concentrate on entities.
My aim is to derive relations like "A Person can Eat", "A Car can move", "A Man can play football".

You might want to look into resources like these:
OpenCyc
VerbNet
FrameNet
ConceptNet (somewhat of a successor to Open Mind Common Sense)

ConceptNet is one of the source, this link shows all the concepts or facts related to football. These facts can be accessed by the API provided by conceptNet. Here is the link. It provides three APIs - Lookup, search and association. Its a good source which is easy to use. You can configure it locally too.
Opencyc is also good resource. It can be set up locally too. They provide its download, onto this link. From both of these, I found ConceptNet more well maintained.
there is one more resource called Probase. it can be used as common sense knowledge base, but I have not tried to configure it. I hope it helps :)

Related

Tool for protocol or interface description

We have to develop a protocol as a interface description between different systems in different companies. The implementations will be made in different languages (not known) by the developers in each company.
However, we want to develop the protocol on a textual description base together. I will have the base copy of the current version and want to send it out to all for comments.
What is a good tool to do so?
At the moment we are using MS Word what leads to several problems:
We need a lot of time for text formatting things.
Its not possible to reference to a datatype in the methods description.
The wording differs from chapter to chapter (different authors) and is hard to align.
Perfect would be:
A tool with a glossary and auto-completion.
References to other items (methods, data types, ...) with active links.
Automatic generation of a human-readable (PDF-) document.
Do you know such a tool?
PS: I did not get Sparx Systems Enterprise Architect doing the job. Maybe also hints for this one?
This is a very big question, since there are many possible aspects you may (should) wish to document in a protocol specification. The two most important ones would be data structures and message sequences, then there's error management, authentication, timing, etc etc.
UML can certainly be used to describe these things, and Enterprise Architect is eminently able to generate versatile documentation from a model - it will definitely help with your reference issues. But first you will have to determine quite strictly how each aspect of the protocol is to be modelled, and from that you will need to construct the necessary EA configuration / adaptation.
In order to get good quality documents out of EA, I recommend generating the documents from an Add-In using the Object Model's DocumentGenerator class as this gives you more flexibility than the traditional RTF generator - for one thing, you can access Word's API in addition to EA's and thus do far more with the document than is possible using EA's API alone.
Without knowing the size or complexity of your protocol, I'd say this would require at the very least a few weeks' work for someone who is experienced in writing EA adaptations. But if the scope of your project is such that there are several companies involved, it is likely to be worth the investment.

Anybody out there having success with the NWorkspace pattern?

I've just begun to delve into my first experiments with Domain Drive Design and I'm taking advantage of the NWorkspace pattern. This pattern seems to make a lot of sense however I haven't been able to find very many examples of places this pattern has been successfully used or even publicly documented. Before I get to far into my implementation I would like to know if anybody has had success using this pattern or whether somebody could point me to any references where NWorkspace has been used in any open source project that I could learn from. Also are there better or more well known alternatives to this pattern that I should know about?
Brief background on NWorkspace
For those who may not be familiar with NWorkspace, it is a pattern introduced by Jimmy Nisson's which abstracts query and persistance responsibilities. In his book Applying Domain-Driven Design and Patterns, Jimmy Nilsson shows how NWorkspace can be used to abstract the infrastructure portions of a DDD Repository as well as provide a mechanism to perform cross Repository atomicity with regard to persistence.
It seems like he's recommending separate interfaces for read and write repositories.
I don't have experience with the pattern described, but I would recommend not having cross-repository transactions. Instead, I would suggest a few solutions popular amongst the DDD community (Eric Evans, Udi Dahan, Greg Young) that have really helped me:
Always use eager loading on your aggregate roots. Then you don't need cross-repository atomicity, and figuring out what's changed when you persist the object is a lot easier.
Use separate classes for writing (i.e. domain classes) and reading (i.e. your viewmodel). Create view model repositories that retrieve viewmodels directly from the database (instead of mapping domain objects to view model classes).
See if implementing the above 2 things simplifies your design.

DDD what all terms mean for Joe the plumber who can't afford to read books few times?

I am on a tight schedule with my project so don't have time to read books to understand it.
Just like anything else we can put it in few lines after reading books for few times. So here i need some description about each terms in DDD practices guideline so I can apply them bit at a piece to my project.
I already know terms in general but can't put it in terms with C# Project.
Below are the terms i have so far known out of reading some brief description in relation with C# project. Like What is the purpose of it in C# project.
Services
Factories
Repository
Aggregates
DomainObjects
Infrastructure
I am really confused about Infrastructure, Repository and Services
When to use Services and when to use Repository?
Please let me know if anyway i can make this question more clear
I recommend that you read through the Domain-Driven Design Quickly book from infoq, it is short, free in pdf form that you can download right away and does its' best to summarize the concepts presented in Eric Evan's Blue Bible
You didn't specify which language/framework the project you are currently working on is in, if it is a .NET project then take a look at the source code for CodeCampServer for a good example.
There is also a fairly more complicated example named Fohjin.DDD that you can look at (it has a focus on CQRS concepts that may be more than you are looking for)
Steve Bohlen has also given a presentation to an alt.net crowd on DDD, you can find the videos from links off of his blog post
I've just posted a blog post which lists these and some other resources as well.
Hopefully some of these resources will help you get started quickly.
This is my understanding and I did NOT read any DDD book, even the holy bible of it.
Services - stateless classes that usually operate on different layer objects, thus helping to decouple them; also to avoid code duplication
Factories - classes that knows how to create objects, thus decouple invoking code from knowing implementation details, making it easier to switch implementations; many factories also help to auto-resolve object dependencies (IoC containers); factories are infrastructure
Repository - interfaces (and corresponding implementations) that narrows data access to the bare minimum that clients should know about
Aggregates - classes that unifies access to several related entities via single interfaces (e.g. order and line items)
Domain Objects - classes that operate purely on domain/business logic, and do not care about persistence, presentation, or other concerns
Infrastructure - classes/layers that glue different objects or layers together; contains the actual implementation details that are not important to real application/user at all (e.g. how data is written to database, how HTTP form is mapped to view models).
Repository provides access to a very specific, usually single, kind of domain object. They emulate collection of objects, to some extent. Services usually operate on very different types of objects, usually accessed via static methods (do not have state), and can perform any operation (e.g. send email, prepare report), while repositories concentrate on CRUD methods.
DDD what all terms mean for Joe the plumber who can’t afford to read books few times?
I would say - not much. Not enough for sure.
I think you're being quite ambitious in trying to apply a new technique to a project that's under such tight deadlines that you can't take the time to study the technique in detail.
At a high level DDD is about decomposing your solution into layers and allocating responsibilities cleanly. If you attempt just to do that in your application you're likely to get some benefit. Later, when you have more time to study, you may discover that you didn't quite follow all the details of the DDD approach - I don't see that as a problem, you proabably already got some benefit of thoughtful structure even if you deviated from some of the DDD guidance.
To specifically answer your question in detail would just mean reiterating material that's already out there: Seems to me that this document nicely summarises the terms you're asking about.
They say about Services:
Some concepts from the domain aren’t
natural to model as objects. Forcing
the required domain functionality to
be the responsibility of an ENTITY or
VALUE either distorts the definition
of a model-based object or adds
meaningless artificial objects.
Therefore: When a significant process
or transformation in the domain is not
a natural responsibility of an ENTITY
or VALUE OBJECT, add an operation to
the model as a standalone interface
declared as a SERVICE.
Now the thing about this kind of wisdom is that to apply it you need to be able to spot when you are "distorting the definition". And I suspect that only with experience (or guidance from someone who is experienced) do you gain the insight to spot such things.
You must expect to experiment with ideas, get it a bit wrong sometimes, then reflect on why your decisions hurt or work. Your goal should not be to do DDD for its own sake, but to produce good software. When you find it cumbersome to implement something, or difficult to maintain something think about why this is, then examine what you did in the light of DDD advice. At that point you may say "Oh, if I had made that a Service, the Model would be so nmuch cleaner", or whatever.
You may find it helpful to read an example.:

Standard database neutral XSD to describe a relational database schema

Is anyone aware of a vendor neutral XSD to describe a relational database schema? Our system needs to grab information about the structure of a database:
Tables
Columns and types
Primary and Foreign Keys Constraints
Indexes
etc
in a vendor independent manner and store it in an XML file for later processing.
Before we do what we typically do and roll our own. I wanted to do some research and see if there was an existing XSD that people are standardizing on for what I assume is not an uncommon requirement for modeling tools and such. I did not find anything on Google that was not database vendor specific. If you know of an existing public standard I would very much appreciate a link.
Thanks in advance,
Terence
This isn't exactly what you're looking for, but the PostgresSQL Wiki has an interesting section on XML exports, that describes how they are supporting SQL and XML together. It displays a section on how a table would be exported as as XML and the XSD that would support it, which looks rather generic. It could serve as a model for you to create your own.
The Wiki talks about reference to a ISO/IEC 9075-14:2006 standard, that appears to be adopted by a few big vendors as a baseline. I quick browse on the ISO site says that :2006 was updated to 2008. I'm sure you can find a covered spec of this that you don't have to pay for to download.
The article also points to a SQL/XML standard definition that is a bit outdated, but could serve your needs if you're looking for some basics.
Interesting problem - I am not aware of any standard or tool to achieve this.
You would almost have to have some kind of a "neutralized" version with adapters for each individual database system you want to target - even just to map all the various data types (VARCHAR and NVARCHAR in SQL Server, VARCHAR2 in Oracle and so on).
You might just use the types defined in the SQL:2003 standard - but even then you'd probably still have to have some kind of a vendor-specific mapping / adaption of sorts. Not to mention some kind of support for vendor-specific implementation details (like IDENTITY columns in SQL Server vs. SEQUENCE in Oracle and others).
Very interesting question! I hope others will be able to shed more light on the issue and maybe recommend an existing tool.
If not, and you decide to roll your own - consider making it open-source on CodePlex or Google Code! I'm sure a lot of folks would be most interested!
MArc
The UML Information Management Metamodel (IMM) Specification from OMG may worth a try.

Why can't I model my domain using ERD?

I am fairly new to this discussion but I HAVE to ask this question even at the risk of sounding 'ignorant'. Why is it that we now stress so much on 'DDD'. The more I look into 'DDD' the more complex it seems to make my application. Whereas modeling my domain with the database helps keep my application consistent across layers. Then I can use DAL Helpers such as SubSonic or L2S to easily access that model. What is so bad about this? Even in enterprise applications?
Why do we strive to create a new way of modeling our domain when we have a tried and tested one?
I am willing to hear from the purists here.
You can't sell an old methodology, because too many projects failed and too many people know the old methodology anyway. There has to be a new one to market.
If you're doing fine with the old way then use what works. Do pay attention to new stuff, as some really nice ideas come along. But that doesn't mean everything old is bad and stupid. Usually you can incorporate new ideas into the old models to a large degree.
There does come a time to make a move. Like I wouldn't do OOP with structures and function pointers. ;-)
This is actually a really excellent question, and the short answer is "you can." We used to do it that way, and there was a whole area of enterprise (data) modeling. In fact, the common OOD notations evolved from ERD.
What we discovered, however, was that data-driven designs like that had some difficulties, the biggest of them being that the natural structure for a data base doesn't necessarily match well to the natural structure for code.
OOD, to a great extent, derives from the desire to make it easier to find a code structure that has a couple of desirable properties:
it should be easy to think out the design
it should be robust under changes.
The ease to think out design comes originally from Simula, which used what we now think of as "objects" for simulation specifically; it was convenient in simulation to have software entities that correspond to the things you're simulating. It was only later that Alan kay et al at Xerox saw that as a more general structuring method.
The part about robustness under changes had many parents, but one of the most important ones among them was Dave Parnas, you wrote several papers that identified a basic rule for modularization, which I call Parnas' Law: every module should keep a secret, and that secret is a requirements that is likely to change.
It turns out that by combining Parnas' Law with the Simula idea of a "object" as corresponding to something that can be identified with the real world, you tend to get system designs that are more robust under requirements changes than the old way we did things. (Not always, and sometime you have to be crafty, as with the Command pattern. Most objects are nouns, thing that have persistent existence. In the Command pattern, the ideal objects are verbs, things you do.)
However, it also turns out that that structure isn't necessarily a good way to represent the underlying data in a relational database, so we end up with the "object relational impedance mismatch" problem: how to represent the transformation from objectland to database-land.
Short answer: if all you need is a CRUD system that allows users to edit data, just build an Access front-end to your back-end database (or use a scaffolding framework like you mentioned) and call it a day. You should be able to lop off 70% of your budget vs. a domain-driven system.
Long answer: with a data-driven design, what does the implementation of the business model look like? Usually after a couple years of building on new features to your application, you'll find that it's all over the place: tables, views, stored procedures, various application services, code-behind files, presenters/ViewModels, etc. with duplication everywhere. When you're having a conversation with the domain expert about a new feature they are requesting, you are constantly trying to translate from the business language into the language around your implementation, and it just does not translate.
What typically ends up happening is that you are forced to communicate with the business in terms of the implementation of the system, and the implementation becomes the "ubiquitous language" that the business and developers are forced to use when communicating. This has a wide range of consequences. The domain experts in the business start believing that they are experts in the implementation domain, and they start demanding features in terms of implementation rather than the business need they are trying to solve.
Also, you'll find that most data-driven implementations do not follow the "conceptual contours" of the domain, and the components of the system aren't very flexible in how they can be combined together to solve the problem, because they don't map one-to-one with concepts in the business model. When code isn't cohesive, changes and new features may require modifications all over your implementation.
Domain Driven-Design provides tools for making your implementation so closely resemble the business model that it's easy for everyone to speak the language of the business. It allows you to write "executable specifications" that test your implementation, but can actually be understood by your domain experts.

Resources