RDF for software libraries? - shared-libraries

RDF for software libraries? - shared-libraries

Are there RDF schemas, ontologies for software libraries? Let's say I'm a Lisp programmer familiar with quicklisp. Or I'm a C programmer using numerical packages. Or a MS programmer using DLLs. Has any software libraries like these been "semantic webbed" with RDF triples? If not, where might I start? I'm guessing there exist XML or JSON data management for libraries, correct?

Not sure I fully get your question though: are you looking for a catalogue of ontologies with their type (subject matter) given? If so, try https://lov.linkeddata.es/ for a list of vocabularies (ontologies) that you can search through. The search loos at individual terms within the vocabs so any hits for terms you enter might indicate a vocab of interest.

Related

Data Mining in Lisp

I'm looking for a way to accomplish data mining tasks in Common Lisp; does anything exist that would make this possible? I found Incanter for Clojure, but I have to stick to Common Lisp for the task at hand.

These are libraries I use often and think helpful:
GSLL: GNU Scientific Library for Lisp
LLA: Lisp Linear Algebra (blas and lapack bindings)
Gabor Melis' ML Libraries(svm, svd, statistics, etc)
There are a lot more listed on cliki that I haven't had a chance to evaluate.

Your question is a bit vague, data mining is a huge field
For statistics, I would also check out:
Tamas Papp's other libraries as well as LLA. In particular cl-random, cl-slice& Cl-num-utils useful stuff.
Mirko Vukevic has a nice implementation of data tables
For the moment I would not worry too much about common-lisp-stat. To say that its pre-alpha would be an understatement. However that will change Real Soon Now, as i intend another round of development
for data munging - Alain Picards CSV (or the many variants thereof, or Pascal Bourgignon's implementation).
Check out the http://www.cliki.net/database page for various database clients.

Natural language de-identification

I am looking for a natural language tool that can automatically de-identify English text. For example, every email address should be renamed or obscured. But proper names should be de-identified, as should addresses and what not.
There is a MITRE Identification Scrubber Toolkit. I don't know how well it works.
My questions:
Are there any other tools out there?
Does anyone have experience with the MITRE tool? How well does it work?
Thanks.

De-identification (perhaps more often referred to as anonymization) is a very active research area as its success is obviously a requirement for the use of authentic text corpora in such fields as NLP for healthcare, medicine and the like. I recommend that you look at the tools listed in the answer to this question on CrossValidated. If you follow the links further, you will find research papers describing how these tools work with further references and results evaluations.

Creating source to source translator

I want to know that what are the strategies to create a source to source translator i.e creating translation from one high level language to another. The two ways that come into my mind are
1- Changing syntax tree of one language to other language syntax tree
2- Changing it to intermediate language and then converting that to other high level language
My question is that is it possible to do the conversion using both strategies and which is more feasible to do, can anyone give some refernces to any theory or implementation done by some converter like any of above methods. And is there any standard xml based intermediate language, i know that xmlvm uses xml as intermediate language but it does not provide any proper specification of the intermediate language.

Any compiler is, roughly, a source-to-source translator. Target language can be an assembly language (or directly a binary machine code language), or C, or whatever high level language you fancy. So, the general compilers theory is applicable.
And just as a word of advice - one intermediate language is normally not nearly enough. Use more. Use dozens of intermediate languages, each different from a previous one in just one tiny aspect. This way any language-to-language translation is nothing but trivial.
Another word of advice (anticipating downvotes here) - stay away from XML, especially as a representation for ASTs.

I would look at LLVM, which can do source to source. Although the output isn't pretty, it might provide some good ideas.

The converters are usually based on constructing the semantic tree of one program and then re-shaping it to the target PL. As an example, take a look at C# to Java convertor.
The second approach is also possible, but the organization of your code may change completely after conversion. So, it is better to keep the intermediate common structure (IL, ST, etc), as high level as possible.

Try Clang! It is powerful for source-to-source translation. As of now it fully supports C, C++, Objective C and Objective C++.
You may also want to look at ROSE compiler infrastructure.

is it possible to markup all programming languages under object oriented paradigm using a common markup schema?

i have planned to develop a tool that converts a program written in a programming language (eg: Java) to a common markup language (eg: XML) and that markup code is converted to another language (eg: C#).
in simple words, it is a programming language converter that converts program written in one language to another language.
i think it is possible but i don know where to start. i wanna know the possibilities to do so and information about some existing system.

What you are trying to do is extremely hard, but if you want to know what you are up for I've listed the steps you need to follow below:
First the hard bit:
First you obtain or derive an operational semantics for your source and target languages.
Then you enhance the semantics to capture your source and target memory models.
Then you need to unify the two enhanced-semantics within a common operational model.
Then you need to define a mapping from your source languages onto the common operational model.
Then you need to define a mapping from your operational model to your target language
Step 4, as you pointed out in your question, is trivial.
Step 1 is difficult, as most languages do not have sufficiently formal semantics specified; but I recommend checking out http://lucacardelli.name/TheoryOfObjects.html as this is the best starting point for building a traditional OO semantics.
Step 2 is almost certainly impossible in general, but may be merely obscenely difficult if you are willing to sacrifice some efficiency.
Step 3 will depend on how clean the result of step 1 turned out, but is going to be anything from delicate and tricky to impossible.
Step 5 is not going to be trivial, it is effectively writing a compiler.
Ultimately, what you propose to do is impossible in general, due to the difficulties inherited in steps 1 and 2. However it should be difficult, but doable, if you are willing to: severely restrict the source language constructs supported; pretty much forget handling threads correctly; and pick two languages with sufficiently similar semantics (ie. Java and C# are ok, but C++ and anything-else is not).

It depends on what languages you want to support, but in general this is a huge & difficult task unless you plan to only support a very small subset of each language.
The real problem is that each programming languages has different features (with some areas that overlap and others that don't) and different ways of solving the same problems -- and it's pretty tricky to detect the problem the programmer is trying to solve and convert that to a new idiom. :) And think about the differences between GUIs created in different languages....
See http://xmlvm.org/ as an example (a project aimed at converting between source code of many different languages, with an XML middle-point) -- the site covers in some depth the challenges they are tackling and the compromises they take, and (if you still have any interest in this kind of project...) ask more specific followup questions.
Notice specifically what the output source code looks like -- it's not at all readable, maintainable, efficient, etc..

It is "technically easy" to produce XML for any single langauge: build a parser, construct and abstract syntax tree, and dump out that tree as XML. (I build tools that do this off-the-shelf for many languages). By technically easy, I mean that the community knows how to do this (see any compiler textbook, e.g., Aho&Ullman Dragon book). I do not mean this is a trivial exercise in terms of effort, because real languages are complicated and messy; there have been many attempts to build C++ parsers and few successes. (I have one of the successes, and it was expensive to get right).
What is really hard (and I don't try to do) is produce XML according to a single schema in which the language semantics are exposed. And without that, it will be essentially impossible to write a translator from a generic XML to an arbitrary target language. This is known as the UNCOL problem and people have been looking since 1958 for the answer. I note that the Wikipedia article seems to indicate the problem is solved, but you can't find many references to UNCOL in the literature since 1961.
The closest attempt I've seen to this is the OMG's "ASTM" model (http://www.omg.org/spec/ASTM/1.0/Beta1/); it exports XMI which is XML. But the ASTM model has lots of escapes built into it to allow langauges that it doesn't model perfectly (AFAIK, that means every language) to extend the XMI in arbitrary ways so that the language-specific information can be encoded. Consequently each language parser produces a custom version of the XMI, and thus each reader has to pretty much know about the extensions and full generality vanishes.

Programmatic parsing and understanding of language (English)

I am looking for some resources pertaining to the parsing and understanding of English (or just human language in general). While this is obviously a fairly complicated and wide field of study, I was wondering if anyone had any book or internet recommendations for study of the subject. I am aware of the basics, such as searching for copulas to draw word relationships, but anything you guys recommend I will be sure to thoroughly read.
Thanks.

Check out WordNet.

You probably want a book like "Representation and Inference for Natural Language - A First Course in Computational Semantics"
http://homepages.inf.ed.ac.uk/jbos/comsem/book1.html

Another way is looking at existing tools that already do the job on the basis of research papers: http://nlp.stanford.edu/index.shtml
I've used this tool once, and it's very nice. There's even an online version that lets you parse English and draws dependency trees and so on.
So you can start taking a look at their papers or the code itself.
Anyway take in consideration that in any field, what you get from such generic tools is almost always not what you want. In the sense that the semantics attributed by such tools is not what you would expect. For most cases, given a specific constrained domain it's preferable to roll your own parser, and do your best to avoid any ambiguities beforehand.

The process that you describe is called natural language understanding. There are various algorithms and software tools that have been developed for this purpose.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string