Is there any opensource tool for converting xml schema to database schema for linux? - linux

Is there any opensource tool which convert xml schema to database schema for linux. All I need is it should read xml schema, generate corresponding database schema and create tables with that. I tried to google and all I could find is xsd2db and its written in c#, but of no use for me. I am using centos and my database is postgresql. Any help is appreciated. Thanks in advance.

Native support appears on the way, but I can't find anything native. Also not finding any kind of decent tools to do the job.
So, I though this would be a neat weekend project to learn a bit more about XSD. I created xsd2pgsql to handle this. It's still pretty rough around the edges, so I'd like you to try it out and let me know of any problems you have. Or fork it if you'd like to help.
XML isn't the greatest format to represent a database since it's 3d and a DB is pretty much 2d. So some assumptions are made by this script, like all element children of root are the primary table and any complexType after that will be a table. That said, this should work on most XML Schemas(or at least the few I've tested).
You can get all the options with the -h option. But basically, you can provide it with the XSD file(s) as the arguments and you can use the options to change behavior slightly or to have it run the SQL directly on your DB. If it's a production system, I'd recommend not connecting directly to the DB and making sure the SQL output is good to go or not, and to make any adjustments.
Here's an example usage with the sample files in the repository: python xsd2pgsql.py -f sample-2.xsd sample.xsd
NOTE: Currently this doesn't handle any relations/references between tables/XML complex types. You'll have to add those and any indexes you want after the fact. Custom namespaces aren't yet supported either.
Hope this helps.

Related

Bulk load XML files into Cassandra

I'm looking into using Cassandra to store 50M+ documents that I currently have in XML format. I've been hunting around but I can't seem to find anything I can really follow on how to bulk load this data into Cassandra without needing to write some Java (not high on my list of language skills!).
I can happily write a script to convert this data into any format if it would make the loading easier although CSV might be tricky given the body of the document could contain just about anything!
Any suggestions welcome.
Thanks
Si
If you're willing to convert the XML to a delimited format of some kind (i.e. CSV), then here are a couple options:
The COPY command in cqlsh. This actually got a big performance boost in a recent version of Cassandra.
The cassandra-loader utility. This is a lot more flexible and has a bunch of different options you can tweak depending on the file format.
If you're willing to write code other than Java (for example, Python), there are Cassandra drivers available for a bunch of programming languages. No need to learn Java if you've got another language you're better with.

Anybody is aware of a XML to SWI-Prolog binding tool similar to JAXB for Java or XSDE for C++?

Is it possible to create Prolog-specific bindings of XML schema files? If so, can anybody point me in the right direction?
I have a schema which I use in Java to create a JAXB binding to serialize and de-serialize XML files. These files are created by an SWI-Prolog application which is still evolving and is developed by a geographically distant team. I want to make sure that when XML format changes, they are tied to a schema file change and not just view a schema as a nice documentation tool which often lags their actual XML content.
Any suggestions are appreciated.
Sounds like the general consensus is to use Prolog's JPL package to wrap a JAXB binding in Java. This is the closest one seems to get to an actual binding implementation.
There is an SGML package which is used by many, but this is not a binding. As far as I understand, the best one can hope for is to be able to validate one's XML against the schema, but a quick look at the SGML documentation did not give me an impression that this functionality is actually fully implemented.
So, if one needs a schema binding-style XML management framework, the recommendation is to access JAXB binding of this schema via JPL. Some folks expressed stability concerns of JPL. They reported frequent crashes of JVM when using JPL, but I have not independently verified these claims and thus cannot confirm them. If anybody has more input on this topic, I would love to get it.
Thanks

Custom log processing/parsing

I have such log format:
[26830431.7966868][4][0.013590574264526367][30398][api][1374829886.320353][init]
GET /foo
{"controller"=>"foo", "action"=>"index"}
[26830431.7966868][666][2.1876697540283203][30398][api][1374829888.4944339][request_end]
200 OK
The entry is constracted using such pattern:
[request_id][user_id][time_from_request_started][process_id][app][timestamp][tagline]
payload
Durring request I have many point where I log something - app basically has complex behaviour. This helps me debug a lot the user behaviour.
The way I would like to parse it is that I would like to make have directory structure like this:
req_id
|
|----[time_from_request_started][process_id][timestamp][tagline]
|
etc
Basically each directory will have name based on req_id, with files wchich names are rest of tagline. These files will include payload.
And also I will have other directory, with users ids, which will contain symlinks to request done by this user.
First question: Is this structure correct? In my opinion it will make easy fast log access. The reason I want to use directories and files is that I like unix approach, and try it (feel by myself its drawbacks and advantages)
Second question: I will have no problem to use ruby for creating this. But I would like to learn some new tool, which is better suited for this. I am thinking about using just unix tools (pipe, awk etc) to achieve this, or write parser in golang which I am learning right now (even have time to implement simple map reduce). What tool is best suited for this?
I would not store logs in a directory to see how the users behave.
Depending on what behaviour you want to keep track of you could use different tools. One of these could be mixpanel or keen.io.
Instead of logging what the user did in a log file you would sent an event to either of those (they are pretty similar, pick the one you think has better docs / lib), then you would graph those events to better understand the behaviour of your users. I've done this a lot recently, to display data in a nice way I've used rickshaw.
The key point why I'm suggesting this is that if you go the file route you will still have to find a way to understand your data, something that graphs will help you a lot at. Also, visualization is something keen.io does by default, you may still want to do your graphs but it's a good start.
Hope this helped.
Is this structure correct?
Only you can know that, it depends directly on how the data needs be accessed and used.
What tool is best suited for this?
You could probably use UNIX tools to achieve this but it may as well be a good exercise to practice your Go skills by writing this. It would also be more extensible.

Using XText to create a DSL for describing proprietary XML-formats

At the moment, I have to work with XACML. As there doesn't seem to be an editor to fit my needs, and as writing documents in it is a real pain, I wonder if I could not create some sort of DSL to make creating documents easier (are less error-prone). Is this possible with XText? I have a feeling it's possible but quite hard to do (especially for someone who doesn't know XText ;-)).
Getting rid of manually edited XML files is a typical use case for Xtext. The tedious part is the syntax definition itself. As soon as you have an idea how your files should look like, it's usually straight forward to get a working prototype with Xtext. What sort of concerns do you have?

How to manage application resources?

We are developing a web application which is available in 3 languages.
There are these key-value pairs to translate everything. At this moment we use Excel (key, german, french, english) for this. But this does not work well ... if there is more than 1 person editing this file, you have no chance to automatically merge the different files.
Is there a good (and free) tool which can handle this job?
--- additional information ---
(This is a STRUTS application) But the question is how to manage these kinds of information in general (or at least in an conveinient way, which also supports multiple users editing this single file ("mergeable" filetypes))
Why not use gettext and manage separate .po files? See that blog entry.
If you can store this information in plain text then you will be able to use a version control system like subversion to help you with merging changes. Subversion is free.
The free guide (the "Red Book") to subversion gives a fairly good explanation of how this kind of merging works.
http://svnbook.red-bean.com/en/1.5/svn.basic.vsn-models.html#svn.basic.vsn-models.copy-merge
EDIT: Another thought - if you really want to stay using a spreadsheet - Google Docs supports simultaneous editing of a spreadsheet. You could import your existing spreadsheet and get your multi-user merging wishes for free with very little change to how you work.
Good Question.
There are some "Best Practice" depending on what you actually code in (java, ms-windows c#).
I solved this (but I think there must be a better way) by using a SQL db instead of excel file, and a wrote a plug for VS (VB6,........,..., emacs) that was able to insert new keys into the db without going to round trip with version control. The keys are the developers name of what they think is a best guess for a label. (key => save, sv => "spara", no => "", en => "save").
This db can then be generated as a module, class, obj, txt, to appropriate code(platform)
and can be accessed, depending on the ide, so in c#, bt,label = corelang.save;
Someone else can then do all the language stuff, and then we just update the db and rerun the generation to the platform resources.
After years of seeing localization done, including localization at large companies like Sony. I can only say the "standard" is Excel :)
There are tons of good ideas around, and probably many better ways to do it, but in real-life excel seems to be the best/cost effective solution that doesn't require training or making complex new tools to get the job done.
Found out, that Intellij Idea (at leas in version 7 and 8) has an editor for application resources. But it is not free at all. And it does not scale for bigger resource files with more than 1.000 keys.
Another good choice would be to use Google's spreadsheets ... for those who don't know it - it is like an "online Excell web-application". It can handle concurrent access from multiple users. Yay! But sadly, it comes from Google. This makes it impossible to be used in commercial projects.
So,
still searching...
cheers,
mana

Resources