How to compare 2 xsd schema files for equivalent functionality - xsd

I would like to compare 2 XSD schemas A and B to determine that all instance documents valid to schema A would also be valid to schema B. I hope to use this to prove that even though schema A and B are "different" they are effectively the same. Examples of differences this would not trigger would be Schema A uses types and Schema B declares all of it's elements inline.
I have found lots of people talking about "smart" diff type tools but these would claim the two files are different because they have different text but the resulting structure is the same. I found some references to XSOM but I'm not sure if that will help or not.
Any thoughts on how to proceed?

Membrane SOA Model - Java API for WSDL and XML Schema
package sample.schema;
import java.util.List;
import com.predic8.schema.Schema;
import com.predic8.schema.SchemaParser;
import com.predic8.schema.diff.SchemaDiffGenerator;
import com.predic8.soamodel.Difference;
public class CompareSchema {
public static void main(String[] args) {
compare();
}
private static void compare(){
SchemaParser parser = new SchemaParser();
Schema schema1 = parser.parse("resources/diff/1/common.xsd");
Schema schema2 = parser.parse("resources/diff/2/common.xsd");
SchemaDiffGenerator diffGen = new SchemaDiffGenerator(schema1, schema2);
List<Difference> lst = diffGen.compare();
for (Difference diff : lst) {
dumpDiff(diff, "");
}
}
private static void dumpDiff(Difference diff, String level) {
System.out.println(level + diff.getDescription());
for (Difference localDiff : diff.getDiffs()){
dumpDiff(localDiff, level + " ");
}
}
}
After executing you get the output shown in listing 2. It is a List of
differences between the two Schema documents.
ComplexType PersonType has changed: Sequence has changed:
Element id has changed:
The type of element id has changed from xsd:string to tns:IdentifierType.
http://www.service-repository.com/ offers an online XML Schema Version Comparator tool that displays a report of the differences between two XSD that appears to be produced from the Membrane SOA Model.

My approach to this was to canonicalize the representation of the XML Schema.
Unfortunately, I can also tell you that, unlike canonicalization of XML documents (used, as an example, to calculate a digital signature), it is not that simple or even standardized.
So basically, you have to transform both XML Schemas to a "canonical form" - whatever the tool you build or use thinks that form is, and then do the compare.
My approach was to create an XML Schema set (could be more than one file if you have more namespaces) for each root element I needed, since I found it easier to compare XSDs authored using the Russian Doll style, starting from the PSVI model.
I then used options such as auto matching substitution group members coupled with replacement of substitution groups with a choice; removal of "superfluous" XML Schema sequences, collapsing of single option choices or moving minOccurs/maxOccurs around for single item compositors, etc.
Depending on what your XSD-aware comparison tool's features are, or you settle to build, you might also have to rearrange particles under compositors such as xsd:choice or xsd:all; etc.
Anyway, what I learned after all of it was that it is extremely hard to build a tool that would work nice for all "cool" XSD features out there... One test case I remember fondly was to deal with various xsd:any content.
I do wonder though if things have changed since...

Related

JOOQ generator: use existing enums

I am using nu.studer.jooq gradle plugin to generate pojos, tables and records for a PostgreSQL database with tables that have fields of type ENUM.
We already have the enums in the application, so I would like that the generator uses those enums instead of generating new ones.
I defined in build.gradle for the generator: udts = false, so it doesn't generate the enums, and I wrote a custom generator strategy that sets the package for the enums to be the one of the already existing enums.
I have an issue in the generated table fields, the SQLDataType.VARCHAR.asEnumDataType(mypackage.ExistingEnum) doesn't work because the mypackage.ExistingEnum does not implement org.jooq.EnumType.
public enum ExistingEnum {
VAL1, VAL2
}
Generated table record:
public class EntryTable extends TableImpl<EntryRecord> {
public final TableField<EntryRecord, ExistingEnum> MY_FIELD = createField(DSL.name("my_field"), SQLDataType.VARCHAR.asEnumDataType(mypackage.ExistingEnum.class), this, "");
}
Is there something I can do to fix this issue? Also we have a lot of enums, so writing a converter for each of them is not suitable.
The point of having custom enum types is that they are individual types, independent of whatever you encode with your database enum types. As such, the jOOQ code generator cannot make any automated assumptions related to how to map the generated types to the custom types. You'll have to implement Converter types of some sort.
If you're not relying on the jOOQ provided EnumType types, you could use the <enumConverter/> configuration, or write implementations based on org.jooq.impl.EnumConverter, which help reduce boilerplate code.
If you have some conventions or rules how to map things a bit more automatically (just because jOOQ doesn't know your convention doesn't mean you don't know it either), you could implement a programmatic code generation configuration, where you query your dictionary views (e.g. PG_CATALOG.PG_ENUM) to generate ForcedType objects. You can even use jOOQ-meta for that purpose.

MPS way of attaching additional attributes to concept's properties/references

I've a set of concepts that represent types of entities
Hrrr.
Sample concepts:
Loop with children loopCount: IntegerProperty[1]
HttpRequest with children url: StringProperty[1], hostName: StringProperty[1]
Both concepts extend AbstractTestElement concept (it defines common properties like name, comment, etc).
I want Loop and HttpRequest to be generated to baseLanguage as follows:
Loop:
Loop e = new Loop();
e.setProperty(new IntegerProperty("loopCount", node.loopCount));
HttpRequest:
HttpRequest e = new HttpRequest();
e.setProperty(new StringProperty("url", node.url));
e.setProperty(new IntegerProperty("host", node.hostName));
What I want is to have some common generator template that covers this common logic for setProperty so it is not repeated for different kinds of test elements.
Well, there are properties that require specific-to-test-element treatment, however there are often cases when properties are one-to-one translated, thus
Here's the question: how can I attach metadata to the Loop/HttpRequest concept configuration?
What is MPS-idiomatic way of doing that?
1) While I could use "names of properties" as names put into the new XXXProperty, however ideally I would use HttpRequest.HOST_PROPERTY_NAME kind of references, thus "names of properties" is not sufficient.
2) I might probably invent annotations and annotate properties of my concepts, it looks like MPS itself does not use that approach.
3) (ab)using concept's behaviors to return <quotation new StringProperty("url", node.url) > looks even more awkward.
I would rather not use 2. and 3. because both approaches add generator behavior into aspects of your languages which aren't aware of the fact how things will be generated. It basically tight couples you structure with your generator.
If you go for 1, you can still use that static class approach. By creating a new rootnode in the generator which is a java class and contains all your fields. And then have generic generator template that reduces the IntegerProperty and so on ... If they have a common super concept it should be fairly easy to do. You just have to make sure that the property is generated before the containing concept. That way you can still access the role of it in the parent and use that information to generate the field access.

Can one change/influence JAXB's code generation?

I was wondering whether one can influence the "style" of the code that JAXB generates from XML schema (.xsd) fles. E.g. I would like to:
emit a comment inside newly generated classes, specifically if the class is empty, since that triggers warnings in my environment.
change all setter-methods to return the object instead of "void", so one can do call-chaining like:
X someMethod() {
return new X().setFoo(5).setBar("something");
}
instead of the tedious:
X someMethod() {
X x = new (X);
x.setFoo(5);
x.setBar("something");
return x;
}
Is there some "template" anywhere that JAXB uses and that one could tweak, to achieve such things? Or is that all hard-coded?
M.
There is no template for modifying the generated code easily.
There is, however, a number of plugins. For instance: https://java.net/projects/jaxb2-commons/pages/Fluent-api which is just what you want according to your 2nd bullet.
There are other plugins, e.g. for annotations suppressing warnings - that may help against the 1st bullet.
As an extra, I'd like to mention that not generating Java classes from an XML schema but writing them by hand (plus annotations, of course) is a plausible alternative, provided the XML schema isn't too complex. It may have other advantages besides solving #1 and #2.

Partial objects with JAXB?

I'm working to create some services with JAX-RS, and am relatively new to JAXB (actually XML in general) so please don't assume I know the pre-requisites that I probably should know! Here's the questions: I want to send and receive "partial" objects in XML. That is, imagine one has an object (Java form, obviously) with:
class Thing { int x, String y, Customer z }
I want to be able to send an XML output that contains (dynamically chosen, so I can't use XmlTransient) just x, or just z, or x and y, but not z, or any other combination that suits my client. The point, obviously, is that sometimes the client doesn't need everything, so I can save some bandwidth (particularly with lists of deep, complex objects, which this example clearly doesn't illustrate!).
Also, for input, the same bandwidth argument applies; I would like to be able to have the client send just the particular fields that should be updated in, say, a PUT operation, and ignore the rest, then have the server "merge" those new values onto existing objects and leave the un-mentioned fields unchanged.
This seems to be supported in the Jackson JSON libraries (though I'm still working on it), but I'm having trouble finding it in JAXB. Any ideas?
One thought that I was pondering is whether one can do this in some way via Maps. If I created a Map (potentially nested Maps, for nested coplex objects) of what I want to send, could JAXB send that with a plausible structure? And if it could create such a map on input, I guess I could work through it to make the updates. Not perfect, but maybe?
And yes, I know that the "documents" that will be flying around will probably fail to comply with schemas, having missing fields and all that, but I'm ok with that, provided the infrastructure can be made to work.
Oh, and I know I could do this "manually" with SAX, StAX, or DOM parsing, but I'm hoping there's a rather more automatic way, particularly since JAXB handles the whole objects so effortlessly.
Cheers,
Toby
Note: I'm the EclipseLink JAXB (MOXy) lead and a member of the JAXB (JSR-222) expert group.
EclipseLink JAXB (MOXy) offerst this support through its object graph extension. Object graphs allow you to specify a subset of properties for the purposes of marshalling an unmarshalling. They may be created at runtime programatically:
// Create the Object Graph
ObjectGraph contactInfo = JAXBHelper.getJAXBContext(jc).createObjectGraph(Customer.class);
contactInfo.addAttributeNodes("name");
Subgraph location = contactInfo.addSubgraph("billingAddress");
location.addAttributeNodes("city", "province");
Subgraph simple = contactInfo.addSubgraph("phoneNumbers");
simple.addAttributeNodes("value");
// Output XML - Based on Object Graph
marshaller.setProperty(MarshallerProperties.OBJECT_GRAPH, contactInfo);
marshaller.marshal(customer, System.out);
or statically on the class through annotations:
#XmlNamedObjectGraph(
name="contact info",
attributeNodes={
#XmlNamedAttributeNode("name"),
#XmlNamedAttributeNode(value="billingAddress", subgraph="location"),
#XmlNamedAttributeNode(value="phoneNumbers", subgraph="simple")
},
subgraphs={
#XmlNamedSubgraph(
name="location",
attributeNodes = {
#XmlNamedAttributeNode("city"),
#XmlNamedAttributeNode("province")
}
)
}
)
#XmlRootElement
#XmlAccessorType(XmlAccessType.FIELD)
public class Customer {
For More Information
http://blog.bdoughan.com/2013/03/moxys-object-graphs-partial-models-on.html
http://blog.bdoughan.com/2013/03/moxys-object-graphs-inputoutput-partial.html
http://blog.bdoughan.com/2011/05/specifying-eclipselink-moxy-as-your.html

ServiceStack: RESTful Resource Versioning

I've taken a read to the Advantages of message based web services article and am wondering if there is there a recommended style/practice to versioning Restful resources in ServiceStack? The different versions could render different responses or have different input parameters in the Request DTO.
I'm leaning toward a URL type versioning (i.e /v1/movies/{Id}), but I have seen other practices that set the version in the HTTP headers (i.e Content-Type: application/vnd.company.myapp-v2).
I'm hoping a way that works with the metadata page but not so much a requirement as I've noticed simply using folder structure/ namespacing works fine when rendering routes.
For example (this doesn't render right in the metadata page but performs properly if you know the direct route/url)
/v1/movies/{id}
/v1.1/movies/{id}
Code
namespace Samples.Movies.Operations.v1_1
{
[Route("/v1.1/Movies", "GET")]
public class Movies
{
...
}
}
namespace Samples.Movies.Operations.v1
{
[Route("/v1/Movies", "GET")]
public class Movies
{
...
}
}
and corresponding services...
public class MovieService: ServiceBase<Samples.Movies.Operations.v1.Movies>
{
protected override object Run(Samples.Movies.Operations.v1.Movies request)
{
...
}
}
public class MovieService: ServiceBase<Samples.Movies.Operations.v1_1.Movies>
{
protected override object Run(Samples.Movies.Operations.v1_1.Movies request)
{
...
}
}
Try to evolve (not re-implement) existing services
For versioning, you are going to be in for a world of hurt if you try to maintain different static types for different version endpoints. We initially started down this route but as soon as you start to support your first version the development effort to maintain multiple versions of the same service explodes as you will need to either maintain manual mapping of different types which easily leaks out into having to maintain multiple parallel implementations, each coupled to a different versions type - a massive violation of DRY. This is less of an issue for dynamic languages where the same models can easily be re-used by different versions.
Take advantage of built-in versioning in serializers
My recommendation is not to explicitly version but take advantage of the versioning capabilities inside the serialization formats.
E.g: you generally don't need to worry about versioning with JSON clients as the versioning capabilities of the JSON and JSV Serializers are much more resilient.
Enhance your existing services defensively
With XML and DataContract's you can freely add and remove fields without making a breaking change. If you add IExtensibleDataObject to your response DTO's you also have a potential to access data that's not defined on the DTO. My approach to versioning is to program defensively so not to introduce a breaking change, you can verify this is the case with Integration tests using old DTOs. Here are some tips I follow:
Never change the type of an existing property - If you need it to be a different type add another property and use the old/existing one to determine the version
Program defensively realize what properties don't exist with older clients so don't make them mandatory.
Keep a single global namespace (only relevant for XML/SOAP endpoints)
I do this by using the [assembly] attribute in the AssemblyInfo.cs of each of your DTO projects:
[assembly: ContractNamespace("http://schemas.servicestack.net/types",
ClrNamespace = "MyServiceModel.DtoTypes")]
The assembly attribute saves you from manually specifying explicit namespaces on each DTO, i.e:
namespace MyServiceModel.DtoTypes {
[DataContract(Namespace="http://schemas.servicestack.net/types")]
public class Foo { .. }
}
If you want to use a different XML namespace than the default above you need to register it with:
SetConfig(new EndpointHostConfig {
WsdlServiceNamespace = "http://schemas.my.org/types"
});
Embedding Versioning in DTOs
Most of the time, if you program defensively and evolve your services gracefully you wont need to know exactly what version a specific client is using as you can infer it from the data that is populated. But in the rare cases your services needs to tweak the behavior based on the specific version of the client, you can embed version information in your DTOs.
With the first release of your DTOs you publish, you can happily create them without any thought of versioning.
class Foo {
string Name;
}
But maybe for some reason the Form/UI was changed and you no longer wanted the Client to use the ambiguous Name variable and you also wanted to track the specific version the client was using:
class Foo {
Foo() {
Version = 1;
}
int Version;
string Name;
string DisplayName;
int Age;
}
Later it was discussed in a Team meeting, DisplayName wasn't good enough and you should split them out into different fields:
class Foo {
Foo() {
Version = 2;
}
int Version;
string Name;
string DisplayName;
string FirstName;
string LastName;
DateTime? DateOfBirth;
}
So the current state is that you have 3 different client versions out, with existing calls that look like:
v1 Release:
client.Post(new Foo { Name = "Foo Bar" });
v2 Release:
client.Post(new Foo { Name="Bar", DisplayName="Foo Bar", Age=18 });
v3 Release:
client.Post(new Foo { FirstName = "Foo", LastName = "Bar",
DateOfBirth = new DateTime(1994, 01, 01) });
You can continue to handle these different versions in the same implementation (which will be using the latest v3 version of the DTOs) e.g:
class FooService : Service {
public object Post(Foo request) {
//v1:
request.Version == 0
request.Name == "Foo"
request.DisplayName == null
request.Age = 0
request.DateOfBirth = null
//v2:
request.Version == 2
request.Name == null
request.DisplayName == "Foo Bar"
request.Age = 18
request.DateOfBirth = null
//v3:
request.Version == 3
request.Name == null
request.DisplayName == null
request.FirstName == "Foo"
request.LastName == "Bar"
request.Age = 0
request.DateOfBirth = new DateTime(1994, 01, 01)
}
}
Framing the Problem
The API is the part of your system that exposes its expression. It defines the concepts and the semantics of communicating in your domain. The problem comes when you want to change what can be expressed or how it can be expressed.
There can be differences in both the method of expression and what is being expressed. The first problem tends to be differences in tokens (first and last name instead of name). The second problem is expressing different things (the ability to rename oneself).
A long-term versioning solution will need to solve both of these challenges.
Evolving an API
Evolving a service by changing the resource types is a type of implicit versioning. It uses the construction of the object to determine behavior. Its works best when there are only minor changes to the method of expression (like the names). It does not work well for more complex changes to the method of expression or changes to the change of expressiveness. Code tends to be scatter throughout.
Specific Versioning
When changes become more complex it is important to keep the logic for each version separate. Even in mythz example, he segregated the code for each version. However, the code is still mixed together in the same methods. It is very easy for code for the different versions to start collapsing on each other and it is likely to spread out. Getting rid of support for a previous version can be difficult.
Additionally, you will need to keep your old code in sync to any changes in its dependencies. If a database changes, the code supporting the old model will also need to change.
A Better Way
The best way I've found is to tackle the expression problem directly. Each time a new version of the API is released, it will be implemented on top of the new layer. This is generally easy because changes are small.
It really shines in two ways: first all the code to handle the mapping is in one spot so it is easy to understand or remove later and second it doesn't require maintenance as new APIs are developed (the Russian doll model).
The problem is when the new API is less expressive than the old API. This is a problem that will need to be solved no matter what the solution is for keeping the old version around. It just becomes clear that there is a problem and what the solution for that problem is.
The example from mythz's example in this style is:
namespace APIv3 {
class FooService : RestServiceBase<Foo> {
public object OnPost(Foo request) {
var data = repository.getData()
request.FirstName == data.firstName
request.LastName == data.lastName
request.DateOfBirth = data.dateOfBirth
}
}
}
namespace APIv2 {
class FooService : RestServiceBase<Foo> {
public object OnPost(Foo request) {
var v3Request = APIv3.FooService.OnPost(request)
request.DisplayName == v3Request.FirstName + " " + v3Request.LastName
request.Age = (new DateTime() - v3Request.DateOfBirth).years
}
}
}
namespace APIv1 {
class FooService : RestServiceBase<Foo> {
public object OnPost(Foo request) {
var v2Request = APIv2.FooService.OnPost(request)
request.Name == v2Request.DisplayName
}
}
}
Each exposed object is clear. The same mapping code still needs to be written in both styles, but in the separated style, only the mapping relevant to a type needs to be written. There is no need to explicitly map code that doesn't apply (which is just another potential source of error). The dependency of previous APIs is static when you add future APIs or change the dependency of the API layer. For example, if the data source changes then only the most recent API (version 3) needs to change in this style. In the combined style, you would need to code the changes for each of the APIs supported.
One concern in the comments was the addition of types to the code base. This is not a problem because these types are exposed externally. Providing the types explicitly in the code base makes them easy to discover and isolate in testing. It is much better for maintainability to be clear. Another benefit is that this method does not produce additional logic, but only adds additional types.
I am also trying to come with a solution for this and was thinking of doing something like the below. (Based on a lot of Googlling and StackOverflow querying so this is built on the shoulders of many others.)
First up, I don’t want to debate if the version should be in the URI or Request Header. There are pros/cons for both approaches so I think each of us need to use what meets our requirements best.
This is about how to design/architecture the Java Message Objects and the Resource Implementation classes.
So let’s get to it.
I would approach this in two steps. Minor Changes (e.g. 1.0 to 1.1) and Major Changes (e.g 1.1 to 2.0)
Approach for minor changes
So let’s say we go by the same example classes used by #mythz
Initially we have
class Foo { string Name; }
We provide access to this resource as /V1.0/fooresource/{id}
In my use case, I use JAX-RS,
#Path("/{versionid}/fooresource")
public class FooResource {
#GET
#Path( "/{id}" )
public Foo getFoo (#PathParam("versionid") String versionid, (#PathParam("id") String fooId)
{
Foo foo = new Foo();
//setters, load data from persistence, handle business logic etc
Return foo;
}
}
Now let’s say we add 2 additional properties to Foo.
class Foo {
string Name;
string DisplayName;
int Age;
}
What I do at this point is annotate the properties with a #Version annotation
class Foo {
#Version(“V1.0")string Name;
#Version(“V1.1")string DisplayName;
#Version(“V1.1")int Age;
}
Then I have a response filter that will based on the requested version, return back to the user only the properties that match that version. Note that for convenience, if there are properties that should be returned for all versions, then you just don’t annotate it and the filter will return it irrespective of the requested version
This is sort of like a mediation layer. What I have explained is a simplistic version and it can get very complicated but hope you get the idea.
Approach for Major Version
Now this can get quite complicated when there is a lot of changes been done from one version to another. That is when we need to move to 2nd option.
Option 2 is essentially to branch off the codebase and then do the changes on that code base and host both versions on different contexts. At this point we might have to refactor the code base a bit to remove version mediation complexity introduced in Approach one (i.e. make the code cleaner) This might mainly be in the filters.
Note that this is just want I am thinking and haven’t implemented it as yet and wonder if this is a good idea.
Also I was wondering if there are good mediation engines/ESB’s that could do this type of transformation without having to use filters but haven’t seen any that is as simple as using a filter. Maybe I haven’t searched enough.
Interested in knowing thoughts of others and if this solution will address the original question.

Resources