JAXB Marshaller.Listener doesn't play nice with OutputStream - jaxb

I want to use Marshaller.Listener to insert XML comments, for example, inside the first element.
This works great when marshalling to an XMLStreamWriter.
However, if I marshall to an outputstream (which I'd prefer to do for various reasons), the comment text is written to the middle of the namespace declarations on the root element!
This happens with both the Sun/Oracle and MOXy (2.5.2) implementations.
In the Sun/Oracle implementation, it happens because Sun/Oracle write content to a variable called octetBuffer, and only copy octetBuffer to outputstream periodically. See UTF8XmlOutput.
In MOXy (2.5.2), it looks like OutputStreamRecord is using a byte[] buffer in a similar fashion.
I didn't notice any documentation warning me not to touch my outputstream from my listener, but its clear that doing so is problematic unless those internal buffers can first be flushed.
Any ideas for workarounds?

Related

MapUtils with Logger

I am using MapUtils.verbosePrint(System.out, "", map) to dump the contents of a map in Java. They (management) do not like us using System.out.println().
We are using log4j. They made the logger into a variable "l" so we can say something like l.debug("This is going to the logfile in debug mode).
I would like to get the output buffer(s) from l so I could pass it into verbosePrint() instead of System.out. I looked at all the methods and members of the logger and did things like getAppenders() and tried all those elements but I could not find anything that helped.
Has anyone else done this? I know the logger may write to > 1 output.
You can use Log4j IOStreams to create PrintStreams that will send everything to a logger. This is mostly useful to log debug output from legacy APIs like JDBC or Java Mail that do not have a proper logging system. I wouldn't advise it in other cases, since your messages might be merged or split into several log messages.
I would rather use one of these approaches:
simply log the map using Logger#debug(Object). This will lazily create an ObjectMessage (only if debug is enabled), which is usually formatted using the map's toString() method. Some layouts might format it differently (like the JSON Template Layout).
eagerly create a MapMessage or StringMapMessage:
if (l.isDebugEnabled()) {
l.debug(new MapMessage(map));
}
This gives you more formatting options. For example the layout pattern %m{JSON} will format your message as JSON.
if your are set on the format provided by MapUtils#verbosePrint, you can extend ObjectMessage and overwrite its getFormattedMessage() and formatTo() methods.
public String getFormattedMessage() {
final ByteArrayOutputStream os = new ByteArrayOutputStream();
MapUtils.verbosePrint(new PrintStream(os), "", );
return new String(os.toByteArray());
}

Why CXF / JAXB read whole InputStream into memory before marshalling to SOAP message

INFO - Sample code
I've set up sample code (SSCCE) for you to help track the problem:
https://github.com/ljader/test-cxf-base64-marshall
The problem
I'm integrating with 3rd party JAX-WS service, so I cannot change the WSDL.
The 3rd party webservice expects Base64 encoded bytes to perform some operation on them - they expect that client sends whole bytes in SOAP message.
They don't want to change to MTOM / XOP, so I'm stuck with current requirements.
I decided to use CXF to easily set up sample client, and it worked ok for small files.
But when I try to send BIG data, i.e. 200MB, the CXF/JAXB throws an exception:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.xml.bind.v2.util.ByteArrayOutputStreamEx.readFrom(ByteArrayOutputStreamEx.java:75)
at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.get(Base64Data.java:196)
at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.writeTo(Base64Data.java:312)
at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.text(UTF8XmlOutput.java:312)
at com.sun.xml.bind.v2.runtime.XMLSerializer.leafElement(XMLSerializer.java:356)
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$PcdataImpl.writeLeafElement(RuntimeBuiltinLeafInfoImpl.java:191)
at com.sun.xml.bind.v2.runtime.MimeTypedTransducer.writeLeafElement(MimeTypedTransducer.java:96)
at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.writeLeafElement(TransducedAccessor.java:254)
at com.sun.xml.bind.v2.runtime.property.SingleElementLeafProperty.serializeBody(SingleElementLeafProperty.java:130)
at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:360)
at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:696)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl$1.serializeBody(ElementBeanInfoImpl.java:155)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl$1.serializeBody(ElementBeanInfoImpl.java:130)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeBody(ElementBeanInfoImpl.java:332)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeRoot(ElementBeanInfoImpl.java:339)
at com.sun.xml.bind.v2.runtime.ElementBeanInfoImpl.serializeRoot(ElementBeanInfoImpl.java:75)
at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsRoot(XMLSerializer.java:494)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:323)
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:251)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.writeObject(JAXBEncoderDecoder.java:617)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.marshall(JAXBEncoderDecoder.java:241)
at org.apache.cxf.jaxb.io.DataWriterImpl.write(DataWriterImpl.java:237)
at org.apache.cxf.interceptor.AbstractOutDatabindingInterceptor.writeParts(AbstractOutDatabindingInterceptor.java:117)
at org.apache.cxf.wsdl.interceptors.BareOutInterceptor.handleMessage(BareOutInterceptor.java:68)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:514)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:423)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:324)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:277)
at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:139)
My findings
I've tracked bug, that based on xsd type "base64Binary", the
com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl
decides, that
com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data
should handle marshalling of data from
javax.activation.DataHandler
During marshalling, the WHOLE data from underlying InputStream is trying to be read http://grepcode.com/file/repo1.maven.org/maven2/com.sun.xml.bind/jaxb-impl/2.2.11/com/sun/xml/bind/v2/runtime/unmarshaller/Base64Data.java/#311, which causes OOME exception.
Problem
CXF uses JAXB during marshalling Java objects into SOAP messages - when marshalling InputStream, the WHOLE input stream is read to memory before beeing converted into Base64 binary.
So I want to send ("stream") data from client to server in chunks (since the OutputSteam in marshaller is wrapped direct HttpURLConnection), so my client could can handle sending any amount of data.
Especially when many threads would be using my client, the streaming is IMHO very desirable.
I don't have good JAX-WS/CXF/JAXB knowledge, hence the question.
The only materials which I found and may be usefull are:
Can JAXB parse large XML files in chunks
http://rezarahim.blogspot.com/2010/05/chunking-out-big-xml-with-stax-and-jaxb.html
The questions
Why CXF/JAXB loads whole InputStream into memory - is not the DataHandler purpouse to prevent such implementation?
Do you know any way to change JAXB behaviour to differently marshall InputStream?
Do you know different marshallers, which can handle such big data marshalling?
As a last resort, maybe you have links to some materials, how to create custom marshaller which would stream the data directly to the server?
You don't need any custom marshallers or change JAXB behaviour to achieve what you need - DataHandler is your friend here.
Answering your first question: JAXB needs to keep all data in memory because it has to resolve references.
I know you can't change the WSDL references, etc. But still you do have your client's WSDL in your project in order to generate client classes, don't you? So what you can do (I haven't tested this with third party's WSDL but might be worth trying) is to add xmime:expectedContentTypes="application/octet-stream" into the response XSD element which returns Base64 encoded data. For e.g.:
<xsd:element name="generateBigDataResponse">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="result"
type="xsd:base64Binary"
minOccurs="0"
maxOccurs="1"
xmime:expectedContentTypes="application/octet-stream"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Also do not forget to add namespace: xmlns:xmime="http://www.w3.org/2005/05/xmlmime" in the xsd:schema element.
What you are doing here - is not changing any WSDL references, just telling JAXB instead of generating byte[] to generate DataHandler. So what happens when you generate your client classes like that:
#Override
public DataHandler generateBigData() {
try {
final PipedOutputStream pipedOutputStream = new PipedOutputStream();
PipedInputStream pipedInputStream = new PipedInputStream(pipedOutputStream);
InputStreamDataSource dataSource = new InputStreamDataSource(pipedInputStream, "application/octet-stream");
executor.execute(new Runnable() {
#Override
public void run() {
//write your stuff here into pipedOutputStream
}
});
return new DataHandler(dataSource);
} catch (IOException e) {
//handle exception if any
}
}
You get DataHandler as a response type thanks to xmime. I suggest you use PipedOutputStream, but make sure do the writing in a different thread:
A piped output stream can be connected to a piped input stream to
create a communications pipe. The piped output stream is the sending
end of the pipe. Typically, data is written to a PipedOutputStream
object by one thread and data is read from the connected
PipedInputStream by some other thread. Attempting to use both objects
from a single thread is not recommended as it may deadlock the thread.
The pipe is said to be broken if a thread that was reading data bytes
from the connected piped input stream is no longer alive.
Then you connecting it with the PipedInputStream which instance goes into constructor of InputStreamDataSource which you then pass into DataHandler and return DataHandler's instance. This way your file will be written in chunks and you won't get that exception, more - client will never get the timeout.
Hope this helps.

Can one change/influence JAXB's code generation?

I was wondering whether one can influence the "style" of the code that JAXB generates from XML schema (.xsd) fles. E.g. I would like to:
emit a comment inside newly generated classes, specifically if the class is empty, since that triggers warnings in my environment.
change all setter-methods to return the object instead of "void", so one can do call-chaining like:
X someMethod() {
return new X().setFoo(5).setBar("something");
}
instead of the tedious:
X someMethod() {
X x = new (X);
x.setFoo(5);
x.setBar("something");
return x;
}
Is there some "template" anywhere that JAXB uses and that one could tweak, to achieve such things? Or is that all hard-coded?
M.
There is no template for modifying the generated code easily.
There is, however, a number of plugins. For instance: https://java.net/projects/jaxb2-commons/pages/Fluent-api which is just what you want according to your 2nd bullet.
There are other plugins, e.g. for annotations suppressing warnings - that may help against the 1st bullet.
As an extra, I'd like to mention that not generating Java classes from an XML schema but writing them by hand (plus annotations, of course) is a plausible alternative, provided the XML schema isn't too complex. It may have other advantages besides solving #1 and #2.

Unmarshalling XML Fragments at multiple levels of a hierarchy with JAXB

I need to un-marshal multiple objects out of an XML Structure that looks like this:
<Control>
<TotalCompanies>2</TotalCompanies>
<TotalSales>100</TotalSales>
<Company>
<Name>ACME Ca</Name>
<TotalSales>70</TotalSales>
<TotalSalesPeople>2</TotalSalesPeople>
<SalesPeople>
<SalesPerson>
<Name>John</Name>
<Sales>40</Sales>
</SalesPerson>
<SalesPerson>
<Name>Joe</Name>
<Sales>30</Sales>
</SalesPerson>
</SalesPeople>
</Company>
<Company>
<Name>ACME Va</Name>
<TotalSales>30</TotalSales>
<TotalSalesPeople>1</TotalSalesPeople>
<SalesPeople>
<SalesPerson>
<Name>Janet</Name>
<Sales>30</Sales>
</SalesPerson>
</SalesPeople>
</Company>
</Control>
I need to be able to separately unmarshall a Control object that contains just the totals and not it's children, and similarly I need to do the same thing at the other levels of the hierarchy. So ideally, my beans would look something like this:
class Control {
int totalCompanies;
int totalSales;
}
class Company {
String name;
int totalSales;
int totalSalesPeople;
}
class SalesPerson {
String name;
int sales;
}
I'm doing this in the context of Spring Batch, but I am pretty sure that doesn't matter. If I restructure the XML some, then I can get it to work (I am pretty sure I won't be allowed to restructure the XML, though). That is, if the objects aren't nested, then it is fine. Similarly, I can get all the SalesPeople out pretty easily.
I can also get the entire tree as an object, and that might work in some cases. However, the real incoming file could be larger than the available memory, so that won't work in practice.
Is there any way to get JAXB, or some other out-of-the-box unmarshaller to do this or do I just need to roll my own based on SAX or STAX?
EDIT:
The system is using Spring Batch to read in large incoming files. The files are not as described above (domain is different), but the structure is the same. The architectural direction is to attempt to use out-of-the-box readers (StaxEventItemReader, e.g.) and unmarshallers (Jaxb2Marshaller, e.g.).
The system will operate in environments where we cannot absolutely guarantee there is sufficient memory to hold the entire file in memory.
I have approaches (custom Stax reader/pre-processing the file/requesting an XSD change) that work, but I wanted to make sure I wasn't missing a feature in the standard reader / unmarshaller implementations that could make this work easily out of the box.

C# get line number with extracted comment block

I want to extract the line number with the comment line from the source code with the codes below
foreach (Match match in re.Matches(FileText))
{
StackFrame CallStack = new StackFrame(0, true);
sb.Append(match.ToString() + CallStack.GetFileLineNumber() + System.Environment.NewLine);
}
return sb.ToString();
How do I capture each comment with line number e.g. /* Test Comment */ Line: 50
There is no "docs" in the assembly. The docs are generated in form of an XML file and should be distributed along with the assembly. From the call stack, you can get the names of the classes and methods. If you know where the XML files are sitting, then you can for example refer to http://jimblackler.net/blog/?p=49 they have the reading of docs per a method mostly done.
However, this is NOT the way the .Net does such things. For having extra compile-time annotations that will survive the compilation and that will be present during the runtime, the .Net allows you to introduce CustomAttributes that can be applied over methods, classes, fields, properties, enums, (....). See that link, look at the example of "Author" attribute and consider changing the magic comment into an attribute. This is the normal way of doing it in whole .Net, not only C#.
Once you read the MethodInfo from the callstack, you can invoke GetCustomAttributes on it, and read the data that you have written in them, see http://msdn.microsoft.com/en-us/library/system.reflection.methodinfo.aspx

Resources