Handle BOM in JAXB unmarshaller - jaxb

How can JAXB handle a string (of XML) that starts with a BOM? Is there a property to set or some configuration to skip the BOM when unmarshalling?Are there other JAXB implementations that could do that beside Oracle's implementation?

If you're referring to a UTF-8 file with a BOM, then you will have to skip it yourself. It's fairly simple to come up with an InputStream class that checks the first two bytes for a BOM and skips them and otherwise wraps another InputStream. This has been documented in this SO answer and open source code for this purpose is available from GitHub.
If you're referring to some other encoding like UTF-16, the JRE should read the BOM from a UTF-16 stream and discard it itself.

Related

How to parse succesfully a utf-16le encoded jsonfile with haxe on the php target?

I have some third party application that I have to use, it generates UTF16LE encoded json files.
When I put these manually on my server and try haxe generated php to parse these files, it refuses. It seems it can't detect and convert to the encoding haxe php accepts.
I don't know where to start. Converting it on the client is an impossibility, there are too many of such files and need too frequently be parsed. So I have to use php. It would be nice if haxe has a way to convert it to the encoding it accepts. I have tried RTFM, but I so far I havent found anything that says haxe can convert it. Before I start reinventing some second wheel, I rather make sure there isn't some obvious way to it with haxe.
I am using Haxe version 4.2.1+bf9ff69
What am I overlooking? Is haxe php able to solve this, or is going native php the only option?
== SOLVED ==
As these json files do not need any emoticon support or whatever characters for non-english language, my solution was to strip everything except basic printable ASCII characters.
import sys.io.File;
import php.Syntax;
// some function body
return Syntax.code('preg_replace( "/[^[:print:]]/", "",{0})',File.getContent(_path));
I couldn't share these file on the web, because of privacy concerns. Also I discovered these files had ... wait for it - double BOM's- hacked into it.The BOM detector I threw in reported the first BOM it found happening to be UTF16LE.
Enterprise spaghetti monster probably the reason. Thought I had seen it all, but with that, one probably can't never have seen it all. The wonders of human ingenuity.
Just a blunt strip instead of making my own ludicrous code to unfudge that stuff and justice served. Hurrah.

Instrumenting java class using soot

I'm trying to instrument byte code by converting it to jimple, adding code lines to jimple and compiling the jimple back to byte code. The problem is that I can't compile the jimple code back to byte code at all, is it possible?
What is the problem you are facing? Normally Soot automatically outputs the corresponding .class files.

How to get plain text files in Doxygen documentation?

I cannot include any text file in my Doxygen documentation. The only exception is a README.md file that I set as the main page.
In particular, I would like to see the Changelog.txt file in the documentation. I tried to add it explicitly in the INPUT field and in the FILE_PATTERNS field, without success. In the generated HTML documentation, I cannot find anything neither in the file list nor making a search.
The only trace is in Doxygen's log file:
Preprocessing C:/Source/Changelog.txt...
Parsing file C:/Source/Changelog.txt...
...
Parsing code for file Changelog.txt...
If I change the extension of the file from txt to md, the file is added to the documentation.
You need EXTENSION_MAPPING=txt=md otherwise the .txt file is handled as a C / C++ source file and it is missing comment signs, resulting in no output.
From the documentation:
EXTENSION_MAPPING Doxygen selects the parser to use depending on the
extension of the files it parses. With this tag you can assign which
parser to use for a given extension. Doxygen has a built-in mapping,
but you can override or extend it using this tag. The format is
ext=language, where ext is a file extension, and language is one of
the parsers supported by doxygen: IDL, Java, Javascript, C#, C, C++,
D, PHP, Objective-C, Python, Fortran (fixed format Fortran:
FortranFixed, free formatted Fortran: FortranFree, unknown formatted
Fortran: Fortran. In the later case the parser tries to guess whether
the code is fixed or free formatted code, this is the default for
Fortran type files), VHDL. For instance to make doxygen treat .inc
files as Fortran files (default is PHP), and .f files as C (default is
Fortran), use: inc=Fortran f=C. Note: For files without extension you
can use no_extension as a placeholder. Note that for custom extensions
you also need to set FILE_PATTERNS otherwise the files are not read by
doxygen.

System String to std string without marshal

Is there any way to convert System::String to std::string if I am not allowed to use msclr/marshal_cppstd.h?
The reason is that I need to use cryptlib.h in the same project and I get an error when I include both:
cryptlib.h and wincrypt.h can't both be used at the same time due to conflicting type names
The error is telling you the problem is cryplib.h and wincrypt.h cannot be #included into the same source file. I think the text of the message comes from whatever comes after the #error, which is down to the user - see here.
So, if you can't included them into the same source file, you could partition up your code differently and include them in different source files. Marshalling the string is not what the error message is complaining about.

complex jaxb scenario on generation of java objects

I have a project that does JAXB generation with framework.xsd. This generates a jar with the xsd and the jaxb objects and other classes around that stuff.
Then another group(two different groups) will be extending framework.xsd and subxmling using the schema extends stuff to extend objects in framework.xsd. They also want to generate jaxb objects BUT they want their SomeClass.java to obviously extend my Framework.java and don't want to end up with a whole new heirarchy.
Is this even possible?
How to do something like this? as the solution would need to
tell the jaxb compiler that the namespace yy is already generated so do not generate
tell the jaxb compiler that it needs to refer to the classes in the package zzzzzz or to look at the xjb file from the framework jar file or something.
Is this possible?
thanks,
Dean
You want to use an episode file : http://weblogs.java.net/blog/kohsuke/archive/2006/09/separate_compil.html when generating JAXB classes for your first schema.
$ xjc -episode framework.episode framework.xsd
Then the other group that consumes your framework.jar should:
1) import your schema in their own schema e.g.:
<xsd:import namespace="http://www.myorg.com/framework" schemaLocation="framework.xsd"/>
2) generate their JAXB classes
$ xjc extend.xsd -b framework.episode
(they'll need a copy of your xsd and episode file at xjc time, as well as the framework.jar in the classpath)
Note that according to the blog post above, you can also place the framework.episode file inside your jar (e.g. /META-INF/sun-jaxb.episode for JAXB RI at least - other JAXB impl may have other ways of accomplishing the same thing), so that the -b framework.episode option can be omitted. I personally find it a bit impractical, you still need the XSD anyway.

Resources