XML Schema Validation done right

Posted by Mike Haller on Wednesday, March 4. 2009 at 13:06 in Java
To validate an XML document against an XSD Schema, Java 5 offers standard functionality with the built-in JAXP API. To do that, first create the Schema object as follows:
URL resource = getClass().getResource( SCHEMA_FILE );
SchemaFactory factory = SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI );
Schema schema = factory.newSchema( resource );

The next step is to create a DocumentBuilderFactory to create the DOM document with enabled namespace awareness and disabled validation. We need to disable the DTD validation, as we use Schema instead. So, if you use setSchema() or JAXP_SCHEMA Attributes, you need to disable the validation using setValidating(false):
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware( true );
// TODO: Workaround for BEA WebLogic
builderFactory.setSchema( schema );
builderFactory.setIgnoringElementContentWhitespace( true );
builderFactory.setIgnoringComments( true );
builderFactory.setValidating( false ); // XML Schema validation explicitly in next step


Note: ignore the line with the TODO marker, i'll cover that later.
Now, with the builder factory finally configured, we can create a document builder and parse the XML document:

DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
ErrorHandler eh = new StrictErrorHandler();
documentBuilder.setErrorHandler( eh );
InputSource is = new InputSource( reader );
is.setPublicId( NAMESPACE_URI );
is.setSystemId( NAMESPACE_URI );
Document document = documentBuilder.parse( is );


This will work great in unit tests and in normal environments. However, in enterprise environments such as J2EE Application Servers, vendors often think they need to reimplement standard implementation. And sometimes, they don't cover the spec and just do it the wrong way. In case you're using BEA WebLogic for example, you need to use an alternative way of specifying the Schema source. If you don't, you will end up with something like this:
java.lang.IllegalArgumentException: Schema
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl.setAttribute(DocumentBuilderFactoryImpl.java:140)
at weblogic.xml.jaxp.WebLogicDocumentBuilderFactory.setAttribute(WebLogicDocumentBuilderFactory.java:146)
at weblogic.xml.jaxp.RegistryDocumentBuilder.setupDocumentBuilderFactory(RegistryDocumentBuilder.java:329)
at weblogic.xml.jaxp.RegistryDocumentBuilder.getDefaultDocumentBuilderFactory(RegistryDocumentBuilder.java:286)
at weblogic.xml.jaxp.RegistryDocumentBuilder.getDocumentBuilder(RegistryDocumentBuilder.java:222)
at weblogic.xml.jaxp.RegistryDocumentBuilder.parse(RegistryDocumentBuilder.java:147)


There's a workaround for this problem, and it's pretty easy. Replace the line marked with TODO and the setSchema() call with the following snippet:

try
{
   builderFactory.setAttribute( JAXP_SCHEMA_LANGUAGE, XMLConstants.W3C_XML_SCHEMA_NS_URI );
   builderFactory.setAttribute( JAXP_SCHEMA_SOURCE, new InputSource( resource.openStream() ) );
}
catch( IllegalArgumentException e )
{
   builderFactory.setSchema( schema );
}


Both constants are defined as (I could not find them anywhere like the XMLConstants):
private static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
private static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource";


Interestingly, the following code will work fine on WebLogic too:
URL resource = getClass().getResource( SCHEMA );
SchemaFactory factory = SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI );
Schema schema = factory.newSchema( resource );
Source source = new DOMSource( aDocument );
Validator validator = schema.newValidator();
DefaultHandler errorHandler = new StrictErrorHandler();
validator.setErrorHandler( errorHandler );
validator.validate( source );


The StrictErrorHandler() rethrows all Exceptions, instead of only printing messages to System.err


Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications
 
Submitted comments will be subject to moderation before being displayed.
 

About

My name is Mike Haller and I'm a software developer and architect at Innovations Software Technology in Germany. I love programming, playing games and reading books. I like good food, making photos and learning and mentoring about the craftsmanship of commercial software development.

Quicksearch