XML Schema Validation done right
Posted by Mike Haller
on Wednesday, March 4. 2009
at 13:06
in Java
To validate an XML document against an XSD Schema, Java 5 offers standard functionality with the built-in JAXP API. To do that, first create the Schema object as follows:URL resource = getClass().getResource( SCHEMA_FILE ); SchemaFactory factory = SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI ); Schema schema = factory.newSchema( resource );
The next step is to create a DocumentBuilderFactory to create the DOM document with enabled namespace awareness and disabled validation. We need to disable the DTD validation, as we use Schema instead. So, if you use setSchema() or JAXP_SCHEMA Attributes, you need to disable the validation using
setValidating(false):DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); builderFactory.setNamespaceAware( true ); // TODO: Workaround for BEA WebLogic builderFactory.setSchema( schema ); builderFactory.setIgnoringElementContentWhitespace( true ); builderFactory.setIgnoringComments( true ); builderFactory.setValidating( false ); // XML Schema validation explicitly in next step
Note: ignore the line with the TODO marker, i'll cover that later.
Now, with the builder factory finally configured, we can create a document builder and parse the XML document:
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder(); ErrorHandler eh = new StrictErrorHandler(); documentBuilder.setErrorHandler( eh ); InputSource is = new InputSource( reader ); is.setPublicId( NAMESPACE_URI ); is.setSystemId( NAMESPACE_URI ); Document document = documentBuilder.parse( is );
This will work great in unit tests and in normal environments. However, in enterprise environments such as J2EE Application Servers, vendors often think they need to reimplement standard implementation. And sometimes, they don't cover the spec and just do it the wrong way. In case you're using BEA WebLogic for example, you need to use an alternative way of specifying the Schema source. If you don't, you will end up with something like this:
java.lang.IllegalArgumentException: Schema at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl.setAttribute(DocumentBuilderFactoryImpl.java:140) at weblogic.xml.jaxp.WebLogicDocumentBuilderFactory.setAttribute(WebLogicDocumentBuilderFactory.java:146) at weblogic.xml.jaxp.RegistryDocumentBuilder.setupDocumentBuilderFactory(RegistryDocumentBuilder.java:329) at weblogic.xml.jaxp.RegistryDocumentBuilder.getDefaultDocumentBuilderFactory(RegistryDocumentBuilder.java:286) at weblogic.xml.jaxp.RegistryDocumentBuilder.getDocumentBuilder(RegistryDocumentBuilder.java:222) at weblogic.xml.jaxp.RegistryDocumentBuilder.parse(RegistryDocumentBuilder.java:147)
There's a workaround for this problem, and it's pretty easy. Replace the line marked with TODO and the setSchema() call with the following snippet:
try
{
builderFactory.setAttribute( JAXP_SCHEMA_LANGUAGE, XMLConstants.W3C_XML_SCHEMA_NS_URI );
builderFactory.setAttribute( JAXP_SCHEMA_SOURCE, new InputSource( resource.openStream() ) );
}
catch( IllegalArgumentException e )
{
builderFactory.setSchema( schema );
}
Both constants are defined as (I could not find them anywhere like the XMLConstants):
private static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; private static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource";
Interestingly, the following code will work fine on WebLogic too:
URL resource = getClass().getResource( SCHEMA ); SchemaFactory factory = SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI ); Schema schema = factory.newSchema( resource ); Source source = new DOMSource( aDocument ); Validator validator = schema.newValidator(); DefaultHandler errorHandler = new StrictErrorHandler(); validator.setErrorHandler( errorHandler ); validator.validate( source );
The StrictErrorHandler() rethrows all Exceptions, instead of only printing messages to System.err
