This package provides the core SAX APIs. Some SAX1 APIs are deprecated to encourage integration of namespace-awareness into designs of new applications and into maintenance of existing infrastructure.
See http://www.saxproject.org for more information about SAX.
One of the essential characteristics of SAX2 is that it added
feature flags which can be used to examine and perhaps modify
parser modes, in particular modes such as validation.
Since features are identified by (absolute) URIs, anyone
can define such features.
Currently defined standard feature URIs have the prefix
http://xml.org/sax/features/
before an identifier such as
validation
. Turn features on or off using
setFeature. Those standard identifiers are:
Feature ID | Default | Description |
---|---|---|
external-general-entities | unspecified | Reports whether this parser processes external general entities; always true if validating |
external-parameter-entities | unspecified | Reports whether this parser processes external parameter entities; always true if validating |
is-standalone | none | May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified the "standalone" flag in its XML declaration, and otherwise is false. |
lexical-handler/parameter-entities | unspecified | true indicates that the LexicalHandler will report the beginning and end of parameter entities |
namespaces | true | true indicates namespace URIs and unprefixed local names for element and attribute names will be available |
namespace-prefixes | false | true indicates XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available |
resolve-dtd-uris | true | A value of "true" indicates that system IDs in declarations will
be absolutized (relative to their base URIs) before reporting.
(That is the default behavior for all SAX2 XML parsers.)
A value of "false" indicates those IDs will not be absolutized;
parsers will provide the base URI from
Locator.getSystemId().
This applies to system IDs passed in
|
string-interning | unspecified | true if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). |
use-attributes2 | unspecified | Returns true if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. That interface exposes additional DTD-related information, such as whether the attribute was specified in the source text rather than defaulted. |
use-locator2 | unspecified | Returns true if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. That interface exposes additional entity information, such as the character encoding and XML version used. |
use-entity-resolver2 | true (when recognized) | Returns true if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. Returns false to indicate that those methods will not be used. |
validation | unspecified | Controls whether the parser is reporting all validity errors; if true, all external entities will be read. |
xmlns-uris | false | Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. By default, SAX2 conforms to the original "Namespaces in XML" Recommendation, which explicitly states that such attributes are not in any namespace. Setting this optional flag to true makes the SAX2 events conform to a later backwards-incompatible revision of that recommendation, placing those attributes in a namespace. |
Support for the default values of the namespaces and namespace-prefixes properties is required. Support for any other feature flags is entirely optional.
For default values not specified by SAX2, each XMLReader implementation specifies its default, or may choose not to expose the feature flag. Unless otherwise specified here, implementations may support changing current values of these standard feature flags, but not while parsing.
For parser interface characteristics that are described
as objects, a separate namespace is defined. The
objects in this namespace are again identified by URI, and
the standard property URIs have the prefix
http://xml.org/sax/properties/
before an identifier such as
lexical-handler
or
dom-node
. Manage those properties using
setProperty(). Those identifiers are:
Property ID | Description |
---|---|
declaration-handler | Used to see most DTD declarations except those treated as lexical ("document element name is ...") or which are mandatory for all SAX parsers (DTDHandler). The Object must implement org.xml.sax.ext.DeclHandler. |
dom-node | For "DOM Walker" style parsers, which ignore their parser.parse() parameters, this is used to specify the DOM (sub)tree being walked by the parser. The Object must implement the org.w3c.dom.Node interface. |
lexical-handler | Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler. |
xml-string | Readable only during a parser callback, this exposes a TBS chunk of characters responsible for the current event. |
All of these standard properties are optional; XMLReader implementations need not support them.
SAX 2.1 defines a standard SAXParseException.getExceptionId() method to identify which kind of error is being reported. since any diagnostic message will vary between parsers. Those identifiers are URIs, which are used in much the same way that they are used for feature and property IDs. Systems can define nonstandard IDs when needed, by using a different base URI.
Moreover, for the XML (and related) standards relied on by the SAX
specification itself (including XML and
Namespaces in XML), SAX also standardizes the IDs used
to identify those errors.
The identifiers all start with the exception base URI
http://xml.org/sax/exception/
which is then combined with additional information describing
the error encountered.
Not all parsers will choose to provide all these IDs, but
those that provide any must only use the exception IDs defined by SAX.
Any errors defined by those specifications which are not yet
addressed by the SAX specification must not include any exception ID
(but see below, more identifiers can be defined).
Parsers that correctly use exception IDs thus allow application software to reason, in a parser-independent manner, about all the basic XML errors that may be reported by an XMLReader. For example, they could assemble (and translate) catalogs of messages that make more sense to their users, because they can use application context to supplement a parser's textually oriented diagnostic. In some cases they can write code that uses knowledge of the errors to decide how to proceed most effectively. For example, some validity errors might be of no concern, while others might need to be treated as fatal.
The core SAX identifiers are described here, and a more current version might be available through http://www.saxproject.org,. Parser writers should work with the SAX project to define standard ID for any error cases that are identified in the relevant specifications, but for which no IDs are defined here.
These IDs are derived from the current
XML 1.0 (2nd edition) recommendation.
The IDs start with the exception base URI, and append to that
xml/
and then
an additional string that provides more specific identification of
the rule being violated.
Those additional strings are defined as follows:
rule-
and include the grammar rule number.
For example, a violation of rule 42 (end tag syntax) would
be reported using the additional string rule-42
.
(These are all fatal errors.)
wfc-
and append the XML ID of the
WFC being violated, as found in the source to the XML Recommendation.
(Any leading wfc- or wf- is first removed;
that source does not have a consistent naming convention for
these identifiers.)
For example, a violation of the PEs in Internal Subset
WFC would be reported using the additional string
wfc-PEInInternalSubset
.
(These are all fatal errors.)
vc-
and append the XML ID of the
VC being violated, as found in the source to the XML Recommendation.
(Any leading vc- is first removed
that source does not have a consistent naming convention for
these identifiers.)
For example, a violation of the Root Element Type VC would
be reported using the additional string vc-roottype
.
(These are all validity errors.)
So for example http://xml.org/sax/exception/xml/rule-66
indicates a violation of grammar rule 66 (a malformed character reference),
which is a fatal error.
And http://xml.org/sax/exception/xml/vc-one-id-per-el
indicates a violation of the One ID per Element Type
validity constraint, which is a validity error reflecting a
bug in the document's DTD.
That list is subject to evolution for several reasons, such as:
rule-*
codes.
Reports of other errors, such as violations of VCs or WFCs,
have no corresponding ambiguity issues.
However, potential changes to the XML source of the XML Recommendation which change those identifiers will not change that list. Such cases would be addressed by applying these rules to a base revision of that specification, and assigning IDs manually for such problem cases.
These IDs are derived from the current XML Namespaces recommendation.
The IDs start with the exception base URI, and append to that
xmlns/
and then
an additional string that provides more specific identification of
the rule being violated.
Those additional strings are defined as follows:
nsc-
and appending
the XML ID of the NSC being violated, as found in the source to
the Namespaces in XML Recommendation.
(Any leading nsc- is first removed.)
For example, a violation of the Prefix Declared NC would
be reported using the additional string nsc-NSDeclared
.
Because these are (necessarily) not defined by the XML Recommendation,
much less as as fatal errors, and they are clearly not just warnings,
SAX parsers report these at the same level of severity they use
for reporting validity errors.
qname
.
So for example
http://xml.org/sax/exception/xmlns/qname
might indicate an "localName" that begins with a digit,
which is a nonfatal error.
That list is subject to evolution, although it has substantially less need to evolve than the corresponding list for XML 1.0 errors, since there are only two NSCs and one other conformance constraint.