What is USFX?
What is different between USFM and USFX?
Why create yet another Bible file format?
What is the USFX philosophy?
How do USFM, USFX, and OSIS differ?
What are the USFX tags?
Where is the USFX schema?
USFX Schema documentation
Copyright and Permissions
Who should I contact with comments on USFX?
Unified Scripture Format XML (USFX) is derived from USFM, a file format for publishing and interchanging basic Scripture texts in multiple languages. It is XML, and has an XML schema. It is not intended to be used by itself for all aspects of Bible layout and publishing, but just for representation of the Scripture text itself, and a small amount of accompanying material. USFX exists primarily to bring the advantages of XML to USFM, with minimal additional changes. USFX can be quickly and easily converted to and from USFM. Like USFM, USFX is not designed to be used for general books, theological works, dictionaries, or any other type of data. USFX is not intended to totally replace other Bible formats, but since it is XML, it can be converted to other formats with tools like XSLT.
USFX is derived from USFM, but it is not USFM. It can, however, encode everything in a USFM document.
The most obvious difference between USFM and USFX is that USFM is based on backslash codes, but USFX is XML. Another difference is that elements representing things like words of Jesus Christ and Old Testament quotes in the New Testament must be properly closed with their own corresponding XML closing tags. Furthermore, these things may be nested in any way that can actually happen in Scripture, provided that the resulting XML is well-formed. Attributes like that are not assumed to be closed when a verse marker is encountered or when another style starts, unlike the way Paratext interprets USFM.
There is another related format called USX, that was devised after USFX. It is used by Paratext and by the ETEN Digital Bible Library. That format is similar, but not exactly the same. USX can encode everything that USFM can encode and vice versa. USX retains some of the quirks of USFM, like the need for special syntax for nested character styles. It lacks some of the extensions that USFM has, like the ref tag. Haiola can read any of USFM, USX, and USFX, and create both USFM and USFX output. Paratext can create USX from USFM. Therefore, properly-encoded Bibles in any one of these three formats can be converted to the other two formats with no loss of Scripture information or formatting as long as the USFX extensions not found in the other two formats are not used. (Even then, only the extended formatting is lost).
“Necessity is the mother of invention.” The need for USFX was first felt in the process of converting Scriptures from one “standard” format to another, and in editing some Scriptures in the process of Bible translation work. The first application is to embed a simple XML schema in a Microsoft Word 2003 (or later) XML document that is both easy to work with in Microsoft Word, and easy to convert back to USFM. There are a several other XML Bible schemas in existence that I'm aware of, but these don’t map very cleanly to USFM. USFX is very easy to convert to and from USFM, because it is based on USFM. It is also much simpler to embed in WordML than complex schemas like OSIS, saving me a great deal of time, and making some applications possible that would otherwise be impossible.
More on the philosophy of USFX and how it compares with other Scripture file formats is available here.
USFM
is an attempt to unify the many variations in usage of backslash (\)
codes to mark Scripture texts. It is not XML. There are many Scriptures
encoded in some form near to this format, mostly for minority
languages. USFM is preferable to the many similar, but slightly
different, implementations of SFM codes to represent Scriptures used by
different, because it is well thought-out, and because it is easier to
support one standard way to mark Scripture files with backslash codes
than many ways, thus making these files more portable among
organizations and branches and making software support for these files
easier, less error-prone, and less costly. USFM is currently the format
that I recommend for practical Bible translation work.
USFX is primarily an expression of what USFM
would look like as proper XML instead of a set of backslash codes.
Every USFM backslash code has a corresponding USFX XML tag. USFX is
more verbose than USFM, as that is the nature of XML, but it is easier
to parse with XML software libraries and XSL transformations. Because
USFX and USFM are so similar, it is very easy to convert between the
two.
OSIS is
another proposed XML Scripture interchange standard. The OSIS XML
schema and documentation view Scriptures differently than USFM and
USFX, so a fully automatic and lossless transformation between the two
is currently not possible. Not only are the metadata sections of OSIS
different, but to be fully compliant with the OSIS standard, some
punctuation in the Bible text itself must be converted to markup in
such a way that it cannot be recovered without language- and
style-dependent processing. This conversion is language-dependent and
labor-intensive. Because of differences in the kinds of things that are
encoded and the ways they are encoded, the current version of OSIS is
not suitable for many applications that USFM works well for. However,
an improper subset of OSIS that I call Modified OSIS comes close. It
is generated by Haiola software from
USFX source.
The alias http://eBible.org/usfx.xsd points to the latest schema, which is documented here.
The USFX Schema is copyright © 2005-2014 SIL International, EBT, eBible.org and Michael Paul Johnson. It is released under the Gnu Lesser Public License or the Common Public License, as explained in LICENSING.txt.
Comments on USFX should be directed to Kahunapule Michael Johnson. You may use his secure web contact form or standard web contact form.