Return to site

Opf Files

broken image


An EPUB 3 archive has a skeleton – the files which are mandatory to structure the content – and some flesh – the ebook content.

  • Apr 15, 2018 Step 3: Create an OPF file ( your-book.opf) The Open Packaging Format (OPF) file is an XML file that contains metadata describing your book. Copy the example OPF content below, and save it in the same directory as your HTML content file. The part of the file name must be the same as your HTML content file. What is an OPF File?
  • There are five parts of the OPF file, all of which are required for validation. The XML doc type is the header of the document. The first line of the doc type tells the renderer engine that is reading the file that it is in XML, and the second tells it where to go to find the rules of this particular file type.
  • The.opf file extension can be classified into a variety of file types. First, it can be an Open Packaging Format File or files formatted with the standard set by Open Packaging.

As explained in the introduction, the flesh is a set of (X)HTML 5 and CSS resources, javascript code, images, audio and video assets. Let’s focus on the skeleton then, which is specified as the EPUB Open Container Format (OCF).

Mime type

OPF file is an Open Electronic Package Ebook. OPF is an XML-based e-book format created by E-Book Systems; it has been superseded by the EPUB electronic publication standard. What is an OPF file? The OPF file type is primarily associated with Flip Album File. How do you open OPF files? You need a suitable software like Flip Album File to open an OPF file. Without proper software you will receive a Windows message 'How do you want to open this file?' (Windows 10) or 'Windows cannot open this file' (Windows 7) or a similar Mac/iPhone/Android alert.

The first file you may spot in the zip archive is the mimetype file, which states that this archive is *really* an EPUB publication.

The content of this text file must be “application/epub+zip” and nothing else. This is how an EPUB reading system will assured he can process the ebook.

It is required that the mimetype file is the first file found in the zip archive. This may appear a weird constraint, but as a matter of fact this constraint originates in the Open Document file format which was the source of the EPUB file format. The rationale behind it is that if an application reads the first bytes of the archive, it will always find the same “magic numbers” (see here for OCF magic numbers). This can replace the detection of the format in case the file extension .epub is not reliable.

A practical issue with this requirement is that one cannot create a proper EPUB file with a simple zip tool: generic tools cannot guarantee that the mimetype file will be first in the archive.

Opf files opening

container.xml

Federal
Epub reader

container.xml

This small XML file, found in the mandatory META-INF directory, is a bootstrapping item. It simply contains the relative location of the .opf file (a.k.a. package document), which is the brain of the publication and will be described shortly.

If the content.opf file is for instance in the folder content in the archive, then the location will be content/content.opf.

Using this information, the reading system will be able to open the .opf file and know more about structure of the publication.

EPUB 3 supports multiple renditions of an EPUB publication. You may find for instance a fixed-layout rendition and a reflowable rendition packaged in the same EPUB file. In such a case, container.xml will reference several package documents usually placed in several directories.

Apart from container.xml, you may find other files in the META-INF directory, like signatures.xmlwhich holds digital signatures of the container and its contents, metadata.xml and manifest.xml which may contain information about the publication itself (i.e. the container; this is useful in the multiple renditions use case), or proprietary files like com.apple.ibooks.display-options.xml. Their presence is rather exceptional, therefore we won’t describe them in details here.

The .opf file, a.k.a package document

This XML file carries bibliographic and structural metadata about an EPUB publication (or an EPUB rendition), and is thus the primary source of information about how to process and display that publication.

In this file, the reading system will find:

Some metadata

i.e. information about the publication (or rendition) content. Diverse sets of metadata (e.g. Onyx) can be expressed as XML elements, from different schemes. The only required elements in EPUB 3.01 are title, identifier, language and modified, from the Dublin Core set. A fixed-layout publication must be tagged by a specific metadata item in this set. Other metadata can be expressed inline, using a generic meta element, or as an external resource via a link element.

A manifest

i.e. the exhaustive list of all publication (or rendition) resources, including (x)html text chapters, images and videos or audio files, fonts, scripts, css files. The reading system will only process the files it finds in the list, and knows from their media type (a.k.a. mime type) that it can process them. From the properties declared on each item, the reading system will also know its type, e.g. if the file corresponds to a navigation document, cover image, vector graphic or a script. If the reading system cannot process the resource because its format is a bit exotic, it will find here the fallback resource he can process instead.

A spine

As its name indicates, this is a “backbone” where the reading system finds the default reading order of all publication “chapters”. As these sections of a publication may not really represent book chapters, each item of the sequence is called … a spine item. Each spine item contains a reference to a manifest item. Spine items can be declared as “non linear”, meaning they are not displayed in the normal flow, but can be reached from another spine item as supplementary content (e.g. popup content).

Below is a simple package document with metadata, a manifest and a spine.

The legacy .ncx file

A quick word about this file, sometime found in EPUB 3 containers: this is the deprecated EPUB 2 way of declaring a navigation document. Some EPUB 3 authors still prefer to include it so that EPUB 2 reading systems can process the publication. An EPUB 3 reading system will not access it, so we won’t bother describing its content.

Metadata.opf Files

A simple diagram to summarize this

Here is a diagram illustrating the complete structure of an EPUB file.

In this example, we find the .opf file and all content files in a directory named “OEBPS”. Why such a strange name? This is simply historical: Open eBook Publication Structure was the name of a legacy ebook format which has been superseded by the EPUB format. The acronym found its way in the publishing vocabulary and is still used by some EPUB authoring tools when they structure an EPUB publication, so that the .opf file and content files are not stored in the root of the EPUB archive (something which would still be harmless by the way).

Readium

Opf Files

The Readium projects provide rock-solid, performant building blocks and applications for processing EPUB3 publications. EDRLab is participating to the Readium codebase maintenance and evolution.

Accessibility

Support for people wih print disabilities is a key part of our mission. We collaborate with European publishers and major inclusing organizations on the creation of a born-accessible ebook market. We also make sure that Readium projects take into account the assistive technologies used by visually-impaired users.





broken image