API¶
- class EPUB(
- file: IO[bytes] | str | Path | None = None,
- generator_tag: bool = True,
Bases:
objectThe main class for reading, writing, and manipulating EPUB files.
Typical usage example:
>>> with EPUB() as book: ... book.metadata.title = "New EPUB" ... print(book) ... book.write(folder / "new.epub") EPUB(title='New EPUB')
- Parameters:
file – A file-like object, path to a zip file, or path to a folder representing an ‘unzipped’ EPUB. If None, creates a new empty EPUB.
generator_tag – Whether to add a generator tag to the metadata, registering that epublib was used to create or edit the EPUB. Defaults to True
- Raises:
NotEPUBError – If the provided file is neither a zip file, a file object or a valid path to a zip file or folder.
- property manifest: BookManifest¶
The manifest of this EPUB. An alias to package_document.manifest.
>>> with EPUB(sample) as book: ... item = book.manifest.items[0] ... item = book.manifest["Text/chapter1.xhtml"] ... item = book.manifest.get(EPUBId("chapter1")) ... print(item.id, item.filename, item.media_type) chapter1 Text/chapter1.xhtml application/xhtml+xml
- property metadata: BookMetadata¶
The metadata of this EPUB. An alias to package_document.metadata.
>>> book = EPUB(sample) >>> book.title = "A sample EPUB" >>> book.title 'A sample EPUB' >>> book.language = "es" >>> book.language 'es' >>> from datetime import datetime >>> book.metadata.modified = datetime(2025, 7, 10, 0, 0, 0) >>> book.metadata.get("dcterms:modified") GenericMetadataItem(name='dcterms:modified', value='2025-07-10...', ...) >>> book.close()
- property spine: BookSpine¶
The spine of this EPUB. An alias to package_document.spine.
>>> with EPUB(sample) as book: ... item = book.spine.items[0] ... item = book.spine["chapter1"] ... print(item.idref) chapter1
- property guide: BookGuide | None¶
The guide (legacy feature of EPUB2 files) of this EPUB. An alias to package_document.spine.
The navgation document of this EPUB. Equivalent to book.resources.get(book.manifest.nav).
- Raises:
EPUBError – If no navigation document is found in the EPUB.
- write_to_sink(out: SinkProtocol) None¶
Write this epub to a sink (any object implementing the SinkProtocol).
- Parameters:
out – The sink to write the EPUB to.
- Raises:
ClosedEPUBError – If the EPUB is already closed.
- write(output_file: IO[bytes] | str | Path) None¶
Write this epub to a file or to a file object.
- Parameters:
output_file – The path to the output file, or a file-like object to write the EPUB to.
- Raises:
ClosedEPUBError – If the EPUB is already closed.
- write_to_folder(folder: str | Path) None¶
Write this epub to a folder (creating an ‘unzipped’ EPUB).
- Parameters:
output_file – The path to the output file, or a file-like object to write the EPUB to.
- Raises:
ClosedEPUBError – If the EPUB is already closed.
- property documents: ContentDocumentManager¶
Manage all content documents (XHTML or SVG) in this EPUB.
- property images: ImagesManager¶
Manage all image resources in this EPUB.
- property scripts: ScriptsManager¶
Manage all JavaScript resources in this EPUB.
- property styles: StylesManager¶
Manage all CSS resources in this EPUB.
- property fonts: FontsManager¶
Manage all font resources in this EPUB.
- property audios: AudioManager¶
Manage all font resources in this EPUB.
- property videos: VideoManager¶
Manage all font resources in this EPUB.
- property publication_resources: PublicationResourceManager¶
Manage all publication resources in this EPUB.
- rename_id( ) None¶
Rename a manifest identifier. Look for references for updating it in the spine items, the cover-image metadata tag and the toc attribute of the spine element.
Caution is advised, as there may be other references to the old id that will become outdated.
- Parameters:
old – The old identifier, or the resource whose identifier to rename.
new – The new identifier.
- Raises:
EPUBError – If the old identifier does not exist, or if the new identifier already exists.
- get_spine_item(
- resource: Resource | ResourceIdentifier,
Get spine item associated with a resource or filename.
>>> with EPUB(sample) as book: ... item = book.get_spine_item("Text/chapter1.xhtml") ... print(item.idref) chapter1
- Parameters:
resource – The resource, its filename, its id or its manifest item to look for in the spine.
- Retrns:
The spine item if found, None otherwise.
- get_spine_position(
- resource: Resource | ResourceIdentifier,
Get the 0-indexed position of a resource in the spine.
>>> with EPUB(sample) as book: ... position = book.get_spine_position("Text/chapter1.xhtml") ... print(position) 0
- Parameters:
resource – The resource (or its filename, its id or its manifest item) the position of which is to be detected in the spine.
- Returns:
The 0-indexed position of the resource in the spine, or None if the resource is not in the spine.
- update_manifest_properties() None¶
Update manifest properties by detecting them from the resources See https://www.w3.org/TR/epub-33/#sec-item-resource-properties.
- reset_toc(
- targets_selector: str | None = 'h1,
- h2,
- h3,
- h4,
- h5,
- h6',
- include_filenames: bool = False,
- spine_only: bool = True,
- reset_ncx: bool | None = None,
- resource_class: type[~epublib.resources.Resource] = <class 'epublib.resources.ContentDocument'>,
- title: str | None = None,
Reset the table of contents in the navigation document by detecting targets in content documents. May replace any existing TOC.
- Parameters:
targets_selector – A CSS selector to detect targets in content documents. If None, all headings will be used.
include_filenames – Whether to include filenames in the TOC.
spine_only – Whether to only include documents in the spine. This ensures the TOC is in reading order.
reset_ncx – Whether to also reset the NCX file. If None (default), will reset the NCX only if an NCX already exists.
resource_class – The class of resources to consider when searching for references for the TOC. Defaults to ContentDocument, which includes XHTML and SVG documents.
title – The title to use for the TOC. If None, will keep the existing title if any, or use leave empty if none. Caution is advised, as an empty TOC title is not conformant with the EPUB spec.
- Raises:
EPUBError – If reset_ncx is True but the book has no NCX file.
- reset_page_list(
- id_format: str = 'page_{page}',
- label_format: str = '{page}',
- pagebreak_selector: str = '[role="doc-pagebreak"], [epub|type="pagebreak"]',
- reset_ncx: bool | None = None,
Reset the page list in the navigation document by detecting pagebreaks in content documents. Will replace any existing page list.
- Parameters:
id_format – A format string to generate the id of each pagebreak. The string must contain a ‘{page}’ placeholder, which will be replaced with the page number (starting at 1).
label_format – A format string to generate the label of each pagebreak. The string must contain a ‘{page}’ placeholder, which will be replaced with the page number (starting at 1).
pagebreak_selector – A CSS selector to detect pagebreaks in content documents. Defaults to ‘[role=”doc-pagebreak”], [epub|type=”pagebreak”]’.
reset_ncx – Whether to also reset the NCX file. If None (default), will reset the NCX only if an NCX file already exists.
- Raises:
EPUBError – If reset_ncx is True but the book has no NCX file.
- create_page_list(
- id_format: str = 'page_{page}',
- label_format: str = '{page}',
- pagebreak_selector: str = '[role="doc-pagebreak"], [epub|type="pagebreak"]',
- reset_ncx: bool | None = None,
Create new page list in the navigation document by detecting pagebreaks in content documents. Will raise an error if a page list already exists.
- Parameters:
id_format – A format string to generate the id of each pagebreak. The string must contain a ‘{page}’ placeholder, which will be replaced with the page number (starting at 1).
label_format – A format string to generate the label of each pagebreak. The string must contain a ‘{page}’ placeholder, which will be replaced with the page number (starting at 1).
pagebreak_selector – A CSS selector to detect pagebreaks in content documents. Defaults to ‘[role=”doc-pagebreak”], [epub|type=”pagebreak”]’.
reset_ncx – Whether to also reset the NCX file. If None (default), will reset the NCX only if an NCX file already exists.
- Raises:
EPUBError – If a page list already exists.
- reset_landmarks(
- include_toc: bool = True,
- targets_selector: str | None = None,
- default_epub_type: str = 'chapter',
Reset the landmarks in the navigation document by detecting targets in content documents, and optionally including the TOC. Will replace existing landmarks.
- Parameters:
include_toc – Whether to include the TOC in the landmarks.
targets_selector – A CSS selector to detect targets in resources.
- create_landmarks(
- include_toc: bool = True,
- targets_selector: str | None = None,
- default_epub_type: str = 'chapter',
Create landmarks in the navigation document by detecting targets in content documents, and optionally including the TOC. Will raise error if landmarks already exist.
- Parameters:
include_toc – Whether to include the TOC in the landmarks.
targets_selector – A CSS selector to detect targets in resources.
- generate_ncx(
- filename: str | Path | None = None,
Generate a new NCX file based on the book metadata and navigation document, and add it to the EPUB. Will raise an error if an NCX file already exists.
- Parameters:
filename – The filename to use for the NCX file. If None, will use ‘toc.ncx’ in the same directory as the package document.
- Raises:
EPUBError – If an NCX file already exists (try reset_ncx instead), or if the book metadata does not contain a title.
- reset_ncx() NCXFile¶
Reset the contents of the NCX file based on the book metadata and navigation document. If no NCX file exists, will generate a new one named ‘toc.ncx’ in the same directory as the package document.
- Raises:
EPUBError – If the book metadata does not contain a title.
- select(
- selector: str,
Select elements matching a CSS selector in all content documents.
- Parameters:
selector – A CSS selector to match elements.
- Yields:
- Tuples (resource, tag), where tag corresponds to the the mathced
element and resource is the content document containing the tag.
- select_one(
- selector: str,
Select elements matching a CSS selector in all content documents.
- Parameters:
selector – A CSS selector to match elements.
- Returns:
- A tuple (resource, tag), where tag corresponds to the the mathced
element and resource is the content document containing the tag. If no element matches the selector, returns (None, None).
- select_tag(
- selector: str,
Select elements matching a CSS selector in all content documents.
- Parameters:
selector – A CSS selector to match elements.
- Yields:
The matched tags.
- select_one_tag(selector: str) Tag | None¶
Select elements matching a CSS selector in all content documents.
- Parameters:
selector – A CSS selector to match elements.
- Returns:
The first matched tag. If no element matches the selector, returns None.
- property base_dir: Path¶
Returns the base directory for the resources in this EPUB. This is an holistic property, and the spec does not define it. There may be more than one base directory in an EPUB. This is the one containing the package document.
- add_generator_tag() None¶
Add a generator meta tag to the metadata, containing the epublib version used to edit or generate this EPUB. If such tag already exists and version is up to date, does nothing.
- remove_generator_tag() None¶
Remove the epublib generator tag of the metadata, if any.
- close() None¶
Close the EPUB and its underlying resources.
- property closed: bool¶
Check if the EPUB is closed.
Subpackages¶
- epublib.nav package
- epublib.ncx package
- epublib.package package
- epublib.resources package
Submodules¶
- epublib.create module
- epublib.css module
- epublib.exceptions module
- epublib.identifier module
- epublib.media_type module
- epublib.parse module
- epublib.soup module
- epublib.source module
- epublib.types module
- epublib.util module
normalize_path()get_absolute_href()get_relative_href()parse_int()tag_ids()new_id()new_id_in_tag()split_fragment()strip_fragment()get_fragment()slugify()ResolutionTypeattr_to_str()get_actual_tag_position()get_attributes()datetime_to_str()get_epublib_version()strip_type_parameters()remove_optional_type()
- epublib.xml_element module