epublib.xml_element module

class SyncType(*values)

Bases: Enum

class XMLAttribute(
init_name: str | None = None,
sync: SyncType = SyncType.ATTR,
get: str | Callable[[Tag], Tag | None] | None = None,
create: str | Callable[[BeautifulSoup, Tag], Tag] | None = None,
prefix: str = '',
)

Bases: object

Represents the relation between the attribute of a XML tag and its representation in an object.

This class is used as metadata for dataclass fields, in combination with typing.Annotated.

>>> @dataclass(kw_only=True)
... class MyElement(XMLElement):
...     my_attr: Annotated[str, XMLAttribute(init_name="my-attr", sync=SyncType.ATTR)] = ""
Parameters:
  • init_name – Name of the attribute in the XML. If None, the name of the dataclass field is used, with underscores replaced by hyphens.

  • sync – How to sync this attribute with the XML tag. One of: - SyncType.ATTR: Sync with a tag attribute - SyncType.STRING: Sync with the tag string - SyncType.NAME: Sync with the tag name

  • get – A tag name or callable to get the relevant tag from the parent tag. If None, the parent tag is used.

  • create – A tag name or callable to create the relevant tag if it does not exist. If None, no tag is created.

  • prefix – The namespace prefix to use when creating a new tag. Only used if create is SyncType.Name.

class BaseElement(soup: S, tag: ~bs4.element.Tag = <sentinel/>)

Bases: ABC, Generic

Abstract base class for an XML element. Responsible for creating the tag if it does not exist.

Parameters:
  • soup – The BeautifulSoup this object is part of.

  • tag – The existing tag to use. If not provided, a new tag is created.

create_tag() None

Create a new tag for this element.

get_tag_name() str

Return the tag name for this element.

class XMLElement(soup: S, tag: ~bs4.element.Tag = <sentinel/>)

Bases: BaseElement, ABC, Generic

Abstract base class for an XML element. Responsible for syncing object and tag, and exposing important tag attributes as convenient instance attributes.

This class uses dataclass fields annotated with typing.Annotated and XMLAttribute metadata to determine which attributes to sync.

create_tag() None

Create a new tag for this element.

update_tag(name: str, value: AttributeValue | None) None

Update the tag to reflect the current value of the attribute.

Parameters:
  • name – The name of the attribute to update.

  • value – The current value of the attribute.

classmethod from_tag(
soup: S,
tag: Tag,
**kwargs: AttributeValue,
) Self

Create this XMLElement from an existing tag.

Any attributes that are not represented in the tag are passed as keyword arguments.

Parameters:
  • soup – The BeautifulSoup this element is part of.

  • tag – The existing tag to use.

  • **kwargs – Any attributes that are not represented in the tag.

Returns:

An instance of this XMLElement.

attribute_to_str(name: str, value: AttributeValue) str

Convert an attribute of this object to a string suitable for XML serialization.

Parameters:
  • name – The name of the attribute to convert.

  • value – The value of the attribute to convert.

Returns:

The string representation of the attribute.

classmethod str_to_attribute(
value: str | None,
typ: type[AttributeValue],
) AttributeValue | None

Convert a string from an XML attribute to an attribute of this object.

Parameters:
  • value – The string value to convert.

  • typ – The type to convert the string to.

Returns:

An instance of the specified type.

class HrefElement(
soup: S,
tag: ~bs4.element.Tag = <sentinel/>,
*,
filename: str,
href: ~typing.Annotated[str,
~epublib.xml_element.XMLAttribute(init_name=None,
sync=<SyncType.ATTR: 1>,
get=None,
create=None,
prefix=)] = '',
own_filename: str,
)

Bases: XMLElement, ABC, Generic

XMLElement with a reference to a file. This class handles the logic of syncing the ‘href’ (relative filename) and ‘filename’ (absolute filename).

Parameters:
  • soup – The BeautifulSoup object this element belongs to.

  • filename – The absolute filename this element refers to. If not provided, it is derived from href and own_filename. One of href or filename must be provided.

  • href – The relative filename this element refers to. If not provided, it is derived from filename and own_filename. On of href or filename must be provided.

  • own_filename – The absolute filename of the file this element is part of.

classmethod from_tag(
soup: S,
tag: Tag,
own_filename: str,
**kwargs: AttributeValue,
) Self

Create this XMLElement from an existing tag.

Any attributes that are not represented in the tag are passed as keyword arguments.

Parameters:
  • soup – The BeautifulSoup this element is part of.

  • tag – The existing tag to use.

  • **kwargs – Any attributes that are not represented in the tag.

Returns:

An instance of this XMLElement.

class XMLChildProtocol(*args, **kwargs)

Bases: Protocol

property pk: str

A primary key that uniquely identifies this element. Used by parent to find elements.

class XMLParent(soup: S, tag: ~bs4.element.Tag = <sentinel/>)

Bases: BaseElement, ABC, Generic

Abstract base class for an XML element that contains other XML elements.

Parameters:
  • soup – The BeautifulSoup this object is part of.

  • tag – The existing tag to use. If not provided, a new tag is created.

get_child_tags() Iterable[Tag]

Return the tags of the children of this element.

parse_items() Sequence

Parse child items from self.tag and return their representations in a list.

Returns:

A sequence of child items.

property parent_tag: Tag | None

Return the parent tag of this element (i.e. the one whose direct descendants are the children of this element) or None if it does not exist.

create_parent_tag() Tag

Return the parent tag of this element (i.e. the one whose direct descendants are the children of this element), creating it if it does not exist.

add_item(item: I) I

Add an item to this element.

Parameters:

item – The item to add.

Returns:

The added item.

insert_item(position: int | None, item: I) I

Insert an item at the specified position.

Parameters:
  • position – The position to insert the item at. If None, the item is added at the end.

  • item – The item to insert.

Returns:

The inserted item.

remove_item(item: I) None

Remove an item from this element.

Parameters:

item – The item to remove.

insert(
position: int | None,
**kwargs: AttributeValue | None,
) I

Create and insert a child item at the specified position.

Parameters:
  • position – The position to insert the item at. If None, the item is added at the end.

  • **kwargs – Attributes to pass to the child item constructor.

Returns:

The newly created item.

add(**kwargs: AttributeValue | None) I

Create and add a child item.

Parameters:

**kwargs – Attributes to pass to the child item constructor.

Returns:

The newly created item.

remove(pk: str) None

Remove an item from this element, if it exists.

Parameters:

pk – The primary key of the item to remove.

get_new_id(
base: Path | str | EPUBId,
) EPUBId

Generate a new unique ID for this element based on the given base.

class HrefChildProtocol(*args, **kwargs)

Bases: XMLChildProtocol, Protocol

class ParentOfHref(
soup: S,
tag: ~bs4.element.Tag = <sentinel/>,
*,
own_filename: str,
)

Bases: XMLParent, ABC, Generic

An XML element that contains other XML elements that have hrefs.

remove(
filename: str | Path,
ignore_fragment: bool = True,
) None

Remove an item from this element, if it exists.

Parameters:

pk – The primary key of the item to remove.

class ParentProtocol(*args, **kwargs)

Bases: Protocol

class RecursiveChildProtocol(*args, **kwargs)

Bases: XMLChildProtocol, Protocol

class RecursiveParent(soup: S, tag: ~bs4.element.Tag = <sentinel/>)

Bases: XMLParent, ABC, Generic

An XML element whose child type is recursive (can contain itself as elements).

class RecursiveHrefChildProtocol(*args, **kwargs)

Bases: RecursiveChildProtocol, HrefChildProtocol, Protocol

An XML element whose child type is recursive and has hrefs.

class HrefRoot(
soup: S,
tag: ~bs4.element.Tag = <sentinel/>,
*,
own_filename: str,
)

Bases: RecursiveParent, ParentOfHref, ABC, Generic

Root of a tree of HrefElements.

items_referencing(
filename: str,
ignore_fragment: bool = False,
) Generator[Self | I]

Yield all items in this element that reference the given filename.

Parameters:
  • filename – The filename to search for.

  • ignore_fragment – Whether to ignore the fragment part of the searched filenames.

Yields:

Items that reference the given filename.

property nodes: Generator[I | Self]

Yields all nodes in the tree (not including the root).

remove_nodes(
filename: Path | str,
ignore_fragments: bool = True,
) None

Remove all nodes in the tree that reference the given filename. If a parent node is removed but not its children, they are added to the parent of the removed node.

Parameters:
  • filename – The filename to search for.

  • ignore_fragments – Whether to ignore the fragment part of the searched filenames.

class HrefRecursiveElement(
soup: S,
tag: ~bs4.element.Tag = <sentinel/>,
*,
filename: str,
href: ~typing.Annotated[str,
~epublib.xml_element.XMLAttribute(init_name=None,
sync=<SyncType.ATTR: 1>,
get=None,
create=None,
prefix=)] = '',
own_filename: str,
parent: ~epublib.xml_element.ParentProtocol | None = None,
)

Bases: HrefRoot, HrefElement, ABC, Generic

Node of a tree of HrefElements.

property nodes: Generator[I | Self]

Yields all nodes in the tree.

items_referencing(
filename: str,
ignore_fragment: bool = False,
) Generator[Self | I]

Yield all items in this element (including the element itself) that reference the given filename.

Parameters:
  • filename – The filename to search for.

  • ignore_fragment – Whether to ignore the fragment part of the searched filenames.

Yields:

Items that reference the given filename.

add_item_after_self(item: I) I

Add an item after this one in the parent’s items.

Parameters:

item – The item to add.

Returns:

The added item.

Raises:

EPUBError – If this element has no parent, or if this element is not found in the parent’s items.

add_after_self(
**kwargs: AttributeValue | None,
) I

Create an item and add it after this one in the parent’s items.

Parameters:

**kwargs – Attributes to pass to the child item constructor.

Returns:

The newly created item.

Raises:

EPUBError – If this element has no parent, or if this element is not found in the parent’s items.