epublib.util module

normalize_path(path: T) T

Normalize a path by removing ..’s

>>> normalize_path("a/b/../c")
'a/c'
Parameters:

path – The path to normalize.

Returns:

The normalized path.

get_absolute_href(origin_href: str | Path, href: T) T

Get absolute href from an origin and a relative href.

>>> get_absolute_href("OEBPS/chapter1.xhtml", "../images/pic.png")
'images/pic.png'
Parameters:
  • origin_href – The origin.

  • href – The relative href.

Returns:

The absolute href.

get_relative_href(
relative_to: str | Path,
absolute_href: T,
) T

Get relative href from an absolute href and a base href.

>>> get_relative_href("OEBPS/chapter1.xhtml", "OEBPS/images/pic.png")
'images/pic.png'
Parameters:
  • relative_to – The base href.

  • absolute_href – The absolute href.

Returns:

The relative href.

parse_int(value: str) int | None
parse_int(value: None) None

Lenient integer parsing

>>> parse_int("42")
42
>>> parse_int("  42  xxx")
42
>>> parse_int("xxx") is None
True
>>> parse_int(None) is None
True
Parameters:

value – The value to parse.

Returns:

The parsed integer or None if parsing failed.

tag_ids(tag: Tag) set[str]

Return set of all ids in a tag.

Parameters:

tag – The tag to search.

new_id(
base: str | Path,
gone: set[str],
add_to_gone: bool = True,
) EPUBId

Generate a new unique id based on base that is not yet used.

Parameters:
  • base – The base id to use.

  • gone – The set of already used ids.

  • add_to_gone – Whether to add the new id to gone.

Returns:

The new unique id.

Raises:

EPUBError – If no unique id could be generated.

new_id_in_tag(
base: str | Path,
tag: Tag,
) EPUBId

Generate a new unique id based on base that is not yet used in tag.

>>> new_id_in_tag("section", bs4.BeautifulSoup('<div id="section"></div>', "lxml"))
'section-1'
Parameters:
  • base – The base id to use.

  • tag – The tag to search for existing ids.

Returns:

The new unique id.

Raises:

EPUBError – If no unique id could be generated.

split_fragment(href: T) tuple[T, str | None]

Given an href, split it into the part before the fragment identifier (#…) and the fragment identifier itself.

>>> split_fragment("chapter1.xhtml#section2")
('chapter1.xhtml', 'section2')
>>> split_fragment("chapter1.xhtml")
('chapter1.xhtml', None)
>>> split_fragment("#")
('', '')
Parameters:

href – The href to split.

Returns:

A tuple (name, fragement) of the part before the fragment and the fragment itself (or None).

strip_fragment(href: T) T

Given an href, return the part before the fragment identifier (#…).

>>> strip_fragment("chapter1.xhtml#section2")
'chapter1.xhtml'
>>> strip_fragment("chapter1.xhtml")
'chapter1.xhtml'
>>> strip_fragment("#section2")
''
Parameters:

href – The href to strip.

Returns:

The part before the fragment.

get_fragment(href: str | Path) str | None

Given an href, return the fragment identifier (#…) or None if there is none.

>>> get_fragment("chapter1.xhtml#section2")
'section2'
>>> get_fragment("chapter1.xhtml") is None
True
>>> get_fragment("#")
''
Parameters:

href – The href to get the fragment from.

Returns:

The fragment or None.

slugify(value: str) str

Convert to ASCII. Convert spaces or repeated dashes to single dashes. Remove characters that aren’t alphanumerics, underscores, or hyphens. Convert to lowercase. Also strip leading and trailing whitespace, dashes, and underscores.

Adapted from django’s utils.text.

>>> slugify("Hello, World!")
'hello-world'
Parameters:

value – The value to slugify.

Returns:

The slugified value.

class ResolutionType(*values)

Bases: Enum

Strategy for converting a list of BeautifulSoup attribute values into a single string.

attr_to_str(
value: str | list[str],
resolution_type: ResolutionType = ResolutionType.JOIN,
) str
attr_to_str(
value: str | list[str] | None,
resolution_type: ResolutionType = ResolutionType.JOIN,
) str | None

Resolve a BeautifulSoup attribute value into a string.

Parameters:
  • value – The attribute value to resolve.

  • resolution_type – The strategy to use for resolving lists.

get_actual_tag_position(
tag: Tag,
position: int,
name: str | None = None,
) int

Given a tag tag and a position i, return the index ret of position-th child of tag (i.e. disregarding NavigableString children of tag). If name is given, consider only children that are tags with that name. If position is out of bounds, return position for last child + 1.

Parameters:
  • tag – The tag to search.

  • position – The position of the child to find.

  • name – The name of the child tags to consider.

Returns:

The index of the position-th child tag.

get_attributes(
parent: Tag,
attributes: Iterable[str],
) Generator[tuple[Tag, str, str]]

Given a parent tag and a list of attribute names, yield tuples (child, attr, value) where:

  • child is a child of tag containing some of the attributes;

  • attr is the name of the attribute; and

  • value is the value of that attribute in this child.

If a child has more than one attribute in the given attributes, yield one tuple per attribute.

Parameters:
  • parent – The parent.

  • attributes – The attribute names to look for.

datetime_to_str(dt: datetime) str

Convert a datetime to a string in ISO8601 format in utc timezone, using trailing Z instead of +00:00.

Parameters:

dt – The datetime to convert.

Returns:

The ISO8601 string representation of the datetime.

get_epublib_version() str | None

Returns the version of epublib if installed as a package. If not found, return None.

strip_type_parameters(typ: type[T]) type[T]

Strip parameters of type hints, making them suitable for usage with isinstance and issubclass checks. If the type is a Literal, return the types of those literals as a UnionType.

>>> strip_type_parameters(list[int])
<class 'list'>
>>> strip_type_parameters(typing.Literal["a", 1])
str | int
Parameters:

typ – The type to strip.

Returns:

The stripped type.

remove_optional_type(typ: T) T

Return the first type from list of types in a UnionType that is not NoneType. This make the union ready for usage as first argument of issubclass.

>>> remove_optional_type(int | None)
<class 'int'>
>>> remove_optional_type(None | str)
<class 'str'>
Parameters:

typ – The union type to remove None from options.

Returns:

The type without None as option.