API Reference

Database Classes

TopicDatabase(…)

A hierarchical soft-clustering database for thematic search.

SoftClusterTree(…)

A hierarchical soft clustering structure storing inclusion strengths as uint8 sparse matrices (0-255, divide by 255 to recover floats).

Query Classes

RootQuery(…)

Entry point for all queries on a TopicDatabase.

TopicQuery(…)

A query object carrying a set of topic indices.

SampleQuery(…)

A query object carrying a set of document indices.

FuzzyQuery(…)

A query object operating on the entire soft cluster matrix at once.

Utility Functions

thematic_search.utilities.cluster_layers_from_leaf_matrix(cluster_tree: dict[tuple, list[tuple]], matrix: ndarray) list[ndarray]

Given a cluster_tree and a matrix of inclusion strengths for the 0th layer, compute a set of cluster layers by summing over the children of each node.

thematic_search.utilities.convert_tree(tree: dict, layers: dict[any, int] = {}) dict[tuple, list[tuple]]

Given an tree in the form of a dictionary containing vertex:[list of children], convert it to a cluster_tree.

Parameters:
  • tree (dict) – The tree to convert. Must have pairs vertex:[list of children]

  • layers (dict) – Custom layer assignment dictionary of the form vertex:layer. If not specified, leaves are assigned layer=0 and all other nodes are assigned layer=max_layer_of_children+1

thematic_search.utilities.print_subtree(node: tuple, cluster_tree: dict[tuple, list[tuple]], cluster_labels: dict[tuple, str] = {}, depth: int = 0)

Print the subtree of a node in a cluster_tree.

Parameters:
  • node (tuple,) – A key in cluster_tree to print the subtree of.

  • cluster_tree (dict[tuple, list[tuple]],) – The cluster tree to print,

  • cluster_labels (dict[tuple, str], (optional, default={})) – A dictionary containing display names for the

thematic_search.utilities.print_tree(cluster_tree: dict[tuple, list[tuple]], cluster_labels={})

Print the cluster tree to the console.

Parameters:
  • cluster_tree (dict[tuple, list[tuple]],) – The cluster tree to print,

  • cluster_labels (dict[tuple, str], (optional, default={})) – A dictionary containing display names for the

thematic_search.utilities.topic_uid(tup: tuple) str

Given a tuple (layer, cluster_number) returns its UID string.

thematic_search.utilities.uid_to_ints(s: str) tuple

Given a UID s, returns (layer, cluster_number)