Saving and Loading

A thematic_search.TopicDatabase object supports saving and loading from disk. Suppose topicdb is a TopicDatabase. Then there are two options for saving to disk:

topicdb.to_file("my-topicdatabase.tm.zip")

topicdb.to_lance("my-topicdatabase")

The to_file() method saves the metadata as parquet files and the vector arrays as .npz files, then writes a metadata JSON file and zips everything for portability.

The to_lance() method saves everything to a LanceDB folder.

If you have a saved tm.zip file or LanceDB folder, you can load it using the appropriate class method:

topicdb = TopicDatabase.from_file("my-topicdatabase.tm.zip")

topicdb = TopicDatabase.from_lance("my-topicdatabase")

Note that the embedding model of a TopicDatabase is not saved. You will have to reload your embedding model separately, and then set topicdb.embedding_model manually. For example:

import thematic_search as ts
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

topicdb = ts.TopicDatabase.from_file("docs/source/20ng-topicdb.tm.zip")
topicdb.embedding_model = model