  • quaterion_models.encoders.extras.fasttext_encoder module

class FasttextEncoder(model_path: str, on_disk: bool, aggregations: Optional[List[str]] = None)[source]

Bases: Encoder

Creates a fasttext encoder, which generates vector for a list of tokens based in given fasttext model

  • model_path – Path to model to load

  • on_disk – If True - use mmap to keep embeddings out of RAM

  • aggregations – What types of aggregations to use to combine multiple vectors into one. If multiple aggregations are specified - concatenation of all of them will be used as a result.

classmethod aggregate(embeddings: Tensor, operation: str) Tensor[source]

Apply aggregation operation to embeddings along the first dimension


Tensor – aggregated embeddings

forward(batch: List[List[str]]) Tensor[source]

Infer encoder - convert input batch to embeddings


batch – processed batch


embeddings – shape: (batch_size, embedding_size)

get_collate_fn() CollateFnType[source]

Provides function that converts raw data batch into suitable model input


CollateFnType – model’s collate function

classmethod get_tokens(batch: List[Any]) List[List[str]][source]
classmethod load(input_path: str) Encoder[source]

Instantiate encoder from saved state.

If no state required - just call create instead


input_path – path to load from


Encoder – loaded encoder

save(output_path: str)[source]

Persist current state to the provided directory


output_path – path to save model

aggregation_options = ['min', 'max', 'avg']
property embedding_size: int

Size of resulting embedding

property trainable: bool

Defines if encoder is trainable.

This flag affects caching and checkpoint saving of the encoder.

training: bool
load_fasttext_model(path: str) Union[FastText, KeyedVectors][source]

Load fasttext model in a universal way

Try to find possible way of loading FastText model and load it


path – path to FastText model or vectors


FastText or KeyedVectors – loaded model


