quaterion_models.encoders.extras.fasttext_encoder module
- class FasttextEncoder(model_path: str, on_disk: bool, aggregations: Optional[List[str]] = None)[source]
Bases:
Encoder
Creates a fasttext encoder, which generates vector for a list of tokens based in given fasttext model
- Parameters:
model_path – Path to model to load
on_disk – If True - use mmap to keep embeddings out of RAM
aggregations – What types of aggregations to use to combine multiple vectors into one. If multiple aggregations are specified - concatenation of all of them will be used as a result.
- classmethod aggregate(embeddings: Tensor, operation: str) Tensor [source]
Apply aggregation operation to embeddings along the first dimension
- Parameters:
embeddings – embeddings to aggregate
operation – one of
aggregation_options
- Returns:
Tensor – aggregated embeddings
- forward(batch: List[List[str]]) Tensor [source]
Infer encoder - convert input batch to embeddings
- Parameters:
batch – processed batch
- Returns:
embeddings – shape: (batch_size, embedding_size)
- get_collate_fn() CollateFnType [source]
Provides function that converts raw data batch into suitable model input
- Returns:
CollateFnType
– model’s collate function
- classmethod get_tokens(batch: List[Any]) List[List[str]] [source]
- classmethod load(input_path: str) Encoder [source]
Instantiate encoder from saved state.
If no state required - just call create instead
- Parameters:
input_path – path to load from
- Returns:
Encoder
– loaded encoder
- save(output_path: str)[source]
Persist current state to the provided directory
- Parameters:
output_path – path to save model
- load_fasttext_model(path: str) Union[FastText, KeyedVectors] [source]
Load fasttext model in a universal way
Try to find possible way of loading FastText model and load it
- Parameters:
path – path to FastText model or vectors
- Returns:
FastText
orKeyedVectors
– loaded model