class WideningHead(input_embedding_size: int, expansion_factor: float = 4.0, activation_fn: str = 'relu', dropout: float = 0.0, **kwargs)[source]

Bases: StackedProjectionHead

Implements narrow-wide-narrow architecture.

Widen the dimensionality by a factor of expansion_factor and narrow it down back to input_embedding_size.

  • input_embedding_size – Dimensionality of the input to this head layer.

  • expansion_factor – Widen the dimensionality by this factor in the intermediate layer.

  • activation_fn – Name of the activation function to apply after the intermediate layer. Must be an attribute of torch.nn.functional and defaults to relu.

  • dropout – Probability of Dropout. If dropout > 0., apply dropout layer on embeddings before applying head layer transformations

get_config_dict() Dict[str, Any][source]

Constructs savable params dict


Serializable parameters for __init__ of the Module

training: bool


