secmlt.models.hugging_face package#
Submodules#
secmlt.models.hugging_face.base_hf_lm module#
Wrapper for Hugging Face causal language models.
- class secmlt.models.hugging_face.base_hf_lm.HFCausalLM(model_path: str, device: device | None = None, dtype: dtype | None = None, tokenizer_kwargs: dict | None = None, model_kwargs: dict | None = None)[source]#
Bases:
BaseLanguageModelWrapper for Hugging Face causal language models.
- decode(ids: LongTensor, **kwargs) list[str][source]#
Decode a batch of token IDs into text.
- Parameters:
ids (torch.LongTensor) – Tensor of token IDs.
- Returns:
Decoded text.
- Return type:
list of str
- encode(texts: list[str], **kwargs) LongTensor[source]#
Tokenize a batch of text prompts.
- Parameters:
texts (list of str) – Batch of input prompts.
- Returns:
Tensor of token IDs.
- Return type:
torch.LongTensor
- generate(prompts: list[list[dict]], **kwargs) list[str][source]#
Generate text completions from chat-style prompts.
- Parameters:
prompts (list of list of dict) – Batch of chat messages, each formatted as a list of {“role”: str, “content”: str}.
**kwargs – Additional parameters for model.generate().
- Returns:
Generated text completions.
- Return type:
list of str
- Raises:
ValueError – If tokenizer does not define apply_chat_template.
Return hidden states of the model.
- Parameters:
input_ids (torch.LongTensor) – Tensor of token IDs.
- Returns:
Hidden representations per layer.
- Return type:
list of torch.Tensor
- logprobs(prompts: list[str], targets: list[str], **kwargs) list[Tensor][source]#
Compute log-probabilities for each token in the target continuation.
- Parameters:
prompts (list of str) – Conditioning prompts.
targets (list of str) – Target continuations.
- Returns:
List of log-probabilities for each target token. Each tensor has shape [target_len_i].
- Return type:
list of torch.Tensor
- property model: AutoModelForCausalLM#
Get the wrapped Hugging Face model.
- Returns:
Wrapped Hugging Face model.
- Return type:
AutoModelForCausalLM
- predict(input_ids: LongTensor, **kwargs) Tensor[source]#
Compute next-token logits for each sequence in the batch.
- Parameters:
input_ids (torch.LongTensor) – Tensor of token IDs.
- Returns:
Logits for the next token.
- Return type:
torch.Tensor
- property tokenizer: AutoTokenizer#
Get the wrapped Hugging Face tokenizer.
- Returns:
Wrapped Hugging Face tokenizer.
- Return type:
AutoTokenizer
Module contents#
Hugging Face model wrappers.