gabm.io.llm.apertus module
This script demonstrates how to use the Apertus models from the Hugging Face Transformers library. It loads the model and tokenizer, prepares a prompt, generates a response, and prints the output.
- gabm.io.llm.apertus.download_apertus_model(model_name: str) None
Downloads and caches the specified Apertus model and tokenizer using Hugging Face Transformers.
- Args:
model_name (str): The Hugging Face model name to download (e.g., ‘swiss-ai/apertus-70b-instruct’).
- gabm.io.llm.apertus.local_apertus_infer(model_name: str, prompt: str, device: str = 'cpu', cache: Dict[Any, Any] | None = None, cache_path: str | None = None, max_new_tokens: int = 32768, logger: Any | None = None) str
Run local inference with an Apertus model, using shared cache and logging utilities.
- Args:
model_name (str): The Hugging Face model name to use (e.g., ‘swiss-ai/apertus-70b-instruct’). prompt (str): The input prompt to send to the model. device (str): The device to run inference on (‘cpu’ or ‘cuda’). cache (dict, optional): An optional cache dictionary to use for caching responses. cache_path (str, optional): An optional path to the cache file. If not provided, uses default paths. max_new_tokens (int): The maximum number of new tokens to generate. logger: Optional logger for logging messages.
- Returns:
str: The generated response from the model.