# Hugging Face Model Usage Guide ## Table of Contents - [Overview](#overview) - [Authentication](#authentication) - [Model Setup](#model-setup) - [Using a Model from Hugging Face](#using-a-model-from-hugging-face) - [Using a Downloaded Model from Local Cache](#using-a-downloaded-model-from-local-cache) - [Comparing Local and Remote Outputs](#comparing-local-and-remote-outputs) - [Troubleshooting](#troubleshooting) ## Overview This guide explains how to set up and use Hugging Face-hosted models (including Apertus LLMs) with GABM, both locally and via remote APIs. ## Authentication Some models require authentication to download from Hugging Face. - **Recommended:** Store your Hugging Face token in `data/api_key.csv` as described in [API_KEYS.md](API_KEYS.md). GABM setup scripts will automatically set the `HF_TOKEN` environment variable from this file if present and not already set. ## Model Setup ### Using a Model from Hugging Face 1. Visit the [Apertus LLM collection](https://huggingface.co/collections/swiss-ai/apertus-llm) and choose a model. 2. Install the required package: ```bash pip install transformers ``` 3. Load and use the model in Python: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "swiss-ai/apertus-llm-7b" # Example tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "Give me a brief explanation of gravity in simple terms." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) output_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(output_text) ``` For chat-style prompting, see the [Hugging Face chat templating docs](https://huggingface.co/docs/transformers/main/en/chat_templating) or the model card. ### Using a Downloaded Model from Local Cache If you have already downloaded a model (e.g., `swiss-ai/Apertus-8B-2509`), you can load it directly from your Hugging Face cache directory. This avoids re-downloading and works even if the model is no longer public. Example (all platforms, adjust path as needed): ```python from transformers import AutoModelForCausalLM, AutoTokenizer import os local_model_path = os.path.expanduser("~/.cache/huggingface/hub/models--swiss-ai--Apertus-8B-2509") tokenizer = AutoTokenizer.from_pretrained(local_model_path) model = AutoModelForCausalLM.from_pretrained(local_model_path) ``` Replace the path with your actual cache location if different. On Windows, the cache is typically in `%USERPROFILE%\.cache\huggingface\hub`. ## Comparing Local and Remote Outputs You can compare the output of a local model and the same model accessed via a service API (e.g., PublicAI) to ensure consistency. Example comparison script: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import requests import os prompt = "Give me a brief explanation of gravity in simple terms." # Local inference local_model_path = os.path.expanduser("~/.cache/huggingface/hub/models--swiss-ai--Apertus-8B-2509") tokenizer = AutoTokenizer.from_pretrained(local_model_path) model = AutoModelForCausalLM.from_pretrained(local_model_path) inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) local_result = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Local output:", local_result) # Remote (PublicAI API example) api_key = "YOUR_PUBLICAI_KEY" # Replace with your key url = "https://api.publicai.co/v1/chat/completions" headers = { "Content-Type": "application/json", "Authorization": f"Bearer {api_key}", "User-Agent": "GABM/1.0" } data = { "model": "swiss-ai/apertus-8b-instruct", # or 70b-instruct "messages": [{"role": "user", "content": prompt}] } response = requests.post(url, headers=headers, json=data) remote_result = response.json() print("Remote output:", remote_result) ``` This lets you verify that your local and remote model outputs are similar or spot differences. ## Troubleshooting - If you see "401 Unauthorized", check your Hugging Face authentication and token. - **Tip:** If you want to use only locally downloaded models, you do not need to set HF_TOKEN. - For more details, see [API_KEYS.md](API_KEYS.md) and the Hugging Face documentation.