Hugging Face Model Usage Guide
Table of Contents
Overview
This guide explains how to set up and use Hugging Face-hosted models (including Apertus LLMs) with GABM, both locally and via remote APIs.
Authentication
Some models require authentication to download from Hugging Face.
Recommended: Store your Hugging Face token in
data/api_key.csvas described in API_KEYS.md. GABM setup scripts will automatically set theHF_TOKENenvironment variable from this file if present and not already set.
Model Setup
Using a Model from Hugging Face
Visit the Apertus LLM collection and choose a model.
Install the required package:
pip install transformers
Load and use the model in Python:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "swiss-ai/apertus-llm-7b" # Example tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "Give me a brief explanation of gravity in simple terms." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) output_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(output_text)
For chat-style prompting, see the Hugging Face chat templating docs or the model card.
Using a Downloaded Model from Local Cache
If you have already downloaded a model (e.g., swiss-ai/Apertus-8B-2509), you can load it directly from your Hugging Face cache directory. This avoids re-downloading and works even if the model is no longer public.
Example (all platforms, adjust path as needed):
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
local_model_path = os.path.expanduser("~/.cache/huggingface/hub/models--swiss-ai--Apertus-8B-2509")
tokenizer = AutoTokenizer.from_pretrained(local_model_path)
model = AutoModelForCausalLM.from_pretrained(local_model_path)
Replace the path with your actual cache location if different. On Windows, the cache is typically in %USERPROFILE%\.cache\huggingface\hub.
Comparing Local and Remote Outputs
You can compare the output of a local model and the same model accessed via a service API (e.g., PublicAI) to ensure consistency.
Example comparison script:
from transformers import AutoModelForCausalLM, AutoTokenizer
import requests
import os
prompt = "Give me a brief explanation of gravity in simple terms."
# Local inference
local_model_path = os.path.expanduser("~/.cache/huggingface/hub/models--swiss-ai--Apertus-8B-2509")
tokenizer = AutoTokenizer.from_pretrained(local_model_path)
model = AutoModelForCausalLM.from_pretrained(local_model_path)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
local_result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Local output:", local_result)
# Remote (PublicAI API example)
api_key = "YOUR_PUBLICAI_KEY" # Replace with your key
url = "https://api.publicai.co/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}",
"User-Agent": "GABM/1.0"
}
data = {
"model": "swiss-ai/apertus-8b-instruct", # or 70b-instruct
"messages": [{"role": "user", "content": prompt}]
}
response = requests.post(url, headers=headers, json=data)
remote_result = response.json()
print("Remote output:", remote_result)
This lets you verify that your local and remote model outputs are similar or spot differences.
Troubleshooting
If you see “401 Unauthorized”, check your Hugging Face authentication and token.
Tip: If you want to use only locally downloaded models, you do not need to set HF_TOKEN.
For more details, see API_KEYS.md and the Hugging Face documentation.