LLM Fine Tuning
- Mohammed Jassim Jasmin

- Mar 18
- 2 min read
Updated: Mar 25
This is for machine with limited resources
Machine configuration
Processor: Intel Core i5 11400 @2.60 GHz * 12 RKL GT1 OS: Ubuntu 22.04
OS Type: 64 bit
RAM: 32 GiB
1. Choose a Lightweight LLM
Since your machine has limited resources (especially without a dedicated GPU), you should opt for smaller, more efficient models. Some good options include:
GPT-2 (small or medium): A smaller version of OpenAI's GPT-2.
DistilGPT-2: A distilled version of GPT-2, which is smaller and faster.
LLaMA (7B or smaller): Meta's LLaMA models are efficient, but even the 7B parameter model might be challenging to fine-tune on your machine.
GPT-Neo (125M or 350M): Smaller versions of GPT-Neo from EleutherAI.
Alpaca-LoRA: A fine-tuned version of LLaMA using Low-Rank Adaptation (LoRA), which is lightweight and efficient.
2. Install Required Libraries
You'll need Python and some machine learning libraries. Install the following:
pip install torch transformers datasets acceleratePyTorch: For model training and inference.
Transformers: Hugging Face's library for working with pre-trained models.
Datasets: For loading and preprocessing datasets.
Accelerate: To optimize training for CPU/GPU.
3. Load a Pre-trained Model
You can load a pre-trained model using Hugging Face's transformers library. For example:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
# Load a small GPT-2 model
model_name = "gpt2" # or "distilgpt2" for a smaller version
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)4. Prepare Your Dataset
Fine-tuning requires a dataset in a format the model can understand. You can use Hugging Face's datasets library to load or create a dataset:
from datasets import load_dataset
# Load a dataset (or use your own)
dataset = load_dataset("wikitext", "wikitext-2-raw-v1") # Example datasetTokenize the dataset:
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_dataset = dataset.map(tokenize_function, batched=True)5. Fine-Tune the Model
Fine-tuning on a CPU will be slow, but it's possible. Use Hugging Face's Trainer API:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=1, # Reduce batch size for CPU
num_train_epochs=1, # Start with 1 epoch
save_steps=10_000,
save_total_limit=2,
logging_dir="./logs",
logging_steps=200,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["validation"],
)
trainer.train()6. Optimise for CPU
You can use the accelerate library to optimise training for CPU:
accelerate configFollow the prompts to configure your environment for CPU training. Then, use:
accelerate launch your_training_script.py7. Consider Using LoRA or Adapters
If fine-tuning the entire model is too resource-intensive, consider using Low-Rank Adaptation (LoRA)or Adapters. These techniques allow you to fine-tune only a small part of the model, significantly reducing memory and compute requirements.
8. Monitor Performance
Keep an eye on your system's memory and CPU usage. If your machine runs out of RAM, you may need to:
Reduce the batch size.
Use a smaller model.
Use gradient checkpointing (enabled in TrainingArguments with gradient_checkpointing=True).
9. Inference
After fine-tuning, you can use the model for inference:
input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))10. Limitations
Speed: Training on a CPU will be slow. Fine-tuning even a small model like GPT-2 (125M) could take hours or days.
Memory: Your machine has limited RAM, so you may need to use smaller models or techniques like LoRA.
Scalability: For larger models or datasets, consider using cloud services like Google Colab, AWS, or Azure.
Alternative: Use a Pre-trained API
If fine-tuning locally is too resource-intensive, consider using APIs like OpenAI's GPT-3.5/4 or Hugging Face's Inference API, which allow you to fine-tune models in the cloud.





Comments