LLM Fine Tuning

Mohammed Jassim Jasmin
Mar 18
2 min read

Updated: Mar 25

This is for machine with limited resources

Machine configuration

Processor: Intel Core i5 11400 @2.60 GHz * 12 RKL GT1 OS: Ubuntu 22.04

OS Type: 64 bit

RAM: 32 GiB

1. Choose a Lightweight LLM

Since your machine has limited resources (especially without a dedicated GPU), you should opt for smaller, more efficient models. Some good options include:

GPT-2 (small or medium): A smaller version of OpenAI's GPT-2.
DistilGPT-2: A distilled version of GPT-2, which is smaller and faster.
LLaMA (7B or smaller): Meta's LLaMA models are efficient, but even the 7B parameter model might be challenging to fine-tune on your machine.
GPT-Neo (125M or 350M): Smaller versions of GPT-Neo from EleutherAI.
Alpaca-LoRA: A fine-tuned version of LLaMA using Low-Rank Adaptation (LoRA), which is lightweight and efficient.

2. Install Required Libraries

You'll need Python and some machine learning libraries. Install the following:

pip install torch transformers datasets accelerate

PyTorch: For model training and inference.
Transformers: Hugging Face's library for working with pre-trained models.
Datasets: For loading and preprocessing datasets.
Accelerate: To optimize training for CPU/GPU.

3. Load a Pre-trained Model

You can load a pre-trained model using Hugging Face's transformers library. For example:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load a small GPT-2 model
model_name = "gpt2"  # or "distilgpt2" for a smaller version
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

4. Prepare Your Dataset

Fine-tuning requires a dataset in a format the model can understand. You can use Hugging Face's datasets library to load or create a dataset:

from datasets import load_dataset

# Load a dataset (or use your own)
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")  # Example dataset

Tokenize the dataset:

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

5. Fine-Tune the Model

Fine-tuning on a CPU will be slow, but it's possible. Use Hugging Face's Trainer API:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=1,  # Reduce batch size for CPU
    num_train_epochs=1,  # Start with 1 epoch
    save_steps=10_000,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=200,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
)

trainer.train()

6. Optimise for CPU

You can use the accelerate library to optimise training for CPU:

accelerate config

Follow the prompts to configure your environment for CPU training. Then, use:

accelerate launch your_training_script.py

7. Consider Using LoRA or Adapters

If fine-tuning the entire model is too resource-intensive, consider using Low-Rank Adaptation (LoRA)or Adapters. These techniques allow you to fine-tune only a small part of the model, significantly reducing memory and compute requirements.

8. Monitor Performance

Keep an eye on your system's memory and CPU usage. If your machine runs out of RAM, you may need to:

Reduce the batch size.
Use a smaller model.
Use gradient checkpointing (enabled in TrainingArguments with gradient_checkpointing=True).

9. Inference

After fine-tuning, you can use the model for inference:

input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

10. Limitations

Speed: Training on a CPU will be slow. Fine-tuning even a small model like GPT-2 (125M) could take hours or days.
Memory: Your machine has limited RAM, so you may need to use smaller models or techniques like LoRA.
Scalability: For larger models or datasets, consider using cloud services like Google Colab, AWS, or Azure.

Alternative: Use a Pre-trained API

If fine-tuning locally is too resource-intensive, consider using APIs like OpenAI's GPT-3.5/4 or Hugging Face's Inference API, which allow you to fine-tune models in the cloud.