(Translated by https://www.hiragana.jp/)
anakin87 (Stefano Fiorucci)

Stefano Fiorucci

anakin87

AI & ML interests

Contributing to Haystack, the LLM Framework šŸ—ļø. NLP / LLMs.

Articles

Organizations

Posts 7

view post
Post
963
šŒš² šŸš¢š«š¬š­ šœšØš¦š¦š®š§š¢š­š² ššš«š­š¢šœš„šž! š’šžš„šžšœš­š¢šÆšž šŸš¢š§šž-š­š®š§š¢š§š  š°š¢š­š” š’š©šžšœš­š«š®š¦ šŸŽÆ

Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
šŸ“” šŸ‘£ https://huggingface.co/blog/anakin87/spectrum

---

Looking to fine-tune Language Models efficiently and save on computational resources?

One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.

However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.

What if we could identify the most informative layers and only fine-tune those? šŸ¤”

This is exactly what Spectrum does! šŸ‘‡

šŸ”¬ Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)

šŸŽÆ Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).

You can then ā„ļø freeze the rest of the model and focus your šŸ‹ļøā€ā™‚ļø training on the chosen layers.


šŸ† Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...

---

For a practical guide, check out the article above.
view post
Post
1547
šŸ’¬ šŸ‡®šŸ‡¹ Phi 3.5 mini ITA: a Small Language Model for Italian

Lately, I've spent some time fine-tuning language models.

Now I am happy to release Phi 3.5 mini ITA: a fine-tuned version of Phi-3.5-mini-instruct to improve performance on the Italian language

šŸ”¹ Small (3.82 B parameters) but capable model
šŸ”¹ 128k context length

Chat with it on šŸ¤— Spaces: anakin87/Phi-3.5-mini-ITA
Model card: anakin87/Phi-3.5-mini-ITA

šŸ—ƒļø Data
Supervised fine-tuning using a good mix of English and Italian data:
- mlabonne/FineTome-100k by @mlabonne
- efederici/capybara-claude-15k-ita by @efederici
šŸ™ Thanks to the authors for the datasets.


šŸŽÆ Targeted training with Spectrum
I used Spectrum, a relatively new technique for parameter-efficient learning.
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ā„ļø freeze the rest.
I trained the top 30% of model layers.

šŸ“ Spectrum paper: https://arxiv.org/abs/2406.06623


šŸ“Š Vibe check and performance on Italian benchmarks seem encouraging