Supercharging My RTX 3060 with the Solar 10.7B Local Large Language Model

Executive Summary

At Djere Services Group, we love all aspects of technology, especially those related to Free Software and open-source. I was recently pleasantly surprised to discover an open-source large language model that was completely new to me: Solar.

In this article, I document my successful effort to significantly boost the performance of my mid-range Nvidia RTX 3060 graphics card for running local Large Language Models (LLMs) without a hardware upgrade. Initially using 8-billion-parameter models, I sought a more powerful solution but was constrained by my card's 12GB VRAM limit, which caused "out of memory" errors when I attempted to run larger 14B models. My objective was to find a "sweet spot" model that offered a substantial increase in capability while operating comfortably within my hardware's memory budget.

The ideal solution was found in SOLAR 10.7B, an efficient and powerful open-source model from the South Korean AI startup Upstage. This model is notable for its innovative creation through a "Depth Up-Scaling" technique, which allowed it to be developed from a Mistral-7B base. This efficient method resulted in a model that consistently outperforms larger competitors, establishing it as a top performer on public leaderboards. Its FOSS nature and superior performance-to-size ratio made it an ideal candidate for my project.

Upon implementation, the SOLAR 10.7B model proved to be a transformative upgrade. It delivered responses with incredible speed and a dramatic improvement in factual accuracy compared to the previous 8B models. A test query about jazz musician Charlie Parker, which previously yielded erroneous information, produced a comprehensive, well-structured, and entirely accurate biography. The model's modest 6.2 GB VRAM footprint makes it a perfect match for the 12GB RTX 3060, leaving ample memory for long conversational contexts. I conclude that SOLAR 10.7B acts as a powerful software-based supercharger for mid-range GPUs, proving that intelligent model design can provide a greater performance leap than simply increasing parameter size.

Keywords: SOLAR 10.7B, Nvidia RTX 3060, Local LLM, Large Language Model, Artificial Intelligence, AI, GPU, 12GB VRAM, Upstage AI, Open Source, FOSS, LLM performance, VRAM limit, Ollama, Pop!_OS, LLM tutorial, Supercharge GPU, Quantized Model, Mistral-7B, Depth Up-Scaling, Mid-range GPU


```
Article Definitions
├─ AI: Artificial Intelligence
├─ DUS: Depth Up-Scaling
├─ FOSS: Free and Open-Source Software
├─ GPU: Graphics Processing Unit
├─ LLM: Large Language Model
├─ VRAM: Video Random Access Memory
├─ SOLAR 10.7B: An open-source 10.7 billion parameter LLM from Upstage.
├─ Nvidia RTX 3060: The GPU used in the article, featuring 12GB of VRAM.
├─ Mistral-7B: The 7 billion parameter base model used for creating SOLAR.
├─ Upstage: The South Korean AI startup that created the SOLAR model.
└─ Quantization: A process for reducing a model's size to fit in VRAM.
```

Introduction

For any technology enthusiast, the world of local Large Language Models (LLMs) is a fascinating frontier. The ability to run a powerful Artificial Intelligence (AI) on your own machine, completely offline and with total privacy, is an innovative game-changer. My journey started like many others: with a capable, but not top-of-the-line, piece of hardware. My Nvidia RTX 3060, with its 12GB of VRAM, proved to be an excellent entry point, handling popular 8-billion-parameter models like Llama 3 with both speed and ease. The experience was amazing, but it left me wondering this: could I get more power without a pricey hardware upgrade?

The search for a more powerful model quickly led to the infamous "VRAM wall". Larger, more capable models often demand more video memory than mid-range cards can provide. While my initial foray into 14B models was met with "out of memory" errors, I knew there had to be a sweet spot: a model that could deliver a significant leap in performance while still fitting comfortably within my 12GB VRAM budget.

That's when I discovered SOLAR 10.7B.

Discovering the Perfect Fit: SOLAR 10.7B

SOLAR 10.7B is a model that fits neatly into my Free Sofware/open-source (FOSS) ethos, and it's a masterpiece of LLM efficiency. Created by the South Korean AI startup Upstage, it made huge waves in the FOSS community when it was released in December 2023. It shot to the top of the Hugging Face Open LLM Leaderboard, outperforming models that were significantly larger and more complex.

What makes SOLAR so special is its history. Instead of training a 10.7B model from scratch, the team at Upstage pioneered a brilliant method they call Depth Up-Scaling (DUS). They started with the already excellent Mistral-7B model and intelligently "scaled it up" by adding layers that were initialized with the weights of the original ones. This clever technique allowed them to build on a strong foundation, creating a more powerful model in a fraction of the time and cost.

This innovative approach resulted in a model renowned for "punching well above its weight." It's a true general-purpose LLM that excels at instruction following, reasoning, and conversational chat.

The Experience:  Speed and Quality

After a few networking hiccups, installing SOLAR was as simple as a single command in my terminal. Once I had the model loaded, I began to test it. It  was fast, just asd fast as my previous 8 billion parameter model, and it gave me much more accurate answers. For example, when I asked my old model about Charlie Parker, the output was filled with factual errors. The output of the Solar model was 100% accurate, as I detail below.

To test Solar's knowledge and coherence, I gave it a simple prompt: "Tell me about Charlie Parker." The response was not just good; it was exceptionally good:

"Charles Christopher 'Yardbird' or 'Bird' Parker, Jr., better known as Charlie Parker (August 29, 1920 – March 12, 1955), was a highly influential American jazz saxophonist and composer. He is considered one of the founding fathers and most important figures in bebop, a form of jazz music that originated in the mid-1940s. Parker's virtuosity on the alto saxophone, his innovative approach to harmony and melodic construction, and his contributions as a composer have placed him at the forefront of jazz history... Despite his immense talent, Parker's personal life was often turbulent due to struggles with drug addiction, particularly heroin... Charlie Parker died on March 12, 1955, at the age of 34... His untimely death left an indelible mark on jazz music and has been commemorated through numerous awards and accolades..."

The output was well-structured, accurate, and comprehensive. It captured the key details of Parker's life and legacy, demonstrating a depth of knowledge and a command of language that was a clear step up from the 8B models I was used to.

The Sweet Spot for the RTX 3060

The quantized version of SOLAR 10.7B uses about 6.2 GB of VRAM. For a 12GB card like the RTX 3060, this is the perfect balance. It leaves nearly half the VRAM free, which is crucial for handling the "context" of a conversation: your prompts, the chat history, and the generated replies. This generous headroom means that you can have long, detailed conversations without ever worrying about hitting the VRAM limit.

Conclusions

For anyone running a local LLM on a mid-range GPU, the SOLAR 10.7B model is a must-try. It represents a true "supercharged" moment: a free software upgrade that unlocks a new level of performance from your existing hardware. It proves that in the world of AI, the smartest design can often beat sheer size.

If you want to learn more about the minds behind this model, you can visit their website at https://upstage.ai/.

```
How I Supercharged My Nvidia RTX 3060 with SOLAR 10.7B
 │
 ├─ The Challenge
 │  ├─ My Hardware: Nvidia RTX 3060 (12GB VRAM)
 │  ├─ Initial State: Running 8B LLMs, seeking more power.
 │  └─ The Obstacle: The "VRAM Wall" causing errors with 14B models.
 │
 ├─ The Solution: Discovering SOLAR 10.7B
 │  ├─ Origin: Created by South Korean AI startup Upstage.
 │  ├─ Key Attributes
 │  │  ├─ Ethos: Fits Free and Open-Source Software (FOSS) principles.
 │  │  └─ Performance: Outperforms larger models ("punches above its weight").
 │  └─ Core Technology: Depth Up-Scaling (DUS)
 │     └─ Foundation: Built upon the Mistral-7B model.
 │
 ├─ The Experience & Results
 │  ├─ Performance Metrics
 │  │  ├─ Speed: As fast as previous 8B models.
 │  │  └─ Quality: Significantly higher factual accuracy.
 │  └─ Case Study: Generated a detailed, accurate biography of Charlie Parker.
 │
 ├─ The Perfect Fit: Technical Analysis
 │  ├─ VRAM Usage: The quantized model uses only ~6.2 GB.
 │  └─ Benefit for RTX 3060: Leaves nearly half of the 12GB VRAM free for context.
 │
 └─ Conclusions
    ├─ Final Verdict: A must-try "software upgrade" for mid-range GPUs.
    └─ Core Message: Intelligent model design can be more effective than sheer size.
```

You should also read: