Back to Blog
LMSTUDIOOLLAMAPYTHON

How to Run AI Locally with LM Studio

3 min read

How to Run AI Locally with LM Studio

You don't need an OpenAI subscription to run a capable language model. With LM Studio, you can download and run models like Qwen, Mistral, or Llama directly on your Mac or PC — no internet required, no API costs, full privacy.

Why Run AI Locally?#

  • Free — no API costs, no subscriptions
  • Private — your data never leaves your machine
  • Fast — no network latency once the model is loaded
  • Offline — works without internet

What You Need#

  • A machine with at least 8GB of RAM (16GB+ recommended)
  • LM Studio installed
  • A model downloaded from the built-in model browser

Getting Started#

LM Studio gives you a clean UI to browse, download, and chat with models. It also exposes a local OpenAI-compatible API at http://localhost:1234 — meaning any tool that supports OpenAI can point to your local model instead.

Download a Model#

Open LM Studio, head to the Discover tab, and search for a model. For example, Qwen2.5-Coder-7B is a great coding model that runs well on 16GB machines.

Start the Local Server#

Once your model is loaded, click Local Server in the sidebar and hit Start Server. You'll see:

bashbash
Server running at http://localhost:1234

Use It with Python#

You can now call your local model with the OpenAI SDK — just point it to localhost:

pythonlocal_ai.py
from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)
 
response = client.chat.completions.create(
    model="qwen2.5-coder-7b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain async/await in Python"}
    ]
)
 
print(response.choices[0].message.content)

Use It with JavaScript#

Same idea — works with any OpenAI-compatible library:

javascriptlocal_ai.js
import OpenAI from "openai"
 
const client = new OpenAI({
  baseURL: "http://localhost:1234/v1",
  apiKey: "not-needed",
})
 
const response = await client.chat.completions.create({
  model: "qwen2.5-coder-7b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a React hook for dark mode" },
  ],
})
 
console.log(response.choices[0].message.content)
Tip

Any tool that supports the OpenAI API can use your local model — just change the base URL to http://localhost:1234/v1.

Configuration#

You can tweak model settings in LM Studio's UI. Here's a quick reference for the most useful parameters:

config.json
{
  "temperature": 0.7,
  "max_tokens": 2048,
  "top_p": 0.9,
  "stream": true
}
Memory Usage

Larger models need more RAM. A 7B parameter model typically needs ~8GB, while 13B models need ~16GB. If your machine starts swapping, try a smaller quantization (Q4 instead of Q8).

You're all set!

You now have a fully local AI running on your machine. No API keys, no costs, no data leaving your computer.

Summary#

  • LM Studio makes it dead simple to run open-source LLMs locally
  • The local server is OpenAI-compatible — works with existing SDKs
  • Great for development, testing, and privacy-sensitive workflows

Enjoyed this post?

Follow me for more articles on web development, software engineering, and my personal projects. I share weekly insights on building modern web applications.