Local AI stack on Windows 11: LM Studio and AnythingLLM setup guide

Local AI on Windows series · View all parts
Part 1 of 3 · Next: Part 2: Hermes AI agent on Ubuntu via Hyper-V →

Running AI models in the cloud means your prompts, documents, and conversations pass through someone else’s servers. A local AI stack keeps everything on your own machine — no API costs, no data leaving your network, and no internet connection required once models are downloaded.

This guide walks you through setting up LM Studio as a local model server and AnythingLLM as a chat and RAG interface on top of it. The setup works on a mid-range Windows 11 machine — you don’t need a high-end GPU. A 6 GB VRAM card runs 3B parameter models comfortably.

What you need

Windows 11 (64-bit, Home or Pro)
GPU with at least 4 GB VRAM — 6 GB or more recommended. Models will fall back to CPU if VRAM is insufficient, which works but is much slower.
16 GB RAM — 8 GB is the minimum, but 16 GB gives you headroom to run the model alongside other apps
10–30 GB free disk space depending on which models you download
LM Studio (free) and AnythingLLM Desktop (free)

Step 1: Install LM Studio

Go to lmstudio.ai and download the Windows installer
Run the installer — the default installation path is fine
Open LM Studio

LM Studio is both a model manager and a local OpenAI-compatible API server. You use it to download models from Hugging Face and serve them over a local HTTP endpoint that other tools can talk to.

Step 2: Download a model

In LM Studio, go to the Discover tab (magnifying glass icon)
Search for Qwen2.5-3B — this is a capable 3B parameter model that fits comfortably in 6 GB VRAM
Look for a Q4_K_M quantized version — this is the best balance of quality and size for consumer GPUs
Click Download and wait for it to complete (~2 GB)

If you have more VRAM (8 GB+), you can go up to a 7B model. For 4 GB VRAM, stick to 3B or smaller. If you’re unsure, start with Qwen2.5-3B — it handles English and Dutch reasonably well for a model of its size.

Step 3: Start the local server

In LM Studio, go to the Developer tab (the </> icon in the left sidebar)
Select your downloaded model from the dropdown at the top
Click Start Server
You should see: “Server listening on http://localhost:1234”

Leave LM Studio open and the server running — AnythingLLM connects to it over this local port.

Step 4: Install AnythingLLM Desktop

Go to anythingllm.com/desktop and download the Windows installer
When prompted, install for Current User only — not “All Users”. Installing to Program Files causes a known spawn error that breaks the app on startup.
Open AnythingLLM Desktop

Step 5: Connect AnythingLLM to LM Studio

In AnythingLLM, open Settings (gear icon, bottom left)
Go to LLM Preference
Select LM Studio as the provider
Set the base URL to http://localhost:1234
Click Save changes
Go to Embedding Model in the same settings panel and set it to AnythingLLM built-in (or LM Studio if you prefer)

You can verify the connection by returning to the LLM settings — AnythingLLM will show a green indicator if it can reach the server.

Step 6: Create a workspace

Back on the main screen, click New Workspace
Give it a name — for example Personal assistant or Project notes
Optionally set a system prompt: “You are a helpful assistant. Always respond in English.”
Click Save

Workspaces are isolated from each other — documents uploaded to one workspace are not visible in another. This lets you keep separate contexts for different projects or topics.

Step 7: Add your own documents (RAG)

This is where the local setup becomes genuinely useful. You can upload your own notes, reports, or reference documents and ask questions about them — all processed locally.

Inside your workspace, click the paperclip icon or drag and drop a file into the chat area
Supported formats: .txt, .pdf, .md, .docx, .csv
AnythingLLM splits the document into chunks and creates embeddings locally — this may take a few seconds depending on file size
Once indexed, you’ll see the document listed under the workspace

Now ask a question directly related to the document content. The model retrieves the relevant chunks and answers using your document as context rather than its training data alone.

Step 8: Test the full stack

With LM Studio running the model server and AnythingLLM connected, try these prompts in order to confirm everything works:

“What is 2 + 2?” — confirms the LLM connection is working
Upload a short text file with a few sentences about a topic you choose
“Based on the document I uploaded, what does it say about [topic]?” — confirms RAG is working

If the model responds but ignores the document, check that the document is actually indexed (green indicator next to the filename in the workspace sidebar) and that the embedding model is active in settings.

Troubleshooting

AnythingLLM won’t start — spawn error on launch
This happens when AnythingLLM is installed under Program Files (All Users). Uninstall and reinstall using the “Current User” option. The app stores its data in AppDataRoaminganythingllm-desktop, which is only accessible per-user.

LM Studio server starts but AnythingLLM can’t connect
Confirm the base URL in AnythingLLM settings is exactly http://localhost:1234 — no trailing slash. Also confirm the LM Studio server is running (green dot in the Developer tab) and a model is loaded.

Responses are very slow
The model is probably running on CPU instead of GPU. In LM Studio’s Developer tab, check the GPU offload slider — set it to maximum to push as many layers as possible to VRAM. With 6 GB VRAM and a 3B Q4 model you should get 10–20 tokens per second.

Model output quality is poor or responses are incoherent
Try a different model. Qwen2.5-3B-Instruct (not the base version) and Llama-3.2-3B-Instruct are good starting points. Avoid base models — you want the instruction-tuned variant for chat use.

Conclusion

LM Studio handles the model layer — download, GPU offload, OpenAI-compatible API. AnythingLLM handles the application layer — workspaces, RAG, document indexing, system prompts. Together they give you a local AI stack that runs entirely on your own hardware, works offline, and keeps your documents private.

The setup takes about 20 minutes. Once it’s running, adding new documents or switching models is a matter of drag-and-drop or a single dropdown selection. From here you can expand the stack with n8n for workflow automation, or explore voice interfaces that connect to the same LM Studio server endpoint.

Featured image by Ales Nesetril on Unsplash.

How to set up a local AI stack on Windows 11 with LM Studio and AnythingLLM

What you need

Step 1: Install LM Studio

Step 2: Download a model

Step 3: Start the local server

Step 4: Install AnythingLLM Desktop

Step 5: Connect AnythingLLM to LM Studio

Step 6: Create a workspace

Step 7: Add your own documents (RAG)

Step 8: Test the full stack

Troubleshooting

Conclusion

Related

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories