#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Hi!

Welcome to the exciting world of local Large Language Models (LLMs) where we’re pushing the boundaries of what’s possible with AI.

Today let’s talk about a cool topic: run models locally, especially on devices like the Raspberry Pi 5. Let’s dive into the future of AI, right in our own backyards.

Ollama and using Open Source LLMs

OLLAMA stands out as a platform that simplifies the process of running open-source LLMs locally on your machine. It bundles model weights, configuration, and data into a single package, making it accessible for developers and AI enthusiasts alike. The key benefits of using Ollama include:

Simplicity: Easy setup process without the need for deep machine learning knowledge.
Cost-Effectiveness: Eliminates cloud costs, making it wallet-friendly.
Privacy: Ensures data processing happens on your local machine, enhancing user privacy.
Versatility: Suitable for various applications beyond Python, including web development.

Using Local LLMs like Llama3 or Phi-3

Local LLMs like Llama3 and Phi-3 represent a significant shift towards more efficient and compact AI models.

Llama3, with its Mixture-of-Experts (MoE) architecture, offers specialized neural networks for different tasks, providing high-quality outputs with a smaller parameter count.

Phi-3, developed by Microsoft, uses advanced training techniques like quantization to maximize efficiency, making it ideal for deployment on a wide range of devices.

The use of local LLMs offers several advantages:

Reduced Latency: Local models eliminate network latency associated with cloud-based solutions.
Enhanced Privacy: Data remains on your local device, offering a secure environment for sensitive information.
Customization: Local models allow for greater flexibility to tweak and optimize the models as per your needs

How to Set Up a Local Ollama Inference Server on a Raspberry Pi 5

I already wrote a couple of times, my own version of the 1st time setup for a Raspberry Pi (link). Once the device is ready, setting up Ollama on a Raspberry Pi 5 (or older) is a straightforward process. Here’s a quick guide to get you started:

Installation: Use the official Ollama installation script to install it on your Raspberry Pi OS.

The main command is:

curl -fsSL https://ollama.com/install.sh | sh

Running Models: After installation, you can run various LLMs like tinyllama, phi, and llava, depending on your RAM capacity.

In example to install and run llama 3, we can use the following command:

ollama run llama3

Once ollama is installed and a model is downloaded, the console should look similar to this one:

log view of the installation of ollama and the run of llama 3 model

For a detailed step-by-step guide, including setting up Docker and accessing the Ollama WebUI, check out the resources available on GitHub.

Tip: to check the realtime journal of the ollama service, we can run this command:

journalctl -u ollama -f

Important: by defaul ollama server is available only for local calls. In order to enable access from other machines, you need to follow these steps:

– Edit the systemd service by calling this command.

sudo systemctl edit ollama.service

– This will open an editor.

– Add a line Environment under section [Service]:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

– Save and exit.

– Reload systemd and restart Ollama:

More information in the ollama FAQ.

How to Use Semantic Kernel to Call a Chat Generation from a Remote Server

Let’s switch and write some code. This is a “Hello World” sample using Semantic Kernel and Azure OpenAI Services.

	// Copyright (c) 2024
	// Author : Bruno Capuano
	// Change Log :
	// – Sample console application to use AOAI with Semantic Kernel
	//
	// The MIT License (MIT)
	//
	// Permission is hereby granted, free of charge, to any person obtaining a copy
	// of this software and associated documentation files (the "Software"), to deal
	// in the Software without restriction, including without limitation the rights
	// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	// copies of the Software, and to permit persons to whom the Software is
	// furnished to do so, subject to the following conditions:
	//
	// The above copyright notice and this permission notice shall be included in
	// all copies or substantial portions of the Software.
	//
	// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
	// THE SOFTWARE.

	#pragma warning disable SKEXP0001, SKEXP0003, SKEXP0010, SKEXP0011, SKEXP0050, SKEXP0052

	using System.Diagnostics;
	using Microsoft.Extensions.Configuration;
	using Microsoft.SemanticKernel;
	using Microsoft.SemanticKernel.ChatCompletion;

	// create kernel using Azure OpenAI Services
	var config = new ConfigurationBuilder().AddUserSecrets<Program>().Build();
	var builder = Kernel.CreateBuilder();
	builder.AddOpenAIChatCompletion(
	config["AZURE_OPENAI_MODEL-GPT3.5"],
	config["AZURE_OPENAI_ENDPOINT"],
	config["AZURE_OPENAI_APIKEY"]);
	var kernel = builder.Build();

	// Invoke a simple prompt to the chat service
	string prompt = "Write a joke about kittens";
	var response = await kernel.InvokePromptAsync(prompt);
	Console.WriteLine("AOAI Response: " + response.GetValue<string>());

view raw skaoaidemo.cs hosted with ❤ by GitHub

You can learn more about this ai samples in: https://aka.ms/dotnet-ai.

Now , to use a remote LLM, like Llama 3 in a Raspberry Pi, we can add a service to the Builder, that uses OpenAI API specification. In the next sample this change in the line 35:

	// Copyright (c) 2024
	// Author : Bruno Capuano
	// Change Log :
	// – Sample console application to use llama3 LLM running in a Raspberry Pi with Semantic Kernel
	//
	// The MIT License (MIT)
	//
	// Permission is hereby granted, free of charge, to any person obtaining a copy
	// of this software and associated documentation files (the "Software"), to deal
	// in the Software without restriction, including without limitation the rights
	// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	// copies of the Software, and to permit persons to whom the Software is
	// furnished to do so, subject to the following conditions:
	//
	// The above copyright notice and this permission notice shall be included in
	// all copies or substantial portions of the Software.
	//
	// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
	// THE SOFTWARE.

	#pragma warning disable SKEXP0001, SKEXP0003, SKEXP0010, SKEXP0011, SKEXP0050, SKEXP0052

	using System.Diagnostics;
	using Microsoft.Extensions.Configuration;
	using Microsoft.SemanticKernel;
	using Microsoft.SemanticKernel.ChatCompletion;

	// Create kernel with a custom http address
	var builder = Kernel.CreateBuilder();
	builder.AddOpenAIChatCompletion("llama", endpoint: new Uri("http://rpillm05:11434"), "apikey");
	var kernel = builder.Build();

	string prompt = "Write a joke about kittens";
	var response = await kernel.InvokePromptAsync(prompt);
	Console.WriteLine("RPI Llama3 Response: " + response.GetValue<string>());

view raw skllama3rpi.cs hosted with ❤ by GitHub

This makes the trick! And with just a single change in a line.

And we also have the question about the performance, adding a StopWatch we can get a sense of the time elapsed for the call. For this simple call, the response is around 30-50 seconds.

Not bad at all for a small device!

Conclusion

The advent of local LLMs like Ollama is revolutionizing the way we approach AI, offering unprecedented opportunities for innovation and privacy. Whether you’re a seasoned developer or just starting out, the potential of local AI is immense and waiting for you to explore.

This blog post was generated using information from various online resources, including cheatsheet.md, anakin.ai, and techcommunity.microsoft.com, to provide a comprehensive guide on local LLMs and Ollama.

Happy coding!

Greetings

El Bruno

From the blog

?? Lo mejor de esta semana en NO TIENE NOMBRE

7 Sep 2025
?? Lo que pasó esta semana en NO TIENE NOMBRE ???

31 Aug 2025
Does .ConfigureAwait Still Matter in .NET? ??

29 Aug 2025
?? Lo mejor de esta semana en NO TIENE NOMBRE ???

24 Aug 2025

About the author

Sophia Bennett is an art historian and freelance writer with a passion for exploring the intersections between nature, symbolism, and artistic expression. With a background in Renaissance and modern art, Sophia enjoys uncovering the hidden meanings behind iconic works and sharing her insights with art lovers of all levels. When she’s not visiting museums or researching the latest trends in contemporary art, you can find her hiking in the countryside, always chasing the next rainbow.

国内精品久久久久影院日本,日本中文字幕视频,99久久精品99999久久,又粗又大又黄又硬又爽毛片

#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Ollama and using Open Source LLMs

Using Local LLMs like Llama3 or Phi-3

How to Set Up a Local Ollama Inference Server on a Raspberry Pi 5

How to Use Semantic Kernel to Call a Chat Generation from a Remote Server

Conclusion

Leave a comment Cancel reply

From the blog

?? Lo mejor de esta semana en NO TIENE NOMBRE

?? Lo que pasó esta semana en NO TIENE NOMBRE ???

Does .ConfigureAwait Still Matter in .NET? ??

?? Lo mejor de esta semana en NO TIENE NOMBRE ???

About the author

国内精品久久久久影院日本,日本中文字幕视频,99久久精品99999久久,又粗又大又黄又硬又爽毛片

#SemanticKernel: Local LLMs Unleashed on #RaspberryPi 5

Ollama and using Open Source LLMs

Using Local LLMs like Llama3 or Phi-3

How to Set Up a Local Ollama Inference Server on a Raspberry Pi 5

How to Use Semantic Kernel to Call a Chat Generation from a Remote Server

Conclusion

Share this:

Leave a comment Cancel reply

From the blog

?? Lo mejor de esta semana en NO TIENE NOMBRE

?? Lo que pasó esta semana en NO TIENE NOMBRE ???

Does .ConfigureAwait Still Matter in .NET? ??

?? Lo mejor de esta semana en NO TIENE NOMBRE ???

About the author

Discover more from El Bruno