How To Run Dozens Of Ai Models On Your Mac Or Pc

SERVERS

cofotoisme/Getty Images

With the rapid advancements in artificial intelligence (AI), running sophisticated models like Meta's Llama 3.1 locally on personal computers is becoming increasingly popular. Running an LLM on your local PC or Mac provides a sandbox for experimentation and development without compromising data privacy and allows for more flexibility in model usage.

Also: Why the future must be BYO AI: Model lock-in deters users and stifles innovation

Here is a quick guide to help you set up and run Llama 3.1 -- as well as many other models such as Google Gemma2 -- on Mac, Linux, and Windows. I'll also discuss the benefits of privately hosted models.

Why develop and test against different open-source models?

Llama 3.1 8b running on Ollama/Open WebUI

Jason Perlow

Developing and testing against various open source models you privately host and run offers several advantages over relying solely on publicly hosted large language models (LLMs) from providers like OpenAI, Microsoft CoPilot, Meta AI, and Google Gemini.

Data privacy: Publicly hosted LLMs require sending data over the internet, which can raise privacy and security concerns. Running models locally ensures that sensitive data remains on your own hardware.

Customization: Open-source models allow for greater customization. Developers can fine-tune models, adjust hyperparameters, and modify the architecture to suit specific use cases better.

Cost control: Cloud-based AI services can be costly, especially for large-scale applications. Hosting models locally can significantly reduce ongoing API usage and data transfer expenses.

Offline capability: Local models can be used without an internet connection, which is essential for applications requiring high availability or in areas with unreliable internet access.

Flexibility and experimentation: Hosting your own models enables you to experiment with different algorithms and configurations, leading to innovative solutions and a deeper understanding of AI technologies.

Freedom from usage policies: Running LLMs locally means the usage policies of companies like OpenAI, Microsoft, Meta, and Google do not restrict you. You can use whatever prompts you want and employ modified LLMs with lifted restrictions, trained on data that these services might restrict.

Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives

Introduction to Ollama

Ollama is a versatile and MIT-licensed open-source platform designed to help developers and researchers easily run and manage machine learning models locally on their own hardware. It was developed by a team of AI enthusiasts and engineers who aim to provide tools that ensure data privacy, flexibility, and control over AI applications. Ollama supports various AI models, making it a valuable resource for those looking to explore and utilize AI technologies without relying on third-party cloud services.

Here are some example models that can be downloaded:

Model	Parameters	Size	Download
Llama 3.1	8B	4.7GB	ollama run llama3.1
Llama 3.1	70B	40GB	ollama run llama3.1:70b
Llama 3.1	405B	231GB	ollama run llama3.1:405b
Phi 3 Mini	3.8B	2.3GB	ollama run phi3
Phi 3 Medium	14B	7.9GB	ollama run phi3:medium
Gemma 2	2B	1.6GB	ollama run gemma2:2b
Gemma 2	9B	5.5GB	ollama run gemma2
Gemma 2	27B	16GB	ollama run gemma2:27b
Mistral	7B	4.1GB	ollama run mistral
Moondream 2	1.4B	829MB	ollama run moondream
Neural Chat	7B	4.1GB	ollama run neural-chat
Starling	7B	4.1GB	ollama run starling-lm
Code Llama	7B	3.8GB	ollama run codellama
Llama 2 Uncensored	7B	3.8GB	ollama run llama2-uncensored
LLaVA	7B	4.5GB	ollama run llava
Solar	10.7B	6.1GB	ollama run solar

Per Ollama's GitHub page, you should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Our test systems

I tested Ollama using M1 Pro and M1 Ultra Macs with 32GB and 64GB of RAM, which are a few generations behindcurrent MacBook Pro models . Despite this, using CPU-only assistance, we successfully ran 8B-10B parameter models of Meta's Llama 3.1 and Google's Gemma2, as well as various specifically trained variants from Ollama's website, with better-than-acceptable performance.

Also: I broke Meta's Llama 3.1 405B with one question (which GPT-4o gets right)

However, I experienced significant performance issues with the 70B parameter variant using these systems. I'm confident that more recent hardware can handle these models even more efficiently, especially with Linux PCs enabled by Nvidia and AMD GPUs.

Step-by-step setup

Download and install Ollama

Go to Ollama's download page and download the installer suitablefor your operating system (MacOS, Linux, Windows).
Follow the provided installation instructions for your specific operating system.

Load the 8B parameter Llama 3.1 Model

The Ollama command line interface with chat functionality.

Screenshot by Jason Perlow

Go to the Llama 3.1 library page on Ollama and copy the command for loading the 8B Llama 3.1 model:ollama run llama3.1:8b
Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows), paste the above command, and hit <enter>.
This command will start running Llama 3.1. In the terminal, you can then issue chat queries to the model to test its functionality.

Manage installed models

List models: Use the commandollamalist to see all models installed on your system.
Remove models: To remove a model, use the commandollama rm <model_name>. For example, to remove the 8B parameter Llama 3.1, you would useollama rm llama3.1:8b
Add new models: To add a new model, browse the Ollama library and then use the appropriateollama run <model_name>command to load it into your system.

Also: 3 ways Meta's Llama 3.1 is an advance for Gen AI

Adding a WebUI

Install Docker Desktop

Visit Docker's Get Started page and downloadDocker Desktop for your operating system (MacOS, Linux, Windows).
Follow the installation instructions for your specific operating system, and start Docker after installation.

Install Open WebUI

Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows) and run the following command to install Open WebUI:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Access the Open WebUI

Open WebUI running on Docker Desktop

Screenshot by Jason Perlow

Open Docker Desktop and go to the dashboard.
Find the Open WebUI container and click on the link under Port to open the WebUI in your browser.

Create and log in to your Open WebUI account

Selecting a model in Open WebUI

Screenshot by Jason Perlow

If you don't already have an Open WebUI account, create one.
Log in to your account through the WebUI.

Integration with IDEs and APIs

Ollama can be integrated into various Integrated Development Environments (IDEs) using APIs, which enhances the development workflow by providing seamless interaction with AI models. One powerful tool for this integration is Continue, an open-source code assistant that leverages the Ollama API.

Also: If you want a career in AI, start with these 5 steps

Using Continue for IDE integration

Ensure that Ollama is running and accessible.
Follow the Ollama Continue blog instructions to install Continue in your preferred IDE.
With Continue and the Ollama API, you can directly leverage AI-powered features like code suggestions, completions, and debugging assistance within your development environment.

Scaling up with powerful GPUs

For more demanding applications, especially those requiring larger models like the 70B and 405B parameter Llama 3.1 models,running Ollama on a Linux-based system equipped with powerful GPUs is recommended. This setup can handle the computational load and provide faster response times, making it suitable for enterprise-level AI applications.

To use GPUs for running Ollama, follow these steps:

For NVIDIA GPUs:

Follow the NVIDIA CUDA documentation instructions to install CUDA and cuDNN on your system.
After installing CUDA and cuDNN, ensure your environment is configured correctly, then run the following command:
ollama run llama3.1:70b --use-gpu

For AMD GPUs:

Follow the instructions on the ROCm documentation to install ROCm on your system.
After installing ROCm, ensure your environment is configured correctly, then run the following command:
ollama run llama3.1:70b --use-gpu

These commands ensure that Ollama can utilize the available GPUs on your system, providing the necessary computational power for running large models. For more detailed instructions, refer to the Ollama GPU documentation.

Running Ollama in a Docker container

You can still leverage GPU support if you prefer running Ollama in a container.Please note that these instructions apply only to Linux for now.

Also: How can business leaders ready their organizations for AI? 4 keys to success

For NVIDIA GPUs with Docker

As per the previous section, install CUDA and cuDNN on your system. Then, follow the instructions in the NVIDIA Docker documentation to install the NVIDIA Container Engine on your system.
Use the following command to run Ollama with NVIDIA GPU support in a Docker container:
docker run --gpus all -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v ollama:/app/backend/data --name ollama --restart always ollama/ollama:latest

For AMD GPUs with Docker

Follow the instructions on the ROCm documentation to install ROCm on your system.
Use the following command to run Ollama with ROCm support in a Docker container:
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

These commands ensure the Docker container can access all available GPUs on your system, providing the necessary computational power to run large models. For more information on using GPUs with Docker and Ollama, refer to the Docker page on using GPUs with Ollama.

Also: Will OpenAI's new AI detection tool put an end to student cheating?

Conclusion

Running AI models such as Meta's Llama 3.1 locally on your Mac or PC provides numerous benefits, including improved data privacy, greater customization, and cost savings. Following the steps in this guide, you can utilize advanced AI models and test different configurations to meet your requirements. Whether you are a developer, researcher, or AI enthusiast, having the ability to run complex models locally unlocks many opportunities.

Artificial Intelligence

How I used ChatGPT to scan 170k lines of code in seconds and save me hours of detective workWhy Claude's Artifacts is the coolest feature I've seen in generative AI so farMidjourney's AI-image generator website is now officially open to everyone - for free5 free AI tools for school that students, teachers, and parents can use, too

How I used ChatGPT to scan 170k lines of code in seconds and save me hours of detective work
Why Claude's Artifacts is the coolest feature I've seen in generative AI so far
Midjourney's AI-image generator website is now officially open to everyone - for free
5 free AI tools for school that students, teachers, and parents can use, too

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Huawei Switches Distributor in UAE

PoE vs PoE+ vs UPoE: What's the best switch to meet your network needs?

Understanding PoE Standards and Wattage

Power Supply Standards for POE Switches. Why is the Power Supply Distance Limited to 100 Meters?

How to Choose the Right 10G SFP+ Module: SR, LR, or LRM?

Huawei Switches: Comprehensive Guide and Insights

How Does Cisco Wireless Network Work?

How Do I Connect to a Cisco Wireless Router?

Cisco Catalyst 9800 Series Wireless Controller Software Configuration Guide

Cisco Access Point and Wireless Controller Selector

Compare Cisco Wireless Architectures and AP Modes

How to run dozens of AI models on your Mac or PC - no third-party cloud needed

Why develop and test against different open-source models?

Introduction to Ollama

Our test systems

Step-by-step setup

Download and install Ollama

Load the 8B parameter Llama 3.1 Model

Manage installed models

Adding a WebUI

Install Docker Desktop

Install Open WebUI

Access the Open WebUI

Create and log in to your Open WebUI account

Integration with IDEs and APIs

Using Continue for IDE integration

Scaling up with powerful GPUs

For NVIDIA GPUs:

For AMD GPUs:

Running Ollama in a Docker container

For NVIDIA GPUs with Docker

For AMD GPUs with Docker

Conclusion

Artificial Intelligence

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches