Beginner Tutorial: Using GPUStack to Aggregate GPUs and Run LLMs

Beginner Tutorial

What is GPUStack?

GPUStack is an open-source GPU cluster manager for running Large Language Models (LLMs). GPUStack allows you to create a unified cluster from any brand of GPUs in Apple MacBooks, Windows PCs, and Linux servers. Administrators can deploy LLMs from popular repositories such as Hugging Face. Developers can then access LLMs just as easily as accessing public LLM services from vendors like OpenAI or Microsoft Azure.

For more details about GPUStack, visit:

Introducing GPUStack: https://gpustack.ai/introducing-gpustack

GitHub repo: https://github.com/gpustack/gpustack

User guide: https://docs.gpustack.ai

 

Getting Started with GPUStack

You need to use at least Python version 3.10.

Installation

Linux or MacOS

GPUStack provides a script to install it as a service on systemd or launchd based systems. To install GPUStack using this method, execute:

Now you have deployed and started the GPUStack server, which serves as the first worker node. You can access the GPUStack page via http://myserver (Replace with the IP address or domain of the host you installed).

Log in to GPUStack with username admin and the default password. You can run the following command to get the password for the default setup:

To add additional worker nodes and form a GPUStack cluster, please run the following command on each worker node:

Replace http://myserver with your GPUStack server URL and mytoken with your secret token for adding workers. To retrieve the token in the default setup from the GPUStack server, use the following command:

Or follow the instructions on GPUStack to add workers:

img

 

Windows

Run PowerShell as administrator, then run the following command to install GPUStack:

You can access the GPUStack page via http://myserver (Replace with the IP address or domain of the host you installed).

Log in to GPUStack with username admin and the default password. You can run the following command to get the password for the default setup:

Optionally, you can add extra workers to form a GPUStack cluster by running the following command on other nodes:

In the default setup, you can run the following to get the token used for adding workers:

For other installation scenarios, please refer to our installation documentation at: https://docs.gpustack.ai/docs/quickstart

 

Serving LLMs

As an LLM administrator, you can log in to GPUStack as the default system admin, navigate to Resources to monitor your GPU status and capacities, and then go to Models to deploy any open-source LLM into the GPUStack cluster. This enables you to provide these LLMs to regular users for integration into their applications. This approach helps you to efficiently utilize your existing resources and deliver stable LLM services for various needs and scenarios.

  1. Access GPUStack to deploy the LLMs you need. Choose models from Hugging Face (only GGUF format is currently supported) or Ollama Library, download them to your local environment, and run the LLMs:

img

 

  1. GPUStack will automatically schedule the model to run on the appropriate Worker:

img

 

  1. You can manage and maintain LLMs by checking API requests, token consumption, token throughput, resource utilization status, and more. This helps you decide whether to scale up or upgrade LLMs to ensure service stability.

img

 

Integrating with your applications

As an AI application developer, you can log in to GPUStack as a regular user and navigate to Playground from the menu. Here, you can interact with the LLM using the UI playground.

img

 

Next, visit API Keys to generate and save your API key. Return to Playground to customize your LLM by adjusting the system prompt, adding few-shot learning examples, or resizing prompt parameters. When you're done, click View Code and select your preferred code format (curl, Python, Node.js) along with the API key. Use this code in your applications to enable communication with your private LLMs.

you can access the OpenAI-compatible API now, for example, use curl as the following:

 

Manage GPUStack

For MacOS

In macOS, GPUStack runs as a launchd service. Use launchctl to manage the GPUStack service:

  • View the configuration

 

  • Stop service

 

  • Start service

 

  • Edit configuration and restart service

 

  • View logs

You can view GPUStack logs using the following path and command:

 

  • Uninstall

Run the following command to uninstall GPUStack:

 

For Linux

In Linux, GPUStack runs as a systemd service. Use systemctl to manage the GPUStack service:

  • View the configuration

 

  • Stop service

 

  • Start service

 

  • Edit configuration and restart service

 

  • View logs

You can view GPUStack logs using the following path and command:

 

  • Uninstall

Run the following command to uninstall GPUStack:

 

For Windows

In Windows, you can use PowerShell to manage the GPUStack service:

  • View the configuration

 

  • Stop service

 

  • Start service

 

  • Edit the configuration using nssm and restart the service

Restart after edit the configuration:

 

  • View logs

You can view GPUStack logs using the following path and command:

 

  • Uninstall

Run the following PowerShell command to uninstall GPUStack:

 

Join Our Community

Please find more information about GPUStack at: https://gpustack.ai.

If you encounter any issues or have suggestions for GPUStack, feel free to join our Community for support from the GPUStack team and to connect with fellow users globally.

We are actively enhancing the GPUStack project and plan to introduce new features in the near future, including support for multimodal models, additional accelerators like AMD ROCm or Intel oneAPI, and more inference engines. Before getting started, we encourage you to follow and star our project on GitHub at gpustack/gpustack to receive instant notifications about all future releases. We welcome your contributions to the project.

 

About Us

GPUStack is brought to you by Seal, Inc., a team dedicated to enabling AI access for all. Our mission is to enable enterprises to use AI to conduct their business, and GPUStack is a significant step towards achieving that goal.

Quickly build your own LLMaaS platform with GPUStack! Start experiencing the ease of creating GPU clusters locally, running and using LLMs, and integrating them into your applications.

Related Articles