Get Models for BuddyGenAI
View the current version of this page
BuddyGenAI uses AI models to chat, generate text, images, TTS and Speech-to-Text. TTS and Speech-to-Text are optional, but you're required to provide a chat and an image model.
Compatibility
You'll need to determine what models your GPU is able to run. The less VRAM your GPU has, the smaller the model you'll need to use. You might be able to run larger models on a GPU with less VRAM, but it will be slower and may crash.
First, check how much VRAM your GPU has. You can do this by opening Task Manager, going to the Performance tab, and clicking on GPU. You'll see a graph of your GPU usage and the amount of VRAM your GPU has. Note how much VRAM you have available.
The following sections provide estimates of the VRAM required for each model. The list of models is not exhaustive but are the models I've used with the app (RTX 3060 12GB).
The app expects the following file extensions for models:
- Chat Model:
.gguf
- Image Model:
.safetensors
- TTS Voice:
.onnx
and adjacent.json
- Speech-to-Text Model:
.bin
Chat Models (Required)
My preferred chat model has been Meta's Llama 3, quantized by bartowski. You can find links to download quanized/smaller versions of the model below.
(The higher the number e.g. Q8, the better the model, but the more VRAM it will require.)
Model | File Size | Est. VRAM Usage | |
---|---|---|---|
Q2_K | 3.17GB | 4.5GB | Download |
Q4_K_M | 4.92GB | 5.9GB | Download |
Q8_0 | 8.54GB | 9.1GB | Download |
Image Model (Required)
I've been using Toonify SD 1.5.
Model | File Size | Est. VRAM Usage | |
---|---|---|---|
Toonify SD 1.5 | 1.98GB | 3GB | Download |
TTS Voices (Optional)
The app uses Piper TTS which uses the CPU and is not very resource-intensive. Here are resources to find voices:
When importing a voice, you'll need to down load and select both a .onnx
and .json
file. The .json
file's name should include the full name of the .onnx
file (e.g. en_US-lessac-medium.onnx
and en_US-lessac-medium.onnx.json
).
Speech-to-Text Models (Optional)
I've been using the Distill-Whisper large-v3 model for increased speeds and lower VRAM usage.
To download, go to the link below and click the "Files and versions" tab, then click on the file "ggml-distil-large-v3.bin" and finally click on Download.
Model | File Size | Est. VRAM Usage |
---|---|---|
Distill-Whisper large-v3 | 1.52GB | ??GB |