BitNet LLM - Instructions

Getting Started

BitNet is a CPU-based large language model inference engine. This package provides SSH access so you can interact with the model directly.

Requirements

CPU with AVX2 support (Intel Haswell or AMD Excavator or newer)
Check support: SSH into your StartOS server and run grep -o 'avx2' /proc/cpuinfo

Configuration

After installation, go to Config tab
Paste your SSH public key into the "SSH Authorized Keys" field
Save configuration
Restart the service

Connecting via SSH

Get your Tor address from the Interfaces tab, then:

ssh -o ProxyCommand="nc -x localhost:9050 %h %p" root@your-bitnet-address.onion

Or via LAN (if configured):

ssh root@your-server-ip -p <bitnet-ssh-port>

Using BitNet

Once connected via SSH:

# Run inference with default model
python3 /BitNet/run_inference.py -m ggml-model-i2_s.gguf -p "Your prompt here"

# See all options
python3 /BitNet/run_inference.py --help

# Use custom model (mount in /root/models)
python3 /BitNet/run_inference.py -m /root/models/your-model.gguf -p "Your prompt"

Custom Models

Download GGUF models from Hugging Face
Upload them via File Browser or SCP to /root/models/
Reference them in your inference commands

Performance

This runs entirely on CPU using AVX2 instructions. Performance depends on your server's CPU speed and core count. The default 2B parameter model is lightweight and should run reasonably well on modern hardware.

Troubleshooting

Can't connect via SSH: Verify your public key is correctly configured
"Illegal instruction" errors: Your CPU doesn't support AVX2
Slow inference: Normal for CPU-based inference; consider a smaller model or faster CPU

Support

Upstream: https://github.com/microsoft/BitNet
Container: https://github.com/kth8/bitnet

1.9 KiB Raw Permalink Blame History