# BitNet LLM - Instructions

## Getting Started

BitNet is a CPU-based large language model inference engine. This package provides SSH access so you can interact with the model directly.

### Requirements

- **CPU with AVX2 support** (Intel Haswell or AMD Excavator or newer)
- Check support: SSH into your StartOS server and run `grep -o 'avx2' /proc/cpuinfo`

### Configuration

1. After installation, go to **Config** tab
2. Paste your SSH public key into the "SSH Authorized Keys" field
3. Save configuration
4. Restart the service

### Connecting via SSH

Get your Tor address from the **Interfaces** tab, then:

```bash
ssh -o ProxyCommand="nc -x localhost:9050 %h %p" root@your-bitnet-address.onion
```

Or via LAN (if configured):
```bash
ssh root@your-server-ip -p <bitnet-ssh-port>
```

### Using BitNet

Once connected via SSH:

```bash
# Run inference with default model
python3 /BitNet/run_inference.py -m ggml-model-i2_s.gguf -p "Your prompt here"

# See all options
python3 /BitNet/run_inference.py --help

# Use custom model (mount in /root/models)
python3 /BitNet/run_inference.py -m /root/models/your-model.gguf -p "Your prompt"
```

### Custom Models

1. Download GGUF models from Hugging Face
2. Upload them via File Browser or SCP to `/root/models/`
3. Reference them in your inference commands

### Performance

This runs entirely on CPU using AVX2 instructions. Performance depends on your server's CPU speed and core count. The default 2B parameter model is lightweight and should run reasonably well on modern hardware.

## Troubleshooting

- **Can't connect via SSH**: Verify your public key is correctly configured
- **"Illegal instruction" errors**: Your CPU doesn't support AVX2
- **Slow inference**: Normal for CPU-based inference; consider a smaller model or faster CPU

## Support

- Upstream: https://github.com/microsoft/BitNet
- Container: https://github.com/kth8/bitnet