Technical Specifications
8C1
A 13B decoder only model tuned for predictable capability and low latency inference across internal tools and customer facing products.
Architecture
- Parameters
- 13 Billion
- Architecture
- Decoder Only Transformer
- Layers
- 40
- Hidden Size
- 5,120
- Attention Heads
- 40
- Context Window
- 16K tokens
- Vocabulary
- 32K BPE tokens
- Training Tokens
- 2.1 Trillion
Performance
- Time to First Token
- 150 to 300ms*
- Throughput
- 50 to 100 tokens/sec
- Infrastructure
- A100/H100 GPU Clusters
Benchmarks
- MMLU
- 49.5%*
- HumanEval
- 41.2%*
- GSM8K Math
- 43.9%*
- TruthfulQA
- 38.5%*
- Code Generation
- 44.9%*
*Performance metrics measured on server infrastructure. Latency includes network round trip and varies by location / connection quality. Benchmark scores are projected estimates based on model architecture and training approach.