Discover our cutting-edge research in efficient AI models and language processing
Spectra-1
Surprising Effectiveness of Pretraining Ternary Language Models at Scale
Spectra introduces the first open suite of low-bitwidth LLMs, including TriLMs, QuantLMs, and FloatLMs, from 99M to 3.9B parameters. TriLMs are pretrained ternary models that outperform traditional quantized and floating-point models at scale. The 3.9B TriLM matches the performance of its Float LM counterpart with far fewer bits, enabling efficient inference. This work pushes the frontier of memory-efficient, scalable language models.
54 multilingual models
from 99M-3.9B parameters
TriLMs—compact, fast,
high-performing.
Powerful AI on
low-resource devices.
Hi-NOLIN
Bridging the English-Hindi Language Gap in Open-Source AI
Extends ability to a new language while boosting English and Code performance.
Spectra-1.1
Scaling Laws and Efficient Inference for Ternary Language Models
This research demonstrates that Ternary Language Models offer superior scaling behavior, providing valuable insights into efficient low-bitwidth language models. TriLMs introduces a suite of ternary language models (TriLMs) trained on up to 1.2 trillion tokens. These models use quantization-aware training and novel bit-packing schemes to dramatically cut memory use.
1.2T token-trained TriLMs
Up to 5× faster inference
with TriRun
Novel 1.6- and 2-bit
packing schemes
Lord: Low rank decomposition of monolingual code llms for one-shot compression
This paper demonstrates efficient LLM compression using Low Rank Decomposition, allowing code LLMs to be compressed by up to 39.58% with minimal performance loss. This method provides an effective approach to reducing model size while maintaining code generation capabilities.
Sep 2023
Researchers: Vaidhya, Kaushal, Rish
Ternary LLMs are more Performant than Quantized FP16 LLMs
We introduce TriLM, a family of pretrained ternary language models that are both compact and high-performing. TriLMs outperform their quantized counterparts and rival full-precision models at larger scales. Our findings show that ternary models not only offer superior efficiency in terms of bit-level size but also maintain strong performance on knowledge benchmarks—establishing TriLM as a compelling choice for efficient LLM deployment.
Jul 2024
Researchers: Kaushal, Vaidhya, pandey, Bhagat, Rish
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Lag-Llama is a general-purpose foundation model for univariate probabilistic time series forecasting, built on a decoder-only transformer using lagged values as covariates. Pretrained on a diverse corpus of time series data, it shows strong zero-shot generalization and achieves state-of-the-art performance when fine-tuned on small amounts of unseen data. Lag-Llama sets a new benchmark for foundation models in time series forecasting.
Oct 2023
Researchers: Rasul, Ashok, Williams, Ghonia, Bhagwatkar, Khorasani, Bayazi, Adamopoulos, Riachi, Hassen, Biloš, Garg, Schneider, Chapados, Drouin, Zantedeschi, Nevmyvaka, Rish
What do tokens know about their characters and how do they know it?
Jun 2022
Researchers: Kaushal, Mahowald
Efficient Encoders for Streaming Sequence Tagging
Jan 2023
Researchers: Kaushal, Gupta, Upadhyay, Faruqui
© 2025 Nolano AI. All rights reserved.