Latest Research
Discover our cutting-edge research in efficient AI models and language processing
Spectra-1
Surprising Effectiveness of Pretraining Ternary Language Models at Scale
Spectra introduces the first open suite of low-bitwidth LLMs, including TriLMs, QuantLMs, and FloatLMs, from 99M to 3.9B parameters. TriLMs are pretrained ternary models that outperform traditional quantized and floating-point models at scale. The 3.9B TriLM matches the performance of its Float LM counterpart with far fewer bits, enabling efficient inference. This work pushes the frontier of memory-efficient, scalable language models.
54 multilingual models
from 99M-3.9B parameters
TriLMs—compact, fast,
high-performing.
Powerful AI on
low-resource devices.
Researchers
Tejas Vaidhya Ayush Kaushal Arnab Kumar Mondal Tejas Pandey Aaryan Bhagat Irina Rish
Hi-NOLIN
Bridging the English-Hindi Language Gap in Open-Source AI
The best open-source
Hindi-English LLM
of its size
Extends ability to a new language while boosting English and Code performance.
Researchers
Tejas Vaidhya Ayush Kaushal Irina Rish
Spectra-1.1
Scaling Laws and Efficient Inference for Ternary Language Models
This research demonstrates that Ternary Language Models offer superior scaling behavior, providing valuable insights into efficient low-bitwidth language models. TriLMs introduces a suite of ternary language models (TriLMs) trained on up to 1.2 trillion tokens. These models use quantization-aware training and novel bit-packing schemes to dramatically cut memory use.
1.2T token-trained TriLMs
Up to 5× faster inference
with TriRun
Novel 1.6- and 2-bit
packing schemes
Researchers
Tejas Vaidhya Ayush Kaushal Arnab Kumar Mondal Tejas Pandey Aaryan Bhagat Irina Rish
More Publications
Lord: Low rank decomposition of monolingual code llms for one-shot compression
September 2023
Researchers
Vaidhya Kaushal Rish
Ternary LLMs are more Performant than Quantized FP16 LLMs
September 2023
Researchers
Kaushal Vaidhya pandey Bhagat Rish
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
September 2024
Lag-Llama is a general-purpose foundation model for univariate probabilistic time series forecasting, built on a decoder-only transformer using lagged values as covariates. Pretrained on a diverse corpus of time series data, it shows strong zero-shot generalization and achieves state-of-the-art performance when fine-tuned on small amounts of unseen data. Lag-Llama sets a new benchmark for foundation models in time series forecasting.
Researchers
Rasul Ashok Williams Ghonia Bhagwatkar Khorasani Bayazi Adamopoulos Riachi Hassen Biloš Garg Schneider Chapados Drouin Zantedeschi Nevmyvaka Rish
What do tokens know about their characters and how do they know it?
September 2023
Researchers
Kaushal Mahowald
Efficient Encoders for Streaming Sequence Tagging
2023
Researchers
Kaushal Gupta Upadhyay Faruqui
Efficient Encoders for Incremental Sequence Tagging
2023
Researchers
Gupta Kaushal Faruqui Upadhyay
© 2025 Nolano AI. All rights reserved.