Latest Research

Discover our cutting-edge research in efficient AI models and language processing

Spectra-1

ICLR 2025 Spotlight

Surprising Effectiveness of Pretraining Ternary Language Models at Scale

Spectra introduces the first open suite of low-bitwidth LLMs, including TriLMs, QuantLMs, and FloatLMs, from 99M to 3.9B parameters. TriLMs are pretrained ternary models that outperform traditional quantized and floating-point models at scale. The 3.9B TriLM matches the performance of its Float LM counterpart with far fewer bits, enabling efficient inference. This work pushes the frontier of memory-efficient, scalable language models.



54 multilingual models

from 99M-3.9B parameters

TriLMs—compact, fast,

high-performing.

Powerful AI on

low-resource devices.

Enterprise Benefits

  • Advanced forecasting models with enterprise-grade accuracy

  • Seamless integration with your existing data infrastructure

  • Dedicated support team for implementation and optimization

  • Custom solutions tailored to your specific industry requirements

  • Comprehensive training and documentation for your team

Enterprise Benefits

  • Advanced forecasting models with enterprise-grade accuracy

  • Seamless integration with your existing data infrastructure

  • Dedicated support team for implementation and optimization

  • Custom solutions tailored to your specific industry requirements

  • Comprehensive training and documentation for your team

Researchers

Tejas Vaidhya Ayush Kaushal Arnab Kumar Mondal Tejas Pandey Aaryan Bhagat Irina Rish

Hi-NOLIN

Bridging the English-Hindi Language Gap in Open-Source AI

Hi-NOLIN is the first open-source English-Hindi bilingual large language model (LLM), built upon the Pythia architecture and expanded to 9B parameters. Trained on a 300B token corpus encompassing English, code, and Hindi, it leverages continual pretraining techniques to enhance performance across multiple domains. Remarkably, Hi-NOLIN outperforms larger models like Pythia 12B and multilingual BLOOM on standard benchmarks

The best open-source
Hindi-English LLM
of its size

Extends ability to a new language while boosting English and Code performance.

Enterprise Benefits

  • Advanced forecasting models with enterprise-grade accuracy

  • Seamless integration with your existing data infrastructure

  • Dedicated support team for implementation and optimization

  • Custom solutions tailored to your specific industry requirements

  • Comprehensive training and documentation for your team

Enterprise Benefits

  • Advanced forecasting models with enterprise-grade accuracy

  • Seamless integration with your existing data infrastructure

  • Dedicated support team for implementation and optimization

  • Custom solutions tailored to your specific industry requirements

  • Comprehensive training and documentation for your team

Researchers

Tejas Vaidhya Ayush Kaushal Irina Rish

Spectra-1.1

ACL 2025

Scaling Laws and Efficient Inference for Ternary Language Models

This research demonstrates that Ternary Language Models offer superior scaling behavior, providing valuable insights into efficient low-bitwidth language models. TriLMs introduces a suite of ternary language models (TriLMs) trained on up to 1.2 trillion tokens. These models use quantization-aware training and novel bit-packing schemes to dramatically cut memory use.

1.2T token-trained TriLMs

Up to 5× faster inference
with TriRun

Novel 1.6- and 2-bit
packing schemes

Enterprise Benefits

  • Advanced forecasting models with enterprise-grade accuracy

  • Seamless integration with your existing data infrastructure

  • Dedicated support team for implementation and optimization

  • Custom solutions tailored to your specific industry requirements

  • Comprehensive training and documentation for your team

Enterprise Benefits

  • Advanced forecasting models with enterprise-grade accuracy

  • Seamless integration with your existing data infrastructure

  • Dedicated support team for implementation and optimization

  • Custom solutions tailored to your specific industry requirements

  • Comprehensive training and documentation for your team

Researchers

Tejas Vaidhya Ayush Kaushal Arnab Kumar Mondal Tejas Pandey Aaryan Bhagat Irina Rish

More Publications

Lord: Low rank decomposition of monolingual code llms for one-shot compression

September 2023

This paper demonstrates efficient LLM compression using Low Rank Decomposition, allowing code LLMs to be compressed by up to 39.58% with minimal performance loss. This method provides an effective approach to reducing model size while maintaining code generation capabilities.



Researchers

Vaidhya Kaushal Rish

Ternary LLMs are more Performant than Quantized FP16 LLMs

September 2023

We introduce TriLM, a family of pretrained ternary language models that are both compact and high-performing. TriLMs outperform their quantized counterparts and rival full-precision models at larger scales. Our findings show that ternary models not only offer superior efficiency in terms of bit-level size but also maintain strong performance on knowledge benchmarks—establishing TriLM as a compelling choice for efficient LLM deployment.




Researchers

Kaushal Vaidhya pandey Bhagat Rish

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

September 2024

Lag-Llama is a general-purpose foundation model for univariate probabilistic time series forecasting, built on a decoder-only transformer using lagged values as covariates. Pretrained on a diverse corpus of time series data, it shows strong zero-shot generalization and achieves state-of-the-art performance when fine-tuned on small amounts of unseen data. Lag-Llama sets a new benchmark for foundation models in time series forecasting.





Researchers

Rasul Ashok Williams Ghonia Bhagwatkar Khorasani Bayazi Adamopoulos Riachi Hassen Biloš Garg Schneider Chapados Drouin Zantedeschi Nevmyvaka Rish

What do tokens know about their characters and how do they know it?

September 2023

This work investigates how pretrained language models (PLMs) encode character-level information despite using subword tokenization. By probing embeddings from models like GPT-J, BERT, and RoBERTa, the study finds that PLMs reliably capture whether specific characters appear in a token—even across non-Latin scripts. Larger models generally encode this information more robustly. The analysis suggests this ability arises from patterns in tokenization, character–POS correlations, and language variability.




Researchers

Kaushal Mahowald

Efficient Encoders for Streaming Sequence Tagging

2023

This work presents HEAR, a Hybrid Encoder with Adaptive Restart, designed for efficient and accurate streaming sequence tagging. Unlike naive bidirectional encoders, HEAR reduces redundant computation and label instability by reusing prior context and selectively restarting bidirectional layers. HEAR maintains strong offline performance while achieving up to 71.1% FLOP savings and +10% improvements in streaming exact match across four tasks.





Researchers

Kaushal Gupta Upadhyay Faruqui

Efficient Encoders for Incremental Sequence Tagging

2023

This work addresses the inefficiency of re-running bidirectional models like BERT for every new token in streaming NLU settings. The proposed approach reduces FLOP count and improves generalization on partial inputs using a hybrid partially bidirectional encoder and an adaptive restart mechanism. It retains comparable performance on full sequences while improving efficiency and streaming accuracy across four sequence tagging datasets.




Researchers

Gupta Kaushal Faruqui Upadhyay

Stay up to date

Get the latest updates on our research and product developments.

Stay up to date

Get the latest updates on our research and product developments.

© 2025 Nolano AI. All rights reserved.