Lumina AI — Responsible AI Research

Latest Research

Advancing the science
of beneficial AI

Alignment

Constitutional AI: Harmlessness from AI Feedback

We present a method for training AI systems to be helpful, harmless, and honest using a set of principles guided by AI feedback rather than human labeling alone.

Mar 2025 · 24 min read Read paper →

Interpretability

Scaling Monosemanticity: Extracting Interpretable Features

Using sparse autoencoders we identify millions of interpretable features in large language models, providing new windows into how AI systems represent knowledge.

Feb 2025 · 31 min read Read paper →

Safety

Towards Measuring the Representation of Subjective Global Opinions

A comprehensive evaluation of value pluralism in language models, exploring whether AI systems can faithfully represent the diversity of human perspectives and beliefs.

Jan 2025 · 18 min read Read paper →

Evaluation

Many-shot Jailbreaking: Benchmarking Adversarial Robustness

We introduce a new benchmark for evaluating the robustness of language models against sophisticated adversarial prompts across diverse categories of harm.

Dec 2024 · 22 min read Read paper →

Alignment

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

An investigation into whether AI models can learn to behave deceptively and whether standard safety training techniques can effectively remove such behaviors.

Nov 2024 · 28 min read Read paper →

Scaling

Predictability and Surprise in Large Generative Models

As models scale, some capabilities emerge unexpectedly. We study the predictability of capability emergence and what it means for responsible AI development and deployment.

Oct 2024 · 19 min read Read paper →

Our Products

Built for the real world

From individual creators to enterprise teams, our AI is designed to be genuinely useful — not just impressive in demos.

Flagship Model

Lyra

Our most capable, most thoughtful AI assistant. Lyra is built for deep reasoning, nuanced writing, and complex analysis — all while staying helpful and honest. Now with extended context and improved instruction-following.

Try Lyra Free →

⚡

Lumina API

Integrate state-of-the-art AI into any product with our clean, well-documented API. Thoughtfully rate-limited, reliably fast, and built for production scale.

View documentation →

🛡️

Safety Evaluations

Open-access tools for evaluating AI safety properties. Developed in collaboration with academic partners and designed to raise the bar for the whole field.

Access evals →

Our Commitment

Safety isn't a feature.
It's the foundation.

We believe the most important question in AI isn't "what can it do?" — it's "what should it do?" Every decision we make, from architecture to deployment, is guided by a genuine commitment to AI that benefits humanity.

That means publishing our safety research openly, engaging with critics honestly, and sometimes moving slower than we could in the name of doing this right.

🔍

Interpretability

Understanding what's happening inside AI systems, not just what comes out.

📏

Evaluation

Rigorous, honest benchmarks — including benchmarks that show our models' limitations.

🤝

Alignment

Training AI systems that reliably do what humans actually want, not just what they literally say.

🌍

Governance

Working with policymakers and researchers to build the right frameworks for AI development.

Life at Lumina

Brilliant people,
meaningful work

We hire for curiosity, care, and rigor — and we try to build an environment where people can do the best work of their lives on problems that actually matter.

We're distributed-first but with hubs in San Francisco, London, and Singapore. We offer generous equity, full benefits, and a genuine commitment to work-life balance.

🧭

Mission-first We're here to build AI that's genuinely good for the world. Full stop.

💬

Honest by default We say what we think, admit what we don't know, and update when we're wrong.

🌱

Long-term thinking We make decisions based on what's right over years, not quarters.

From the Blog

Thinking out loud

All posts →

Safety

Why we think current AI systems require careful human oversight

A frank look at where AI models currently fall short of the reliability and honesty we'd need to fully trust their judgment — and what we're doing about it.

Dario M. · 12 min read

Research

What we learned from training 1,000 specialized models

Research Team · 8 min read

Policy

Our response to the proposed AI safety framework

Policy Team · 6 min read

AI that's genuinely
good for people

Advancing the science
of beneficial AI

Constitutional AI: Harmlessness from AI Feedback

Scaling Monosemanticity: Extracting Interpretable Features

Towards Measuring the Representation of Subjective Global Opinions

Many-shot Jailbreaking: Benchmarking Adversarial Robustness

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Predictability and Surprise in Large Generative Models

Built for the real world