πŸ† #1 on Artificial Analysis Arena Β· Elo 1333

HappyHorse-1.0 β€” The #1 Open Source AI Video Model

15B-parameter unified Transformer. Joint video & audio generation. 1080p in ~38s. Fully open source.

15B
Parameters
1333
T2V Elo Score
~38s
1080p on H100
7
Lip-sync Languages

What Makes HappyHorse-1.0 Different

A unified architecture that changes what's possible in open-source AI video.

Joint Audio-Video Generation

One inference pass generates video, dialogue, ambient sound, and Foley effects simultaneously β€” no post-production needed.

Single-Stream Transformer

40-layer pure self-attention architecture. Text, video, and audio tokens denoised in a single unified sequence β€” no cross-attention.

DMD-2 Distillation

Only 8 denoising steps required. Combined with MagiCompiler inference acceleration, it outpaces all comparable open-source models.

Multilingual Lip-Sync

Supports English, Mandarin, Cantonese, Japanese, Korean, German, and French with industry-leading WER accuracy.

Fully Open Source

Complete release: base model, distilled model, super-resolution model, and inference code β€” commercial use included.

1080p Native Output

Native 1080p output with support for 16:9, 9:16, 4:3, 21:9, and 1:1 aspect ratios straight from the model.

Leaderboard Snapshot

Artificial Analysis Arena Β· Last updated: Apr 8, 2026

RankModelT2V EloI2V EloAudio
πŸ₯‡ 1HappyHorse-1.013331392βœ“
πŸ₯ˆ 2Seedance 2.0~1273~1355βœ“
πŸ₯‰ 3Kling 3.0 Pro~1240~1260β€”
4SkyReels V4~1210~1230β€”
5WAN 2.61189β€”β€”

Data sourced from Artificial Analysis Arena Β· View Full Leaderboard β†’

What Is HappyHorse-1.0?

HappyHorse-1.0 is a groundbreaking open-source AI video generation model that stunned the research community in early April 2026 by claiming the top position on the Artificial Analysis Video Generation Arena β€” surpassing every commercial closed-source model in head-to-head blind evaluation. The HappyHorse-1.0 model achieved a Text-to-Video (T2V) Elo score of 1333 and an Image-to-Video (I2V) Elo score of 1392, beating established commercial systems from ByteDance, Kuaishou, and other major AI labs.

What makes HappyHorse-1.0 particularly remarkable is its open-source commitment. While competing models like Seedance 2.0 and Kling 3.0 Pro remain proprietary, HappyHorse-1.0 plans to release its full model weights, inference code, and training methodology β€” including a distilled version and super-resolution model β€” under a license that permits commercial use.

The name HappyHorse carries cultural significance: 2026 is the Year of the Horse in the Chinese lunar calendar, and the model's emergence as an unexpected champion from the open-source community β€” overtaking billion-dollar commercial labs β€” embodies the spirit of the underdog. In Mandarin AI circles, HappyHorse-1.0 has been dubbed "the dark horse that became the lead horse."

HappyHorse-1.0 Technical Architecture

At the core of the HappyHorse-1.0 architecture is a 15-billion-parameter single-stream Transformer that processes text, video frames, and audio tokens as a single unified sequence β€” an approach that fundamentally differs from most competing models, which use separate encoders and decoders for each modality. The HappyHorse single-stream design enables true joint denoising: text prompts, video latents, and audio spectrograms are denoised together in a single forward pass, producing synchronized audio-visual content without any post-processing step.

The HappyHorse-1.0 Transformer consists of 40 layers. The first four and last four layers use modality-specific projections to handle the different representations of text, video, and audio. The central 32 layers share parameters across all modalities, enabling efficient cross-modal learning. A per-head gating mechanism in each attention head controls how strongly different modalities influence one another during training β€” a critical stabilization technique for joint audio-video generation.

Speed is a defining feature of HappyHorse-1.0. Through DMD-2 (Distribution Matching Distillation v2), the model requires only 8 denoising steps β€” compared to 50 or more steps in standard diffusion models β€” without requiring Classifier-Free Guidance (CFG). Combined with MagiCompiler full-graph compilation that fuses operators across Transformer layers, HappyHorse-1.0 generates a 5-second 1080p clip in approximately 38 seconds on an NVIDIA H100 GPU β€” among the fastest in its class.

HappyHorse-1.0 vs. Commercial AI Video Models

The Artificial Analysis Video Generation Arena uses human preference voting to rank AI video models through blind A/B comparisons β€” evaluators see two videos generated from the same prompt and vote for the better one without knowing which model produced it. HappyHorse-1.0 topped this leaderboard across both T2V (text-to-video) and I2V (image-to-video) categories on its first appearance, a result that sent shockwaves through the AI community.

In the T2V no-audio category, HappyHorse-1.0 achieved an Elo of 1333 β€” approximately 60 Elo points ahead of Seedance 2.0 (~1273) and 93 points ahead of Kling 3.0 Pro (~1240). In I2V, the gap was even larger: HappyHorse-1.0 scored 1392 versus Seedance 2.0's ~1355. These margins are statistically significant in the Elo rating system, indicating a clear and consistent preference advantage.

The one area where HappyHorse-1.0 does not hold the top position is the combined audio evaluation category, where Seedance 2.0 has an edge β€” partly because Seedance uses a dedicated audio generation pipeline optimized separately from video. The HappyHorse-1.0 joint audio-video approach trades some audio specialization for the significant advantage of synchronized generation in a single pass.

For developers and researchers evaluating AI video tools, HappyHorse-1.0 represents a compelling option: it delivers best-in-class visual quality while remaining open-source and commercially usable β€” a combination no other top-tier model currently offers.

HappyHorse-1.0 Multilingual Capabilities

One of the standout features of HappyHorse-1.0 is its multilingual lip-sync capability. The model supports accurate lip synchronization for spoken dialogue in seven languages: English, Mandarin Chinese, Cantonese, Japanese, Korean, German, and French. Independent evaluations show HappyHorse-1.0 achieving industry-leading Word Error Rate (WER) scores in lip-sync accuracy across these languages.

This makes HappyHorse-1.0 particularly valuable for global content creation β€” generating talking-head videos, dubbed animations, and multilingual marketing content without the need for separate post-production dubbing. The joint audio-video architecture of HappyHorse means the model generates speech-synchronized video natively, rather than applying audio as a post-processing step that can introduce timing mismatches.

When Will HappyHorse-1.0 Weights Be Released?

As of April 2026, the HappyHorse-1.0 model weights have not been publicly released. The model appeared on the Artificial Analysis leaderboard through a submission from a pseudonymous team, and while the official HappyHorse-1.0 site promises a full open-source release β€” including base model, distilled weights, super-resolution model, and inference code β€” no release date has been confirmed.

The open-source AI community is eagerly awaiting the HappyHorse-1.0 weight release. When available, the model is expected to require an NVIDIA H100 or A100 GPU with at least 48GB VRAM for full-precision inference. An FP8-quantized version of HappyHorse is expected to reduce memory requirements significantly, enabling deployment on 40GB A100 GPUs.

Subscribe to our notification list above to be among the first to know when HappyHorse-1.0 weights become available. In the meantime, researchers interested in the underlying architecture can explore daVinci-MagiHuman β€” the open-source model from GAIR Lab and Sand.ai that is most closely linked to the HappyHorse-1.0 architecture.

Get notified when HappyHorse weights go live

Be first to access the model when weights and API are publicly released.

No spam. Unsubscribe anytime.

HappyHorse-1.0 β€” The #1 Open Source AI Video Model