Question 1

What input formats does LongCat Video Avatar 1.5 support?

Accepted Answer

LongCat Video Avatar 1.5 accepts audio, text prompts, reference images, and existing video clips. You can generate videos using three modes: Audio-Text-to-Video (AT2V), Audio-Text-Image-to-Video (ATI2V), or Video Continuation.

Question 2

What video resolutions does it support?

Accepted Answer

The current release supports 480P and 720P output. Native 1080P is not available in version 1.5.

Question 3

Does it work with anime and stylized characters?

Accepted Answer

Yes. Beyond photorealistic humans, it handles anime characters, illustrated portraits, stylized 3D avatars, and animal characters.

Question 4

Can I use the generated videos commercially?

Accepted Answer

Yes. The model ships under the MIT license, which permits commercial use. You're responsible for ensuring you hold the rights to any images, audio, and likenesses used as inputs.

Question 5

How is it different from HeyGen or Kling Avatar 2.0?

Accepted Answer

LongCat Video Avatar 1.5 is the only open-source option in this group — MIT licensed, self-hostable, and with no per-video fee. HeyGen and Kling are closed commercial APIs with limited deployment flexibility and no customization access.

Question 6

What makes a good reference image?

Accepted Answer

Use a clear, front-facing portrait with even lighting and no face occlusion. Detailed text prompts help too — include appearance, action, and scene context (e.g., "A young woman in a white blouse speaking in a bright café"). More detail consistently produces better output.

Question 7

How accurate is the lip-sync?

Accepted Answer

Whisper-Large-v3 delivers tighter phoneme-to-viseme mapping than Wav2Vec2. The official evaluation confirmed Audio-Visual Harmony improvements across 508 image-audio test pairs.

Question 8

Do I need to install software or have a local GPU?

Accepted Answer

No installation or local GPU needed to use the online demo — just sign up and start generating. Local deployment requires a CUDA-compatible GPU (24GB VRAM minimum), Python 3.10, and a conda environment.

Question 9

Is this platform free?

Accepted Answer

New users get a free credit on sign-up to generate one video. Additional credits are available for purchase — see the pricing page for plan details.

Question 10

Does it support multiple languages?

Accepted Answer

Yes. The Whisper-Large-v3 encoder performs best on English and Chinese audio for lip-sync alignment and speech feature extraction. Other languages may work but aren't officially supported.

Question 11

Can it generate video in real time?

Accepted Answer

No. This is an offline generation model. Even with 8-step inference, each video requires meaningful GPU compute time. It's not designed for live-streaming or real-time avatar applications.

Question 12

Does it support multi-person scenes?

Accepted Answer

Yes. Version 1.5 adds dual-audio support for multi-person avatar scenes via Merge and Concatenation modes.

Question 13

How do I generate a two-person video?

Accepted Answer

Switch to Multi Avatar mode in the online tool and upload two separate audio tracks. Merge mode runs both tracks simultaneously and requires equal-length clips. Concatenation mode sequences them one after the other — no equal length required, with silence padding any gaps.

Question 14

How are credits calculated?

Accepted Answer

Credit usage depends on video length, resolution, and generation mode. Higher resolution and longer duration consume more credits per generation.

What Changed	v1.0	v1.5
Audio encoder	Wav2Vec2	Whisper-Large-v3
Lip-sync quality	Functional	Significantly smoother, more natural
Inference steps	Full diffusion	8 steps via DMD2 distillation
VRAM options	Standard	INT8 quantization available
Stylized domains	Limited	Anime, animals, complex scenes
Multi-person support	Single stream	Single + multi-stream audio
Long video stability	Variable	Production-grade temporal consistency

LongCat Video Avatar 1.5
Generate AI Avatar Videos Online

Longcat Video Avatar AI Video Generator

Preview

What Is LongCat Video Avatar 1.5?

AT2V

ATI2V

Video Continuation

What's New in Version 1.5 vs 1.0?

LongCat Video Avatar 1.0 vs 1.5

Commercial Model Comparison

Sample 1

Sample 2

Stability and Consistency

Long-Form Talking

Singing and Performance

Animation

Multi-Person Interaction

Key Features of LongCat Video Avatar 1.5

Whisper-Large-v3 Audio Encoder

Production-Grade Stability

Stylized Domain Support

8-Step Fast Inference with INT8 Quantization

Multi-GPU Context Parallelism

Who Is LongCat Video Avatar 1.5 For?

News Broadcasting & Education

Singing & Performance

Animation & Stylized Characters

Multi-Person Conversations

E-Commerce Marketing Videos

Animal & Non-Human Characters

Frequently Asked Questions

Ready to Generate Your First Avatar Video?

LongCat Video Avatar 1.5Generate AI Avatar Videos Online