蒋蒋的学习笔记

Qwen3VL推理详解

模型结构图

Qwen3VL结构图

输入数据

{
        "model": "qwen3vl",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "video",
                        "video_url": 
                        {
                            "url": "https://www.w3schools.com/html/movie.mp4"
                        }
                    },                          
                    {
                        "type": "text", 
                        "text": "视频中发生了什么"
                    }
                ]
            }
        ]
    }

chat_template

tokenize

data preprocess

Qwen3VL model

vision encoder

patch embed

pos embed

2dRoPE

chunk attention

merge

vision model

LM Dense Decoder

MRoPE

language model

LM MOE Decoder理解

rust推理代码

https://github.com/jhqxxx/aha/tree/main/src/models/qwen3vl