蒋蒋的学习笔记

DeepSeek-OCR推理详解

模型结构图

DeepSeek-OCR结构图 图片地址:DeepSeek-OCR: Contexts Optical Compression

输入数据

{
        "model": "deepseek-ocr",
        "messages": [
            {
                "role": "user",
                "content": [ 
                    {
                        "type": "image",
                        "image_url": 
                        {
                            "url": "file://./assets/img/ocr_test1.png"
                        }
                    },              
                    {
                        "type": "text", 
                        "text": "<image>\n<|grounding|>Convert the document to markdown. "
                    }
                ]
            },
            {
                "role": "assistant",
                "content": ""
            }
        ],
        "metadata": {"base_size": "640", "image_size": "640", "crop_mode": "false"}
    }

preprocess

DeepEncoder

SAM

CLIP

动态分辨率

DeepSeek3B-MoE-A570M