打开微信,使用扫一扫进入页面后,点击右上角菜单,
点击“发送给朋友”或“分享到朋友圈”完成分享
硬件MLU370-X4(1块)
命令:
vllm serve /home/project2/DeepSeek-R1-Distill-Qwen-7B --served-model-name deepseek-r1:7b --max-model-len 4096 --use-v2-block-manager --tensor-parallel-size 1 --max-num-seqs 32 --max-num-batched-tokens 4096 --enable-chunked-prefill --dtype half --enforce_eager --gpu_memory_utilization 0.7 --port 6006
报错:
INFO 04-15 09:19:36 mlu_hijack.py:35] Run vLLM in unpaged mode, Apply MLU optimization
INFO 04-15 09:19:36 dump_info.py:65] Cannot get device info
Apply vllm_mlu success, running in performance version !
INFO 04-15 09:20:04 api_server.py:585] vLLM API server version 0.6.4.post1
ERROR 04-15 09:20:12 engine.py:366] Not support chunked_prefill in unpaged mode.
希望将上下文长度突破2048,有什么办法吗?
热门帖子
精华帖子