vllm/vllm-openai:latest
. You can use this image address directly.
vllm/vllm-openai:latest
as an example for configuration:
vllm/vllm-openai:latest
.8080
in this case.--model meta-llama/Llama-3.1-8B-Instruct --max-model-len 4096
.200
, such as /health
.HUGGING_FACE_HUB_TOKEN={Your Hugging Face Access Token (with read permission)}
.curl
command to access the service like below: