测试用的代码

from modelscope import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'OpenBMB/MiniCPM4-0.5B'
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

# User can directly use the chat interface
responds, history = model.chat(tokenizer, "Write an article about Artificial Intelligence.", temperature=0.7, top_p=0.7)
print(responds)
上面的代码参考:https://modelscope.cn/models/OpenBMB/MiniCPM4-0.5B

使用 nvtop 工具观测显存使用

 Device 0 [Tesla T4] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 585MHz  MEM 5000MHz TEMP  47°C FAN N/A% POW  28 /  70 W
 GPU[                         0%] MEM[|||         1.815Gi/15.000Gi]

发现使用了 1.815G


universe_king
3.5k 声望716 粉丝