文章预览
本文将深入分析千问模型推理代码,详细解析其内部实现机制, 简单阅读下官方推理代码: from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "qwen/Qwen2___5-7B-Instruct" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype= "auto" , device_map= "auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ { "role" : "system" , "content" : "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." }, { "role" : "user" , "content" : prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize= False , add_generation_prompt= True , ) model_inputs = tokenizer([text], return_tensors= "pt" ).to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens= 512 , ) generated_ids
………………………………