
如何快速實現(xiàn)REST API集成以優(yōu)化業(yè)務(wù)流程
從上圖可以直觀的看出,預(yù)訓(xùn)練模型旁增加了右側(cè)的“旁支”,也就是先用一個Linear層A,將數(shù)據(jù)從 d維降到r,再用第二個Linear層B,將數(shù)據(jù)從r變回d維。最后再將左右兩部分的結(jié)果相加融合,得到輸出的hidden_state。
AdaLoRA是自適應(yīng)預(yù)算分配以實現(xiàn)參數(shù)有效的微調(diào),由于在不太重要的權(quán)重矩陣添加更多的參數(shù)會產(chǎn)生很少的收益,甚至?xí)p害模型性能。因此為了降低不重要的權(quán)重的計算,AdaLoRA通過調(diào)整增量矩陣的秩,以控制參數(shù)參與計算的量。
調(diào)整矩陣秩的方法
lora是根據(jù) 讓模型學(xué)習(xí)這兩個矩陣,用近似SVD分解的結(jié)果,同時將 的秩統(tǒng)一成為
AdaLora只直接利用SVD分解的結(jié)果,根據(jù) 學(xué)習(xí)型 這三個權(quán)重。
下面的內(nèi)容解釋了這些參數(shù)的中文含義
import?torch
from?transformers?import?AutoModelForCausalLM,?AutoTokenizer,?TrainingArguments,?BitsAndBytesConfig
#?modelpath="meta-llama/Llama-2-7b-hf"
modelpath="meta-llama/Meta-Llama-3-8B"
#?Load?4-bit?quantized?model
model?=?AutoModelForCausalLM.from_pretrained(
????modelpath,????
????device_map="auto",
????quantization_config=BitsAndBytesConfig(
????????load_in_4bit=True,
????????bnb_4bit_quant_type="nf4",
????????bnb_4bit_use_double_quant=True,
????????bnb_4bit_compute_dtype=torch.bfloat16,
????),
????torch_dtype=torch.bfloat16,
)
model.config.use_cache?=?False
tokenizer?=?AutoTokenizer.from_pretrained(modelpath)
tokenizer.pad_token?=?tokenizer.eos_token
配置lora
LoRAConfig()
?用于配置,哪些位置添加lora層。在這里,將 LoRA 層添加到自注意力模塊的所有線性投影(attn_matrices=[“q”, “k”, “v”])以及中間和輸出線性層。
import?adapters
from?adapters?import?LoRAConfig
adapters.init(model)
config?=?LoRAConfig(
????selfattn_lora=True,?
????intermediate_lora=True,?
????output_lora=True,
????attn_matrices=["q",?"k",?"v"],
????alpha=16,?r=64,?dropout=0.1
)
model.add_adapter("assistant_adapter",?config=config)
model.train_adapter("assistant_adapter")
剛查需要更新的參數(shù)
print(model.adapter_summary())?#?觀察需要微調(diào)的參數(shù)量
for?param?in?model.parameters():
????if?param.ndim?==?1:
????????#?cast?the?small?parameters?(e.g.?layernorm)?to?fp32?for?stability
????????param.data?=?param.data.to(torch.float32)
#?Enable?gradient?checkpointing?to?reduce?required?memory?if?needed
#?model.gradient_checkpointing_enable()
#?model.enable_input_require_grads()
class?CastOutputToFloat(torch.nn.Sequential):
????def?forward(self,?x):?return?super().forward(x).to(torch.float32)
model.lm_head?=?CastOutputToFloat(model.lm_head)
model
數(shù)據(jù)準(zhǔn)備
from?datasets?import?load_dataset
dataset?=?load_dataset("timdettmers/openassistant-guanaco")
def?tokenize(element):
????return?tokenizer(
????????element["text"],
????????truncation=True,
????????max_length=512,?#?can?set?to?longer?values?such?as?2048
????????add_special_tokens=False,
????)
dataset_tokenized?=?dataset.map(
????tokenize,?
????batched=True,?
????num_proc=os.cpu_count(),????#?multithreaded
????remove_columns=["text"]?????#?don't?need?this?anymore,?we?have?tokens?from?here?on
)
開始訓(xùn)練
from?adapters?import?AdapterTrainer
from?transformers?import?DataCollatorForLanguageModeling
args?=?TrainingArguments(
????output_dir="output/llama_qlora",
????per_device_train_batch_size=1,
????per_device_eval_batch_size=1,
????evaluation_strategy="steps",
????logging_steps=10,
????save_steps=500,
????eval_steps=187,
????save_total_limit=3,
????gradient_accumulation_steps=16,
????max_steps=1875,
????lr_scheduler_type="constant",
????optim="paged_adamw_32bit",
????learning_rate=0.0002,
????group_by_length=True,
????bf16=True,
????warmup_ratio=0.03,
????max_grad_norm=0.3,
)
trainer?=?AdapterTrainer(
????model=model,
????tokenizer=tokenizer,
????data_collator=DataCollatorForLanguageModeling(tokenizer,?mlm=False),
????train_dataset=dataset_tokenized["train"],
????eval_dataset=dataset_tokenized["test"],
????args=args,
)
trainer.train()
推理
from?transformers?import?logging
logging.set_verbosity(logging.CRITICAL)
def?prompt_model(model,?text:?str):
????batch?=?tokenizer(f"###?Human:?{text}\n###?Assistant:",?return_tensors="pt")
????batch?=?batch.to(model.device)
????
????model.eval()
????with?torch.inference_mode(),?torch.cuda.amp.autocast():
????????output_tokens?=?model.generate(**batch,?max_new_tokens=50)
????return?tokenizer.decode(output_tokens[0],?skip_special_tokens=True)
print(prompt_model(model,?"Explain?Calculus?to?a?primary?school?student"))
權(quán)重融合
model.merge_adapter("assistant_adapter")
https://arxiv.org/pdf/2106.09685
https://github.com/microsoft/LoRA
https://arxiv.org/pdf/2303.10512
https://github.com/QingruZhang/AdaLoRA
文章轉(zhuǎn)自微信公眾號@CourseAI