Warum sind die Ausgaben während des Trainings und Tests unterschiedlich?

Anonymous · Post by **Anonymous** » 08 Apr 2025, 15:42

Ich habe versucht, das LLAMA-2-7B-Modell mit dem FreedomIntelligence/Medical-O1-Reasoning-Datensatz in einem Anweisungs-Antwort-Format zu optimieren. Im Moment übernitiere ich den Datensatz absichtlich, um den Trainingsfluss zu beobachten. Wenn ich jedoch das LORA-adaptierte Modell später zum Testen lade, unterscheidet sich die Ausgabe von dem, was ich zuvor gesehen habe. Insbesondere stimmen die Ausgänge aus der Funktion test_model (nach dem Training) und der Funktion test_Merged_model (nach dem Laden des Modells später) nicht überein.

Code: Select all

>  [INST]  You are a helpful assistant.   user A
> patient with psoriasis was treated with systemic steroids, and upon
> discontinuation of the treatment, developed generalized pustules all
> over the body. What is the most likely cause of this condition?
> assistant  [/INST] think I'm thinking about this patient with
> psoriasis who's been on systemic steroids. Okay, so psoriasis is this
> chronic skin condition that can be pretty stubborn and flares up now
> and then. They stopped their steroid treatment, and then suddenly
> there are these generalized pustules appearing all over the place.
> What's going on here?
>
> Alright, pustules are like little bumps that are filled with pus—they
> can be a sign of different skin issues, but in someone with psoriasis,
> a major change like stopping treatment can really mess with things.
> Steroids are powerful at tamping down inflammation and managing the
> immune system. When you take someone off steroids, particularly
> suddenly, the body can react in unexpected ways. It's like the immune
> system gets a bit confused and overreacts—a rebound effect.
>
> Oh, right, I remember reading that systemic steroids can sometimes
> mask the severity of psoriasis, keeping it under control. But pulling
> those steroids too fast can allow the psoriasis to come back stronger,
> even shifting into another form. This sounds familiar—there's a type
> called pustular psoriasis. If I recall, that's when the psoriasis
> turns into this more extreme version with pustules.
>
> So connecting the dots: it's likely that stopping the steroids led to
> a rebound and we might be seeing a case of generalized pustular
> psoriasis. That makes sense given the scenario—I've seen this
> mentioned before where physicians need to be cautious about how they
> taper steroids off in psoriasis patients to prevent exactly this kind
> of escalation. It fits pretty well with what I'd expect based on what
> I know about steroid withdrawal and psoriasis flare-ups.
> answer The development of generalized pustules in a patient
> with psoriasis following the discontinuation of systemic steroid
> treatment is most likely due to a condition called generalized
> pustular psoriasis (GPP). This condition can occur as a rebound effect
> when systemic steroids are suddenly stopped or rapidly tapered down.
> In patients with psoriasis, steroids can suppress the immune system
> and mitigate symptoms. When these drugs are withdrawn too quickly, the
> immune system can overreact, leading to more severe symptoms,
> including the transformation of psoriasis into a pustular form.
> Careful management and slow tapering of steroids are critical in
> psoriasis patients to prevent such exacerbations.
< /code>
 Zweite Ausgabe (nach dem Training, Lading Lora -Modell) < /h4>
> [INST]You are a helpful assistant. user A patient
> with psoriasis was treated with systemic steroids, and upon
> discontinuation of the treatment, developed generalized pustules all
> over the body. What is the most likely cause of this condition?
> assistant [/INST]thinkink I'thinkisinkis this patient with
> psoriasis who's been on systemic steroids going on systemic steroid
> treatment, and then. What are pretty cause. They stopped their steroid
> treatment, and then. What's going on now and then. What are these
> generalized pustules are like little bumps that can be a sign of these
> generalized pustules that can be a sign, and then suddenly there are
> these are filled pustules that can be a sign of different skin issues,
> but in someone with psoriasis, a major change like stopping treatment
> can really mess with the immune system. Steroids are powerful at
> tamping down the immune system and keeping it under control.  When you
> take someone off steroids, particularly suddenly, the body can react
> in unexpected ways. It's like the immune system gets a bit confused
> and overreacts, a rebound effect. Alright, so connecting the dots:
> it's likely that stopping the steroids led to a rebound effect where
> the psoriasis got worse. This makes sense given the scenario—I've seen
> this mentioned before where physicians need to be cautious about how
> they taper steroids off in psoriasis patients to prevent exactly this
> kind of escalation. It fits pretty well with what I'd expect based on
> the available information. The development of generalized
> pustules in a patient with psoriasis following the discontinuation of
> systemic steroid treatment is most likely due to a condition called
> generalized pustular psoriasis (GPP). This condition can occur as a
> rebound effect when systemic steroids are suddenly stopped or rapidly
> tapered down. In patients with psoriasis, steroids can suppress the
> immune system and mitigate symptoms. When these drugs are withdrawn
> too quickly, the immune system can overreact, leading to more severe
> symptoms, including the transformation of psoriasis into a pustular
> form. Careful management and slow tapering of steroids are critical in
> psoriasis patients to prevent the escalation of symptoms.
< /code>
Ich bin mir nicht sicher, warum das Verhalten zwischen den beiden inkonsistent ist.  < /p>
"""
Practical Introduction to Llama 2 Fine-Tuning with S1K Dataset using Standard Trainer

Fine-tune a 7B parameter Llama 2 model using QLoRA on a T4 GPU with limited VRAM.
This script uses parameter-efficient fine-tuning techniques to enable training
on consumer-grade hardware, using medical reasoning dataset.
"""

import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
Trainer,
pipeline,
logging,
DataCollatorForLanguageModeling,
LlamaTokenizer
)
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from tqdm import tqdm

# Configuration
# Model and dataset settings
MODEL_NAME = "meta-llama/Llama-2-7b-hf"
# MODEL_NAME = "NousResearch/Llama-2-7b-hf"
DATASET_NAME = "FreedomIntelligence/medical-o1-reasoning-SFT"
NEW_MODEL_NAME = "llama-2-7b-medical-reasoning"
OUTPUT_DIR = "./results-medical-reasoning"

# QLoRA parameters
LORA_R = 64
LORA_ALPHA = 16
LORA_DROPOUT = 0.1

# Quantization parameters
USE_4BIT = True
BNB_4BIT_COMPUTE_DTYPE = "float16"
BNB_4BIT_QUANT_TYPE = "nf4"
USE_NESTED_QUANT = False

# Training parameters
NUM_TRAIN_EPOCHS = 100
FP16 = True
BF16 = False
PER_DEVICE_TRAIN_BATCH_SIZE = 4
PER_DEVICE_EVAL_BATCH_SIZE = 4
GRADIENT_ACCUMULATION_STEPS = 4
GRADIENT_CHECKPOINTING = True
MAX_GRAD_NORM = 0.3
LEARNING_RATE = 2e-4
WEIGHT_DECAY = 0.001
OPTIM = "paged_adamw_32bit"
LR_SCHEDULER_TYPE = "constant"
MAX_STEPS = -1  # Override epochs if positive
WARMUP_RATIO = 0.03
GROUP_BY_LENGTH = True
SAVE_STEPS = 25
LOGGING_STEPS = 25

# Sequence parameters
MAX_SEQ_LENGTH = 10000  # More reasonable max length
PACKING = False
DEVICE_MAP = {"": 0}  # Load on GPU 0

# Generation parameters - consistent across all model tests
GENERATION_CONFIG = {
"max_length": 2000,
"do_sample": False,
"temperature": 0.0,
"num_beams": 1,
"top_p": 1.0,
"top_k": 50,
"repetition_penalty": 1.0,
}

# Prompt formatting functions
def generate_prompt_llama(question, think, answer):
formatted_prompt = "[INST]You are a helpful assistant.\n"
formatted_prompt += f"user\n{question}\nassistant\n[/INST]"
formatted_prompt += f"think\n{think}\nanswer\n{answer}"
return formatted_prompt

def generate_prompt_llama_answer(question, think, answer):
formatted_prompt = f"think\n{think}\nanswer\n{answer}"
return formatted_prompt

def generate_prompt_llama_system(question):
formatted_prompt = "[INST]You are a helpful assistant.\n"
formatted_prompt += f"user\n{question}\nassistant\n[/INST]"
return formatted_prompt

def main():
"""Main function to run the fine-tuning process."""
print("Starting Llama 2 fine-tuning process with medical reasoning dataset")

# 1. Load dataset
print(f"Loading {DATASET_NAME} dataset...")
dataset = load_dataset(DATASET_NAME, 'en')
dataset = dataset['train'].select(range(3,4))  # For testing purposes
print(f"Dataset loaded: {dataset}")
print(f"Dataset length: {len(dataset)}")
print(f"Dataset structure: {dataset.features}")

# 2.  Configure quantization
print("Configuring BitsAndBytes for 4-bit quantization...")
compute_dtype = getattr(torch, BNB_4BIT_COMPUTE_DTYPE)
bnb_config = BitsAndBytesConfig(
load_in_4bit=USE_4BIT,
bnb_4bit_quant_type=BNB_4BIT_QUANT_TYPE,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=USE_NESTED_QUANT,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and USE_4BIT:
if torch.cuda.is_available():
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16: accelerate training with bf16=True")
print("=" * 80)

# 3. Load model and tokenizer
print(f"Loading {MODEL_NAME} in 4-bit precision...")
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
quantization_config=bnb_config,
device_map=DEVICE_MAP,
trust_remote_code=True,
use_cache=False  # Set here directly
)

# Configure model settings for training
model.config.pretraining_tp = 1

print("Loading tokenizer...")
tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
num_added_toks = tokenizer.add_tokens(['[INST]', '[/INST]', '', ''], special_tokens=True)

# Use distinct end-of-text and padding tokens
special_tokens_dict = {
'eos_token': '',
'pad_token': ''
}
# Add special tokens and resize model embeddings
num_added_tokens = tokenizer.add_special_tokens(special_tokens_dict)
print(f"Added {num_added_tokens} special tokens to the tokenizer")

# Important: Resize model embeddings to match new tokenizer size
model.resize_token_embeddings(len(tokenizer))

# Set padding side for the tokenizer
tokenizer.padding_side = "right"  # Fix overflow issues with fp16 training

# Update model config with token IDs
model.config.pad_token_id = tokenizer.pad_token_id
model.config.eos_token_id = tokenizer.eos_token_id

# Print token information for debugging
print(f"Pad token: {tokenizer.pad_token}, ID: {tokenizer.pad_token_id}")
print(f"EOS token: {tokenizer.eos_token}, ID: {tokenizer.eos_token_id}")

# Prepare model for k-bit training - CRITICAL STEP!
model = prepare_model_for_kbit_training(model)

# 4. Configure LoRA
print("Configuring LoRA...")
peft_config = LoraConfig(
lora_alpha=LORA_ALPHA,
lora_dropout=LORA_DROPOUT,
r=LORA_R,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "down_proj", "up_proj"],
)

# Apply LoRA to the model
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# 5.  Process dataset
print("Processing dataset...")

def preprocess_function(examples):
"""Process a batch of examples."""
full_prompts = [
generate_prompt_llama(q, traj, att)
for q, traj, att in zip(
examples["Question"],
examples["Complex_CoT"],
examples["Response"]
)
]
print("Full prompt example:", full_prompts[0])

# Tokenize inputs with proper padding and truncation
tokenized = tokenizer(
full_prompts,
truncation=True,
max_length=MAX_SEQ_LENGTH,
padding=False,  # DataCollator will handle padding
return_tensors=None,  # Return python lists, not tensors
)

# For causal language modeling, labels are the same as input_ids
tokenized["labels"] = tokenized["input_ids"].copy()

return tokenized

# Apply preprocessing to dataset
tokenized_dataset = dataset.map(
preprocess_function,
batched=True,
remove_columns=dataset.column_names,
desc="Tokenizing dataset",
)

print("Decoded sample:", tokenizer.decode(tokenized_dataset[0]["input_ids"], skip_special_tokens=False))
print(f"Sample input_ids shape: {len(tokenized_dataset[0]['input_ids'])}")

# 6. Data collator - critical for properly batching sequences
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False,  # We're doing causal language modeling
)

# 7. Set up training arguments
print("Setting up training arguments...")
training_args = TrainingArguments(
output_dir=OUTPUT_DIR,
num_train_epochs=NUM_TRAIN_EPOCHS,
per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH_SIZE,
per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH_SIZE,
gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
optim=OPTIM,
save_steps=SAVE_STEPS,
logging_steps=LOGGING_STEPS,
learning_rate=LEARNING_RATE,
weight_decay=WEIGHT_DECAY,
fp16=FP16,
bf16=BF16,
max_grad_norm=MAX_GRAD_NORM,
max_steps=MAX_STEPS,
warmup_ratio=WARMUP_RATIO,
group_by_length=GROUP_BY_LENGTH,
lr_scheduler_type=LR_SCHEDULER_TYPE,
report_to="tensorboard",
gradient_checkpointing=GRADIENT_CHECKPOINTING,
remove_unused_columns=False,  # Important for LoRA fine-tuning
)

# 8. Initialize standard Trainer
print("Initializing standard Trainer...")
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
data_collator=data_collator,
)

# 9. Train model
print("Starting training...")
trainer.train()

# 10. Save trained model
print(f"Saving model to {NEW_MODEL_NAME}...")
trainer.save_model(NEW_MODEL_NAME)
tokenizer.save_pretrained(NEW_MODEL_NAME)

print("Fine-tuning complete!")

# 11. Test model
test_model(model, tokenizer)

# 12. Merge weights (requires restarting with fresh VRAM)
print("====================================================================:")
print("Note: To merge LoRA weights with the base model:")
print("1. Restart your environment to clear VRAM")
print("2. Run the merge_weights() function")

merge_weights()

def test_model(model, tokenizer):
"""Test the model with a sample question."""
logging.set_verbosity(logging.CRITICAL)
prompt = 'A patient with psoriasis was treated with systemic steroids, and upon discontinuation of the treatment, developed generalized pustules all over the body.  What is the most likely cause of this condition?'

# Format for inference
formatted_prompt = generate_prompt_llama_system(prompt)

print("\nTesting model with prompt:", prompt)

# Testing using model.generate()
input_ids = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(model.device)

model.eval()

with torch.no_grad():
# Use the global generation config
output_ids = model.generate(
input_ids=input_ids,
max_length=GENERATION_CONFIG["max_length"],
do_sample=GENERATION_CONFIG["do_sample"],
temperature=GENERATION_CONFIG["temperature"],
num_beams=GENERATION_CONFIG["num_beams"],
top_p=GENERATION_CONFIG["top_p"],
top_k=GENERATION_CONFIG["top_k"],
repetition_penalty=GENERATION_CONFIG["repetition_penalty"],
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)

print("\nGenerated output:")
print(tokenizer.decode(output_ids[0], skip_special_tokens=False))

def merge_weights():
"""Merge LoRA weights with base model (run after restarting environment)."""
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

print("Loading tokenizer first...")
tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_NAME, trust_remote_code=True)

# Calculate the new vocabulary size
new_vocab_size = len(tokenizer)
print(f"New vocabulary size: {new_vocab_size}")

print("Loading base model in FP16...")
base_model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map={"": 0},
)

# Resize model embeddings BEFORE loading LoRA weights
base_model.resize_token_embeddings(len(tokenizer))
print(f"Resized base model embeddings to {len(tokenizer)}")

print(f"Loading LoRA weights from {NEW_MODEL_NAME}...")
model = PeftModel.from_pretrained(base_model, NEW_MODEL_NAME)

print("Merging weights...")
model = model.merge_and_unload()

# Set padding configuration
tokenizer.padding_side = "right"

# Print token info
print(f"Pad token: {tokenizer.pad_token}, ID: {tokenizer.pad_token_id}")
print(f"EOS token: {tokenizer.eos_token}, ID: {tokenizer.eos_token_id}")

# Save merged model
merged_model_name = f"{NEW_MODEL_NAME}-merged"
print(f"Saving merged model to {merged_model_name}...")
model.save_pretrained(merged_model_name)
tokenizer.save_pretrained(merged_model_name)

print("Model weights successfully merged!")

# Test the merged model
test_merged_model(model, tokenizer)

return model, tokenizer

def test_merged_model(model, tokenizer):
"""Test the merged model."""
# Use the exact same test code and parameters as test_model
logging.set_verbosity(logging.CRITICAL)
prompt = 'A patient with psoriasis was treated with systemic steroids, and upon discontinuation of the treatment, developed generalized pustules all over the body.  What is the most likely cause of this condition?'

# Format for inference
formatted_prompt = generate_prompt_llama_system(prompt)

print("\nTesting merged model with prompt:", prompt)

# Testing using model.generate()
input_ids = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(model.device)

model.eval()

with torch.no_grad():
# Use the identical generation config as the original test
output_ids = model.generate(
input_ids=input_ids,
max_length=GENERATION_CONFIG["max_length"],
do_sample=GENERATION_CONFIG["do_sample"],
temperature=GENERATION_CONFIG["temperature"],
num_beams=GENERATION_CONFIG["num_beams"],
top_p=GENERATION_CONFIG["top_p"],
top_k=GENERATION_CONFIG["top_k"],
repetition_penalty=GENERATION_CONFIG["repetition_penalty"],
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)

print("\nGenerated output:")
print(tokenizer.decode(output_ids[0], skip_special_tokens=False))

if __name__ == "__main__":
main()here

Warum sind die Ausgaben während des Trainings und Tests unterschiedlich?

Warum sind die Ausgaben während des Trainings und Tests unterschiedlich? ⇐ Python

Quick Reply