"Sie haben eine Version von" Bitsandbytes ", die nicht mit 4 -Bit -Inferenz und Training kompatibel ist"Python

Python-Programme
Guest
 "Sie haben eine Version von" Bitsandbytes ", die nicht mit 4 -Bit -Inferenz und Training kompatibel ist"

Post by Guest »

Ich versuche jetzt, ein LAMA3 -Modell zu beenden.

Code: Select all

from unsloth import FastLanguageModel
< /code>
Dann lade ich das LLAMA3 -Modell. < /p>
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
< /code>
Ich leite mein Skript auf CS -Code aus, und mein Python und mein Skript sind auf WSL. Meine Systeminformationen lautet wie unten: < /p>
==((====))==  Unsloth: Fast Llama patching release 2024.5
\\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
"-____-"     Free Apache license: http://github.com/unslothai/unsloth
< /code>
Jetzt tritt ich auf diesen Fehler ein: < /p>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 2
1 # 2. Load Llama3 model
----> 2 model, tokenizer = FastLanguageModel.from_pretrained(
3     model_name = "unsloth/llama-3-8b-bnb-4bit",
4     max_seq_length = max_seq_length,
5     dtype = None,
6     load_in_4bit = True,
7 )

File ~/miniconda/envs/llama3/lib/python3.9/site-packages/unsloth/models/loader.py:142, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, *args, **kwargs)
139     tokenizer_name = None
140 pass
--> 142 model, tokenizer = dispatch_model.from_pretrained(
143     model_name     = model_name,
144     max_seq_length = max_seq_length,
145     dtype          = dtype,
146     load_in_4bit   = load_in_4bit,
147     token          = token,
148     device_map     = device_map,
149     rope_scaling   = rope_scaling,
150     fix_tokenizer  = fix_tokenizer,
151     model_patcher  = dispatch_model,
152     tokenizer_name = tokenizer_name,
...
96         "You have a version of `bitsandbytes` that is not compatible with 4bit inference and training"
97         " make sure you have the latest version of `bitsandbytes` installed"
98     )

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
`
Würde jemand bitte helfen?

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post