"Sie haben eine Version von" Bitsandbytes ", die nicht mit 4 -Bit -Inferenz und Training kompatibel ist"

"Sie haben eine Version von" Bitsandbytes ", die nicht mit 4 -Bit -Inferenz und Training kompatibel ist" ⇐ Python

1 post • Page 1 of 1

Guest

"Sie haben eine Version von" Bitsandbytes ", die nicht mit 4 -Bit -Inferenz und Training kompatibel ist"

Post by Guest » 09 Feb 2025, 11:52

Ich versuche jetzt, ein LAMA3 -Modell zu beenden.

from unsloth import FastLanguageModel
< /code>
Dann lade ich das LLAMA3 -Modell. < /p>
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
< /code>
Ich leite mein Skript auf CS -Code aus, und mein Python und mein Skript sind auf WSL. Meine Systeminformationen lautet wie unten: < /p>
==((====))==  Unsloth: Fast Llama patching release 2024.5
\\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
"-____-"     Free Apache license: http://github.com/unslothai/unsloth
< /code>
Jetzt tritt ich auf diesen Fehler ein: < /p>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 2
1 # 2. Load Llama3 model
----> 2 model, tokenizer = FastLanguageModel.from_pretrained(
3     model_name = "unsloth/llama-3-8b-bnb-4bit",
4     max_seq_length = max_seq_length,
5     dtype = None,
6     load_in_4bit = True,
7 )

File ~/miniconda/envs/llama3/lib/python3.9/site-packages/unsloth/models/loader.py:142, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, *args, **kwargs)
139     tokenizer_name = None
140 pass
--> 142 model, tokenizer = dispatch_model.from_pretrained(
143     model_name     = model_name,
144     max_seq_length = max_seq_length,
145     dtype          = dtype,
146     load_in_4bit   = load_in_4bit,
147     token          = token,
148     device_map     = device_map,
149     rope_scaling   = rope_scaling,
150     fix_tokenizer  = fix_tokenizer,
151     model_patcher  = dispatch_model,
152     tokenizer_name = tokenizer_name,
...
96         "You have a version of `bitsandbytes` that is not compatible with 4bit inference and training"
97         " make sure you have the latest version of `bitsandbytes` installed"
98     )

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

`
Würde jemand bitte helfen?

1739098325

Guest

Ich versuche jetzt, ein LAMA3 -Modell zu beenden.[code]from unsloth import FastLanguageModel
< /code>
Dann lade ich das LLAMA3 -Modell. < /p>
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
< /code>
Ich leite mein Skript auf CS -Code aus, und mein Python und mein Skript sind auf WSL. Meine Systeminformationen lautet wie unten: < /p>
==((====))==  Unsloth: Fast Llama patching release 2024.5
\\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
"-____-"     Free Apache license: http://github.com/unslothai/unsloth
< /code>
Jetzt tritt ich auf diesen Fehler ein: < /p>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 2
1 # 2. Load Llama3 model
----> 2 model, tokenizer = FastLanguageModel.from_pretrained(
3     model_name = "unsloth/llama-3-8b-bnb-4bit",
4     max_seq_length = max_seq_length,
5     dtype = None,
6     load_in_4bit = True,
7 )

File ~/miniconda/envs/llama3/lib/python3.9/site-packages/unsloth/models/loader.py:142, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, *args, **kwargs)
139     tokenizer_name = None
140 pass
--> 142 model, tokenizer = dispatch_model.from_pretrained(
143     model_name     = model_name,
144     max_seq_length = max_seq_length,
145     dtype          = dtype,
146     load_in_4bit   = load_in_4bit,
147     token          = token,
148     device_map     = device_map,
149     rope_scaling   = rope_scaling,
150     fix_tokenizer  = fix_tokenizer,
151     model_patcher  = dispatch_model,
152     tokenizer_name = tokenizer_name,
...
96         "You have a version of `bitsandbytes` that is not compatible with 4bit inference and training"
97         " make sure you have the latest version of `bitsandbytes` installed"
98     )

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
[/code]
`
Würde jemand bitte helfen?

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Quick Reply

Username:

Change Text Case:

Smilies

View more smilies

Similar Topics

Replies

Views

Last post

Fastapi + Transformatoren + 4-Bit Mistral: .to () wird für BitsandBytes 4-Bit-Modelle Fehler nicht unterstützt

Last post by Anonymous « 03 Apr 2025, 06:07
Posted in Python

by Anonymous » 03 Apr 2025, 06:07 » in Python

Ich setze ein Fastapi-Backend mit umarmenden Gesichtstransformatoren mit dem Mistralai/Mistral-7b-Instruct-V0.1-Modell ein, das mit BitsandByTesconfig nach 4-Bit quantisiert wurde. Ich leite dies in...

0 Replies

9 Views

Last post by Anonymous
03 Apr 2025, 06:07
OpenJDK 64-Bit Server VM Warnung: Sie haben Bibliothek geladen, die möglicherweise einen Deaktivierten Stack Guard haben

Last post by Anonymous « 03 Jun 2025, 08:03
Posted in Java

by Anonymous » 03 Jun 2025, 08:03 » in Java

Kann mir jemand bei dieser Fehlermeldung helfen.

Während Sie versuchen, das Projekt in Eclipse mit JNETPCAP -API zu kompilieren. /> Obwohl ich immer noch die folgende Fehlermeldung erhalte. Die VM...

0 Replies

0 Views

Last post by Anonymous
03 Jun 2025, 08:03
Was ist der Unterschied zwischen diesen beiden Linien, MSC V.1900 64 Bit (AMD64) und MSC V.1914 32 Bit (Intel)

Last post by Anonymous « 24 Feb 2025, 12:24
Posted in Python

by Anonymous » 24 Feb 2025, 12:24 » in Python

(base) C:\Users\Abj>python
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) on win32
Type help , copyright , credits or license for more information.
>>> exit()

(base) C:\Users\Abj>py...

0 Replies

11 Views

Last post by Anonymous
24 Feb 2025, 12:24
Wie extrahiere ich die höchstwertigen Bits aus jedem Byte einer 64-Bit-Ganzzahl in eine 8-Bit-Bitmaske?

Last post by Guest « 16 Jan 2025, 02:31
Posted in C++

by Guest » 16 Jan 2025, 02:31 » in C++

Gegeben eine 64-Bit-Ganzzahl, möchte ich alle höchstwertigen Bits ihrer acht Bytes extrahieren und sie in eine Acht-Bit-Bitmaske schreiben. Das heißt, ich möchte effektiv eine Funktion auto...

0 Replies

22 Views

Last post by Guest
16 Jan 2025, 02:31
Der effizienteste Weg, eine 32-Bit-Ganzzahl in einen 16-Bit-Wert umzuwandeln?

Last post by Guest « 04 Jan 2025, 07:16
Posted in C++

by Guest » 04 Jan 2025, 07:16 » in C++

Ich schreibe einen DSP-Code, der eine Wellenfaltungsverzerrung an einem Eingangssignal durchführt. Dieser Code wendet eine Amplitudenverstärkung an (multipliziert die Eingabe mit einem...

0 Replies

17 Views

Last post by Guest
04 Jan 2025, 07:16

Return to “Python”