Semantickernel getStreamingchatMessageContententsAsync leer, aber GetChatMessageContentAsync funktioniert einwandfreiC#

Ein Treffpunkt für C#-Programmierer
Guest
 Semantickernel getStreamingchatMessageContententsAsync leer, aber GetChatMessageContentAsync funktioniert einwandfrei

Post by Guest »

Ich habe gerade mit Semantickernel auf lokalem LLM angefangen.

Code: Select all

var chat = app.Services.GetRequiredService();
< /code>
ChatMessageContent response = await chat.GetChatMessageContentAsync(chatHistory);
var items = response.Items;
var firstitem = items.FirstOrDefault();
var textContent = firstitem as TextContent;
Console.WriteLine(textContent?.Text);
< /code>
This works as expected and produces a "Hello! How can I assist you today? ??" reply
However, I want to do this like everybody else with streaming.
await foreach (StreamingChatMessageContent stream in chat.GetStreamingChatMessageContentsAsync("Hi"))
{
// await Task.Yield(); // tried this to see if it would help but it didn't
Console.WriteLine(stream.Content);
}
< /code>
But this returns 12 "empty" results, which if serialised
{"Content":null,"Role":{"Label":"Assistant"},"ChoiceIndex":0,"ModelId":"deepseek-r1-distill-llama-8b","Metadata":{"CompletionId":"chatcmpl-m086eaeve495763ls6arwj","CreatedAt":"2025-02-13T10:22:51+00:00","SystemFingerprint":"deepseek-r1-distill-llama-8b","RefusalUpdate":null,"Usage":null,"FinishReason":null}}
< /code>
followed by a "stop"
{"Content":null,"Role":null,"ChoiceIndex":0,"ModelId":"deepseek-r1-distill-llama-8b","Metadata":{"CompletionId":"chatcmpl-m086eaeve495763ls6arwj","CreatedAt":"2025-02-13T10:22:51+00:00","SystemFingerprint":"deepseek-r1-distill-llama-8b","RefusalUpdate":null,"Usage":null,"FinishReason":"Stop"}}
< /code>
So I know the server is running as the direct approach works fine, but I cannot get the streaming to work properly.
For the direct message without streaming, here is the server log for the request:
2025-02-13 12:28:42 [DEBUG]
Received request: POST to /v1/chat/completions with body  {
"messages": [
{
"role": "user",
"content": "Hi"
}
],
"model": "deepseek-r1-distill-llama-8b"
}
2025-02-13 12:28:42  [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-02-13 12:28:42 [DEBUG]
Sampling params:    repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
2025-02-13 12:28:42 [DEBUG]
sampling:
logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 12
BeginProcessingPrompt
2025-02-13 12:28:42 [DEBUG]
FinishedProcessingPrompt. Progress: 100
2025-02-13 12:28:42  [INFO]
[LM STUDIO SERVER] Accumulating tokens ...  (stream = false)
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 1 tokens 
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 2 tokens \n\n
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 3 tokens \n\n
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 4 tokens \n\n\n\n
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 5 tokens \n\n\n\nHello
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 6 tokens \n\n\n\nHello!
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 7 tokens \n\n\n\nHello! How
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 8 tokens \n\n\n\nHello! How can
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 9 tokens \n\n\n\nHello! How can I
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 10 tokens \n\n\n\nHello! How can I assist
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 11 tokens \n\n\n\nHello! How can I assist you
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 12 tokens \n\n\n\nHello! How can I assist you today
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 13 tokens \n\n\n\nHello! How can I assist you today?
2025-02-13 12:28:42 [DEBUG]
Incomplete UTF-8 character.  Waiting for next token (skip)
2025-02-13 12:28:42 [DEBUG]
[deepseek-r1-distill-llama-8b] Accumulated 14 tokens \n\n\n\nHello! How can I assist you today? 😊
2025-02-13 12:28:42 [DEBUG]
target model llama_perf stats:
llama_perf_context_print:        load time =    4657.69 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =     350.76 ms /    16 runs   (   21.92 ms per token,    45.62 tokens per second)
llama_perf_context_print:       total time =     361.22 ms /    17 tokens
2025-02-13 12:28:42  [INFO]
[LM STUDIO SERVER] [deepseek-r1-distill-llama-8b] Generated prediction:  {
"id": "chatcmpl-qv5tc01fntgw2bsm3091wk",
"object": "chat.completion",
"created": 1739442522,
"model": "deepseek-r1-distill-llama-8b",
"choices": [
{
"index": 0,
"logprobs": null,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "\n\n\n\nHello! How can I assist you today? 😊"
}
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 14,
"total_tokens": 18
},
"system_fingerprint": "deepseek-r1-distill-llama-8b"
}
< /code>
And with the streaming log which doesn't return any content:
2025-02-13 12:30:18 [DEBUG]
Received request: POST to /v1/chat/completions with body  {
"messages": [
{
"role": "user",
"content": "Hi"
}
],
"model": "deepseek-r1-distill-llama-8b",
"stream": true,
"stream_options": {
"include_usage": true
}
}
2025-02-13 12:30:18  [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-02-13 12:30:18  [INFO]
[LM STUDIO SERVER] Streaming response...
2025-02-13 12:30:18 [DEBUG]
Sampling params:    repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling:
logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 12
BeginProcessingPrompt
2025-02-13 12:30:18 [DEBUG]
FinishedProcessingPrompt. Progress: 100
2025-02-13 12:30:18  [INFO]
[LM STUDIO SERVER] First token generated. Continuing to stream response..
2025-02-13 12:30:18  [INFO]
[LM STUDIO SERVER] Received  - START
2025-02-13 12:30:18 [DEBUG]
Incomplete UTF-8 character. Waiting for next token (skip)
2025-02-13 12:30:18 [DEBUG]
target model llama_perf stats:
llama_perf_context_print:        load time =    4657.69 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =     354.33 ms /    16 runs   (   22.15 ms per token,    45.16 tokens per second)
llama_perf_context_print:       total time =     365.10 ms /    17 tokens
2025-02-13 12:30:18  [INFO]
Finished streaming response

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post