Snowflake SPCS Multi Model Training – 403-Fehler nach einiger ZeitPython

Python-Programme
Anonymous
 Snowflake SPCS Multi Model Training – 403-Fehler nach einiger Zeit

Post by Anonymous »

Ich versuche, einen Trainingsjob mit langer Laufzeit auszuführen und dabei die verteilten Multimodell-Trainingsfunktionen von Snowflakes zu nutzen. Anhand einer Stichprobe meiner Trainingsdaten (ca. 100.000 Zeilen) wird dieser Vorgang ohne Probleme abgeschlossen. Bei der Erweiterung auf den gesamten Trainingssatz (~150 Mio. Zeilen) stoße ich jedoch in den später verarbeiteten Partitionen in meinem Datensatz auf 403-Fehler. Diese Fehler treten etwa nach 6 Stunden des Trainings auf. Partitionen, die zuvor trainiert wurden, weisen keine Probleme auf.
Die folgenden Ausnahmen werden ausgelöst, die für jede fehlgeschlagene Partition gleich sind

Code: Select all

An exception was raised from a task of operator "ReadResultSetDataSource->SplitBlocks(2)". Dataset execution will now abort.  To ignore this exception and continue, set DataContext.max_errored_blocks.
Traceback (most recent call last):
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/streaming_executor_state.py", line 472, in process_completed_tasks
bytes_read = task.on_data_ready(
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/interfaces/physical_operator.py", line 138, in on_data_ready
raise ex from None
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/interfaces/physical_operator.py", line 134, in on_data_ready
ray.get(block_ref)
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/_private/worker.py", line 2849, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/_private/worker.py", line 937, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): [36mray::ReadResultSetDataSource->SplitBlocks(2)()[39m (pid=58742, ip=**.***.**.**)
snowflake.connector.network.RetryRequest: 290403: HTTP 403: Forbidden

Code: Select all

[36mray::ReadResultSetDataSource->SplitBlocks(2)()[39m (pid=58742, ip=**.***.**.**)
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 552, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 601, in __call__
for block in blocks:
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 532, in __call__
for data in iter:
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 377, in __call__
yield from self._block_fn(input, ctx)
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/_internal/planner/plan_read_op.py", line 106, in do_read
yield from read_task()
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/ray/data/datasource/datasource.py", line 168, in __call__
yield from result
File "/opt/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/site-packages/implementations/ray_data_ingester.py", line 255, in _read_fn
raise RuntimeError(
RuntimeError: Unable to refresh batches: invalid query_id
Ich habe versucht, die Sitzungserhaltung auf „True“ zu setzen

Code: Select all

SESSION.sql("ALTER SESSION SET CLIENT_SESSION_KEEP_ALIVE = TRUE").collect()
sowie die Übergabe eines Verweises auf eine Tabelle und einen Datenrahmen im Speicher beim Training ohne Erfolg.

Code: Select all

trainer = ManyModelTraining(
train_func=train_model,
stage_name=MMT_STAGE,
serde=ModelOnlySerde(),
)

run = trainer.run(
partition_by="PARTITION_ID",
#snowpark_dataframe=SESSION.table(

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post