TensorFlow läuft außerhalb des Anpassung außerhalb der Reichweite: Lokales Rendezvous ist mit Status abgebrochen: out_of

Anonymous · Post by **Anonymous** » 22 Aug 2025, 09:19

Ich habe einen Datensatz, der semantische Vektoren von Länge 384 enthält. Ich gruppiere sie zu Windows, die jeweils eine 100 enthalten. Ich stepst sie in Stapelgrößen von 32. Ich erhalte jedoch einen Fehler beim Anpassen des Modells. Ich habe das Gefühl, es könnte daran liegen, dass es 48 32-Größe-Stapel von 1515 Fenstern erzeugt, die die letzte Stapel unvollständig machen, da 1515/32 = 47.34375 aber ich weiß nicht, ob dies wirklich ein Problem verursachen sollte-ich hatte in der Vergangenheit nie das Problem. PrettyPrint-Override ">

Code: Select all

model.fit(training_data, epochs=50, validation_data=validation_data)
< /code>
So explizit, dass ich die Schritte nicht festlegt, was nicht bedeuten sollte, dass es zu TensorFlow führen sollte, das die gesamte Training -Data in jeder Epoche durchläuft. Wenn ja, können Sie ein Beispiel geben, wie Sie es fallen lassen können?2025-04-03 12:20:03 [INFO]: Training (151508, 2)
2025-04-03 12:20:03 [INFO]: Validation (26514, 2)
2025-04-03 12:20:03 [INFO]: Test (22728, 2)
I0000 00:00:1743675604.435011    6880 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-04-03 12:20:04.435109: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2343] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2025-04-03 12:20:04.838231: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Number of windows created: 1515
Input shape: (32, 100, 384)
Label shape: (32, 100, 384)
2025-04-03 12:20:07.074530: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Counted training batches: 48
Epoch 1/50
2025-04-03 12:20:09.798596: E tensorflow/core/util/util.cc:131] oneDNN supports DT_INT32 only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.
47/Unknown 5s 48ms/step - loss: 0.00172025-04-03 12:20:12.186355: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
[[{{node IteratorGetNext}}]]
/home/xaver/Documents/Repos/logAnomalyDetection/venv/lib/python3.12/site-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches.  You may need to use the `.repeat()` function when building your dataset.
self._interrupted_warning()
< /code>
 Code < /h1>
"""
Semi-supervised log anomaly detection using sentence vector embeddings and a sliding window
"""

import logging
import pickle
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import layers, models, Input
from sklearn.metrics import (
confusion_matrix, matthews_corrcoef, precision_score, recall_score,
f1_score, roc_auc_score, average_precision_score
)

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

WINDOW_SIZE = 100
BATCH_SIZE = 32

# Configure logging
logging.basicConfig(
format="%(asctime)s [%(levelname)s]: %(message)s",
level=logging.INFO,
datefmt="%Y-%m-%d %H:%M:%S"
)

# Step 1: Load and prepare data
input_pickle = "outputs/parsed_openstack_logs_with_embeddings_final.pickle"
with open(input_pickle, "rb") as handle:
df = pickle.load(handle)

logging.info("DataFrame loaded successfully.")
# embeddings will be the input and output since AE tries to create the representation from the latent space
# label contains whether the data is anomalous (1) or normal (0).
logging.debug(f"{df.columns}")
logging.debug(f"\n{df.head()}")

# split data so that training data contains no anomalies, validation and testing contain an equal number of normal and abnormal entries
embedding_length = len(df["embeddings"][0])
print("Length of embedding vector:", embedding_length)
normal_data = df[df['label'] == 0]
test_data_abnormal = df[df['label'] == 1]
abnormal_normal_ratio = len(test_data_abnormal) / len(normal_data)
logging.info(f"N normal: {len(normal_data)};  N abnormal: {len(test_data_abnormal)} -- Ratio: {abnormal_normal_ratio}")

training_data, rest_data = train_test_split(normal_data, train_size=0.8, shuffle=False)
validation_data, test_data_normal = train_test_split(rest_data, test_size=0.3, shuffle=False)

test_data_normal = test_data_normal.head(len(test_data_abnormal))
test_data_abnormal = test_data_abnormal.head(len(test_data_normal))

test_data = pd.concat([test_data_normal, test_data_abnormal])

def create_window_tf_dataset(dataset):
# transforms the embeddings into a windowed dataset
embeddings_list = dataset["embeddings"].tolist()
embeddings_array = np.array(embeddings_list)
embedding_tensor = tf.convert_to_tensor(embeddings_array)
df_tensor  = tf.convert_to_tensor(embedding_tensor)
tensor_dataset = tf.data.Dataset.from_tensor_slices(df_tensor)
windowed_dataset = tensor_dataset.window(WINDOW_SIZE, shift=WINDOW_SIZE, drop_remainder=True)
windowed_dataset = windowed_dataset.flat_map(lambda window: window.batch(WINDOW_SIZE))
return windowed_dataset

logging.info(f"Training {training_data.shape}")
logging.info(f"Validation {validation_data.shape}")
logging.info(f"Test {test_data.shape}")

num_windows = sum(1 for _ in create_window_tf_dataset(training_data))
print(f"Number of windows created: {num_windows}") # 1515

training_data = create_window_tf_dataset(training_data).map(lambda window: (window, window)) # training data is normal
validation_data = create_window_tf_dataset(validation_data).map(lambda window: (window, window)) # training data is normal
test_data_normal = create_window_tf_dataset(test_data_normal).map(lambda window: (window, window)) # normal training data is normal
test_data_abnormal = create_window_tf_dataset(test_data_abnormal).map(lambda window: (window, window)) # abnormal training data is abnormal

# group normal and abnormal data
test_data = test_data_normal.concatenate(test_data_abnormal)

training_data = training_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(BATCH_SIZE)
test_data = test_data.batch(BATCH_SIZE)
for x, y in training_data.take(1):
print("Input shape:", x.shape) # (32, 100, 384)
print("Label shape:", y.shape) # (32, 100, 384)
train_batches = sum(1 for _ in training_data)
print(f"Counted training batches: {train_batches}") # 48

# Build the Autoencoder model
model = models.Sequential([
Input(shape=(WINDOW_SIZE, embedding_length)),
layers.LSTM(64, return_sequences=True),
layers.LSTM(32, return_sequences=False),
layers.RepeatVector(WINDOW_SIZE),
layers.LSTM(32, return_sequences=True),
layers.LSTM(64, return_sequences=True),
layers.TimeDistributed(layers.Dense(embedding_length))
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(training_data, epochs=50, validation_data=validation_data) # error occurs here
# [... evaluation ...]

Zusätzliche Informationen
Meine TensorFlow -Version ist 2.17.0 .

TensorFlow läuft außerhalb des Anpassung außerhalb der Reichweite: Lokales Rendezvous ist mit Status abgebrochen: out_of

TensorFlow läuft außerhalb des Anpassung außerhalb der Reichweite: Lokales Rendezvous ist mit Status abgebrochen: out_of ⇐ Python

Quick Reply