Ich habe einen Datensatz, der semantische Vektoren von Länge 384 enthält. Ich gruppiere sie zu Windows, die jeweils eine 100 enthalten. Ich stepst sie in Stapelgrößen von 32. Ich erhalte jedoch einen Fehler beim Anpassen des Modells. Ich habe das Gefühl, es könnte daran liegen, dass es 48 32-Größe-Stapel von 1515 Fenstern erzeugt, die die letzte Stapel unvollständig machen, da 1515/32 = 47.34375 aber ich weiß nicht, ob dies wirklich ein Problem verursachen sollte-ich hatte in der Vergangenheit nie das Problem. PrettyPrint-Override ">
model.fit(training_data, epochs=50, validation_data=validation_data)
< /code>
So explizit, dass ich die Schritte nicht festlegt, was nicht bedeuten sollte, dass es zu TensorFlow führen sollte, das die gesamte Training -Data in jeder Epoche durchläuft. Wenn ja, können Sie ein Beispiel geben, wie Sie es fallen lassen können?2025-04-03 12:20:03 [INFO]: Training (151508, 2)
2025-04-03 12:20:03 [INFO]: Validation (26514, 2)
2025-04-03 12:20:03 [INFO]: Test (22728, 2)
I0000 00:00:1743675604.435011 6880 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-04-03 12:20:04.435109: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2343] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2025-04-03 12:20:04.838231: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Number of windows created: 1515
Input shape: (32, 100, 384)
Label shape: (32, 100, 384)
2025-04-03 12:20:07.074530: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Counted training batches: 48
Epoch 1/50
2025-04-03 12:20:09.798596: E tensorflow/core/util/util.cc:131] oneDNN supports DT_INT32 only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.
47/Unknown 5s 48ms/step - loss: 0.00172025-04-03 12:20:12.186355: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
[[{{node IteratorGetNext}}]]
/home/xaver/Documents/Repos/logAnomalyDetection/venv/lib/python3.12/site-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
self._interrupted_warning()
< /code>
Code < /h1>
"""
Semi-supervised log anomaly detection using sentence vector embeddings and a sliding window
"""
import logging
import pickle
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import layers, models, Input
from sklearn.metrics import (
confusion_matrix, matthews_corrcoef, precision_score, recall_score,
f1_score, roc_auc_score, average_precision_score
)
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
WINDOW_SIZE = 100
BATCH_SIZE = 32
# Configure logging
logging.basicConfig(
format="%(asctime)s [%(levelname)s]: %(message)s",
level=logging.INFO,
datefmt="%Y-%m-%d %H:%M:%S"
)
# Step 1: Load and prepare data
input_pickle = "outputs/parsed_openstack_logs_with_embeddings_final.pickle"
with open(input_pickle, "rb") as handle:
df = pickle.load(handle)
logging.info("DataFrame loaded successfully.")
# embeddings will be the input and output since AE tries to create the representation from the latent space
# label contains whether the data is anomalous (1) or normal (0).
logging.debug(f"{df.columns}")
logging.debug(f"\n{df.head()}")
# split data so that training data contains no anomalies, validation and testing contain an equal number of normal and abnormal entries
embedding_length = len(df["embeddings"][0])
print("Length of embedding vector:", embedding_length)
normal_data = df[df['label'] == 0]
test_data_abnormal = df[df['label'] == 1]
abnormal_normal_ratio = len(test_data_abnormal) / len(normal_data)
logging.info(f"N normal: {len(normal_data)}; N abnormal: {len(test_data_abnormal)} -- Ratio: {abnormal_normal_ratio}")
training_data, rest_data = train_test_split(normal_data, train_size=0.8, shuffle=False)
validation_data, test_data_normal = train_test_split(rest_data, test_size=0.3, shuffle=False)
test_data_normal = test_data_normal.head(len(test_data_abnormal))
test_data_abnormal = test_data_abnormal.head(len(test_data_normal))
test_data = pd.concat([test_data_normal, test_data_abnormal])
def create_window_tf_dataset(dataset):
# transforms the embeddings into a windowed dataset
embeddings_list = dataset["embeddings"].tolist()
embeddings_array = np.array(embeddings_list)
embedding_tensor = tf.convert_to_tensor(embeddings_array)
df_tensor = tf.convert_to_tensor(embedding_tensor)
tensor_dataset = tf.data.Dataset.from_tensor_slices(df_tensor)
windowed_dataset = tensor_dataset.window(WINDOW_SIZE, shift=WINDOW_SIZE, drop_remainder=True)
windowed_dataset = windowed_dataset.flat_map(lambda window: window.batch(WINDOW_SIZE))
return windowed_dataset
logging.info(f"Training {training_data.shape}")
logging.info(f"Validation {validation_data.shape}")
logging.info(f"Test {test_data.shape}")
num_windows = sum(1 for _ in create_window_tf_dataset(training_data))
print(f"Number of windows created: {num_windows}") # 1515
training_data = create_window_tf_dataset(training_data).map(lambda window: (window, window)) # training data is normal
validation_data = create_window_tf_dataset(validation_data).map(lambda window: (window, window)) # training data is normal
test_data_normal = create_window_tf_dataset(test_data_normal).map(lambda window: (window, window)) # normal training data is normal
test_data_abnormal = create_window_tf_dataset(test_data_abnormal).map(lambda window: (window, window)) # abnormal training data is abnormal
# group normal and abnormal data
test_data = test_data_normal.concatenate(test_data_abnormal)
training_data = training_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(BATCH_SIZE)
test_data = test_data.batch(BATCH_SIZE)
for x, y in training_data.take(1):
print("Input shape:", x.shape) # (32, 100, 384)
print("Label shape:", y.shape) # (32, 100, 384)
train_batches = sum(1 for _ in training_data)
print(f"Counted training batches: {train_batches}") # 48
# Build the Autoencoder model
model = models.Sequential([
Input(shape=(WINDOW_SIZE, embedding_length)),
layers.LSTM(64, return_sequences=True),
layers.LSTM(32, return_sequences=False),
layers.RepeatVector(WINDOW_SIZE),
layers.LSTM(32, return_sequences=True),
layers.LSTM(64, return_sequences=True),
layers.TimeDistributed(layers.Dense(embedding_length))
])
# Compile the model
model.compile(optimizer='adam', loss='mse')
# Train the model
model.fit(training_data, epochs=50, validation_data=validation_data) # error occurs here
# [... evaluation ...]
Zusätzliche Informationen
Meine TensorFlow -Version ist 2.17.0 .
Ich habe einen Datensatz, der semantische Vektoren von Länge 384 enthält. Ich gruppiere sie zu Windows, die jeweils eine 100 enthalten. Ich stepst sie in Stapelgrößen von 32. Ich erhalte jedoch einen Fehler beim Anpassen des Modells. Ich habe das Gefühl, es könnte daran liegen, dass es 48 32-Größe-Stapel von 1515 Fenstern erzeugt, die die letzte Stapel unvollständig machen, da 1515/32 = 47.34375 aber ich weiß nicht, ob dies wirklich ein [url=viewtopic.php?t=26065]Problem[/url] verursachen sollte-ich hatte in der Vergangenheit nie das Problem. PrettyPrint-Override ">[code]model.fit(training_data, epochs=50, validation_data=validation_data) < /code> So explizit, dass ich die Schritte nicht festlegt, was nicht bedeuten sollte, dass es zu TensorFlow führen sollte, das die gesamte Training -Data in jeder Epoche durchläuft. Wenn ja, können Sie ein Beispiel geben, wie Sie es fallen lassen können?2025-04-03 12:20:03 [INFO]: Training (151508, 2) 2025-04-03 12:20:03 [INFO]: Validation (26514, 2) 2025-04-03 12:20:03 [INFO]: Test (22728, 2) I0000 00:00:1743675604.435011 6880 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2025-04-03 12:20:04.435109: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2343] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2025-04-03 12:20:04.838231: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence Number of windows created: 1515 Input shape: (32, 100, 384) Label shape: (32, 100, 384) 2025-04-03 12:20:07.074530: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence Counted training batches: 48 Epoch 1/50 2025-04-03 12:20:09.798596: E tensorflow/core/util/util.cc:131] oneDNN supports DT_INT32 only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present. 47/Unknown 5s 48ms/step - loss: 0.00172025-04-03 12:20:12.186355: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence [[{{node IteratorGetNext}}]] /home/xaver/Documents/Repos/logAnomalyDetection/venv/lib/python3.12/site-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset. self._interrupted_warning() < /code> Code < /h1> """ Semi-supervised log anomaly detection using sentence vector embeddings and a sliding window """
import logging import pickle import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import tensorflow as tf from tensorflow.keras import layers, models, Input from sklearn.metrics import ( confusion_matrix, matthews_corrcoef, precision_score, recall_score, f1_score, roc_auc_score, average_precision_score )
from tensorflow.python.client import device_lib device_lib.list_local_devices()
# Step 1: Load and prepare data input_pickle = "outputs/parsed_openstack_logs_with_embeddings_final.pickle" with open(input_pickle, "rb") as handle: df = pickle.load(handle)
logging.info("DataFrame loaded successfully.") # embeddings will be the input and output since AE tries to create the representation from the latent space # label contains whether the data is anomalous (1) or normal (0). logging.debug(f"{df.columns}") logging.debug(f"\n{df.head()}")
# split data so that training data contains no anomalies, validation and testing contain an equal number of normal and abnormal entries embedding_length = len(df["embeddings"][0]) print("Length of embedding vector:", embedding_length) normal_data = df[df['label'] == 0] test_data_abnormal = df[df['label'] == 1] abnormal_normal_ratio = len(test_data_abnormal) / len(normal_data) logging.info(f"N normal: {len(normal_data)}; N abnormal: {len(test_data_abnormal)} -- Ratio: {abnormal_normal_ratio}")
num_windows = sum(1 for _ in create_window_tf_dataset(training_data)) print(f"Number of windows created: {num_windows}") # 1515
training_data = create_window_tf_dataset(training_data).map(lambda window: (window, window)) # training data is normal validation_data = create_window_tf_dataset(validation_data).map(lambda window: (window, window)) # training data is normal test_data_normal = create_window_tf_dataset(test_data_normal).map(lambda window: (window, window)) # normal training data is normal test_data_abnormal = create_window_tf_dataset(test_data_abnormal).map(lambda window: (window, window)) # abnormal training data is abnormal
# group normal and abnormal data test_data = test_data_normal.concatenate(test_data_abnormal)
training_data = training_data.batch(BATCH_SIZE) validation_data = validation_data.batch(BATCH_SIZE) test_data = test_data.batch(BATCH_SIZE) for x, y in training_data.take(1): print("Input shape:", x.shape) # (32, 100, 384) print("Label shape:", y.shape) # (32, 100, 384) train_batches = sum(1 for _ in training_data) print(f"Counted training batches: {train_batches}") # 48
# Build the Autoencoder model model = models.Sequential([ Input(shape=(WINDOW_SIZE, embedding_length)), layers.LSTM(64, return_sequences=True), layers.LSTM(32, return_sequences=False), layers.RepeatVector(WINDOW_SIZE), layers.LSTM(32, return_sequences=True), layers.LSTM(64, return_sequences=True), layers.TimeDistributed(layers.Dense(embedding_length)) ])
# Compile the model model.compile(optimizer='adam', loss='mse')
# Train the model model.fit(training_data, epochs=50, validation_data=validation_data) # error occurs here # [... evaluation ...] [/code] Zusätzliche Informationen Meine TensorFlow -Version ist 2.17.0 .
Ziemlich neu mit Datengenerator und Datensatz von Tensorflow. Ich habe Probleme mit der Größenanpassung von Batch, Epochen und Schritten ... Ich kann mir nicht vorstellen, wie man den Fehler „Lokales...
neuer Benutzer hier. Ich verfolgte das gesamte Diagramm als Vektor, wobei jeder Eintrag ein Vektor ist, der einen Knoten darstellt, dessen Elemente seine Nachbarn sind. Anschließend wird eine...
Ich habe den folgenden Code in Java
qry= insert into usr_auth(usrid,username,password,email) values(?,?,?,?) ;
ps=con.prepareStatement(qry);
ps.setString(1,getUniqueID());...
Ich habe Probleme mit TensorFlow, wenn ich meine GPU nicht sehen kann. Ich weiß nicht, ob es ein Problem mit der Installation von CUDA 11.8 gibt, wenn mein Nvidia-Smi sagt: NVIDIA-SMI 572.70...
Also verwende ich das Jetpack für meine App und habe einen Status mit einem Float -Wert im ViewModel. Dieser Wert kann außerhalb des Schiebereglers aktualisiert werden. Aber auch vom Schieberegler,...