Schwerwiegender RAM-Leck beim Ausführen von OpenVINO-Inferenz auf Raspberry Pi 5 (ARM) – sogar nur mit infer()

Anonymous · Post by **Anonymous** » 17 Jan 2026, 04:07

Ich entwickle ein Objekterkennungsprojekt, das auf einem Raspberry Pi 5 läuft.

Auf der Kameraseite (Picamera2 / libcamera) funktioniert alles einwandfrei: Wenn die Kamera alleine läuft, ist die RAM-Nutzung völlig stabil.
Sobald ich jedoch mit YOLO zur Objekterkennungsphase übergehe, tritt ein ernstes Problem auf. Selbst nachdem Ultralytics YOLO vollständig entfernt und die Inferenz direkt mit OpenVINO Runtime durchgeführt wurde, besteht das Problem weiterhin.
Die entscheidende Beobachtung ist folgende:

Auch wenn keine Bildvorverarbeitung durchgeführt wird
Auch wenn keine Kamerarahmen verwendet werden
Selbst wenn ich wiederholt only infer() für eine statische Eingabe aufrufe
Die RAM-Nutzung steigt um zig Megabyte pro Sekunde

Dies zeigt deutlich, dass das Problem nicht verursacht wird durch:

Farbe Konvertierung
Bildgrößenänderung
Kamerapuffer
NumPy-Zuweisungen
OpenCV-Vorverarbeitung

Stattdessen scheint das Problem von zu stammen OpenVINO auf ARM (Raspberry Pi) gibt Speicher nicht ordnungsgemäß frei.
Dieses Verhalten ist besonders reproduzierbar mit:

OpenVINO 2025.4.1
Python 3.13
Raspberry Pi 5 (ARM64)

Auf einem Desktop-PC scheint die gleiche Logik „gut“ zu funktionieren, aber das Problem wird wahrscheinlich durch die große Menge an verfügbarem RAM verdeckt.
Um das Problem einzugrenzen, habe ich einen minimalen Test geschrieben, bei dem:

das Modell einmal geladen wird
Tensoren werden wiederverwendet
es werden keine neuen NumPy-Arrays zugewiesen
Inferenz wird in einer engen Schleife ausgeführt

Selbst in diesem Isolationsmodus erhöht sich der RSS-Speicher kontinuierlich.
An diesem Punkt bin ich Erwägen:

OpenVINO herunterstufen
Python herunterstufen
oder OpenVINO ganz aufgeben und zu TFLite oder NCNN

wechseln. Bevor ich das tue, würde ich das gerne tun Verstehe:

Ist das ein bekannter OpenVINO-Speicherverlust auf ARM?
Hängt das mit Python 3.13-Bindungen zusammen?
Gibt es eine empfohlene Problemumgehung oder Konfiguration, um die Wiederverwendung von Speicher zu erzwingen?

Unten finden Sie einen Minimalwert Reproduzierbares Beispiel, das das Problem verdeutlicht.
Beispiel für RAM-Leck-Code:

Code: Select all

import sys
import os
import glob
import time
import argparse
import gc
import psutil
import cv2
import numpy as np
import openvino.runtime as ov

# --- CONFIGURATION ---
MODEL_DIR = "yolo11n_openvino_model"
CONF_THRESHOLD = 0.50
INPUT_W, INPUT_H = 640, 640  # Model Input Dimensions
CAM_W, CAM_H = 640, 480      # Camera Dimensions

def get_rss_mb():
process = psutil.Process(os.getpid())
return process.memory_info().rss / 1024 / 1024

class YoloZeroAlloc:
def __init__(self, model_dir):
self.core = ov.Core()

# Load Model
xml_files = glob.glob(os.path.join(model_dir, "*.xml"))
if not xml_files: raise FileNotFoundError(f"No .xml in {model_dir}")

print(f"Loading: {xml_files[0]}")
model = self.core.read_model(xml_files[0])

# Force Static Shape [1, 3, 640, 640]
print(f"Forcing Shape: [1, 3, {INPUT_H}, {INPUT_W}]")
model.reshape([1, 3, INPUT_H, INPUT_W])

self.compiled_model = self.core.compile_model(model, "CPU")
self.infer_request = self.compiled_model.create_infer_request()

# --- MEMORY POOLS (The Fix) ---
# 1. Input Tensor (Float32, NCHW)
self.input_tensor = self.infer_request.get_input_tensor()
self.input_data_buffer = self.input_tensor.data

# 2.  Resize Buffer (Uint8, HWC)
# We calculate the target size once based on aspect ratio
scale = min(INPUT_W / CAM_W, INPUT_H / CAM_H)
self.new_w = int(CAM_W * scale)
self.new_h = int(CAM_H * scale)
self.resize_buffer = np.zeros((self.new_h, self.new_w, 3), dtype=np.uint8)

# 3. Canvas Buffer (Uint8, HWC) - Full 640x640
self.canvas_buffer = np.full((INPUT_H, INPUT_W, 3), 114, dtype=np.uint8)

# Calculate padding offsets once
self.dw = (INPUT_W - self.new_w) // 2
self.dh = (INPUT_H - self.new_h) // 2

print("Buffers Allocated. Memory Pools Ready.")

def preprocess_zero_alloc(self, img_rgb):
"""
Resizes and pads WITHOUT allocating new numpy arrays.
Uses cv2.resize(dst=...) and in-place assignments.
"""
# 1. Resize directly into pre-allocated buffer
# This prevents creating a new 1.2MB array
cv2.resize(img_rgb, (self.new_w, self.new_h), dst=self.resize_buffer)

# 2. Reset Canvas (Fill with gray 114)
# Faster than np.full, we just assign the value
self.canvas_buffer[:] = 114

# 3. Copy resized image into canvas
# Numpy handles this heavily optimized
self.canvas_buffer[self.dh:self.dh+self.new_h, self.dw:self.dw+self.new_w] = self.resize_buffer

# 4. Normalize and Transpose directly to Tensor
# HWC -> CHW happens via transpose view (cheap)
# np.divide writes result directly to OpenVINO memory (no intermediate float array)

# Create a temporary view of the canvas for transposing
# (Views do not allocate data memory)
canvas_chw = self.canvas_buffer.transpose((2, 0, 1))

# Normalize 0-255 -> 0-1 directly into input_data_buffer
np.divide(canvas_chw, 255.0, out=self.input_data_buffer[0], casting='unsafe')

def infer_isolation(self):
"""Run inference ONLY. No preprocessing.  Just math."""
self.infer_request.infer()
# Retrieve result to ensure pipeline completes
_ = self.infer_request.get_output_tensor().data[0, 0, 0]

def infer_pipeline(self, img_rgb):
"""Run full zero-alloc pipeline."""
self.preprocess_zero_alloc(img_rgb)
self.infer_request.infer()
return self.infer_request.get_output_tensor().data

# --- TEST MODES ---

def run_isolation_test():
"""
MODE 1: Isolation
If this leaks, the OpenVINO driver is broken.
If this is stable, the leak was in the Python Preprocessing.
"""
print("\n--- MODE: ISOLATION (No Preprocessing) ---")
yolo = YoloZeroAlloc(MODEL_DIR)

print("Starting Inference Loop on Static Data...")
frames = 0
start = time.time()

while True:
try:
# PURE INFERENCE
yolo.infer_isolation()

frames += 1
if frames % 30 == 0:
rss = get_rss_mb()
elapsed = time.time() - start
fps = frames / elapsed
print(f"ISO | T:{elapsed:.0f}s | FPS:{fps:.1f} | RAM:{rss:.1f}MB")

# Manual GC every 10s just to be sure
if frames % 100 == 0: gc.collect()

except KeyboardInterrupt:
break

def run_fixed_test():
"""
MODE 2: Production Fix
Uses strict buffer reuse to stop the 19MB/s leak.
"""
print("\n--- MODE: FIXED ZERO-ALLOC PIPELINE ---")
yolo = YoloZeroAlloc(MODEL_DIR)

# Static dummy frame
frame_rgb = np.zeros((CAM_H, CAM_W, 3), dtype=np.uint8)
cv2.randu(frame_rgb, 0, 255)

print("Starting Optimized Pipeline...")
frames = 0
start = time.time()

while True:
try:
# Full Pipeline with Zero Alloc Preprocess
_ = yolo.infer_pipeline(frame_rgb)

frames += 1
if frames % 30 == 0:
rss = get_rss_mb()
elapsed = time.time() - start
fps = frames / elapsed
print(f"FIX | T:{elapsed:.0f}s | FPS:{fps:.1f} | RAM:{rss:.1f}MB")

if frames % 60 == 0:
gc.collect() # Helper sweep

except KeyboardInterrupt:
break

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--mode", choices=["isolation", "fixed"], required=True)
args = parser.parse_args()

if args.mode == "isolation":
run_isolation_test()
elif args.mode == "fixed":
run_fixed_test()

Schwerwiegender RAM-Leck beim Ausführen von OpenVINO-Inferenz auf Raspberry Pi 5 (ARM) – sogar nur mit infer()

Schwerwiegender RAM-Leck beim Ausführen von OpenVINO-Inferenz auf Raspberry Pi 5 (ARM) – sogar nur mit infer() ⇐ Python

Quick Reply