Wenn ich versuche, sie in Google Colab zu laden, ist der Vorgang extrem langsam (nur ~2 Dateien pro Sekunde).
Hier ist mein Code:
Code: Select all
import glob
import torch
from tqdm import tqdm
folder = "drive/MyDrive/train_embeddings_35M/*.pt"
id_to_emb = {}
files = glob.glob(folder)
print("Total files:", len(files))
for file in tqdm(files, desc="Loading embeddings"):
data = torch.load(file, map_location="cuda")
raw_id = data["entry_id"]
formatted_id = raw_id.split("|")[1]
layer = list(data["mean_representations"].keys())[0]
emb = data["mean_representations"][layer]
id_to_emb[formatted_id] = emb
Code: Select all
Total files: 46181
Loading embeddings: 3%|▎ | 1380/46181 [07:50
Mobile version