BM25Retriever + Chromadb Hybrid -Suchoptimierung mit LangchainPython

Python-Programme
Anonymous
 BM25Retriever + Chromadb Hybrid -Suchoptimierung mit Langchain

Post by Anonymous »

Für diejenigen, die den Chromadb -Client in das Langchain -Framework integriert haben, schlage ich den folgenden Ansatz zur Implementierung der Hybridsuche vor (Vektorsuche + BM25Retriever): < /p>

Code: Select all

from langchain_chroma import Chroma
import chromadb
from chromadb.config import Settings
from langchain_openai import OpenAIEmbeddings
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import TypedDict

# Assuming that you have instantiated Chroma client and integrate it into Langchain (below is an example)
“””
persistent_client = chromadb.PersistentClient(path=”./test”, settings=Settings(allow_reset=True))
collection = persistent_client.get_or_create_collection(
name=”example”,
metadata={
"hnsw:space": "cosine",
# you can add other HNSW parameters if you want
}
)

chroma = Chroma(
client=persistent_client,
collection_name=collection.name,
embedding_function= OpenAIEmbeddings(model="text-embedding-3-large"))
“””

def hybrid_search(self, query: str, k: int = 5):
"""Perform a Hybrid Search (similarity_search + BM25Retriever) in the collection."""
# Get all raw documents from the ChromaDB
raw_docs = chroma.get(include=["documents", "metadatas"])
# Convert them in Document object
documents = [
Document(page_content=doc, metadata=meta)
for doc, meta in zip(raw_docs["documents"], raw_docs["metadatas"])
]
# Create BM25Retriever from the documents
bm25_retriever = BM25Retriever.from_documents(documents=documents, k=k)
# Create vector search retriever from ChromaDB instance
similarity_search_retriever = self.chroma.as_retriever(
search_type="similarity",
search_kwargs={'k': k}
)
# Ensemble the retrievers using Langchain’s EnsembleRetriever Object
ensemble_retriever = EnsembleRetriever(retrievers=[similarity_search_retriever, bm25_retriever], weights=[0.5, 0.5])
# Retrieve k relevant documents for the query
return ensemble_retriever.invoke(query) # If needed, we can use ainvoke(query) method to retrieve the docs asynchrounously

# Call hybrid_search() method
# Graph Nodes State approach
class State(TypedDict):
question: str
context: List[Document]
answer: str

# --- Define Graph Nodes (retrieve, generate, etc.) ---
def retrieve(state: State) -> dict:
retrieved_docs = vector_store.hybrid_search(state["question"], 3)
return {"context": retrieved_docs}

Hinweis : Der obige Code ist nur eine Sequenz, die ausschließlich die Abrufkomponente enthält, die in die Anwendungsstruktur und den RAG -Fluss weiter integriert ist.>

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post