代理式 RAG 堆栈 (2/5) - 使用 Sentence Transformers 进行重新排名以增强检索结果

社区文章发布于 2025 年 2 月 5 日

davidberenstein1957

这是关于代理式 RAG 的博客系列第 2 部分，该系列是 AI-blueprint 的一部分！阅读第 1 部分或继续第 3 部分。

AI 开发蓝图，重点关注 LLM 和代理时代 RAG、信息提取等方面的应用示例。这是一种实用的方法，旨在展示 smol-course 中一些更理论化的学习的应用，并将其应用于端到端的实际示例。

🚀 包含 Web 应用程序和微服务！

每个笔记本都将展示如何使用 Gradio 将您的 AI 作为 Web 应用程序部署到 Hugging Face Spaces，您可以通过 Gradio Python 客户端直接将其用作微服务。所有代码和演示都可以在私有或公共设置中使用。已部署到 Hub！

简介

检索是对查询最相关文档的快速估计，这对于首次处理数百万文档非常有效，但我们可以通过重新排序检索到的文档来提高这种相关性。我们将构建一个重排序器，它可以与使用 Sentence Transformers 和 DuckDB 索引和检索文档以进行向量搜索笔记本中的检索微服务一起在 RAG 管道中使用。最后，我们将部署一个微服务，该服务可用于根据查询对文档进行重新排序。

依赖项和导入

让我们安装必要的依赖项。

!pip install gradio gradio-client pandas sentence-transformers -q

现在，让我们导入必要的库。

import gradio as gr
import pandas as pd

from gradio_client import Client
from sentence_transformers import CrossEncoder

将 Hugging Face 作为向量搜索后端

简要回顾一下之前的笔记本，我们使用 Hugging Face 作为向量搜索后端，并可以通过 Gradio Python Client 将其作为 REST API 调用。

gradio_client = Client("https://ai-blueprint-rag-retrieve.hf.space/")


def similarity_search(query: str, k: int = 5) -> pd.DataFrame:
    results = gradio_client.predict(api_name="/similarity_search", query=query, k=k)
    return pd.DataFrame(data=results["data"], columns=results["headers"])


similarity_search("What is the future of AI?", k=5)

Loaded as API: https://ai-blueprint-rag-retrieve.hf.space/ ✔

	网址	文本	距离
0	https://www.bbc.com/news/technology-51064369	过去十年对人工智能来说是重要的一年……	0.281200
1	http://www.bbc.co.uk/news/technology-25000756	奇点：机器人即将偷走我们的……	0.365842
2	http://www.bbc.com/news/technology-25000756	奇点：机器人即将偷走我们的……	0.365842
3	https://www.bbc.co.uk/news/technology-37494863	Google、Facebook、Amazon 联手未来……	0.380820
4	https://www.bbc.co.uk/news/technology-37494863	Google、Facebook、Amazon 联手未来……	0.380820

重新排序检索到的文档

每当我们从向量搜索后端检索文档时，我们都可以提高传递给 LLM 的文档质量。我们通过按文档与查询的相关性进行排名来实现。我们将使用 sentence-transformers 库。您可以使用 MTEB 排行榜找到执行此操作的最佳模型。

我们首先检索 50 个文档，然后使用 sentence-transformers/all-MiniLM-L12-v2 对文档进行重新排序并返回前 5 个。

reranker = CrossEncoder("sentence-transformers/all-MiniLM-L12-v2")


def rerank(query: str, documents: pd.DataFrame) -> pd.DataFrame:
    documents = documents.copy()
    documents = documents.drop_duplicates("text")
    documents["rank"] = reranker.predict([[query, hit] for hit in documents["text"]])
    documents = documents.sort_values(by="rank", ascending=False)
    return documents


query = "What is the future of AI?"
documents = similarity_search(query, k=50)
reranked_documents = rerank(query=query, documents=documents)
reranked_documents[:5]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/all-MiniLM-L12-v2 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

	网址	文本	距离	排名
8	http://www.bbc.com/news/world-us-canada-39425862	Vector Institute 只是加拿大最新的……	0.424994	0.508780
12	http://www.bbc.com/news/business-34266425	Google 的 Demis Hassabis – 人工智能的滥用……	0.442649	0.508423
19	http://news.bbc.co.uk/2/hi/uk_news/england/wea...	一群英格兰东北部的科学家……	0.484410	0.508336
21	https://www.bbc.com/news/technology-47668476	教皇方济各将如何塑造机器人世界的未来……	0.494108	0.508200
42	http://news.bbc.co.uk/2/hi/technology/6583893.stm	科学家们对使用……表示担忧	0.530431	0.507771

我们可以看到返回的文档在排名上略有变化，这是好事，因为我们看到我们的重排序器正在工作。

创建用于重新排名的 Web 应用程序和微服务

我们将使用 Gradio 作为 Web 应用程序工具来为我们的重排序器创建演示界面。我们可以在本地开发，然后轻松部署到 Hugging Face Spaces。最后，我们可以使用 Gradio 客户端作为 SDK 直接与我们的重排序微服务进行交互。

创建 Web 应用程序

with gr.Blocks() as demo:
    gr.Markdown("""# RAG - Augment 
                
                Applies reranking to the retrieved documents using [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2)
                
                Part of [AI blueprint](https://github.com/davidberenstein1957/ai-blueprint) - a blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs and agents..""")

    query_input = gr.Textbox(
        label="Query", placeholder="Enter your question here...", lines=3
    )
    documents_input = gr.Dataframe(
        label="Documents", headers=["text"], wrap=True, interactive=True
    )

    submit_btn = gr.Button("Submit")
    documents_output = gr.Dataframe(
        label="Documents", headers=["text", "rank"], wrap=True
    )

    submit_btn.click(
        fn=rerank_documents,
        inputs=[query_input, documents_input],
        outputs=[documents_output],
    )

demo.launch(share=False) # share=True is used to share the app with the public

* Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.

将 Web 应用程序部署到 Hugging Face

现在我们可以将 Gradio 应用程序部署到 Hugging Face Spaces。

点击“创建空间”按钮。
从 Gradio 界面复制代码并将其粘贴到 app.py 文件中。不要忘记复制 generate_response_* 函数以及执行 RAG 管道的代码。
使用 gradio-client 和 sentence-transformers 创建一个 requirements.txt 文件。
如果您使用 Inference API，请在空间设置中将 Hugging Face API 设置为 HF_TOKEN 秘密变量。

我们等待几分钟让应用程序部署，然后，我们就拥有了一个公共重排序接口！

将 Web 应用程序用作微服务

现在我们可以使用 Gradio 客户端作为 SDK 直接与我们的 RAG 管道进行交互。每个 Gradio 应用程序都有一个 API 文档，描述了可用的端点及其参数，您可以从 Gradio 应用程序空间页面底部的按钮访问它。

client = Client("https://ai-blueprint-rag-augment.hf.space/")

df = similarity_search("What is the future of AI?", k=10)
data = client.predict(
    query="What is the future of AI?",
    documents={"headers": df.columns.tolist(), "data": df.values.tolist(), "metadata": None},
    api_name="/rerank",
)
pd.DataFrame(data=data["data"], columns=data["headers"])

Loaded as API: https://ai-blueprint-rag-augment.hf.space/ ✔

	网址	文本	距离	排名
0	https://www.bbc.co.uk/news/business-48139212	人工智能 (AI) 是最……之一	0.407243	0.511831
1	http://www.bbc.com/news/technology-39657505	Ted 2017：想要上大学的机器人……	0.424357	0.509631
2	http://www.bbc.com/news/world-us-canada-39425862	Vector Institute 只是加拿大最新的……	0.424994	0.508584
3	https://www.bbc.co.uk/news/technology-37494863	Google、Facebook、Amazon 联手未来……	0.380820	0.507728
4	https://www.bbc.com/news/technology-51064369	过去十年对人工智能来说是重要的一年……	0.281200	0.506788
5	http://www.bbc.co.uk/news/technology-25000756	奇点：机器人即将偷走我们的……	0.365842	0.506259
6	https://www.bbc.com/news/technology-52415775	英国间谍将需要使用人工智能……	0.414651	0.505149

结论

我们已经了解了如何使用 sentence-transformers 库创建重排序器，以及如何将其作为微服务部署到 Hugging Face Spaces。下一步将是创建一个模型，用于生成对查询的响应。

后续步骤

继续 - 第 3 部分。
贡献 - 缺少什么？欢迎提交 PR。
学习 - Hugging Face 课程或 smol-course 中方法背后的理论。
探索 - Hugging Face Cookbook 中使用类似技术的笔记本。

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论