开源 AI 食谱文档

使用 Gemma、Elasticsearch 和 Hugging Face 模型构建 RAG 系统

开源 AI 食谱

加入 Hugging Face 社区

并获得增强的文档体验

在模型、数据集和 Spaces 上协作

通过加速推理获得更快的示例

切换文档主题

开始使用

使用 Gemma、Elasticsearch 和 Hugging Face 模型构建 RAG 系统

作者：lloydmeta

本笔记本将引导您构建一个由 Elasticsearch (ES) 和 Hugging Face 模型驱动的检索增强生成 (RAG) 系统，让您可以在 ES 向量化（您的 ES 集群在摄取和查询时为您向量化）与自向量化（您在将所有数据发送到 ES 之前对其进行向量化）之间切换。

您的用例应该使用什么？这取决于情况 🤷‍♂️。 ES 向量化意味着您的客户端不必实现它，因此这是此处的默认设置；但是，如果您没有任何 ML 节点，或者您自己的嵌入设置更好/更快，请随意在下面的“选择数据和查询向量化选项”部分中将 USE_ELASTICSEARCH_VECTORISATION 设置为 False！

此笔记本已在 ES 8.13.x 和 8.14.x 上进行了测试

步骤 1：安装库

!pip install elasticsearch sentence_transformers transformers eland==8.12.1 # accelerate # uncomment if using GPU
!pip install datasets==2.19.2 # Remove version lock if https://github.com/huggingface/datasets/pull/6978 has been released

步骤 2：设置

Hugging Face

这允许您通过 Hugging Face 进行身份验证，以下载模型和数据集。

from huggingface_hub import notebook_login

notebook_login()

Elasticsearch 部署

让我们确保您可以访问您的 Elasticsearch 部署。如果您没有，请在Elastic Cloud创建一个。

确保您已将 CLOUD_ID 和 ELASTIC_DEPL_API_KEY 保存为 Colab 密钥。

Image of how to set up secrets using Google Colab

from google.colab import userdata

# https://elastic.ac.cn/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
CLOUD_ID = userdata.get("CLOUD_ID")  # or "<YOUR CLOUD_ID>"

# https://elastic.ac.cn/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = userdata.get("ELASTIC_DEPL_API_KEY")  # or "<YOUR API KEY>"

设置客户端并确保凭据有效。

from elasticsearch import Elasticsearch, helpers

# Create the client instance
client = Elasticsearch(cloud_id=CLOUD_ID, api_key=ELASTIC_API_KEY)

# Successful response!
client.info()

步骤 3：数据来源和准备

本教程中使用的数据来源于 Hugging Face 数据集，特别是MongoDB/embedded_movies 数据集。

# Load Dataset
from datasets import load_dataset

# https://huggingface.co/datasets/MongoDB/embedded_movies
dataset = load_dataset("MongoDB/embedded_movies")

dataset

以下代码片段中的操作侧重于强制执行数据完整性和质量。

第一个过程确保每个数据点的 fullplot 属性不为空，因为这是我们在嵌入过程中使用的主要数据。
第二个步骤还确保我们从所有数据点中删除 plot_embedding 属性，因为这将由使用不同嵌入模型 gte-large 创建的新嵌入替换。

# Data Preparation

# Remove data point where plot coloumn is missing
dataset = dataset.filter(lambda x: x["fullplot"] is not None)

if "plot_embedding" in sum(dataset.column_names.values(), []):
    # Remove the plot_embedding from each data point in the dataset as we are going to create new embeddings with an open source embedding model from Hugging Face
    dataset = dataset.remove_columns("plot_embedding")

dataset["train"]

步骤 4：使用向量化数据加载 Elasticsearch

选择数据和查询向量化选项

在这里，您需要做出决定：您是希望 Elasticsearch 向量化您的数据和查询，还是希望自己执行此操作？

将 USE_ELASTICSEARCH_VECTORISATION 设置为 True 将使此笔记本的其余部分设置并使用 ES 托管向量化来处理您的数据和查询，但请注意，这需要您的 ES 部署至少具有 1 个 ML 节点（我建议在您的 Cloud 部署上将自动缩放设置为 true，以防您选择的模型太大）。

如果 USE_ELASTICSEARCH_VECTORISATION 为 False，则此笔记本将设置并“本地”使用提供的模型进行数据和查询向量化。

在这里，我选择了 thenlper/gte-small 模型，原因仅仅是因为它在另一个食谱中使用过，并且对我来说效果足够好。如果您愿意，请随时尝试其他模型 - 唯一重要的是您根据模型更新 EMBEDDING_DIMENSIONS。

注意：如果您更改这些值，您可能需要从此步骤重新运行笔记本。

USE_ELASTICSEARCH_VECTORISATION = True

EMBEDDING_MODEL_ID = "thenlper/gte-small"
# https://huggingface.co/thenlper/gte-small's page shows the dimensions of the model
# If you use the `gte-base` or `gte-large` embedding models, the numDimension
# value in the vector search index must be set to 768 and 1024, respectively.
EMBEDDING_DIMENSIONS = 384

如果需要，将 Hugging Face 模型加载到 Elasticsearch 中

如果 USE_ELASTICSEARCH_VECTORISATION 为 True，则此步骤使用Eland将 Hugging Face 模型加载并部署到 Elasticsearch 中。这允许 Elasticsearch 在后续步骤中向量化您的查询和数据。

import locale
locale.getpreferredencoding = lambda: "UTF-8"
!(if [ "True" == $USE_ELASTICSEARCH_VECTORISATION ]; then \
  eland_import_hub_model --cloud-id $CLOUD_ID --hub-model-id $EMBEDDING_MODEL_ID --task-type text_embedding --es-api-key $ELASTIC_API_KEY --start --clear-previous; \
fi)

此步骤添加了在本地创建文本嵌入的函数，并使用嵌入丰富了数据集，以便可以将数据作为向量摄取到 Elasticsearch 中。如果 USE_ELASTICSEARCH_VECTORISATION 为 True，则不运行。

from sentence_transformers import SentenceTransformer

if not USE_ELASTICSEARCH_VECTORISATION:
    embedding_model = SentenceTransformer(EMBEDDING_MODEL_ID)


def get_embedding(text: str) -> list[float]:
    if USE_ELASTICSEARCH_VECTORISATION:
        raise Exception(f"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]")
    else:
        if not text.strip():
            print("Attempted to get embedding for empty text.")
            return []

        embedding = embedding_model.encode(text)
        return embedding.tolist()


def add_fullplot_embedding(x):
    if USE_ELASTICSEARCH_VECTORISATION:
        raise Exception(f"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]")
    else:
        full_plots = x["fullplot"]
        return {"embedding": [get_embedding(full_plot) for full_plot in full_plots]}


if not USE_ELASTICSEARCH_VECTORISATION:
    dataset = dataset.map(add_fullplot_embedding, batched=True)
    dataset["train"]

步骤 5：创建具有向量搜索映射的搜索索引。

此时，我们在 Elasticsearch 中创建一个索引，其中包含正确的索引映射以处理向量搜索。

请访问此处阅读有关Elasticsearch 向量功能的更多信息。

>>> # Needs to match the id returned from Eland
>>> # in general for Hugging Face models, you just replace the forward slash with
>>> # double underscore
>>> model_id = EMBEDDING_MODEL_ID.replace("/", "__")

>>> index_name = "movies"

>>> index_mapping = {
...     "properties": {
...         "fullplot": {"type": "text"},
...         "plot": {"type": "text"},
...         "title": {"type": "text"},
...     }
... }
>>> # define index mapping
>>> if USE_ELASTICSEARCH_VECTORISATION:
...     index_mapping["properties"]["embedding"] = {
...         "properties": {
...             "is_truncated": {"type": "boolean"},
...             "model_id": {
...                 "type": "text",
...                 "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
...             },
...             "predicted_value": {
...                 "type": "dense_vector",
...                 "dims": EMBEDDING_DIMENSIONS,
...                 "index": True,
...                 "similarity": "cosine",
...             },
...         }
...     }
>>> else:
...     index_mapping["properties"]["embedding"] = {
...         "type": "dense_vector",
...         "dims": EMBEDDING_DIMENSIONS,
...         "index": "true",
...         "similarity": "cosine",
...     }

>>> # flag to check if index has to be deleted before creating
>>> should_delete_index = True

>>> # check if we want to delete index before creating the index
>>> if should_delete_index:
...     if client.indices.exists(index=index_name):
...         print("Deleting existing %s" % index_name)
...         client.indices.delete(index=index_name, ignore=[400, 404])

>>> print("Creating index %s" % index_name)


>>> # ingest pipeline definition
>>> if USE_ELASTICSEARCH_VECTORISATION:
...     pipeline_id = "vectorize_fullplots"

...     client.ingest.put_pipeline(
...         id=pipeline_id,
...         processors=[
...             {
...                 "inference": {
...                     "model_id": model_id,
...                     "target_field": "embedding",
...                     "field_map": {"fullplot": "text_field"},
...                 }
...             }
...         ],
...     )

...     index_settings = {
...         "index": {
...             "default_pipeline": pipeline_id,
...         }
...     }
>>> else:
...     index_settings = {}

>>> client.options(ignore_status=[400, 404]).indices.create(
...     index=index_name, mappings=index_mapping, settings=index_settings
... )

Creating index movies

将数据批量摄取到 Elasticsearch 中是最佳做法。幸运的是，helpers 提供了一种简单的方法来做到这一点。

>>> from elasticsearch.helpers import BulkIndexError


>>> def batch_to_bulk_actions(batch):
...     for record in batch:
...         action = {
...             "_index": "movies",
...             "_source": {
...                 "title": record["title"],
...                 "fullplot": record["fullplot"],
...                 "plot": record["plot"],
...             },
...         }
...         if not USE_ELASTICSEARCH_VECTORISATION:
...             action["_source"]["embedding"] = record["embedding"]
...         yield action


>>> def bulk_index(ds):
...     start = 0
...     end = len(ds)
...     batch_size = 100
...     if USE_ELASTICSEARCH_VECTORISATION:
...         # If using auto-embedding, bulk requests can take a lot longer,
...         # so pass a longer request_timeout here (defaults to 10s), otherwise
...         # we could get Connection timeouts
...         batch_client = client.options(request_timeout=600)
...     else:
...         batch_client = client
...     for batch_start in range(start, end, batch_size):
...         batch_end = min(batch_start + batch_size, end)
...         print(f"batch: start [{batch_start}], end [{batch_end}]")
...         batch = ds.select(range(batch_start, batch_end))
...         actions = batch_to_bulk_actions(batch)
...         helpers.bulk(batch_client, actions)


>>> try:
...     bulk_index(dataset["train"])
>>> except BulkIndexError as e:
...     print(f"{e.errors}")

>>> print("Data ingestion into Elasticsearch complete!")

batch: start [0], end [100]
batch: start [100], end [200]
batch: start [200], end [300]
batch: start [300], end [400]
batch: start [400], end [500]
batch: start [500], end [600]
batch: start [600], end [700]
batch: start [700], end [800]
batch: start [800], end [900]
batch: start [900], end [1000]
batch: start [1000], end [1100]
batch: start [1100], end [1200]
batch: start [1200], end [1300]
batch: start [1300], end [1400]
batch: start [1400], end [1452]
Data ingestion into Elasticsearch complete!

步骤 6：对用户查询执行向量搜索

以下步骤实现了一个返回向量搜索结果的函数。

如果 USE_ELASTICSEARCH_VECTORISATION 为 true，则文本查询将直接发送到 ES，其中上传的模型将首先用于对其进行向量化，然后再执行向量搜索。如果 USE_ELASTICSEARCH_VECTORISATION 为 false，那么我们将在本地进行向量化，然后再发送带有查询向量化形式的查询。

def vector_search(plot_query):
    if USE_ELASTICSEARCH_VECTORISATION:
        knn = {
            "field": "embedding.predicted_value",
            "k": 10,
            "query_vector_builder": {
                "text_embedding": {
                    "model_id": model_id,
                    "model_text": plot_query,
                }
            },
            "num_candidates": 150,
        }
    else:
        question_embedding = get_embedding(plot_query)
        knn = {
            "field": "embedding",
            "query_vector": question_embedding,
            "k": 10,
            "num_candidates": 150,
        }

    response = client.search(index="movies", knn=knn, size=5)
    results = []
    for hit in response["hits"]["hits"]:
        id = hit["_id"]
        score = hit["_score"]
        title = hit["_source"]["title"]
        plot = hit["_source"]["plot"]
        fullplot = hit["_source"]["fullplot"]
        result = {
            "id": id,
            "_score": score,
            "title": title,
            "plot": plot,
            "fullplot": fullplot,
        }
        results.append(result)
    return results


def pretty_search(query):

    get_knowledge = vector_search(query)

    search_result = ""
    for result in get_knowledge:
        search_result += f"Title: {result.get('title', 'N/A')}, Plot: {result.get('fullplot', 'N/A')}\n"

    return search_result

步骤 7：处理用户查询并加载 Gemma

>>> # Conduct query with retrival of sources, combining results into something that
>>> # we can feed to Gemma
>>> def combined_query(query):
...     source_information = pretty_search(query)
...     return f"Query: {query}\nContinue to answer the query by using these Search Results:\n{source_information}."


>>> query = "What is the best romantic movie to watch and why?"
>>> combined_results = combined_query(query)

>>> print(combined_results)

Query: What is the best romantic movie to watch and why?
Continue to answer the query by using these Search Results:
Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love?
Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl.
Title: Dark Blue World, Plot: March 15, 1939: Germany invades Czechoslovakia. Czech and Slovak pilots flee to England, joining the RAF. After the war, back home, they are put in labor camps, suspected of anti-Communist ideas. This film cuts between a post-war camp where Franta is a prisoner and England during the war, where Franta is like a big brother to Karel, a very young pilot. On maneuvers, Karel crash lands by the rural home of Susan, an English woman whose husband is MIA. She spends one night with Karel, and he thinks he's found the love of his life. It's complicated by Susan's attraction to Franta. How will the three handle innocence, Eros, friendship, and the heat of battle? When war ends, what then?
Title: Dark Blue World, Plot: March 15, 1939: Germany invades Czechoslovakia. Czech and Slovak pilots flee to England, joining the RAF. After the war, back home, they are put in labor camps, suspected of anti-Communist ideas. This film cuts between a post-war camp where Franta is a prisoner and England during the war, where Franta is like a big brother to Karel, a very young pilot. On maneuvers, Karel crash lands by the rural home of Susan, an English woman whose husband is MIA. She spends one night with Karel, and he thinks he's found the love of his life. It's complicated by Susan's attraction to Franta. How will the three handle innocence, Eros, friendship, and the heat of battle? When war ends, what then?
Title: No Good Deed, Plot: About a police detective, Jack, who, while doing a friend a favor and searching for a runaway teenager on Turk Street, stumbles upon a bizarre band of criminals about to pull off a bank robbery. Jack finds himself being held hostage while the criminals decide what to do with him, and the leader's beautiful girlfriend, Erin, is left alone to watch Jack. Erin, who we discover is a master manipulator of the men in the gang, reveals another side to Jack - a melancholy romantic who could have been a classical cellist. She finds Jack's captivity an irresistible turn-on and he can't figure out if she's for real, or manipulating him, too. Before the gang returns, Jack and Erin's connection intensifies and who ends up with the money is anyone's guess.
.

加载我们的 LLM（此处我们使用google/gemma-2b-lt）

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
# CPU Enabled uncomment below 👇🏽
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
# GPU Enabled use below 👇🏽
# model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto")

定义一种方法，该方法从 ES 中的向量化搜索中获取格式化的结果，然后将其馈送到 LLM 以获取我们的结果。

>>> def rag_query(query):
...     combined_information = combined_query(query)

...     # Moving tensors to GPU
...     input_ids = tokenizer(combined_information, return_tensors="pt")  # .to("cuda") # Add if using GPU
...     response = model.generate(**input_ids, max_new_tokens=700)

...     return tokenizer.decode(response[0], skip_special_tokens=True)


>>> print(rag_query("What's a romantic movie that I can watch with my wife?"))

Query: What's a romantic movie that I can watch with my wife?
Continue to answer the query by using these Search Results:
Title: King Solomon's Mines, Plot: Guide Allan Quatermain helps a young lady (Beth) find her lost husband somewhere in Africa. It's a spectacular adventure story with romance, because while they fight with wild animals and cannibals, they fall in love. Will they find the lost husband and finish the nice connection?
Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love?
Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl.
Title: Fortress, Plot: A futuristic prison movie. Protagonist and wife are nabbed at a future US emigration point with an illegal baby during population control. The resulting prison experience is the subject of the movie. The prison is a futuristic one run by a private corporation bent on mind control in various ways.
Title: Varalaaru, Plot: Relationships become entangled in an emotional web.
.

Which movie would you recommend for a romantic evening with your wife?

From the provided titles, the movie that would be recommended for a romantic evening with your wife is **King Solomon's Mines**. It's a romantic adventure story with romance, and it's a great choice for a date night.

鸣谢

本笔记本改编自

MongoDB 的 RAG 食谱
OpenAI 的 ES RAG 食谱
Elasticsearch-labs 的 loading-model-fromhugging-face 食谱

< > 在 GitHub 上更新

←使用 Cleanlab 通过主动学习注释文本数据使用 Gemma、MongoDB 和开源模型构建 RAG 系统→