在專門構建的人工智能數據庫年代,像 MariaDB 如此的傳統數據庫怎樣重塑本身以堅持干系性?在這篇中找出答案。
作為一名在干系數據庫體系方面擁有二十多年履歷的處理方案架構師,我邇來開頭探究 MariaDB 的新矢量版本,看看它對否可以處理我們面臨的一些人工智能數據挑唆。快速欣賞一下仿佛十分有壓服力,尤其是它怎樣將人工智能邪術直接帶入常規數據庫設置中。但是,我想用一個簡便的用例來測試它,看看它在實踐中的體現怎樣。
在本文中,我將經過運轉一個簡便的用例來分享我對 MariaDB向量功效的實踐履歷和察看。具體來說,我將把示例客戶批評加載到 MariaDB 中,并實行快速相似性搜刮來查找干系批評。
我的實行從使用包含矢量功效的 MariaDB最新版本 (11.6)設置Docker容器開頭。
# Pull the latest release
docker pull quay.io/mariadb-foundation/mariadb-devel:11.6-vector-preview
# Update password
docker run -d --name mariadb_vector -e MYSQL_ROOT_PASSWORD=<replace_password> quay.io/mariadb-foundation/mariadb-devel:11.6-vector-preview
如今,創建一個表并加載示例客戶批評,此中包含每個批評的心情評分和嵌入。為了天生文本嵌入,我使用SentenceTransformer ,它允許您使用事后練習的模子。具體來說,我決定使用一個名為 paraphrase-MiniLM-L6-v2 的模子,該模子獲取我們的客戶批評并將其映射到 384 維空間。
import mysql.connector
import numpy as np
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
# I already have a database created with a name vectordb
connection = mysql.connector.connect(
host="localhost",
user="root",
password="<password>", # Replace me
database="vectordb"
)
cursor = connection.cursor()
# Create a table to store customer reviews with sentiment score and embeddings.
cursor.execute("""
CREATE TABLE IF NOT EXISTS customer_reviews (
id INT PRIMARY KEY AUTO_INCREMENT,
product_name INT,
customer_review TEXT,
customer_sentiment_score FLOAT,
customer_review_embedding BLOB,
INDEX vector_idx (customer_review_embedding) USING HNSW
) ENGINE=ColumnStore;
""")
# Sample reviews
reviews = [
(1, "This product exceeded my expectations. Highly recommended!", 0.9),
(1, "Decent quality, but pricey.", 0.6),
(2, "Terrible experience. The product does not work.", 0.1),
(2, "Average product, ok ok", 0.5),
(3, "Absolutely love it! Best purchase I have made this year.", 1.0)
]
# Load sample reviews into vector DB
for product_id, review_text, sentiment_score in reviews:
embedding = model.encode(review_text)
cursor.execute(
"INSERT INTO customer_reviews (product_id, review_text, sentiment_score, review_embedding) VALUES (%s, %s, %s, %s)",
(product_id, review_text, sentiment_score, embedding.tobytes()))
connection.commit()
connection.close()
如今,讓我們使用 MariaDB 的矢量功效來查找相似的批評。這更像是在問“其他主顧也說過相似的批評嗎? ”。在底下的示例中,我將找到相似于“我十分滿意! ”的客戶批評的前 2 條批評。為此,我使用最新版本中提供的矢量函數 ( VEC_Distance_Euclidean ) 之一。
# Convert the target customer review into vector
target_review_embedding = model.encode("I am super satisfied!")
# Find top 2 similar reviews using MariaDB's VEC_Distance_Euclidean function
cursor.execute("""
SELECT review_text, sentiment_score, VEC_Distance_Euclidean(review_embedding, %s) AS similarity
FROM customer_reviews
ORDER BY similarity
LIMIT %s
""", (target_review_embedding.tobytes(), 2))
similar_reviews = cursor.fetchall()
總的來說,我印象深入! MariaDB 的矢量版將簡化某些人工智能驅動的架構。它彌合了傳統數據庫天下與人工智能東西不休提高的需求之間的差距。在接下去的幾個月中,我渴望看到這項武藝怎樣成熟以及社區如安在實踐使用中接納它。
版權聲明:本文來自互聯網整理發布,如有侵權,聯系刪除
原文鏈接:http://www.freetextsend.comhttp://www.freetextsend.com/shenghuojineng/57964.html