In [ ]:
%pip install ollama scikit-learn
Inputs¶
- User prompt
- Expected answer
- Some unrelated text
In [ ]:
_prompt = 'Why is the sky blue?'
_answer = "The sky is blue because of Rayleigh scattering."
_unrelated_prompt = "I like Ice Cream."
Calculate Embeddings¶
The following block uses the Mistral 7B to calculate the embeddings for each sentence.
In [ ]:
import ollama
embed_prompt = ollama.embeddings(model='mistral', prompt=_prompt)
embed_ans = ollama.embeddings(model='mistral', prompt=_answer)
embed_unrelated = ollama.embeddings(model='mistral', prompt=_unrelated_prompt)
Calculate Similarity¶
Embeddings map information into a common vector space allowing for the calculation of similarity between the vectors. There are a variety of methods for calculating the distances/similarity in n-dimensional space - for this example we will use Cosine Similarity.
In [ ]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Reshape the embeddings to 2D arrays
embed_prompt_reshaped = np.array(embed_prompt['embedding']).reshape(1, -1)
embed_ans_reshaped = np.array(embed_ans['embedding']).reshape(1, -1)
embed_unrelated_reshaped = np.array(embed_unrelated['embedding']).reshape(1, -1)
# Calculate the cosine similarity
similarity = cosine_similarity(embed_prompt_reshaped, embed_ans_reshaped)
similarity_unrelated = cosine_similarity(embed_prompt_reshaped, embed_unrelated_reshaped)
self_similarity = cosine_similarity(embed_ans_reshaped, embed_ans_reshaped)
Results¶
The similarity will be in the range [0, 1], the nearer to 1 the more similar the vectors are.
In [ ]:
print(f"Embeddings Length: {len(embed_prompt['embedding'])}")
print(f"Prompt: {_prompt}")
print(f"Cosine Self Similarity: {self_similarity}")
print(f"Embeddings Vector: {embed_prompt['embedding'][:3]} ... \
{len(embed_prompt['embedding'])} total elements.")
print(f"Answer: {_answer}")
print(f"Cosine Similarity to prompt: {similarity}")
print(f"Embeddings Vector: {embed_ans['embedding'][:3]} ... \
{len(embed_ans['embedding'])} total elements.")
print(f"Unrelated Text: {_unrelated_prompt}")
print(f"Cosine Similarity to prompt: {similarity_unrelated}")
print(f"Embeddings Vector: {embed_unrelated['embedding'][:3]} ... \
{len(embed_unrelated['embedding'])} total elements.")
Embeddings Length: 4096 Prompt: Why is the sky blue? Cosine Self Similarity: [[1.]] Embeddings Vector: [2.397336006164551, 3.377761125564575, 1.2996881008148193] ... 4096 total elements. Answer: The sky is blue because of Rayleigh scattering. Cosine Similarity to prompt: [[0.76728607]] Embeddings Vector: [2.19769287109375, 0.10650518536567688, 3.478220224380493] ... 4096 total elements. Unrelated Text: I like Ice Cream. Cosine Similarity to prompt: [[0.65319078]] Embeddings Vector: [3.750589609146118, 0.14632035791873932, -0.8303617238998413] ... 4096 total elements.