Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query always provides answers regardless of whether document context is found. #1430

Open
Freffles opened this issue May 17, 2024 · 5 comments

Comments

@Freffles
Copy link

Freffles commented May 17, 2024

Using Ollama LL (phi3) and chat mode set to "Query". Query will provide answers only if document context is found.

Using ChromaDB with Ollama embeddings (nomic-embed-text)

Asking a question that is obviously outside the document context (i.e. What is a prime number) rather than getting the specified query mode refusal response, I get an answer to my question with citations that point to document context that are clearly incorrect.

So, this is two issues. 1. I should be getting a query mode refusal and 2. the citations given have no relationship to the answer provided.

image

FWIW, get exactly the same behavior using System default LLM with Ollama embeddings.

@timothycarambat
Copy link
Member

You should likely modify the Document Similarity Threshold since that is likely why you are getting results for a search even though the prompt is irrelevant. What is the score reporting for those chunks when you view the citations? The score is below each chunk

@Freffles
Copy link
Author

I've deleted all of that and going back to basics. Just maiting for llama2 to download and I will give it a try again using "System Default" LLM with Anything LLM embeddings and Lance DB. I have Max Context set to 10 and Document Similarity Threshold set to High for this run. Will post the update when I have run it with this setup.

@Freffles
Copy link
Author

I did get it to work properly with all the defaults (LLM, Embeddings and VectorDB) but I had to reload my data (from the git repo) because deleting the vectors wasn't enough.

If I may make a suggestion, it would be nice to be able to enter a similarity threshold in numeric form as well as have the low, medium and high options.

@timothycarambat
Copy link
Member

@Freffles I agree, there should be some custom option. The reason that we have those "pre-defined" stops is because it prevents people from playing with it too much since it's a very black-box kind of toggle. It can be useful but I was initially worried about people accidentally foot-gunning themselves due to bad configs

@Freffles
Copy link
Author

Should have added this but before. Regarding score, I found that things that should have been a direct hit were returning scores less than 60. I could not get anything returned when I set to high. Maybe I need to play around with the chunk size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants