llm-evaluation

For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.

models prompt parsers pinecone rag llm langchain-python langchain-chains langchain-agent llm-evaluation llmchain

Updated Mar 29, 2024
Jupyter Notebook

euskoog / openai-assistants-evals

Star

Visualize LLM Evaluations for OpenAI Assistants

openai tailwindcss llms llm-evaluation openai-assistants

Updated Mar 27, 2024
TypeScript

awesome-software / ray-summit-2023-training

Star

llm-evaluation

Updated Sep 21, 2023
Jupyter Notebook

j0st / PoliticalLLM

Star

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated May 1, 2024
Python

innerNULL / summary-evaluator

Star

Summary Evaluation Tool

nlp deep-learning text-summarization model-evaluation model-evaluation-metrics llm bertscore llm-evaluation

Updated Jun 3, 2024
Python

FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.

nlp natural-language-processing evaluation openai question-answering gpt-4 answer-evaluation large-language-models llms gpt-evaluation llm-evaluation

Updated Apr 25, 2024
Python

VidhyaVarshanyJS / EnsembleX

Star

EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.

python benchmark knapsack huggingface streamlit large-language-models llm llm-evaluation open-llm-leaderboard

Updated May 5, 2024
Python

GURPREETKAURJETHRA / LLMs-Evaluation

Star

LLMs Evaluation

large-language-models llm generative-ai llm-evaluation

Updated May 16, 2024
Jupyter Notebook

GiacomoMeloni / ExploringLLMs

Star

Exploring the depths of LLMs 🚀

rag llm prompt-engineering generative-ai retrieval-augmented-generation llm-evaluation

Updated Dec 7, 2023
Jupyter Notebook

ivarfresh / Interaction_LLMs

Star

[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.

personality-traits bfi linguistic-alignment llms generative-agents llm-evaluation

Updated Apr 8, 2024
Python

AdamCoscia / iScore

Star

Upload, score, and visually compare multiple LLM-graded summaries simultaneously!

transformers visual-analytics summary-evaluation learning-sciences responsible-ai ethical-ai llm-evaluation

Updated Mar 8, 2024
JavaScript

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation

Here are 62 public repositories matching this topic...

nagababumo / Automated-Testing-for-LLMOps

gretelai / navigator-helpers

IteraLabs / knowledge-benchmarks

prompt-foundry / typescript-sdk

SharathHebbar / eval_llms

wittyicon29 / Custom-Evaluate-LLM

nagababumo / Building-and-Evaluating-Advanced-RAG

awesome-software / lm-evaluation-harness

aknvictor / calibrationgame

DavidGir / LangChain-Familiarization

euskoog / openai-assistants-evals

awesome-software / ray-summit-2023-training

j0st / PoliticalLLM

innerNULL / summary-evaluator

armingh2000 / FactScoreLite

VidhyaVarshanyJS / EnsembleX

GURPREETKAURJETHRA / LLMs-Evaluation

GiacomoMeloni / ExploringLLMs

ivarfresh / Interaction_LLMs

AdamCoscia / iScore

Improve this page

Add this topic to your repo