-
Updated
Jun 4, 2024 - Jupyter Notebook
llm-evaluation
Here are 62 public repositories matching this topic...
A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.
-
Updated
May 18, 2024
Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry
-
Updated
Jun 11, 2024 - TypeScript
-
Updated
Feb 4, 2024 - Jupyter Notebook
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
-
Updated
Apr 21, 2024 - Jupyter Notebook
-
Updated
Jun 1, 2024 - Jupyter Notebook
Calibration game is a game to get better at identifying hallucination in LLMs.
-
Updated
Feb 4, 2024 - CSS
For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
-
Updated
Mar 29, 2024 - Jupyter Notebook
Visualize LLM Evaluations for OpenAI Assistants
-
Updated
Mar 27, 2024 - TypeScript
-
Updated
Sep 21, 2023 - Jupyter Notebook
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
-
Updated
May 1, 2024 - Python
Summary Evaluation Tool
-
Updated
Jun 3, 2024 - Python
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
-
Updated
Apr 25, 2024 - Python
EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
-
Updated
May 5, 2024 - Python
LLMs Evaluation
-
Updated
May 16, 2024 - Jupyter Notebook
Exploring the depths of LLMs 🚀
-
Updated
Dec 7, 2023 - Jupyter Notebook
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
-
Updated
Apr 8, 2024 - Python
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
-
Updated
Mar 8, 2024 - JavaScript
Improve this page
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."