KwaiRec: A Fully-observed Dataset for Recommender Systems.
-
Updated
Jun 2, 2024 - Jupyter Notebook
KwaiRec: A Fully-observed Dataset for Recommender Systems.
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
The production toolkit for LLMs. Observability, prompt management and evaluations.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
🤖 Build AI applications with confidence ✅ DSPy Visualizer ✅ Understand how your users are using your LLM-app ✅ Get a full picture of the quality performance of your LLM-app ✅ Collaborate with your stakeholders in ONE platform ✅ Iterate towards the most valuable & reliable LLM-app.
Python SDK for running evaluations on LLM generated responses
The official evaluation suite and dynamic data release for MixEval.
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
A task generation and model evaluation system.
Large Language Model Feedback Analysis and Optimization (LLMFAO)
Programming Language Selector based on language metadata and user-specified values.
Interpreter for a programming language with basic features
Evaluation of Named Entity Recognition Models for Russian News Texts in the Cultural Domain
Comparing Naive- and Advanced RAG to LLM with full-context injection on a Question-answering (QA) dataset based on a narrative text. Evaluation for accuracy, latency and cost.
AI powered Spendenraid evaluation.
[ACL 2024]CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.
To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."