[Launch Post]: Compare LLM Performance Across Different Prompts, Models and Topics | by Himanshu Bamoria | Athina AI

[Launch Post]: Compare LLM Performance Across Different Prompts, Models and Topics | by Himanshu Bamoria | Athina AI | Apr, 2024

The Tech Guy April 9, 2024 2 min read

We spend a lot of time speaking to AI engineers every week. Some of the pain points we hear about very frequently include:

“I don’t have a clear view of model performance over time”
“I don’t know why the model is not performing well”
“I can’t identify or prevent model quality regressions”
“I don’t know which prompt / model is better

We built a new Analytics dashboard to give users new ways to slice and dice metrics. Here’s what you can do with it 👇

Comparison reports of evaluation metrics and usage metrics

Compare latency, cost, feedback, and evaluation metrics across different prompts, models, customer ids, topics, and date ranges. Here’s a product video:

https://athina.ai/videos/compare-view.mp4

2. Analytics Dashboard

Percentile distributions graph for every evaluation metric
Graph of any evaluation metric over time
New views and filters for token usage, cost, latency, user feedback

(click the image to see a 30 second product demo)

https://athina.ai/videos/eval-analytics.mp4

Without measurement, it’s very difficult to improve model performance.

Athina’s new LLM Analytics platform can help you:

Identify regressions in models (across any criteria)
Deep dive into the metrics segmented by prompt, customer, topic or model to discover gaps in model performance.
Explore the performance of models over time

Existing users: Sign in to Athina and access the Analytics and Compare dashboards.
New users: Sign up, and log inferences in 5 mins. Then access the Analytics and Compare dashboards.

As always if you have any questions, feel free to reply to this email or get on a call with us!

Source link

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity