AI Dynamics

Global AI News Aggregator

@sambanovaai

Open LLM Leaderboard benchmarks collection launched

By

@sambanovaai

–

12 September 2023 23h05

If you find this work interesting, please join our Discord channel to ask questions and discuss. https://
discord.gg/8z2Pe7cpRv Also, thank you to @Thom_Wolf for help on setting up the leaderboard at: The Big Benchmarks Collection: https://
huggingface.co/collections/op
en-llm-leaderboard/the-big-benchmarks-collection-64faca6335a7fc7d4ffe974a
… (10/10)

→ View original post on X — @sambanovaai,

12 September 2023
Open-Source LLMs Become Effective Tool Manipulators

By

@sambanovaai

–

12 September 2023 23h05

Find the paper on @arxiv at https://
arxiv.org/abs/2305.16504 and more details on our blog at https://
sambanova.ai/blog/enabling-
open-source-llms-to-become-effective-tool-manipulators/
…
(9/10)

→ View original post on X — @sambanovaai,

12 September 2023
Optimizing OSS Models with System Prompts RAG Fine-tuning

By

@sambanovaai

–

12 September 2023 23h05

We use established fast and simple techniques to improve OSS model performance. These techniques include system prompts to generate less verbose answers, RAG to reduce hallucinations and fine-tuning to improve accuracy. (7/10)

→ View original post on X — @sambanovaai,

12 September 2023
Open-Source LLMs Struggle With API Selection and Code Generation

By

@sambanovaai

–

12 September 2023 23h05

According to our error analysis, we observe that open-source LLMs often face difficulty in 1. API selection
2. API argument population,
3. generating legitimate and executable code. (6/10)

→ View original post on X — @sambanovaai,

12 September 2023
OSS Models Close Gap With Proprietary Tools via Techniques

By

@sambanovaai

–

12 September 2023 23h05

As seen on the @huggingface leaderboard, these techniques reduce the gap between proprietary and OSS models significantly and make OSS models useful for tool manipulation. https://
huggingface.co/spaces/qianton
g-xu/toolbench-leaderboard
…
(8/10)

→ View original post on X — @sambanovaai,

12 September 2023
Open Source Models Progress With API Integration Capabilities

By

@sambanovaai

–

12 September 2023 23h05

These capabilities allow models to access knowledge beyond their training data. OSS models have made progress with releases from @AiEleuther @Meta @BigscienceW @StabilityAI @TIIuae @salesforce @BigCodeProject @databricks however their ability to use software APIs is unclear(3/10)

→ View original post on X — @sambanovaai,

12 September 2023
ToolBench: Comprehensive AI Agent Orchestration Benchmark Suite

By

@sambanovaai

–

12 September 2023 23h05

We create a comprehensive benchmark suite that covers a range of use cases. Using a single API call to do something useful to orchestrating and planning multiple API calls to solve a complex problem. The benchmark is called ToolBench, available at https://
github.com/sambanova/tool
bench
… (4/10)

→ View original post on X — @sambanovaai,

12 September 2023
Open Source vs Proprietary AI Models: Performance Gap Analysis

By

@sambanovaai

–

12 September 2023 23h05

We benchmark a wide variety of OSS and proprietary models from @AiEleuther @Meta @BigscienceW @StabilityAI @OpenAI @TIIuae @salesforce @BigCodeProject @databricks and others. We see a huge gap between proprietary models and OSS models. (5/10)

→ View original post on X — @sambanovaai,

12 September 2023
Teaching LLMs to Use Tools Over Direct Computation

By

@sambanovaai

–

12 September 2023 23h05

For example, it's better to teach a LLM to use a calculator rather than teaching it to do complex math. (2/10)

→ View original post on X — @sambanovaai,

12 September 2023
Open Source LLMs Become Effective Tool Manipulators

By

@sambanovaai

–

12 September 2023 23h05

TECHNICAL UPDATE: We are excited to share our work on how to enable OSS to become effective tool manipulators. https://
sambanova.ai/blog/enabling-
open-source-llms-to-become-effective-tool-manipulators/
… The benchmarks, leaderboard, and tuned models are available on @huggingface
. (1/10)

→ View original post on X — @sambanovaai,

12 September 2023