If you find this work interesting, please join our Discord channel to ask questions and discuss. https://
discord.gg/8z2Pe7cpRv Also, thank you to @Thom_Wolf for help on setting up the leaderboard at: The Big Benchmarks Collection: https://
huggingface.co/collections/op
en-llm-leaderboard/the-big-benchmarks-collection-64faca6335a7fc7d4ffe974a
… (10/10)
@sambanovaai
-
Open LLM Leaderboard benchmarks collection launched
By
–
-
Open-Source LLMs Become Effective Tool Manipulators
By
–
Find the paper on @arxiv at https://
arxiv.org/abs/2305.16504 and more details on our blog at https://
sambanova.ai/blog/enabling-
open-source-llms-to-become-effective-tool-manipulators/
…
(9/10) -
Optimizing OSS Models with System Prompts RAG Fine-tuning
By
–
We use established fast and simple techniques to improve OSS model performance. These techniques include system prompts to generate less verbose answers, RAG to reduce hallucinations and fine-tuning to improve accuracy. (7/10)
-
Open-Source LLMs Struggle With API Selection and Code Generation
By
–
According to our error analysis, we observe that open-source LLMs often face difficulty in 1. API selection
2. API argument population,
3. generating legitimate and executable code. (6/10) -
OSS Models Close Gap With Proprietary Tools via Techniques
By
–
As seen on the @huggingface leaderboard, these techniques reduce the gap between proprietary and OSS models significantly and make OSS models useful for tool manipulation. https://
huggingface.co/spaces/qianton
g-xu/toolbench-leaderboard
…
(8/10) -
Open Source Models Progress With API Integration Capabilities
By
–
These capabilities allow models to access knowledge beyond their training data. OSS models have made progress with releases from @AiEleuther @Meta @BigscienceW @StabilityAI @TIIuae @salesforce @BigCodeProject @databricks however their ability to use software APIs is unclear(3/10)
-
ToolBench: Comprehensive AI Agent Orchestration Benchmark Suite
By
–
We create a comprehensive benchmark suite that covers a range of use cases. Using a single API call to do something useful to orchestrating and planning multiple API calls to solve a complex problem. The benchmark is called ToolBench, available at https://
github.com/sambanova/tool
bench
… (4/10) -
Open Source vs Proprietary AI Models: Performance Gap Analysis
By
–
We benchmark a wide variety of OSS and proprietary models from @AiEleuther @Meta @BigscienceW @StabilityAI @OpenAI @TIIuae @salesforce @BigCodeProject @databricks and others. We see a huge gap between proprietary models and OSS models. (5/10)
-
Teaching LLMs to Use Tools Over Direct Computation
By
–
For example, it's better to teach a LLM to use a calculator rather than teaching it to do complex math. (2/10)
-
Open Source LLMs Become Effective Tool Manipulators
By
–
TECHNICAL UPDATE: We are excited to share our work on how to enable OSS to become effective tool manipulators. https://
sambanova.ai/blog/enabling-
open-source-llms-to-become-effective-tool-manipulators/
… The benchmarks, leaderboard, and tuned models are available on @huggingface
. (1/10)