Orca 2: Enhancing Reasoning in Smaller Language Models - BigBench-Hard Subtask Metrics

Written by textmodels | Published 2024/05/29
Tech Story Tags: language-models | orca-2 | reasoning-techniques | machine-learning | small-models | imitation-learning | ai-benchmarks | model-training

TLDR

Teaching Orca 2 to be a Cautious Reasoner is available on arxiv.org under the CC-BY 4.0:11045 license. The paper is available online at: http://hackernoon.com/preview/jrEFMCvsvN9Qu9mXULgU.via the TL;DR App

Authors:

(1) Arindam Mitra;

(2) Luciano Del Corro, work done while at Microsoft;

(3) Shweti Mahajan, work done while at Microsoft;

(4) Andres Codas, denote equal contributions;

(5) Clarisse Simoes, denote equal contributions;

(6) Sahaj Agarwal;

(7) Xuxi Chen, work done while at Microsoft;;

(8) Anastasia Razdaibiedina, work done while at Microsoft;

(9) Erik Jones, work done while at Microsoft;

(10) Kriti Aggarwal, work done while at Microsoft;

(11) Hamid Palangi;

(12) Guoqing Zheng;

(13) Corby Rosset;

(14) Hamed Khanpour;

(15) Ahmed Awadall.

Table of Links

Abstract and Introduction

Teaching Orca 2 to be a Cautious Reasoner

Technical Details

Experimental Setup

Evaluation Results

Conclusions and References

A. AGIEval Subtask Metrics

B. BigBench-Hard Subtask Metrics

C. Evaluation of Grounding in Abstractive Summarization

D. Evaluation of Safety

E. Prompts used in Evaluation

F. Illustrative Example from Evaluation Benchmarks and Corresponding Model Outpu

B BigBench-Hard Subtask Metrics

Table 7, 8, 9, and 10 showcase the zero-shot performance of Orca 2 and the baseline models on each BBH MCQ reasoning task, with accuracy being the metric used to evaluate performance.

This paper is available on arxiv under CC 4.0 license.

Written by textmodels | We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

Published by HackerNoon on 2024/05/29