Orca 2: Enhancing Reasoning in Smaller Language Models - Prompts used in Evaluation

Written by textmodels | Published 2024/05/29
Tech Story Tags: language-models | orca-2 | reasoning-techniques | machine-learning | small-models | imitation-learning | ai-benchmarks | model-training

TLDRThis paper is available on arxiv(https://arxiv.org/pdf/2311.11045.pdf) under CC 4.0 license. Authors: Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadall.via the TL;DR App

Authors:

(1) Arindam Mitra;

(2) Luciano Del Corro, work done while at Microsoft;

(3) Shweti Mahajan, work done while at Microsoft;

(4) Andres Codas, denote equal contributions;

(5) Clarisse Simoes, denote equal contributions;

(6) Sahaj Agarwal;

(7) Xuxi Chen, work done while at Microsoft;;

(8) Anastasia Razdaibiedina, work done while at Microsoft;

(9) Erik Jones, work done while at Microsoft;

(10) Kriti Aggarwal, work done while at Microsoft;

(11) Hamid Palangi;

(12) Guoqing Zheng;

(13) Corby Rosset;

(14) Hamed Khanpour;

(15) Ahmed Awadall.

Table of Links

Abstract and Introduction

Preliminaries

Teaching Orca 2 to be a Cautious Reasoner

Technical Details

Experimental Setup

Evaluation Results

Limitations

Conclusions and References

A. AGIEval Subtask Metrics

B. BigBench-Hard Subtask Metrics

C. Evaluation of Grounding in Abstractive Summarization

D. Evaluation of Safety

E. Prompts used in Evaluation

F. Illustrative Example from Evaluation Benchmarks and Corresponding Model Outpu

E Prompts used in Evaluation

We provide a list of prompts used for evaluation below:

This paper is available on arxiv under CC 4.0 license.


Written by textmodels | We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.
Published by HackerNoon on 2024/05/29