231 reads

Human Preferences Help Scientists Train AI 30x Faster Than Before

by Language Models (dot tech)December 3rd, 2024

Read on Terminal Reader

Read this story w/o Javascript

Too Long; Didn't Read

featured image - Human Preferences Help Scientists Train AI 30x Faster Than Before

Table of Links

A. Appendix

A.1. Full Prompts and A.2 ICPL Details

A. 3 Baseline Details

A.4 Environment Details

A.5 Proxy Human Preference

A.6 Human-in-the-Loop Preference

c

Authors:

(1) Chao Yu, Tsinghua University;

(2) Hong Lu, Tsinghua University;

(3) Jiaxuan Gao, Tsinghua University;

(4) Qixin Tan, Tsinghua University;

(5) Xinting Yang, Tsinghua University;

(6) Yu Wang, with equal advising from Tsinghua University;

(7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute;

(8) Eugene Vinitsky, with equal advising from New York University ([email protected]).

This paper is available on arxiv under CC 4.0 license.

Databricks <> AWS Marketplace

L O A D I N G
. . . comments & more!

About Author

Language Models (dot tech)@languagemodels

Large Language Models (LLMs) ushered in a technological revolution. We breakdown how the most important models work.

Read my stories Learn More

TOPICS

purcat-img

machine-learning #reinforcement-learning #in-context-learning #preference-learning #reward-functions #ai-training #rlhf-efficiency #human-in-the-loop-rl #hackernoon-top-story

THIS ARTICLE WAS FEATURED IN...

Read on Terminal Reader

Read this story w/o Javascript

Also published here

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

Categories

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks