rlhf

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Mar 15, 2024
Python

Docta-ai / docta

Star

A Doctor for your data

data language-model data-curation data-centric-ai data-diagnosis data-centric-machine-learning rlhf

Updated Jan 12, 2024
Python

opendilab / awesome-RLHF

Star

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated Mar 4, 2024

THUDM / WebGLM

Star

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

llm chatgpt rlhf webglm

Updated Jul 29, 2023
Python

PKU-Alignment / safe-rlhf

Star

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Mar 16, 2024
Python

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Mar 17, 2024
Jupyter Notebook

THUDM / ImageReward

Star

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

generative-model diffusion-models human-preferences rlhf

Updated Sep 25, 2023
Python

xtreme1-io / xtreme1

Star

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

computer-vision image-annotation annotation point-cloud image-classification annotation-tool 3d-annotation labeling-tool multimodal image-labelling-tool rlhf

Updated Mar 15, 2024
TypeScript

argilla-io / distilabel

Star

Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency

python ai openai huggingface llms rlhf rlaif

Updated Mar 16, 2024
Python

GaryYufei / AlignLLMHumanSurvey

Star

Aligning Large Language Models with Human: A Survey

awesome survey llama gpt-4 large-language-models llms chatgpt rlhf supervised-finetuning llama2 chinese-llama

Updated Sep 11, 2023

voidful / TextRL

Sponsor

Star

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

nlp reinforcement-learning pytorch nlg language-model gpt-2 gpt-3 controlled-nlg chatgpt rlhf

Updated Aug 6, 2023
Python

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

Star

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效的金融垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

nlp finance qa transformers text-generation chinese llama sft large-language-models rlhf

Updated Jun 30, 2023
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Feb 16, 2024
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

Here are 97 public repositories matching this topic...

LAION-AI / Open-Assistant

hiyouga / LLaMA-Factory

RUCAIBox / LLMSurvey

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

hiyouga / ChatGLM-Efficient-Tuning

huggingface / alignment-handbook

argilla-io / argilla

Docta-ai / docta

opendilab / awesome-RLHF

THUDM / WebGLM

PKU-Alignment / safe-rlhf

tatsu-lab / alpaca_eval

THUDM / ImageReward

xtreme1-io / xtreme1

argilla-io / distilabel

GaryYufei / AlignLLMHumanSurvey

voidful / TextRL

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

ContextualAI / HALOs

Improve this page

Add this topic to your repo