would DPO help in this case ? #4

sandys · 2024-01-27T08:21:35Z

hi
i had a question after reading your paper - do you think DPO/RLHF would help here ? your approach of using chain-of-thought to generate structured data is very innovative.
and chain-of-thought is very responsive to alignment tuning. So im wondering if DPO/RLHF could be something that can be explored.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

would DPO help in this case ? #4

would DPO help in this case ? #4

sandys commented Jan 27, 2024

would DPO help in this case ? #4

would DPO help in this case ? #4

Comments

sandys commented Jan 27, 2024