ChatGPT, OpenAI’s newest model is a GPT-3 variant that has been fine-tuned using Reinforcement Learning from Human Feedback, and it is taking the world by storm!
Sponsor: Weights & Biases.
https://wandb.me/yannic.
OUTLINE:
0:00 — Intro.
0:40 — Sponsor: Weights & Biases.
3:20 — ChatGPT: How does it work?
5:20 — Reinforcement Learning from Human Feedback.
7:10 — ChatGPT Origins: The GPT-3.5 Series.
8:20 — OpenAI’s strategy: Iterative Refinement.
9:10 — ChatGPT’s amazing capabilities.
14:10 — Internals: What we know so far.
16:10 — Building a virtual machine in ChatGPT’s imagination (insane)
20:15 — Jailbreaks: Circumventing the safety mechanisms.
29:25 — How OpenAI sees the future.
References:
https://openai.com/blog/chatgpt/
https://openai.com/blog/language-model-safety-and-misuse/
https://beta.openai.com/docs/model-index-for-researchers.
https://scale.com/blog/gpt-3-davinci-003-comparison#Conclusion.
New post: What the delay in launching text-davinci-003 tells us about RLHF via PPO and instruction tuning more generally. https://t.co/Q3FUekFERk
— John McDonnell (@johnvmcdonnell) December 2, 2022
https://twitter.com/blennon_/status/1597374826305318912
Ran one of our essay questions through @OpenAI’s new chatbot. Essays are dead.
Back to hand-written exams I guess. Sigh. pic.twitter.com/nzzhRwGp05
— Tim Kietzmann (@TimKietzmann) December 1, 2022