Why Can GPT Learn In-Context? Language Models Perform Gradient Descent

35 points by xaedes 2 years ago | 0 comments