trees are harlequins, words are harlequins — meta-post on meta-learning


There’s an LW post I keep trying to write. I have several unpublished draft versions of it.

The point I want to make is simple and straightforward, but when I try to write it down, I get worried I’m not … like, “messaging” it correctly? Not striking the right tone?

The point of the post is roughly:

People don’t use the term “meta-learning” consistently when they’re talking about GPT-3. The paper uses the term one way (and they are 100% explicit, they spell out their definition in the text), the blogging community uses it another way.

The bloggers are excited/scared that GPT-3 does “meta-learning” by which they mean something like “general reasoning on the fly without training.”

If you’re excited/scared by this capability (and you should be), then you should really care whether GPT-3 actually has it, to what extent, how the capability scales, etc.

There is very little public evidence on this topic, because the paper is (explicitly!) 95% not about the topic, the remaining 5% is pretty weak evidence, and the only other evidence out there is like … some subjective user impressions? gwern saying “GPT-3 has the capability” in a really eloquent and forceful way?

It would be easy to test the capability much more rigorously than this. This ought to be done since the topic is important. It can only be done by people with API access (AI Dungeon doesn’t count).

But it … feels hard to say this in a way that could actually convince anyone who doesn’t already agree? Like,

  1. These points seem so clearly true to me that when I try to “argue for them,” I feel pedantic and like I’m belaboring the obvious.

    Do I actually have to say “no, few-shot translation from French to English is not an example of general reasoning on the fly?” Surely no one thinks the model is like … learning how to speak French from ~2000 words of data?

    Do I have to quote the part of the paper where it says what it means by meta-learning? It’s right there! You can just read the paper!

  2. I made most of this argument already in my original GPT-3 post, immediately after reading the paper. So (A) I feel like I’m repeating myself and (B) if the point didn’t get across then, why would it now?
  3. There is an element of “mere semantics” to the point and it’s hard to clarify to my satisfaction that no, I don’t just care that blog posts are using a word incorrectly. But I have to bring up the semantic issue to even describe what I am saying.
  4. It feels inevitably like picking on gwern’s choice of words, since blogosphere beliefs about “GPT-3 meta-learning” basically all trace back to gwern’s blog.

    I don’t care about whether gwern is using the right words, he’s just the most detailed “primary source” we have on the topic due to the closed API

I was thinking about this yesterday because @slatestarscratchpad linked my original GPT-3 post in his April linkpost. I actually sat down and wrote up another one of those drafts and … nope, gave up again.

I notice I am able to write this on tumblr with no problems at all. Perhaps this is yet another point of evidence that using tumblr lets me do much more “real blogging” than I could if I had “a real blog.”


Source link