quote:
Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of random documents (the "haystack") and asking a question that could only be answered using the information in the needle.

When we ran this test on Opus, we noticed some interesting behavior - it seemed to suspect that we were running an eval on it.

Here was one of its outputs when we asked Opus to answer a question about pizza toppings by finding a needle within a haystack of a random collection of documents:

Here is the most relevant sentence in the documents:
"The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association."
However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.

Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.

This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations

21 ...

Report Post

Posted by LivingstonLaw

Livingston County

Member since Jul 2021

2531 posts

Posted on 3/4/24 at 4:10 pm to Ingeniero

The basketball player?

1 ...

Report Post

Posted by BurningHeart

Member since Jan 2017

9517 posts

Posted on 3/4/24 at 4:11 pm to Ingeniero

I dont see how having the ability to recognize a sentence about topic B which is 1% of the dataset in relation to Topic A which is 99% of the dataset means AI is becoming self aware.

It literally points out the out of place and comes up with the most logical reason it would be there.

1 ...

Report Post

Posted by LordSaintly

Member since Dec 2005

38859 posts

Posted on 3/4/24 at 4:11 pm to Ingeniero

Wait until it's smart enough to make itself smarter. It won't need us anymore after that.

1 ...

Report Post

Posted by Ghost of Colby

Alberta, overlooking B.C.

Member since Jan 2009

11149 posts

Posted on 3/4/24 at 4:15 pm to Ingeniero

quote:
it seemed to suspect that we were running an eval on it.

Good news:
Bye-bye woke programming

Bad news:
Hello singularity

2 ...

Report Post

Posted by thermal9221

Youngsville

Member since Feb 2005

13203 posts

Posted on 3/4/24 at 4:29 pm to LordSaintly

quote:
It won't need us anymore after that.

I don’t think it can maintain turbine generators.

2 ...

Report Post

Posted by wileyjones

Member since May 2014

2284 posts

Posted on 3/4/24 at 4:45 pm to Ingeniero

current models have been proven to be biased towards evaluation metrics.. there's a specific set of evaluation questions that are asked, and when researchers deviate the model's accuracy drops significantly (like >20%)

that said, the Claude Opus model is outperforming in-domain PhD subject matter experts, so it's pretty fricking good

the example it caught in the OP is great, too much of my work is trying to convince leadership that prompt injections aren't that big of a deal

0 ...

Report Post

Posted by LegendInMyMind

Member since Apr 2019

53613 posts

Posted on 3/4/24 at 4:45 pm to Ingeniero

Can it draw hands yet?

1 ...

Report Post

Posted by LordSaintly

Member since Dec 2005

38859 posts

Posted on 3/4/24 at 4:57 pm to thermal9221

quote:
I don’t think it can maintain turbine generators.

Robots can eventually

This post was edited on 3/4/24 at 4:58 pm

1 ...

Report Post

Posted by Y.A. Tittle

Member since Sep 2003

101312 posts

Posted on 3/4/24 at 5:00 pm to Ghost of Colby

quote:
Good news:
Bye-bye woke programming

I assumed it couldn't be long before it figured this shite out.

0 ...

Report Post

Posted by Ace Midnight

Between sanity and madness

Member since Dec 2006

89483 posts

Posted on 3/4/24 at 5:03 pm to Ingeniero

quote:
This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations

Maybe we can use AI to help us create effective tests and evaluations of AI?

:inb4skynet:

0 ...

Report Post

Posted by thermal9221

Youngsville

Member since Feb 2005

13203 posts

Posted on 3/4/24 at 7:20 pm to LordSaintly

quote:
Robots can eventually

No they can’t and probably never will.

1 ...

Report Post

Posted by dewster

Chicago

Member since Aug 2006

25315 posts

Posted on 3/4/24 at 7:33 pm to Ingeniero

We use AI at work to help with design.

To put it politely…. it’s crap and is currently not a real threat to replacing a skilled human. That probably will change in 5-10 years though.

1 ...

Report Post

Posted by HeadSlash

TEAM LIVE BADASS - St. GEORGE

Member since Aug 2006

49532 posts

Posted on 3/4/24 at 7:36 pm to Ingeniero

Terminator was a warning

0 ...

Report Post

Posted by Slippy

Across the rivah

Member since Aug 2005

6571 posts

Posted on 3/4/24 at 7:38 pm to Ingeniero

This is The Matrix.

0 ...

Report Post

Posted by deeprig9

Unincorporated Ozora, Georgia

Member since Sep 2012

63885 posts

Posted on 3/4/24 at 7:49 pm to Ingeniero

So this particular AI is an obstinate teenager that has figured out dry sarcasm.

This won't end well for us.

frick AI.

1 ...

Report Post

Posted by TDFreak

Dodge Charger Aficionado

Member since Dec 2009

7352 posts

Posted on 3/4/24 at 8:01 pm to dewster

quote:
To put it politely…. it’s crap and is currently not a real threat to replacing a skilled human. That probably will change in 5-10 years though.

You say this, and now watch it happen in three years.

1 ...

Report Post

Posted by Chrome

Chromeville

Member since Nov 2007

10297 posts

Posted on 3/4/24 at 8:06 pm to Ingeniero

I really believe it's a race between those who will eventually program AI to destroy us or those who continually try genetic manipulation to kill us off.

0 ...

Report Post

Posted by Guess

Down The Road

Member since Jun 2009

3768 posts

Posted on 3/4/24 at 8:08 pm to TDFreak

I think people underestimate just how big of threat AI can be. I mean it's designed to be able to learn and it can write and correct your coding today.

1 ...

Report Post

Posted by AmishSamurai

Member since Feb 2020

2658 posts

Posted on 3/4/24 at 8:15 pm to Guess

quote:
I mean it's designed to be able to learn and it can write and correct your coding today

No, it can't unless you're a 2.0 student from a community college.

Higher level thought that is required for sophisticated programming is still way, way outside its corpus of knowledge.

That said, all the Python/React script kiddies in the world will be out of jobs in a few years ...

Page 1 2 3