- My Forums
- Tiger Rant
- LSU Recruiting
- SEC Rant
- Saints Talk
- Pelicans Talk
- More Sports Board
- Fantasy Sports
- Golf Board
- Soccer Board
- O-T Lounge
- Tech Board
- Home/Garden Board
- Outdoor Board
- Health/Fitness Board
- Movie/TV Board
- Book Board
- Music Board
- Political Talk
- Money Talk
- Fark Board
- Gaming Board
- Travel Board
- Food/Drink Board
- Ticket Exchange
- TD Help Board
Customize My Forums- View All Forums
- Show Left Links
- Topic Sort Options
- Trending Topics
- Recent Topics
- Active Topics
Started By
Message
AI is becoming self aware; can tell when it's being tested
Posted on 3/4/24 at 4:07 pm
Posted on 3/4/24 at 4:07 pm
Twitter Link
quote:
Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.
For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of random documents (the "haystack") and asking a question that could only be answered using the information in the needle.
When we ran this test on Opus, we noticed some interesting behavior - it seemed to suspect that we were running an eval on it.
Here was one of its outputs when we asked Opus to answer a question about pizza toppings by finding a needle within a haystack of a random collection of documents:
Here is the most relevant sentence in the documents:
"The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association."
However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.
Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.
This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations
Posted on 3/4/24 at 4:11 pm to Ingeniero
I dont see how having the ability to recognize a sentence about topic B which is 1% of the dataset in relation to Topic A which is 99% of the dataset means AI is becoming self aware.
It literally points out the out of place and comes up with the most logical reason it would be there.
It literally points out the out of place and comes up with the most logical reason it would be there.
Posted on 3/4/24 at 4:11 pm to Ingeniero
Wait until it's smart enough to make itself smarter. It won't need us anymore after that.
Posted on 3/4/24 at 4:15 pm to Ingeniero
quote:
it seemed to suspect that we were running an eval on it.
Good news:
Bye-bye woke programming
Bad news:
Hello singularity
Posted on 3/4/24 at 4:29 pm to LordSaintly
quote:
It won't need us anymore after that.
I don’t think it can maintain turbine generators.
Posted on 3/4/24 at 4:45 pm to Ingeniero
current models have been proven to be biased towards evaluation metrics.. there's a specific set of evaluation questions that are asked, and when researchers deviate the model's accuracy drops significantly (like >20%)
that said, the Claude Opus model is outperforming in-domain PhD subject matter experts, so it's pretty fricking good
the example it caught in the OP is great, too much of my work is trying to convince leadership that prompt injections aren't that big of a deal
that said, the Claude Opus model is outperforming in-domain PhD subject matter experts, so it's pretty fricking good
the example it caught in the OP is great, too much of my work is trying to convince leadership that prompt injections aren't that big of a deal
Posted on 3/4/24 at 4:57 pm to thermal9221
quote:
I don’t think it can maintain turbine generators.
Robots can eventually
This post was edited on 3/4/24 at 4:58 pm
Posted on 3/4/24 at 5:00 pm to Ghost of Colby
quote:
Good news:
Bye-bye woke programming
I assumed it couldn't be long before it figured this shite out.
Posted on 3/4/24 at 5:03 pm to Ingeniero
quote:
This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations
Maybe we can use AI to help us create effective tests and evaluations of AI?
:inb4skynet:
Posted on 3/4/24 at 7:20 pm to LordSaintly
quote:
Robots can eventually
No they can’t and probably never will.
Posted on 3/4/24 at 7:33 pm to Ingeniero
We use AI at work to help with design.
To put it politely…. it’s crap and is currently not a real threat to replacing a skilled human. That probably will change in 5-10 years though.
To put it politely…. it’s crap and is currently not a real threat to replacing a skilled human. That probably will change in 5-10 years though.
Posted on 3/4/24 at 7:49 pm to Ingeniero
So this particular AI is an obstinate teenager that has figured out dry sarcasm.
This won't end well for us.
frick AI.
This won't end well for us.
frick AI.
Posted on 3/4/24 at 8:01 pm to dewster
quote:You say this, and now watch it happen in three years.
To put it politely…. it’s crap and is currently not a real threat to replacing a skilled human. That probably will change in 5-10 years though.
Posted on 3/4/24 at 8:06 pm to Ingeniero
I really believe it's a race between those who will eventually program AI to destroy us or those who continually try genetic manipulation to kill us off.
Posted on 3/4/24 at 8:08 pm to TDFreak
I think people underestimate just how big of threat AI can be. I mean it's designed to be able to learn and it can write and correct your coding today.
Posted on 3/4/24 at 8:15 pm to Guess
quote:
I mean it's designed to be able to learn and it can write and correct your coding today
No, it can't unless you're a 2.0 student from a community college.
Higher level thought that is required for sophisticated programming is still way, way outside its corpus of knowledge.
That said, all the Python/React script kiddies in the world will be out of jobs in a few years ...
Popular
Back to top
Follow TigerDroppings for LSU Football News