Sign In Register

Sign In Register Forums Trending Topics Recruiting Pick'em LSU Football Schedule SECRant.com More Links

Quick Links:

•LSU Recruiting

Return•Bottom

Page 1

Page 1

Started By

Message

Question for Statistics Gurus

Posted on 7/13/14 at 6:49 pm

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 6:49 pm

(Repost from tech board after thinking that might not be the best one for this)

So for work I am developing a new analytical method so serve as an alternative to current practices.

One of the data outputs of the model is in the form of difference of predicted value and value obtained by the reference method.

For instance:

Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.

I was thinking something along the lines of computing the 95% confidence interval and seeing if the data matched it (that is, 95% of the results actually fell into the interval), but wasn't sure if I was making an invalid assumption doing that.

Any thoughts?

upvote

0

downvote

0

Posted by LSU fan 246

Member since Oct 2005

90567 posts

Posted on 7/13/14 at 6:54 pm to Volvagia

i forgot 95% of the statistics i took in grad school 5 minutes after the final

if you dont get legit responses today, bump this thread on monday so you at least get the bored at work crowd

upvote

0

downvote

0

Posted by djangochained

Gardere

Member since Jul 2013

19054 posts

Posted on 7/13/14 at 6:55 pm to Volvagia

Get a real job nerd

upvote

0

downvote

0

Posted by biglego

Ask your mom where I been

Member since Nov 2007

76643 posts

Posted on 7/13/14 at 6:58 pm to Volvagia

iPhone>droid

hope that helps

upvote

0

downvote

0

Posted by Pectus

Internet

Member since Apr 2010

67302 posts

Posted on 7/13/14 at 6:59 pm to Volvagia

From your statistical test you should have an alpha value built in, draw those on either side of your line to show confidence interval and basically a correlation window.

You can use a simple percent error equaiton
Or you can do +/- 0.05 if your confidence interval is 0.95 (95%).

Is there a reason your red line is at 0 and the blue line is just above it, or is that your regression line?

This post was edited on 7/13/14 at 7:02 pm

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 7:00 pm to biglego

Yeah, thats about why I posted on the tech board first.

upvote

0

downvote

0

Posted by Winkface

Member since Jul 2010

34377 posts

Posted on 7/13/14 at 7:08 pm to Volvagia

quote:
Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.

yes, plot your data and then draw the two 95% cl lines with the regression line in the middle. This is assuming your data is normally distributed.

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 7:23 pm to Pectus

Red line is ideal (which is not always 0, but typically is), blue is regression.

I am not seeking to simply compute error....the system already does that with the RMSECV.

I was looking for a way to draw the line in the sand for what an acceptable error statistically speaking would be.

Here is another graph, from one of the messier models to illustrate what I mean:

The vast majority of samples are centered around zero. The overall error is also fairly low, +/- 3%

But there are some that are far outside that in spite of not being outliers, with an error closer to 30%.

My question is if there is an statistical method of where I can draw a line of acceptable error? Like if a certain number of samples is allowed outside of a range, but no more?

upvote

0

downvote

0

Posted by gaetti15

AK

Member since Apr 2013

13371 posts

Posted on 7/13/14 at 7:24 pm to Winkface

need to know a little bit about the design of the experiment first.

I find that in most of my consulting work, people misspecify the model and there results are completely wrong.

CRD, RBD, Latin Square?

It looks like you are comparing something to a control thus if it was a designed experiment and you are looking to test the differences with the control you would use what is called Dunnet's post hoc test.

This post was edited on 7/13/14 at 7:25 pm

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 7:27 pm to Winkface

quote:
This is assuming your data is normally distributed.

It passes normality tests. At least enough to apply the central limit theorem.

quote:
yes, plot your data and then draw the two 95% cl lines with the regression line in the middle.

That's what I was thinking.

So I am accurate in saying that the model is at a 95% confidence interval only 5% of n is outside the 95% range?

Or do they all have to be in the interval?

upvote

0

downvote

0

Posted by Winkface

Member since Jul 2010

34377 posts

Posted on 7/13/14 at 7:34 pm to Volvagia

Anything outside the cl are outliers, traditionally.

Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance, I'd just do cl on the raw data.

This post was edited on 7/13/14 at 7:35 pm

upvote

0

downvote

0

Posted by DevilDogTiger

RTWFY!

Member since Nov 2007

6374 posts

Posted on 7/13/14 at 7:35 pm to Volvagia

Soccer board

upvote

0

downvote

0

Posted by gaetti15

AK

Member since Apr 2013

13371 posts

Posted on 7/13/14 at 7:38 pm to Winkface

quote:
Anything outside the cl are outliers, traditionally.

Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance,

right if you are looking for outliers I wouldn't use just regular residuals.

In regression it is better to use the r-studentized residuals to check for outliers, usually anything >=2.5 are considered outliers.

But you only want to remove data that is both an outlier and influential.

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 7:41 pm to gaetti15

quote:
need to know a little bit about the design of the experiment first.

This is using FT-NIR spectroscopy as a quantitative technique. You take a collection of various samples and collect the absorbance spectra of it. Then you obtain the attribute values from a different reference method. You input these reference values into the computer, and it looks for a correlative function via PLS regression between the reference value and the integrated spectrum area based on the parameters you put in (wavelength regions, mathematical preproccessing of them, etc)

Now you have a function correlating spectra signal to reference value, now that remains is to test it for accuracy. The first is a cross validation test, where one of the spectra in the calibration is excluded and tested with the calibrations of the other spectra, repeated for all calibration samples.

That is a preliminary test.

The final test is showing results of the model to spectra not contained in the calibration spectra at all.

All graphics I have shown prior to this point have been of the difference of predicted values and actual values. While the model data itself isn't normally distributed, the residuals are

This post was edited on 7/13/14 at 7:49 pm

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 7:47 pm to Winkface

quote:
Anything outside the cl are outliers, traditionally.

Unfortunately the underlying chemistry here is such that you can expect to see SOME outliers due to unknown factors mucking up the univariate calibration.

Part of the expertise of doing this is separating the "valid" outliers to the ones that should be excluded from the model calibration. The ones that remain are not separate enough from the rest of the group to legitimately exclude them, regardless of confidence interval.

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51935 posts

Posted on 7/13/14 at 7:51 pm to Volvagia

As a FWIW, here is the cross validation plot of the two models of residuals I already posted:

upvote

0

downvote

0

Posted by gaetti15

AK

Member since Apr 2013

13371 posts

Posted on 7/13/14 at 7:54 pm to Volvagia

The process you are doing is correct.

Cross-validation is definitely the way to go with a regression problem like this.

If you are concerned in trying to find the difference between a true outlier and an something that would be considered wrong because of the process I would look at the r-studentized residuals.

ETA: If you want I can give you a reference to a professional statistician I know who loves this kind of stuff. Actually works with professors in Food Science on similar issues to yours.
These type of residuals are similar to z-scores.

If you have rstudent values over ~+/- 2.5 that means that the value the regression predicted had only a P(Z>=2.5) <0.0001 chance of being replicated again.

This post was edited on 7/13/14 at 7:57 pm

upvote

0

downvote

0

Posted by LT

The City of St. George

Member since May 2008

5151 posts

Posted on 7/13/14 at 8:16 pm to Volvagia

The solution is right in front of you. If you want my help pm me and I'll tell you where to send the money.

upvote

0

downvote

0

Return To Board

Page 1

Return To Board

first page

prev page

Page 1 of 1 Next page

Next page

last page

refresh

Latest LSU News »

LSU Relief Pitcher Enters The Transfer Portal After Two Seasons

Alex Milazzo Shares Heartfelt Letter He Received From A Young LSU Fan

A Former Tiger Wide Receiver Signs With The Pittsburgh Steelers

11 LSU Teams Score 990 Or Better On NCAA's Academic Progress Report

Video: Emotional Postgame Comments From Jay Johnson & Will Hellmers After UNC Loss

Photo Of Cowboys Cheerleader's 'Insane' Flexibility Is Making The Rounds

Kobe Bryant Helicopter Crash Costume At Chinese Comic Con Sparks Outrage

Lane Kiffin's Son Has Received Two Major Scholarship Offers

Former NFL Player Sparks Massive Debate With McDonald's Fries Opinion

Derek Jeter Finally Sold His New York 'Castle' For $6.3 Million

Latest SEC Headlines »

Lane Kiffin's Son Has Received Two Major Scholarship Offers

Kirby Smart Doesn't Think Georgia Was Snubbed From College Football Playoff

Five SEC Teams Advance To The Super Regionals, Here's The Schedule For This Weekend

Tennessee's Mascot Flashed Pitcher During NCAA Tournament

Mississippi State Softball Player Brylie St. Clair Shows Off Her Weekend Looks

Popular

LSU Football Had The Perfect Tweet After Justin Jefferson Lands Monster Deal

LSU Relief Pitcher Enters The Transfer Portal After Two Seasons

Video: Emotional Postgame Comments From Jay Johnson & Will Hellmers After UNC Loss

Deshaun Watson's Girlfriend Poses For Sports Illustrated Swimsuit Runway

Mississippi State Softball Player Brylie St. Clair Shows Off Her Weekend Looks

Follow TigerDroppings for LSU Football News

Follow us on Twitter, Facebook and Instagram to get the latest updates on LSU Football and Recruiting.

Facebook•Twitter•Instagram