Sign In Register

Sign In Register Forums Trending Topics Recruiting Pick'em LSU Football Schedule SECRant.com More Links

Quick Links:

•LSU Recruiting

Return•Bottom

Page 1

Page 1

Started By

Message

Question for Statistics Gurus

Posted on 7/13/14 at 6:49 pm

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 6:49 pm

(Repost from tech board after thinking that might not be the best one for this)

So for work I am developing a new analytical method so serve as an alternative to current practices.

One of the data outputs of the model is in the form of difference of predicted value and value obtained by the reference method.

For instance:

Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.

I was thinking something along the lines of computing the 95% confidence interval and seeing if the data matched it (that is, 95% of the results actually fell into the interval), but wasn't sure if I was making an invalid assumption doing that.

Any thoughts?

upvote

0

downvote

0

Posted by Jones

Member since Oct 2005

90449 posts

Posted on 7/13/14 at 6:54 pm to Volvagia

i forgot 95% of the statistics i took in grad school 5 minutes after the final

if you dont get legit responses today, bump this thread on monday so you at least get the bored at work crowd

upvote

0

downvote

0

Posted by djangochained

Gardere

Member since Jul 2013

19054 posts

Posted on 7/13/14 at 6:55 pm to Volvagia

Get a real job nerd

upvote

0

downvote

0

Posted by biglego

Ask your mom where I been

Member since Nov 2007

76220 posts

Posted on 7/13/14 at 6:58 pm to Volvagia

iPhone>droid

hope that helps

upvote

0

downvote

0

Posted by Pectus

Internet

Member since Apr 2010

67302 posts

Posted on 7/13/14 at 6:59 pm to Volvagia

From your statistical test you should have an alpha value built in, draw those on either side of your line to show confidence interval and basically a correlation window.

You can use a simple percent error equaiton
Or you can do +/- 0.05 if your confidence interval is 0.95 (95%).

Is there a reason your red line is at 0 and the blue line is just above it, or is that your regression line?

This post was edited on 7/13/14 at 7:02 pm

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 7:00 pm to biglego

Yeah, thats about why I posted on the tech board first.

upvote

0

downvote

0

Posted by Winkface

Member since Jul 2010

34377 posts

Posted on 7/13/14 at 7:08 pm to Volvagia

quote:
Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.

yes, plot your data and then draw the two 95% cl lines with the regression line in the middle. This is assuming your data is normally distributed.

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 7:23 pm to Pectus

Red line is ideal (which is not always 0, but typically is), blue is regression.

I am not seeking to simply compute error....the system already does that with the RMSECV.

I was looking for a way to draw the line in the sand for what an acceptable error statistically speaking would be.

Here is another graph, from one of the messier models to illustrate what I mean:

The vast majority of samples are centered around zero. The overall error is also fairly low, +/- 3%

But there are some that are far outside that in spite of not being outliers, with an error closer to 30%.

My question is if there is an statistical method of where I can draw a line of acceptable error? Like if a certain number of samples is allowed outside of a range, but no more?

upvote

0

downvote

0

Posted by gaetti15

AK

Member since Apr 2013

13361 posts

Posted on 7/13/14 at 7:24 pm to Winkface

need to know a little bit about the design of the experiment first.

I find that in most of my consulting work, people misspecify the model and there results are completely wrong.

CRD, RBD, Latin Square?

It looks like you are comparing something to a control thus if it was a designed experiment and you are looking to test the differences with the control you would use what is called Dunnet's post hoc test.

This post was edited on 7/13/14 at 7:25 pm

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 7:27 pm to Winkface

quote:
This is assuming your data is normally distributed.

It passes normality tests. At least enough to apply the central limit theorem.

quote:
yes, plot your data and then draw the two 95% cl lines with the regression line in the middle.

That's what I was thinking.

So I am accurate in saying that the model is at a 95% confidence interval only 5% of n is outside the 95% range?

Or do they all have to be in the interval?

upvote

0

downvote

0

Posted by Winkface

Member since Jul 2010

34377 posts

Posted on 7/13/14 at 7:34 pm to Volvagia

Anything outside the cl are outliers, traditionally.

Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance, I'd just do cl on the raw data.

This post was edited on 7/13/14 at 7:35 pm

upvote

0

downvote

0

Posted by DevilDogTiger

RTWFY!

Member since Nov 2007

6364 posts

Posted on 7/13/14 at 7:35 pm to Volvagia

Soccer board

upvote

0

downvote

0

Posted by gaetti15

AK

Member since Apr 2013

13361 posts

Posted on 7/13/14 at 7:38 pm to Winkface

quote:
Anything outside the cl are outliers, traditionally.

Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance,

right if you are looking for outliers I wouldn't use just regular residuals.

In regression it is better to use the r-studentized residuals to check for outliers, usually anything >=2.5 are considered outliers.

But you only want to remove data that is both an outlier and influential.

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 7:41 pm to gaetti15

quote:
need to know a little bit about the design of the experiment first.

This is using FT-NIR spectroscopy as a quantitative technique. You take a collection of various samples and collect the absorbance spectra of it. Then you obtain the attribute values from a different reference method. You input these reference values into the computer, and it looks for a correlative function via PLS regression between the reference value and the integrated spectrum area based on the parameters you put in (wavelength regions, mathematical preproccessing of them, etc)

Now you have a function correlating spectra signal to reference value, now that remains is to test it for accuracy. The first is a cross validation test, where one of the spectra in the calibration is excluded and tested with the calibrations of the other spectra, repeated for all calibration samples.

That is a preliminary test.

The final test is showing results of the model to spectra not contained in the calibration spectra at all.

All graphics I have shown prior to this point have been of the difference of predicted values and actual values. While the model data itself isn't normally distributed, the residuals are

This post was edited on 7/13/14 at 7:49 pm

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 7:47 pm to Winkface

quote:
Anything outside the cl are outliers, traditionally.

Unfortunately the underlying chemistry here is such that you can expect to see SOME outliers due to unknown factors mucking up the univariate calibration.

Part of the expertise of doing this is separating the "valid" outliers to the ones that should be excluded from the model calibration. The ones that remain are not separate enough from the rest of the group to legitimately exclude them, regardless of confidence interval.

upvote

0

downvote

0

Posted by Volvagia

Fort Worth

Member since Mar 2006

51896 posts

Posted on 7/13/14 at 7:51 pm to Volvagia

As a FWIW, here is the cross validation plot of the two models of residuals I already posted:

upvote

0

downvote

0

Posted by gaetti15

AK

Member since Apr 2013

13361 posts

Posted on 7/13/14 at 7:54 pm to Volvagia

The process you are doing is correct.

Cross-validation is definitely the way to go with a regression problem like this.

If you are concerned in trying to find the difference between a true outlier and an something that would be considered wrong because of the process I would look at the r-studentized residuals.

ETA: If you want I can give you a reference to a professional statistician I know who loves this kind of stuff. Actually works with professors in Food Science on similar issues to yours.
These type of residuals are similar to z-scores.

If you have rstudent values over ~+/- 2.5 that means that the value the regression predicted had only a P(Z>=2.5) <0.0001 chance of being replicated again.

This post was edited on 7/13/14 at 7:57 pm

upvote

0

downvote

0

Posted by LT

The City of St. George

Member since May 2008

5151 posts

Posted on 7/13/14 at 8:16 pm to Volvagia

The solution is right in front of you. If you want my help pm me and I'll tell you where to send the money.

upvote

0

downvote

0

Return To Board

Page 1

Return To Board

first page

prev page

Page 1 of 1 Next page

Next page

last page

refresh

Latest LSU News »

Report: Jarvis Landry Is Healthy And Making His Return To the NFL

Five Tigers Sign Undrafted Free Agents Deals

Auburn Posts 7-5 Win Over LSU In Sunday's Finale

Watch: Cory The LSU Baseball Fan That Was Hit By Tommy White's HR Was On The SEC Network

Josh Pearson Walks-Off Auburn, 3-2 To Take Series

Terrell Owens Reacts To The 49ers Signing His Son

Former Tennis Star Genie Bouchard Goes Viral For Bold Outfit At Grocery Store

Nick Saban Reveals Miss Terry's Favorite Former Alabama Player

Kay Adams Was at a Bachelorette Party This Weekend

John Kruk Tells Story Of Guy Goingto Prison After Catching His Girlfriend Cheating On Him

Latest SEC Headlines »

SEC Dominates The NFL Draft Per Usual, Here's A Recap From The Weekend & Team Draftees

Arkansas DB Transfer Lorando Johnson Commits To His Old School

Auburn WR Transfer Jay Fair Commits To Big Ten Program

Alabama OL James Brockermeyer Expected To Transfer To Big 12 Program

Nick Saban Reveals Miss Terry's Favorite Former Alabama Player

Popular

Terrell Owens Reacts To The 49ers Signing His Son

Jayden Daniels' Expected Rookie Contract Figures With The Commanders Have Been Revealed

Get Ready To See Girlfriend of Georgia TE Brock Bowers During NFL Draft

Video Of Top NFL Draft Pick's Mom Boxing Out Girlfriend Is Going Viral

Five Tigers Sign Undrafted Free Agents Deals

Follow TigerDroppings for LSU Football News

Follow us on Twitter, Facebook and Instagram to get the latest updates on LSU Football and Recruiting.

Facebook•Twitter•Instagram