- My Forums
- Tiger Rant
- LSU Recruiting
- SEC Rant
- Saints Talk
- Pelicans Talk
- More Sports Board
- Fantasy Sports
- Golf Board
- Soccer Board
- O-T Lounge
- Tech Board
- Home/Garden Board
- Outdoor Board
- Health/Fitness Board
- Movie/TV Board
- Book Board
- Music Board
- Political Talk
- Money Talk
- Fark Board
- Gaming Board
- Travel Board
- Food/Drink Board
- Ticket Exchange
- TD Help Board
Customize My Forums- View All Forums
- Show Left Links
- Topic Sort Options
- Trending Topics
- Recent Topics
- Active Topics
Started By
Message
Question for Statistics Gurus
Posted on 7/13/14 at 6:49 pm
Posted on 7/13/14 at 6:49 pm
(Repost from tech board after thinking that might not be the best one for this)
So for work I am developing a new analytical method so serve as an alternative to current practices.
One of the data outputs of the model is in the form of difference of predicted value and value obtained by the reference method.
For instance:
Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.
I was thinking something along the lines of computing the 95% confidence interval and seeing if the data matched it (that is, 95% of the results actually fell into the interval), but wasn't sure if I was making an invalid assumption doing that.
Any thoughts?
So for work I am developing a new analytical method so serve as an alternative to current practices.
One of the data outputs of the model is in the form of difference of predicted value and value obtained by the reference method.
For instance:
Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.
I was thinking something along the lines of computing the 95% confidence interval and seeing if the data matched it (that is, 95% of the results actually fell into the interval), but wasn't sure if I was making an invalid assumption doing that.
Any thoughts?
Posted on 7/13/14 at 6:54 pm to Volvagia
i forgot 95% of the statistics i took in grad school 5 minutes after the final
if you dont get legit responses today, bump this thread on monday so you at least get the bored at work crowd
if you dont get legit responses today, bump this thread on monday so you at least get the bored at work crowd
Posted on 7/13/14 at 6:58 pm to Volvagia
iPhone>droid
hope that helps
hope that helps
Posted on 7/13/14 at 6:59 pm to Volvagia
From your statistical test you should have an alpha value built in, draw those on either side of your line to show confidence interval and basically a correlation window.
You can use a simple percent error equaiton
Or you can do +/- 0.05 if your confidence interval is 0.95 (95%).
Is there a reason your red line is at 0 and the blue line is just above it, or is that your regression line?
You can use a simple percent error equaiton
Or you can do +/- 0.05 if your confidence interval is 0.95 (95%).
Is there a reason your red line is at 0 and the blue line is just above it, or is that your regression line?
This post was edited on 7/13/14 at 7:02 pm
Posted on 7/13/14 at 7:00 pm to biglego
Yeah, thats about why I posted on the tech board first.
Posted on 7/13/14 at 7:08 pm to Volvagia
quote:yes, plot your data and then draw the two 95% cl lines with the regression line in the middle. This is assuming your data is normally distributed.
Is there a way to statistically draw a line in the distrubution to say that we have a model with an acceptable error for application.
Posted on 7/13/14 at 7:23 pm to Pectus
Red line is ideal (which is not always 0, but typically is), blue is regression.
I am not seeking to simply compute error....the system already does that with the RMSECV.
I was looking for a way to draw the line in the sand for what an acceptable error statistically speaking would be.
Here is another graph, from one of the messier models to illustrate what I mean:
The vast majority of samples are centered around zero. The overall error is also fairly low, +/- 3%
But there are some that are far outside that in spite of not being outliers, with an error closer to 30%.
My question is if there is an statistical method of where I can draw a line of acceptable error? Like if a certain number of samples is allowed outside of a range, but no more?
I am not seeking to simply compute error....the system already does that with the RMSECV.
I was looking for a way to draw the line in the sand for what an acceptable error statistically speaking would be.
Here is another graph, from one of the messier models to illustrate what I mean:
The vast majority of samples are centered around zero. The overall error is also fairly low, +/- 3%
But there are some that are far outside that in spite of not being outliers, with an error closer to 30%.
My question is if there is an statistical method of where I can draw a line of acceptable error? Like if a certain number of samples is allowed outside of a range, but no more?
Posted on 7/13/14 at 7:24 pm to Winkface
need to know a little bit about the design of the experiment first.
I find that in most of my consulting work, people misspecify the model and there results are completely wrong.
CRD, RBD, Latin Square?
It looks like you are comparing something to a control thus if it was a designed experiment and you are looking to test the differences with the control you would use what is called Dunnet's post hoc test.
I find that in most of my consulting work, people misspecify the model and there results are completely wrong.
CRD, RBD, Latin Square?
It looks like you are comparing something to a control thus if it was a designed experiment and you are looking to test the differences with the control you would use what is called Dunnet's post hoc test.
This post was edited on 7/13/14 at 7:25 pm
Posted on 7/13/14 at 7:27 pm to Winkface
quote:
This is assuming your data is normally distributed.
It passes normality tests. At least enough to apply the central limit theorem.
quote:
yes, plot your data and then draw the two 95% cl lines with the regression line in the middle.
That's what I was thinking.
So I am accurate in saying that the model is at a 95% confidence interval only 5% of n is outside the 95% range?
Or do they all have to be in the interval?
Posted on 7/13/14 at 7:34 pm to Volvagia
Anything outside the cl are outliers, traditionally.
Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance, I'd just do cl on the raw data.
Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance, I'd just do cl on the raw data.
This post was edited on 7/13/14 at 7:35 pm
Posted on 7/13/14 at 7:38 pm to Winkface
quote:
Anything outside the cl are outliers, traditionally.
Looks like you have residuals plotted here. You can do an upper and lower bound for that but for your circumstance,
right if you are looking for outliers I wouldn't use just regular residuals.
In regression it is better to use the r-studentized residuals to check for outliers, usually anything >=2.5 are considered outliers.
But you only want to remove data that is both an outlier and influential.
Posted on 7/13/14 at 7:41 pm to gaetti15
quote:
need to know a little bit about the design of the experiment first.
This is using FT-NIR spectroscopy as a quantitative technique. You take a collection of various samples and collect the absorbance spectra of it. Then you obtain the attribute values from a different reference method. You input these reference values into the computer, and it looks for a correlative function via PLS regression between the reference value and the integrated spectrum area based on the parameters you put in (wavelength regions, mathematical preproccessing of them, etc)
Now you have a function correlating spectra signal to reference value, now that remains is to test it for accuracy. The first is a cross validation test, where one of the spectra in the calibration is excluded and tested with the calibrations of the other spectra, repeated for all calibration samples.
That is a preliminary test.
The final test is showing results of the model to spectra not contained in the calibration spectra at all.
All graphics I have shown prior to this point have been of the difference of predicted values and actual values. While the model data itself isn't normally distributed, the residuals are
This post was edited on 7/13/14 at 7:49 pm
Posted on 7/13/14 at 7:47 pm to Winkface
quote:
Anything outside the cl are outliers, traditionally.
Unfortunately the underlying chemistry here is such that you can expect to see SOME outliers due to unknown factors mucking up the univariate calibration.
Part of the expertise of doing this is separating the "valid" outliers to the ones that should be excluded from the model calibration. The ones that remain are not separate enough from the rest of the group to legitimately exclude them, regardless of confidence interval.
Posted on 7/13/14 at 7:51 pm to Volvagia
As a FWIW, here is the cross validation plot of the two models of residuals I already posted:
Posted on 7/13/14 at 7:54 pm to Volvagia
The process you are doing is correct.
Cross-validation is definitely the way to go with a regression problem like this.
If you are concerned in trying to find the difference between a true outlier and an something that would be considered wrong because of the process I would look at the r-studentized residuals.
ETA: If you want I can give you a reference to a professional statistician I know who loves this kind of stuff. Actually works with professors in Food Science on similar issues to yours.
These type of residuals are similar to z-scores.
If you have rstudent values over ~+/- 2.5 that means that the value the regression predicted had only a P(Z>=2.5) <0.0001 chance of being replicated again.
Cross-validation is definitely the way to go with a regression problem like this.
If you are concerned in trying to find the difference between a true outlier and an something that would be considered wrong because of the process I would look at the r-studentized residuals.
ETA: If you want I can give you a reference to a professional statistician I know who loves this kind of stuff. Actually works with professors in Food Science on similar issues to yours.
These type of residuals are similar to z-scores.
If you have rstudent values over ~+/- 2.5 that means that the value the regression predicted had only a P(Z>=2.5) <0.0001 chance of being replicated again.
This post was edited on 7/13/14 at 7:57 pm
Posted on 7/13/14 at 8:16 pm to Volvagia
The solution is right in front of you. If you want my help pm me and I'll tell you where to send the money.
Popular
Back to top
Follow TigerDroppings for LSU Football News