Articles on this Page
- 07/02/11--19:53:_Comment on Spreading the...
- 07/23/11--08:06:_Comment on Combining...
- 10/17/11--10:08:_Comment on Comparing...
- 11/11/11--11:39:_Comment on EIV/TLS...
- 11/30/11--22:47:_Comment on GHCN and...
- 12/01/11--06:01:_Comment on GHCN and...
- 12/01/11--07:20:_Comment on GHCN and...
- 12/04/11--10:15:_Comment on 2010 Spring...
- 12/27/11--10:41:_Comment on GHCN and...
- 12/27/11--10:45:_Comment on GHCN and...
More Channels
- Jan 29: adrianlikins.com » linux
- Nov 27: AppealDemocrat - News Headlines...
- Jan 11: Hardcore Bachelor Party Sex
- Dec 11: LUCID INTERVALS » Rinojo
- Jan 29: Chicago Food Snob Recent Posts
- Jan 27: 義達祐未...
- Nov 28: Lemma Tacchini - избранное
- Dec 29: Twitter / Poelilrichgirl
- Jan 28: Crawley Observer - Lifestyle Feed
- Nov 29: Bleog
- Nov 18: ` › 0▐▬▌ Y℮A▐▬▌...
- Nov 18: :: Spaventapassere.blog ::
- Dec 18: can you feel this magic in the air?
- Nov 27: Fantasy blog - dřevárny, larp,...
- Dec 25: Michelle S on LOOKBOOK.nu
- Nov 28: have you ever been alone in a...
- Jan 23: バケツ持ってウロウロ...
- Nov 28: TrUe_BloNdie's Xanga
- Jan 29: AreaGirls.com RSS Feed
- Nov 28: Ameba GG 今日のペット紹介
- Dec 12: Delicious/ippe/georss
- Nov 29: catatan harian si njil
- Jan 26: it's all so complicated.
- Jan 28:
- Nov 28: c zamora
- Jan 23: backpage.com | electronics |...
- Jan 29: Newsvine - Top News
- Dec 24: Twitter / Favorites from...
- Jan 20: Radio Launch Pad Blog
- Nov 30: 液柱式差压计
- Jan 22:
- Jan 27: LRB blog
- Dec 7: Twitter / Favorites from baycatSF
- Nov 28: Emo_Everythingx3's Xanga
- Dec 21: Biblioteca Museu Victor Balaguer -
- Nov 28: starry11188's Xanga
- Jan 28: The Columbian stories: Sports
- Nov 19: Strela's Site
- Dec 4: Do you remember? :)
- Dec 25: さくらたんどっとびーず
- Jan 27: WordPress.com News
- Nov 28: Nuevas Iniciativas
- Jan 26: metroheadmusic for metroheadpeople
- Jan 28: backpage.com | apartments &...
- Jan 28: backpage.com | health/beauty...
- Dec 21: Commenti per Merli in testa
- Dec 25: Allendj (Allendj.promodj.ru)
- Jan 28: KITV.com - Health News
- Nov 28: Comments on Brooklyn by Colm Toibin
- Jan 28: Net.hr infocentar
|
|
Are you the publisher? Claim this channel |
|
Latest Articles in this Channel:
- 07/02/11--19:53: Comment on Spreading the Warmth Around by RegEM Impact on Peninsula Correlations « Climate Audit (chan 2237999)
- 07/23/11--08:06: Comment on Combining Stations (Plan C) by Roman M’s anomaly combination incorporated into R « the Air Vent (chan 2237999)
- 10/17/11--10:08: Comment on Comparing Single and Monthly Offsets by The Blackboard » Another land temp reconstruction joins the fray (chan 2237999)
- 11/11/11--11:39: Comment on EIV/TLS Regression – Why Use It? by Hu McCulloch (chan 2237999)
- 11/30/11--22:47: Comment on GHCN and Adjustment Trends by P. Solar (chan 2237999)
- 12/01/11--06:01: Comment on GHCN and Adjustment Trends by RomanM (chan 2237999)
- 12/01/11--07:20: Comment on GHCN and Adjustment Trends by KevinUK (chan 2237999)
- 12/04/11--10:15: Comment on 2010 Spring Arctic Sea Ice Extent by Ruhroh (chan 2237999)
- 12/27/11--10:41: Comment on GHCN and Adjustment Trends by Layman Lurker (chan 2237999)
- 12/27/11--10:45: Comment on GHCN and Adjustment Trends by Layman Lurker (chan 2237999)
[...] Pole and how its temperatures correlate with the rest of the grid points. These can be found at my statpad site . The R script can be found in a Word document here. This entry was written by RomanM, [...]
[...] long time readers know, I’m a fan of Roman’s temperature combination method which doesn’t require a base period window to offset individual station anomalies in global [...]
[...] method is similar to that of Nick Stokes and Jeff Id/Roman M, in that they all used the Tamino and Roman method of computing a monthly offset for each station such that the sum of the squared differences [...]
Here is some discussion of TLS from the thread "Un-Muddying the Waters" (11/7/11) over on CA. It's more OT over here.
</blockquote>
Hu McCulloch
Posted Nov 10, 2011 at 3:55 PM | Permalink | Reply
Roman —
I can’t say that I have ever actually used TLS, but from reading up some on it after discussions here on CA, it sounds to me like a very reasonable way to handle the errors-in-variables problem.
However, the elementary treatments I have seen just assume that the variances of all the errors are equal. In fact, they ordinarily wouldn’t be equal, and in order for the method to be identified, you have to know what their relative size is (or what the absolute size is on one side). Then you can rescale the variables so that the errors are equal, and use the elementary method. This gives you “y on x” in the limit when you know that x has no measurement error, and “x on y” in the limit where you know that x has measurement error but there are no regression errors.
Is there a standard way to compute standard errors or CI’s for the coefficients in TLS? I haven’t seen that. (Since the measurement-error-only case corresponds to the calibration problem, the CI’s may be nonstandard).
Another issue that is glossed over is the intercept term — most regressions include an intercept, which is the coefficient on a unit “regressor”. However, there is never any measurement error on unity, so it has to be handled differently. A quick fix-up is just to subtract the means from all variables, which forces the regression through the origin, and then to shift it to pass through the variable means instead. But then we still need an estimate of the uncertainty of the restored intercept term.
The widely used econometrics package EViews doesn’t seem to have TLS. An Ivo Petras has contributed a TLS package to Matlab File Exchange. It may work, but as such it has no Matlab endorsement. (I recently found a very helpful program to solve Nash Equilibria there, but it only worked after I corrected two bugs.)
Do you approve of TLS?
RomanM
Posted Nov. 10, 2011 at 5:41 PM | Permalink | Reply
Hu, basically you have it right.
The usual approach is to center all of the variables involved at zero (the line can be moved by de-centering the result later) and as Steve says an svd procedure applied to estimate the coefficients. In the case of two variables, there is an explicit solution. Inference on the result would be difficult and resampling or asymptotic large sample results would likely be the only type of inferential methodology available. I have not bothered to research what these methods might be.
I have some real concerns that the methods don’t really properly take into account the individual “errors” for the predictors or the responses. For each observation, there is in effect a single “residual value” which is apportioned to each of the variables (in exactly the same proportions for each observation) where the apportioning is determined by the fitted line (or plane as the case may be). I wrote up some criticism of the procedure at my blog a year ago.
If you want a simple R function to do TLS, you can try this:
reg.orth = function(ymat, xmat) {
nx = NCOL(xmat); ny = NCOL(ymat)
tmat = cbind(xmat,ymat)
mat.svd = svd(tmat)
coes = -mat.svd$v[1:nx,(nx+1):(nx+ny)] %*% solve(mat.svd$v[(nx+1):(nx+ny),(nx+1):(nx+ny)])
pred = xmat %*% coes
list(coefs=coes,pred=pred)}
ymat and xmat are matrices containing the response variables and the predictor variables, respectively. They should be decentered before use. The output is the regression coefficients and the predicted values for ymat. The residuals can be calculated by subtracting the latter from ymat. In the case of a single variable for each, the coefficient is the slope and the predicted values form a straight line.
Do I “approve” of the procedure? I guess there are probably some uses, but IMHO, I don’t see that it really handles the problem that all of the variables can contain uncertainty in any reasonable fashion.
Hu McCulloch
Posted Nov 11, 2011 at 11:49 AM | Permalink | Reply
Thanks for the link to your blog page, Roman. Since this is getting OT here, we should move this discussion there.
But as you note in one of your comments on your page, the variables must be scaled so that their errors have equal variance before running standard TLS. If this is not done, measuring temperature in F rather than C changes the results, and equalizing the variances of the variables themselves can give absurd results.
More over there later…
</blockquote>
Using the word sourcecode in square brackets should allow posting this sort of code. Here goes:
[sourcecode]
#####################
# read data from two files which have been downloaded from
# http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
# and decompressed by an external program
#v2.mean.Z
#v2.mean.adj.Z
v2.mean = readLines("v2.mean")
v2.madj = readLines("v2.mean_adj")
length(v2.mean) # 595759
length(v2.madj) # 422373
#last ten lines of adjusted file are identical and contain no information
#remove 9 of them
v2.madj = v2.madj[1:422364]
#identify matching station and year lines in both sets
#extract identifying info
idv2 = substr(v2.mean,1,16)
idv2adj = substr(v2.madj,1,16)
sum(idv2[-length(idv2)] > idv2[-1]) #0
sum(idv2adj[-length(idv2adj)] > idv2adj[-1]) #0
#check to see if both setrs are in alphabetical order
#if so the pairing process is faster
#function to pair lines
reconcile= function(dat1,dat2) {
leng1 = length(dat1)
leng2 = length(dat2)
id.pos = rep(NA, leng2)
curr = 1
for (i in 1:leng2) { j = curr
while (dat2[i] >= dat1[j]) {j=j+1}
if (dat2[i]==dat1[j-1]) {
id.pos[i]=j-1
curr = j}}
id.pos }
inds = reconcile(idv2,idv2adj)
#check to see if there are adjusted lines without originals in the raw data
#remove if necessary
sum(is.na(inds)) #31
v2.madjx = v2.madj[-which(is.na(inds))]
indsx = inds[-which(is.na(inds))]
v2.meanx = v2.mean[indsx]
idv2x = idv2[indsx]
idv2adjx = idv2adj[-which(is.na(inds))]
identical(idv2x,idv2adjx) # TRUE
#function to calculate individual monthly differences
diff.calc = function(dat1,dat2) {
len = length(dat1)
outmat = matrix(NA,len,13)
st = 17 + (5*(0:11))
en = st+4
x1 = x2 = rep(NA,12)
for (i in 1:len) {chx1 = dat1[i]
chx2=dat2[i]
outmat[i,1] = as.numeric(substr(chx1,13,16))
if (outmat[i,1] != as.numeric(substr(chx2,13,16))) return("Error")
for (j in 1:12) {
x1[j] = as.numeric(substr(chx1,st[j],en[j]))
x2[j] = as.numeric(substr(chx2,st[j],en[j]))}
x1[x1==-9999]=NA
x2[x2==-9999]=NA
outmat[i,2:13] = (x2-x1)/10}
outmat}
#adjustment = adjusted - unadjusted
adjs = diff.calc(v2.meanx,v2.madjx)
#some statistics
12*422342 # 5068104 total number of monthly values
sum(is.na(adjs[,-1])) # 205985 (4.06%) NAs
sum( adjs[,-1]==0,na.rm=T) # 1631153 (32.18%) unadjusted values
#calculate annual average for each station in a given year
year=adjs[,1]
ann.mean = rowMeans(adjs[,2:13],na.rm=T)
#calculate average of all adjustments in a given year
annadj = data.frame(year,ann.mean)
aveadj = c(by(annadj,annadj$year, function(x) mean(x$ann.m)))
plot(year,ann.mean,cex=.25,main = "Annual Averages for Individual Stations",
xlab="Year", ylab="Degrees (C)" )
plot(as.numeric(names(aveadj)),aveadj, main = "Mean Annual GHCN Adjustment",
xlab = "Year",ylab = "Degrees (C)")
[/sourcecode]
P. Solar: Thanks for the information.
Actually, this post is almost two years old and I have used the "sourcecode" tag in some later threads, e.g.<a href="http://statpad.wordpress.com/2010/03/29/will-the-real-rapid-city-please-stand-up/" rel="nofollow">here</a>.
It does make it easier for the reader to copy the code, because a mouseover produces a floating window in the upper right portion of the text and this allows the reader to copy all of the code with a single click without having to select that code first.
However, long scripts can be somewhat bulky and interfere with the "flow" of a post so that sometimes it might be preferable to put them in a separate file.
PSolar,
May I ask how you came across this thread? As RomanM says it's almost two years old and is IMO a seminal thread as it subsequently sparked off a lot of activity by the 'Blackboard crew' as I call them (zeke h, nick s, r broberg, moshpit, the ccc guys etc) to attempt to refute Roman's analysis here.
Have you read this thread and if not please do so. For example http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/#comment-195 and hopefully you'll agree that despit ethe fact that a further two years have expired the GHCN database is still a mess. BEST haven't improved the situation in any real way and in fact, if anything they've WORST it.
Now whatever happened to Giorgio Gilestro? Most of the people contributing to this thread are still around and stiil post regularly on various blogs (particlarly CSIRO Mannian hockey-stick apologist Prof. Nick Stokes BSc,MSc,PhD. GG is conspicuous by his absence.
KevinUK
Dear Sir;
Am curious about your 'statistical opinion' of the method described by Briffa in Tranche II email 3436;
http://www.ecowho.com/foia.php?file=3468.txt&search=score+of+1
"we are having trouble to express the real message of the reconstructions - being
scientifically sound in representing uncertainty , while still getting the crux of the
information across clearly. It is not right to ignore uncertainty, but expressing this
merely in an arbitrary way (and as a total range as before) allows the uncertainty to swamp
the magnitude of the changes through time . We have settled on this version (attached) of
the Figure which we hoe you will agree gets the message over but with the rigor required
for such an important document.
We have added a box to show the "probability surface" for the most likely estimate of past
temperatures based on all published data. By overlapping all reconstructions and giving a
score of 2 to all areas within the 1 standard error range of the estimates for each
reconstruction , and a score of 1 for the area between 1 and 2 standard errors, you build
up a composite picture of the most likely or "concensus" path that temperatures took over
the last 1200 years (note - now with a linear time axis). This still shows the outlier
ranges , preserving all the information, but you see the central most likely area well ,
and the comparison of past and recent temperature levels is not as influenced by the
outlier estimates. What do you think? We have experimented with different versions of the
shading and this one shows up quite well - but we may have to use some all grey version as
the background to the overlay of the model results."
Probably it is a better use of your life force to consider that entry into the 'reconstruction derby' of which you hinted elsewhere.
Best,
RR
Roman, I just left this comment at Lucia's citing your "Mean Annual GHCN Adjustments" graph from this post. It falls into the category of "things that make you say hmmm".
http://rankexploits.com/musings/2011/climategate-investigation-tallblokegreg-laden-laframboise/#comment-87938