Shahid Lecture-7- MKAG1273

Dr. Shamsuddin ShahidDepartment of Hydraulics and Hydrology

Faculty of Civil Engineering, Universiti Teknologi Malaysia

Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: sshahid@utm.my

MAL1303: STATISTICAL HYDROLOGY

Non-parametric Regression

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Simple Linear Regression: Revisited

Null Hypothesis, H0 : There is no change, m = 0Alternative Hypothesis, HA: There is a change, m ≠ 0

If |t(calculated)| > t (critical, α, n-2), Null hypothesis rejected.The change is significant.

If t(calculated) = 3.59t (critical, 0.05, 10) = 2.23

As t(calculated) > t (critical, 0.05, 10), Null hypothesis rejected.The change is significant.

A change in rainfall by 1mm cause a change indischarge by 1.08 cumec, at 95% level of confidence.

Test of Significance of Slope

Null Hypothesis, H0 : The intercept is zero, c = 0Alternative Hypothesis, HA: There intercept is not zero, m ≠ 0

If |t(calculated)| > t (critical, α, n-2), Null hypothesis rejected.The change is significant.

If t(calculated) = 0.11t (critical, 0.05, 10) = 2.23

As t(calculated) < t (critical, 0.05, 10), Null hypothesis CANNOT BE rejected. The intercept is NOT significantlydifferent from zero.It can be commented that discharge is notsignificantly different from zero at 95% level ofconfidence when rainfall is zero.

Test of Significance of Intercept

ResidualsDifference between actual observation and the predicted observation is called residual.

Distribution of Residuals

Residuals should be normally distributed.

Distribution of Residuals

Distribution of Residuals for the present example.

Abnormal Distribution of Residuals

Leverage

Leverage is a measure of an "outlier" in the x direction. It is a function of thedistance from the i-th x value to the middle (mean) of the x values used inthe regression.

A high leverage point is one where hi > 3p/n where p is the number ofcoefficients in the model (p=2 in simple linear regression, b0 and b1).

Leverage

All hi is less than 3p/n (3*2/12 = 0.5)

Leverage

One hi is more than 3p/n (3*2/12 = 0.5)

Measures of Outliers in the y Direction

One measure of outliers in the y direction is the standardized residual, esi

An extreme outlier is one for which |esi|>3.There should be only an average of 3 of these in 1,000 observations ifthe residuals are normally distributed.

|esi|>2 should occur about 5 times in 100 observations if normallydistributed.

More than this number indicates that the residuals do not have anormal distribution.

Where,

Measures of Outliers in the y Direction

An extreme outlier is one for which |esi|>3.|esi|>2 should occur about 5 times in 100 observations if normally distributed.

Measures of Influence of Outliers

Observations with high influence are those which have both highleverage and large outliers. These exert a stronger influence on theposition of the regression line than other observations.

There are two most widely used methods to measure the influence ofoutlier in regression equation,

1. Cook's D2. DFFITS

Cook's D Method"Cook's D" is one of the most widely method used to measures the influence.

The i-th observation is considered to have high influence if

Di > F(p+1,n−p) at α=0.05

where p is the number of coefficients.

For Simple Linear Regression (SLR) with more than about 30observations, the critical value for Di would be about 2.4.

The DFFITS is a more robust method to diagnosis influence.

DFFITS Method

An observation is considered to have high influence if

Measures of Influence of Outliers

Cooks D: F(p+1,n−p) at α=0.05 = 3.7, Di is always less than 3.7

DFFITS: 2*√pn = 2 *√2*12 = 9.79, DFFITS values are always less than 9.79

Abnormal Distribution of Residuals

Alternative Methods for Regression

Situations such as the above frequently arise where the assumptions ofconstant variance and normality of residuals required by Ordinary LeastRegression (OLS) are not satisfied, and transformations to remedy thisare either not possible, or not desirable.

In these situations, alternative methods are better for fitting lines todata.These include:

• Nonparametric rank-based methods• Minimizing residuals variations• Smooths.

Kendall-Theil Robust Line

Kendall-Theil is non-parametric rank based method.

Related to Kendall-tau rank correlation, it is a robust nonparametricline applicable when Y is linearly related to X.

These are the advantages of Kendall-Theil method in contrast to OLSRegression are:

• Kendall-Theil line does not depend on the normality of residualsfor validity of significance tests

• It is not strongly affected by outliers

Kendall-Theil Robust LineKendall-Theil method also try to find the best fit line:

Where, slope,

and Intercept,

Kendall-Theil Robust Line: Example

0.412 0.595 0.729 0.739 0.750 0.787 0.795 0.812 0.817 0.839 0.856 0.8820.890 0.937 0.985 1.000 1.000 1.010 1.038 1.053 1.063 1.077 1.220 1.2281.393 1.500 1.897 2.222

Median is the average of 14th and 15th slopes, i.e., (0.937+0.985)/2 = 0.961

C = 49.9 – (0.961 * 47.5) = 4.25

Y = 0.961X + 4.25

Kendall-Theil Robust Line: Test of Significance

The test for significance of the Kendall-Theil linear relationship,

H0: m = 0HA: m 0

The steps involve to test the significance:

1. Calculate the S as the sum of the algebraic signs of the possiblepair wise slopes.

2. Calculate the Significance value from table using S and n3. Decide significance.

Number of positive slopes are 24. Negative slopes are 0. Therefore,

S = 24 – 0 = 24N = 8

Table values or (S = 24 and N = 8) = 0.0009Two-tailed test: Significance = 2 X 0.0009 = 0.0018 (Significant)

Confidence Interval of Y

Confidence Interval for Theil Slope

Method for calculating confidence interval of slope is depends onsample size. For small sample size we use tabulated values.

1. For small sample sizes, table is used to find the critical value Xuhaving a p-value nearest to α/2.

2. This critical value is then used to compute the ranks Ru and Rlcorresponding to the slope values at the upper and lowerconfidence limits for slope

Kendall-Theil Robust Line: Confidence Interval

0.412 0.595 0.729 0.739 0.750 0.787 0.795 0.812 0.817 0.839 0.856 0.8820.890 0.937 0.985 1.000 1.000 1.010 1.038 1.053 1.063 1.077 1.220 1.2281.393 1.500 1.897 2.222

There are 24 slopes.Median is the average of 14th and 15th slopes, i.e., (0.937+0.985)/2 = 0.961

To determine a confidenceinterval for slope at 95% level ofconfidence (α = 0.05), the tabledcritical value Xu nearest to α/2=0.025 for N = 8 is found to be 16(p=0.031).

Therefore, Ru = (24 + 16)/2 = 20 Rl = [(24 - 16)/2] + 1 = 5

0.412 0.595 0.729 0.739 0.750 0.787 0.795 0.812 0.817 0.839 0.856 0.8820.890 0.937 0.985 1.000 1.000 1.010 1.038 1.053 1.063 1.077 1.220 1.2281.393 1.500 1.897 2.222

Median = 0.961 with range 0.750 to 1.228

Ru = 20; Rl = 5

When, n 20

Regression: Non-parametric

Sen’s Slope Method

Example: Sen’s Slope Method

Net change is 1.6

Weighted Least Squares (WLS)

With WLS, each squared residual is weighted by some weight factor in such a way that observations with greater variance have lesser weight.

With WLS, X and Y are weighted by,

Where,

And, c is a constant, commonly used 3S = the IQR of the residuals

Smoothing

1. Smoothing is an exploratory technique, having no simple equationor significance tests associated with it.

2. The most common smooths estimate the center of the data -- theconditional mean or median of Y as X changes.

3. The lack of an equation is a strength in the sense that a smooth isnot constrained by some prior assumption as to the mathematicalfunction of the relationship.

Moving Average

• It computes an average of the last m consecutive observations• In contrast to modeling in terms of a mathematical equation, the

moving average merely smooths the fluctuations in the data.• A moving average works well when the data have

– a fairly linear trend– a definite rhythmic pattern of fluctuations

Example of Moving Average

Shahid Lecture-7- MKAG1273

Engineering

Transcript of Shahid Lecture-7- MKAG1273

PPT (BSc) shahid

Shahid Lecture-5- MKAG1273

airtel shahid

Ya Khamooshi Kaha tak by General Shahid Aziz.pdf

06 Shahid Ali ED Ttc 221211

Sahibzada Mohammad Shahid Khan Afridi

Marghazar -- Afsany by Mohd Hameed Shahid

Parveen o Surayya - Shahid Ahmad Dehlavi

Shahid Mehraj Shahshahidshah.weebly.com/uploads/1/1/2/2/11221304/_mcgraw...Shahid Mehraj Shah

Mohammed Shahid Presentation Slides

Drafts- Aga Shahid

Shahid Lecture-1- MKAG1273

Shahid Lecture-13-MKAG1273

SagarKaramchand Sanjay Shahid Ameet

Dr. Mir Shahid Satar - University of Kashmirsouthcampus.uok.edu.in/Files/People/CV_Dr. Mir Shahid Satar.pdf · rofile/Shahid_Mir2ev=hdr_xprf& _sg=owKJoETys5bcF2-KWulX3i77Wy_fTiKvJ0s

Aik Hairat Angaiz Inkishaf - Maulana Dost Muhammad Shahid

Hussain Shahid Thesis, Sadiq

Shahid Lecture-4-MKAG1273

Gen Shahid Labh Singh

Shahid xdre soor.pdf