Step 6: Make a decision regarding the hypothesis. Because the calculated test statistic falls to the left of the lowest critical value, we reject the null hypothesis. We conclude that the announcement period abnormal returns are different for horizontal and vertical managers.
r: Identify the test statistic for a hypothesis test about the mean difference for two normal distributions (paired comparisons test).
Frequently, we are interested in the difference of paired observations. If questions about the differences can be expressed as a hypothesis, we can test the validity of the hypothesis. When paired observations are compared, the test becomes a test of paired differences. The hypothesis becomes:
H0: µd = µd0
HA: µd ≠ µd0
Where µd is the population mean of differences and µd0 is the hypothesized difference in means (often zero). the alternative may be one-sided:
HA: µd > µd0
HA: µd < µd0
For paired differences, the test statistic is: t = {(d bar - µd0) / sd bar}.
s: Formulate a null and an alternative hypothesis about the mean difference of two normal populations (paired comparisons test) and determine whether the null hypothesis is rejected at a given level of significance.
Joe Andrews is examining changes in estimated betas for the common stock of companies in the telecommunications industry before and after deregulation. Joe believes that the betas may decline because of deregulation, because companies are no longer subject to the uncertainties of rate regulation, or may increase because there is more certainty regarding competition in the industry.?
Mean of differences in betas = 0.23
Sample standard deviation of differences = 0.14
Sample size = 39
Step 1: State the hypothesis. There is reason to believe that the mean differences may be positive or negative, so a two-sided alternative hypothesis is in order here.
H0: μd? = 0
HA: μd ≠ 0
Step 2: Select the appropriate test statistic. Use the test statistic formula for paired differences (from LOS 1.B.r).
Step 3: Specify the level of significance. We will use the common significance level of 5%.
Step 4: State the decision rule regarding the hypothesis. There are 39 - 1 = 38 degrees of freedom. Using the t-distribution table, the critical value for a 5% level of significance is 2.024.
Step 5: Collect the sample and calculate the sample statistics. Inputting our data into the formula:
t = (0.23 - 0) / (0.14 /?√39) = 0.23 / 0.022418 = 10.2596
Step 6: Make a decision regarding the hypothesis. We reject the null hypothesis of no difference, concluding that there is a difference in betas from before to after deregulation.
t: Discuss the choice between tests of differences between means and tests of mean differences in relation to the independence of samples.
The test of the differences in means is used when there are two independent samples.
The test of the mean of the difference is used when the samples are not independent, but in fact allow paired comparisons.
u: Identify the test statistic for a hypothesis test about the variance of a normally distributed population.
Given that many financial market observers measure risk as variance, the difference in variance is a common focus of statistical analysis.
A test of the population variance requires the use of a Chi-squared distributed test statistic.
The hypotheses tested are:
H0: σ12 = σ02
HA: σ12 ≠ σ02
The alternative hypothesis may also be one-sided.
The Chi-squared distribution is asymmetrical and approaches the normal distribution in shape as the degrees of freedom increase.
The Chi-squared test statistic is:
χ2 = (n - 1)s2 / σ02
v: Formulate a null and an alternative hypothesis about the variance of a normally distributed population and determine whether the null hypothesis is rejected at a given level of significance. In the past, High-Return Equity Fund has advertised that the standard deviation of monthly returns on the fund have been 4%. This was based on estimates from the 1990-1998 period. High-Return wants to verify whether that still adequately describes the standard deviation of the fund's returns. It has collected monthly returns for the period 1998-2000 and determined that the standard deviation of monthly returns is 3.8 % over these 24 months. Is the more recent standard deviation different from the advertised standard deviation??
Step 1: State the hypothesis. The null hypothesis is that the variance of monthly returns is 4%2, or 0.0016. Therefore, this is a two-sided test.
H0: σ12 = 0.0016
HA: σ12 ≠ 0.0016
Step 2: Select the appropriate test statistic. Use the test statistic formula for Chi squared (from LOS 1.B.u).
Step 3: Specify the level of significance. Choosing a 5% level of significance means that there will be a 2.5% probability in each tail of the Chi-square distribution.
Step 4: State the decision rule regarding the hypothesis. There are 23 degrees of freedom. Using the Chi-square values for 23 degrees of freedom and probabilities of 0.975 and 0.025, the critical values are 11.689 and 38.076.
Step 5: Collect the sample and calculate the sample statistics. Inputting our data into the formula:
χ2?= (23)(0.001444) / 0.0016 = 0.033212 / 0.0016 = 20.7575
Step 6: Make a decision regarding the hypothesis. We fail to reject the null hypothesis that the variance is 4% because the computed statistic lies between our two critical values.
w: Identify the test statistic for a hypothesis test about the equality of the variances of two normally distributed populations, based on two independent random samples.
The equality of variances of two populations can be tested with an F-distributed test statistic. The hypotheses tested are:
H0: σ12 = σ22
HA: σ12 ≠ σ22
One-sided alternative tests are also permissible. It is assumed that the populations from which the samples are drawn are normally distributed. The test statistic is F-distributed:
F = s12 / s22
With n1 - 1 and n2 - 1 degrees of freedom.
The F-distribution is right-skewed and is truncated at zero on the left-hand side. The shape of the F-distribution is determined by two degrees of freedom (one pertaining to the numerator, one pertaining to the denominator). The rejection region is always in the right side tail of the distribution. Therefore, when constructing this test statistic, always put the larger variance in the numerator.
x: Formulate a null and an alternative hypothesis about the equality of the variances of two normal populations and, given the test statistic, determine whether the null hypothesis is rejected at a given level of significance.
Annie Cower is examining the earnings for two different industries. Cower has noticed that the earnings of the textile industry seem to be more divergent than those of the paper industry. To confirm this, Cower looked at a sample of 31 textile manufacturers and a sample of 41 paper companies. She calculated the standard deviation of earnings across the textile industry as $4.30, and that of the paper industry companies is $3.80. Are the earnings of textile manufacturers more divergent than those of the paper companies?
Step 1: State the hypothesis. The null hypothesis is that the variance of monthly returns is 4%2, or 0.0016. Therefore, this is a two-sided test.
H0: σ12 = σ22
HA: σ12 >σ22
Where σ12 is the variance of earnings of the textile manufacturers and σ22 is the variance of earnings of the paper companies.
Step 2: Select the appropriate test statistic. Use the test statistic formula for F-distributed (from LOS 1.B.w).
Step 3: Specify the level of significance. This means that the calculated F-value's p-value must be less than 5% in order to reject the null hypothesis.
Step 4: State the decision rule regarding the hypothesis. The appropriate critical F-value is taken from the F-distribution table for a 5% level of significance for 30 and 40 degrees of freedom. If the calculated statistic is greater than the critical value of 1.74, we reject the null hypothesis of equal variances.
Step 5: Collect the sample and calculate the sample statistics. Inputting our data into the formula:
F = 4.302 / 3.802 = 18.49 / 14.44 = 1.2805
Step 6: Make a decision regarding the hypothesis. Because the calculated F-statistic of 1.2805 is less than the critical F-statistic of 1.74, we fail to reject the null hypothesis. The variances are not different from one another.
y: Distinguish between parametric and nonparametric tests.
Parametric tests rely on assumptions regarding the distribution of the population and are specific to parameters.
Nonparametric tests either do not consider a parameter or have few assumptions about the population that is sampled.
Often nonparametric tests are used along with parametric tests. In this way, the nonparametric test is a backup in case the assumptions underlying the parametric test do not hold.
1.C: Correlation and Regression
a: Define and interpret a scatter plot.
A scatter plot is an illustration of the relationship between two variables. In the scatter plot of two variables, X and Y, each point on the plot is an X - Y pair. A scatter plot allows for visual inspection of the data.
b: Define and calculate the covariance between two random variables.
The covariance between two random variables is a statistical measure of the degree to which the two variables move together. The covariance captures how one variable changes when the other variable changes. A positive covariance indicates that the variables tend to move together; a negative covariance indicates that the variables tend to move in opposite directions. The covariance is calculated as:
Covariance = the sum of i = 1 to n of (Xi - X bar)(Yi - Y bar) / n - 1
Where n is the sample size, Xi is the ith observation on variable X, X bar is the mean of the variable X observations, Yi is the ith observation on variable Y, and Y bar is the mean of the variable Y observations.
The actual value of the covariance is not meaningful, because it is affected by the scale of the two variables. That is why we calculate the correlation coefficient - to make something interpretable from the covariance information.
c: Define, calculate, and interpret a correlation coefficient.
The correlation coefficient, r (or also denoted as ρ), is a measure of the strength of the relationship between or among variables. Correlation is a unitless measure of the tendency of two variables to move together. The correlation coefficient is bounded by +1 (the variables move together perfectly) and -1 (the variables move exactly opposite of each other).
r = covariance of X and Y / (standard deviation of X)(standard deviation of Y)
Assume there are 10 observations and you are given the data below:
|
X |
Y |
X - X bar |
(X - X bar)2 |
Y - Y bar |
(Y - Y bar)2 |
(X - X bar)(Y - Y bar) |
Sum |
135 |
416 |
0.00 |
374.50 |
0.00 |
2,342.40 |
445.00 |
X bar = 135 / 10 = 13.5
Y bar = 416 / 10 = 41.6
s2X = 374.5 / 9 = 41.611
s2Y = 2,342.4 / 9 = 260.267
r = (445 / 9) / (√41.611√260.267) = 49.444 / (6.451)(16.133) = 0.475
r = +1: perfect positive correlation
+1 > r > 0: positive relationship
r = 0: no relationship
0 > r > -1: negative relationship
r = -1: perfect positive correlation
d: Describe how correlation analysis is used to measure the strength of a relationship between variables.
The correlation coefficient is bounded by -1 and +1. The closer the coefficient is to these, the stronger the correlation. However, with the exception of the extremes (that is, r = 1.0, or r = -1), we cannot really talk about the strength of a relationship indicated by the correlation coefficient without a statistical test of significance.
Using our previous example in LOS 1.C.c of r = 0.475 and n = 10, the test statistic is:
t =( 0.475√8) / √(1 - 0.475)2 = 1.3435 / 0.88 = 1.5267.
To make a decision, compare the calculated t-statistic with the critical t-statistic for the appropriate degrees of freedom and level of significance. Hence, at a 5% level of significance, the correlation is not significantly different from zero (critical t = 2.3060, two-tailed test - look in the 8df row and match that with the .05, two-tailed column or .025 one-tailed column).
e: Formulate a test of the hypothesis that the population correlation coefficient equals zero and determine whether the hypothesis is rejected at a given level of significance.
Example: Suppose the correlation coefficient is 0.2, and the number of observations is 32. What is the calculated test statistic? Is this correlation significantly different from zero using a 5% level of significance?
The hypotheses are:
H0: ρ = 0
HA: ρ ≠ 0
The calculated t-statistic is:
t = .2(√32 - 2) / (√1 - 0.04) = 0.2 √30 / √0.96 = 1.11803
Degrees of freedom = 32 - 2 = 30. Hence, the critical t-value for a 5% level of significance and 30 df is 2.042. Therefore, there is no significant correlation (1.11803 falls between the two critical values of -2.042 and +2.042).
f: Define an outlier and explain how outliers can affect correlations.
An outlier is an extreme value of a variable. The outlier may be quite large or small. An outlier may affect the sample statistics, such as a correlation coefficient. Two things to note about outliers:
1. Outliers can cause us to conclude that there is a significant relationship, when, in fact, there is none or to conclude that there is no relationship when in fact there is a relationship.
2. The researcher must exercise judgment (and caution) when deciding whether to include or exclude an observation.
g: Explain the nature of a spurious correlation
Spurious correlation is the appearance of a relationship when, in fact, there is no relation. Certain data items may be highly correlated but not necessarily a result of a causal relationship. A good example of a spurious correlation is snowfall and stock prices in January. If you regress historical stock prices on snowfall totals in Minnesota, you would get a statistically significant relationship - especially for the month of January. Since there is not an economic reason for this relationship, this would be an example of spurious correlation.
h: Explain the difference between dependent and independent variables in a linear regression.
The variables in a regression consist of dependent and independent variables. The dependent variable is the variable whose variation is being explained by the other variables. Also referred to as the explained variable, the endogenous variable, or the predicted variable.
The independent variable is the variable whose variation is used to explain that of the dependent variable. Also referred to as the explanatory variable, the exogenous variable, or the predicting variable.
i: Distinguish between the slope and the intercept terms in a regression equation.
The parameters in a simple regression equation are the slope (b1) and the intercept(b0):
Yi = b0 + b1 Xi + εi
Where:
Yi = the ith observation on the dependent variable
Xi = the ith observation on the independent variable
b0 = the intercept
b1 = the slope coefficient
εi = the residual for the ith observation
The slope, b1, is the change in Y for a given one-unit change in X. The slope can be positive, negative, or zero.
The intercept,b0, is the line's intersection with the Y-axis at X = 0. The intercept can be positive, negative, or zero.
j: List the assumptions underlying linear regression.
Major assumptions:
1. A linear relationship exists between the dependent and independent variable.
2. The independent variable is uncorrelated with the residuals.
3. The expected value of the disturbance term is zero, that is the mean of the residuals is zero.
4. There is a constant variance of the distribution term. In other words, the disturbance terms are homoskedastistic.
5. The residuals are independently distributed; that is, the residual or disturbance for one observation is not correlated with that of another observation.
6. The disturbance term (residual, or error term) is normally distributed.
k: Define and calculate the standard error of the estimate.
The standard error of the estimate (SEE - also referred to as the standard error of the residual or standard error of the regression, and often indicated as se) is the standard deviation of the predicted dependent variable values about the estimated regression line.
The SEE is easy to calculate. Recall that regression minimizes the sum of the squared vertical distances between the predicted value and the actual value for each observation. The sum of the squared prediction errors is called the sum of squared errors (SSE - not to be confused with SEE). If the relationship between the variables in the regression is very strong, then the prediction errors (and the SSE) will be small. Hence, the standard error of the estimate is a function of the SSE.
Standard error of the estimate (SEE) =√se2 = √(SSE / n - 2).
l: Define and calculate the coefficient of determination.
The coefficient of determination is another way to measure the relationship between the X and Y variables. The coefficient of determination tells you what proportion of the total variation of the dependent variable (Y) is explained or accounted for by the variation in the independent variable (X). The coefficient of determination is called R2 because mathematically it turns out to be just the square of the coefficient of correlation (r). Assuming a correlation coefficient of .86, we discover that the R2 of the index and stock returns is (.86)2 = .74.
A R2 of .74 tells us that 74% of the variation in the stock’s returns (the bell curve shown on the Y-axis) is explained by the variation of the return in the index (the bell curve shown on the X-axis). R2 also describes the systematic relationship between the movement of two variables. In investment finance terms, you would say the movement of the market explains 74% of the stock’s total risk (defined as the stock’s variability). So, 26% of the stock’s volatility is not explained and is unique to the company. Total risk = systematic risk + unsystematic risk.
m: Calculate a confidence interval for a regression coefficient.
A confidence interval is the range of regression coefficient values for a given value estimate of the coefficient and a given level of probability. The confidence interval for a regression coefficient b1 is calculated as:
b1 tcsb1
Where tc is the critical t-value for the selected confidence level. Although this looks slightly different than what we've seen before, it is precisely the same. All confidence intervals take the predicted value then add and subtract the critical test statistic times the variability of the statistic.
The interpretation of the confidence interval is that this is an interval that we believe will include the true parameter with the specified level of confidence. As the standard error of the estimate rises, the confidence interval widens. In other words, the more variable the data, the less confident you will be when you're using the regression model to estimate the coefficient.
n: Identify the test statistic for a hypothesis test about the population value of a regression coefficient.
Hypothesis testing for a regression coefficient involves placing a band around the estimated coefficient. The true parameter is believed to lie somewhere in the band. A high standard error for the coefficient will cause the confidence interval to be wider.