ERRORS IN STATISTICAL DATA Show Introduction
It is important for a researcher to be aware of these errors, in particular non-sampling error, so that they can be either minimised or eliminated from the survey. An introduction to measuring sampling error and the effects of non-sampling error is provided in the following sections. Sampling Error Factors Affecting Sampling Error The population variability also affects the sampling error. More variable populations give rise to larger errors as the samples or the estimates calculated from different samples are more likely to have greater variation. The effect of the variability within the population can be reduced by increasing the sample size to make it more representative of the survey population. Various sample design options also affect the size of the sampling error. For example, stratification reduces sampling error whereas cluster sampling tends to increase it (these designs are discussed in Sample Design). Standard Error . The standard error is a measure of the spread of estimates around the "true value". In practice, only one estimate is available, so the standard error can not be calculated directly. However, if the population variance is known the standard error can be derived mathematically. Even if the population variance is unknown, as happens in practice, the standard error can be estimated by using the variance of the sample units. Any estimate derived from a probability based sample survey has a standard error associated with it (called the standard error of the estimate, written se(y) where y is the estimate of the variable of interest). Note that :
For more information on how to calculate estimates and their standard errors please refer to Analysis. Variance Relative Standard Error RSE(y) = 100 * {se(y) / y} Confidence Interval Normal Curve 95% CI(y) = [y - {2*se(y)} , y + {2*se(y)}] This is expressed: "We are 95% confident that the true value of the variable of interest lies within the interval [y - {2*se(y)} , y + {2*se(y)}]". Other confidence intervals are the 68% confidence interval (where the confidence interval extends to one standard error on either side of the estimate has a 68% chance of containing the "true value") and the 99% confidence interval (where the confidence interval extends to three standard errors on either side of the survey estimate has a 99% chance of containing the "true value"). For example, suppose a survey estimate is 50 with a standard error of 10. The confidence interval 40 to 60 has a 68% chance of containing the "true value", the interval 30 to 70 has a 95% chance of containing the "true value" and the interval 20 to 80 has a 99% chance of containing the "true value". NON-SAMPLING ERROR Non-sampling errors can occur at any stage of the process. They can happen in censuses and sample surveys. Non-sampling errors can be grouped into two main types: systematic and variable. Systematic error (called bias) makes survey results unrepresentative of the target population by distorting the survey estimates in one direction. For example, if the target population is the population of Australia but the survey population is just males then the survey results will not be representative of the target population due to systematic bias in the survey frame. Variable error can distort the results on any given occasion but tends to balance out on average. Some of the types of non-sampling error are outlined below: Failure to Identify Target Population / Inadequate Survey Population Non-Response Bias Questionnaire problems It is essential that questionnaires are tested on a sample of respondents before they are finalised to identify questionnaire flow and question wording problems, and allow sufficient time for improvements to be made to the questionnaire. The questionnaire should then be re-tested to ensure changes made do not introduce other problems. This is discussed in more detail in Questionnaire Design. Respondent Bias Processing Errors Misinterpretation of Results Time Period Bias Minimising Non-Sampling Error
RESPONDENT BIAS Sensitivity Fatigue NON-RESPONSE Partial Non-Response
Total Non-Response When conducting surveys it is important to collect information on why a respondent has not responded. For example when evaluating a program a respondent may indicate they were not happy with the program and therefore do not wish to be part of the survey. Another respondent may indicate that they simply don't have the time to complete the interview or survey form. If a large number of those not responding indicate dissatisfaction with the program, and this is not indicated in the final report, an obvious bias would be introduced in the results. Minimising Non-Response Following are some hints on how to minimise refusals in a personal or phone contact: Find out the reasons for refusal and try to talk through them
Other measures that can improve respondent cooperation and maximise response include:
In case of a mail survey most of the points above can be stated in an introductory letter or through a publicity campaign. Allowing for Non-Response The main aim of imputation is to produce consistent data without going back to the respondent for the correct values thus reducing both respondent burden and costs associated with the survey. Broadly speaking the imputation methods fall into three groups:
When deciding on the method of imputation it is desirable to know what effect will imputation have on the final estimates. If a large amount of imputation is performed the results can be misleading, particularly if the imputation used distorts the distribution of data. If at the planning stage it is believed that there is likely to be a high non-response rate, then the sample size could be increased to allow for this. However, the non-response bias will not be overcome by just increasing the sample size, particularly if the non-responding units have different characteristics to the responding units. Post-stratification and imputation also fail to totally eliminate non-response bias from the results. Example: Effect of Non-Response
After two follow up reminders there was still only a 37% response rate. From other information it was known that the overall average was 329. The result based on this survey would have been:
If results had been published without any follow-up then the average number of trees would have been too high as farms with greater number of trees appeared to have responded more readily. With follow-up, more smaller farms sent back survey forms and the estimate became closer to the true value. What are parameter values?In math, a parameter is something in an equation that is passed on in an equation. It means something different in statistics. It's a value that tells you something about a population and is the opposite from a statistic, which tells you something about a small part of the population.
How do you define a population parameter?population parameter in American English
noun. Statistics. a quantity or statistical measure that, for a given population, is fixed and that is used as the value of a variable in some general distribution or frequency function to make it descriptive of that population.
Which of the following population parameters is used as a symbol for the mean or average of the population?In statistics, Greek symbols usually represent population parameters, such as μ (mu) for the mean and σ (sigma) for the standard deviation.
What do we call to the range of values that may contain the parameter of a population?For both continuous and dichotomous variables, the confidence interval estimate (CI) is a range of likely values for the population parameter based on: the point estimate, e.g., the sample mean. the investigator's desired level of confidence (most commonly 95%, but any level between 0-100% can be selected)
|