Uncertainty = Accuracy + Precision + Ambiguity + Vagueness + Logical Fallacies
Arises from our inability to measure phenomena perfectly and flaws in our conceptual models.
There is no standardized measure of data quality in GIS.
When baking mistakes are obvious.
Often in GIS, they are not.
Data must be assessed on a case by case basis.
The terms are related, but the distinction is very important.
Accuracy: The degree to which a set of measurements correctly matches the real world values.
Precision: The degree of agreement between multiple measurements of the same real world phenomena.
Accuracy and precision can be quantified.
Statistical methods can be used to quantify error.
We can quantify any offset (bias) and the spread of the measurements (unbiased).
Mean Absolute Error (MAE):
$MAE = \frac{\sum_{i=1}^N \lvert{x_i-t_i}\rvert}{N}$
$x_i$ = the ith sample value
$t_i$ = the ith true value
$N$ = the total number of samples
Mean Squared Error (MSE):
$MSE = \frac{\sum_{i=1}^N \left({x_i-t_i}\right)^2}{N}$
$x_i$ = the ith sample value
$t_i$ = the ith true value
$N$ = the total number of samples
Root Mean Squared Error (RMSE):
$RMSE = \sqrt{\frac{\sum_{i=1}^N \left({x_i-t_i}\right)^2}{N}}$
$x_i$ = the ith sample value
$t_i$ = the ith true value
$N$ = the total number of samples
Standard Deviation ($\sigma$):
$\sigma=\sqrt{\frac{\sum_{i=1}^N \left({x_i-\overline{X}}\right)^2}{N}}$
$x_i$ = the ith sample value
$\overline{X}$ = the mean of all samples
$N$ = the total number of samples
Confidence Intervals (CI):
$CI = \frac{\sigma}{\sqrt{N}} z$
$\sigma$ = the standard deviation
$N$ = the total number of samples
$z$ = a z-score
Confidence Intervals (CI):
Inter Quartile Range (IQR):
The terms are related, but the distinction is very important.
Vagueness: When something is not clearly stated or defined.
Ambiguity: When something can reasonably be interpreted in multiple ways.
Ambiguity and vagueness are difficult to quantify numerically. But they still must be addressed whenever possible..
The key with these issues:
Where does uncertainty come from and what can we do to minimize it?
Some sources of error are out of our control. The instruments we use to collect data can only so precise.
The concentration of samples in space and time dictates the level of accuracy & precision you can attain.
Things we do have some control over:
Errors that arise when creating vector features:
Errors that arise when creating vector features:
Since geographic phenomena often don’t have clear, natural units, we are often forced to assign zones and labels in our work (i.e. Census Data).
Much of the data we use to learn about society is collected in aggregate. We take average values for many individuals within a group or area (i.e. Census Data).
Even with "perfect" data; GIS operations can add uncertainty:
A flaw in our reasoning that undermines the logic of our argument.
Applying data collected/presented in aggregate for a group/region and applying it to an individual or specific place.
Occurs when we take aggregated data and aggregate it again at a higher level.
The US Electoral College is an example of this in practice:
Modifiable, arbitrary boundaries can have a significant impact on descriptive statistics for areas.
Data collected at a finer level of detail is being combined into larger areas of lower detail that can be manipulated.
Errors are cumulative: