- Dr Bryan Hall - pre 2010

Structure of Information & Constraints to Analysis

This information abstracted from: Hall, Bryan
Environmental Mapping Systems - Locationally Linked Databases
University of Western Sydney Masters Thesis - Submitted 1994

Purpose of Inductive Statistics - Evidence From Multiple Observations

Analytical experimental work is frequently carried out under conditions that are well controlled and consequently observations can be made accurately. However, in many situations the material is inherently variable and consequently the results have to be treated statistically. The theory of Mathematical Statistics is concerned with obtaining all and only the conclusions for which multiple observations are evidence. Mathematical statistics is not merely the handling of facts stated in numerical terms (Kaplan; 1961).

The purpose of inductive statistics is to provide methods for making statistical inferences about a population based on a collection of sampled individuals. Because inductive statistical inferences are determined on a probability basis the inferences are probabilistic.

The testing of an appropriate hypothesis relating to measurable characteristics is central to statistical decision making and consequently when applying statistical methods, it is essential to carefully and precisely define the problem to be solved. Inductive statistics is based on the mathematics of probability. Before any inductive statistical analysis can be conducted on a population or other datastructure it is necessary to establish exactly how a random experiment could be conducted on the datastructure and the outcome that would be expected from a long series of trials of the random experiment.

Changing a Casual Observation Into Useful Information

To change a casual observation into useful information or data requires the detailed reporting of at least the following attributes of the observation:

Any observation that does not record explicitly or implicitly all of the above attributes is inadequate in its information content. Data for which no record of precision or reliability exists should be suspect. Once such data are entered into a computer it is assumed that they are accurate to the stated level of precision (Sinton; 1978).

At least three important types of data generalisation commonly take place:

These procedures of abstraction and generalisation significantly affect the utility of data for analytic purposes. Certain types of detail present in the original data may be lost. It is important to establish the extent and characteristics of the detail lost in the process of generalisation as this affects the nature of the thematic content of the information (Sinton; 1978).

Randomness, Independence And Experiments

An event occurs "at random" in the case where the circumstances required to bring about the event exist "in the hands of chance". In such cases no explicit cause exists and consequently it is not possible to directly attribute a cause or collection of contributory processes.

Events are said to be independent in the case where the occurrence of any one of the events, or occurrence of any combination of the events, has no bearing on the occurrence of any of the other events, or any combination of any of the other events. It is instructive to define an experiment as a process having the following properties:

A single performance of an experiment is referred to as a trial and the result as the outcome of the trial. It is a matter of common observation that most random experiments exhibit statistical regularity. That is to say, the relative frequency of an event E selected from a finite sample space in a long sequence of trials approaches some constant value. Because of this convergence a number P(E) is postulated and defined as the probability of the event E from the random experiment. The statement E has probability P(E) should be understood to mean that over a long series of trials it is to be expected that the relative frequency of E will converge to P(E) (Kreyszig; 1988).

Scales Of Measurement - Stating Facts In Numerical Terms.

To formulate information in a scientific manner it is essential to have basic facts stated in numerical terms. However, it is not necessary to enumerate each unit in the universe in order to arrive at an acceptable estimate for the total. A carefully designed sample may provide the necessary information (Raj; 1968). However, it is possible to enumerate only those characteristics of entities for which there exists a correspondence between the characteristic and the concept of Cardinality.

Scales of measurement based on elements of the Real Number Line are possible only because there exists an isomorphism between what can be done with measurable properties of objects and what can be done with numbers (Stevens; 1946). When measuring characteristics of objects, experimental operations are performed for classifying (determining equality), for rank-ordering, and for determining when differences and when ratios between the aspects of objects are equal. The empirical operations performed and the characteristics of the property being measured determine the type of measuring scale attained (Stevens; 1946).

Table 2.1 describes the group structure and permissible statistics for each of 4 scales of measurement that can be erected when using a representation based on elements of the Real Number Line. The mathematical group structure of a scale is determined by the collection of algebraic functions which leave the scale form invariant. For a statistic to have any meaning when applying any particular scale the statistic must be invariant under all the transformations permissible for the scales listed mathematical group structure.
Scale Basic Empirical operations Mathematical Group Structure Permissible Statistics (invariantive)
Nominal Determination of Equality Permutation Group: x(i) = f(x)
f(x) means any one-to-one substitution.
  • Number of cases
  • Mode 
  • Contingency correlation
Ordinal Determination of greater or less Isotonic group: x(i) = f(x)
f(x) means any monotonic increasing function.
  • Median Percentiles
Interval Determination of equality of intervals or differences General Linear Group
x(i) = ax + b
  • Mean
  • Standard deviation
  • Rank-order correlation
  • Product-moment correlation
Ratio Determination of equality of ratios Similarity Group
x(i) = ax
  • Coefficient of variation

Table 2.1: The scales of measurement (Birkhoff in Stevens; 1946).

The permutation group includes the isotonic, general linear, and similarity groups as subgroups. Therefore, any statistic applicable when using the nominal scale is automatically applicable when using an ordinal, interval or ratio scale. Similarly, any statistic applicable when using an ordinal scale can be applied when using the interval or ratio scale and any statistic applicable when using the interval scale is applicable when using the ratio scale.

In order to take ratios of measurable characteristics of objects in any self-consistent manner it is essential that the ratio scale be used. This scale requires an absolute zero. Measurement of temperature in degrees Celsius serves as a specific example. Given a temperature x 0C it is not reasonable to consider a temperature 2*x 0C; for if x be 10 0C then 2*x corresponds to 20 0C, but if x be -10 0C then 2*x 0C corresponds to a temperature of -20 0C! The problem arises since measurements of temperature in degrees Celsius conform to only the interval scale.

Mean Of A Population Does Not Approach That Of A Sample

Attempts to broaden the use of descriptive statistics from the sample on which they were based to cover the entire sample space are subject to sampling error. In addition, sample statistics can not be used to draw conclusions about the part from the whole. Inductive statistics can be used only to ascertain properties of the whole from that of a sample.

The application of Bayes rule from limited survey information, which may not be representative ,to infer the distribution of an entire population, requires the assumption that the mean of the population tends to (or is distributed about) that of the sample. One of the most common errors when using Bayesian Statistics is this assumption: that the mean of a population approaches that of a sample (Goode; 1962). Such an assumption is false and is the converse of the true situation: subject to certain conditions the mean of a collection of sampled properties tends to that of the population.

Supplement Glossary

The meaning of terms defined by authors varies, therefore explanation of some common terms may serve to prevent confusion. The Collins Dictionary of Mathematics presents generally accepted standard definitions.

Miscellaneous Application Specific Notes

Exact Indexing

To prevent multiple indexing of identical electronic information you can dynamically insert instructions to indexing robots. For example inserting the following instructions in everything other than an archive copy will ensure that each document is uniquely indexed.


ABS Standardised Statistical Classifications

Australian Bureau of Statistics

Australian Spatial Data Standards

National Map Accuracy Standards

Document Type Definition (DTD) for geospatial metadata in Australia

Note: XML parsers are required to check that data sets are structurally valid at run time - in short if the structure of the information is not valid then the parser MUST crash it.

Controlled Vocabularies and Thesaurus

BOUNDARIES Administrative
BOUNDARIES Biophysical



This information abstracted from: Hall, Bryan
Environmental Mapping Systems - Locationally Linked Databases
University of Western Sydney Masters Thesis - Submitted 1994