precisioninfo.com - Dr Bryan Hall - pre 2010

# Structure of Information & Constraints to Analysis

This information abstracted from: Hall, Bryan
Environmental Mapping Systems - Locationally Linked Databases
University of Western Sydney Masters Thesis - Submitted 1994

## Purpose of Inductive Statistics - Evidence From Multiple Observations

Analytical experimental work is frequently carried out under conditions that are well controlled and consequently observations can be made accurately. However, in many situations the material is inherently variable and consequently the results have to be treated statistically. The theory of Mathematical Statistics is concerned with obtaining all and only the conclusions for which multiple observations are evidence. Mathematical statistics is not merely the handling of facts stated in numerical terms (Kaplan; 1961).

The purpose of inductive statistics is to provide methods for making statistical inferences about a population based on a collection of sampled individuals. Because inductive statistical inferences are determined on a probability basis the inferences are probabilistic.

The testing of an appropriate hypothesis relating to measurable characteristics is central to statistical decision making and consequently when applying statistical methods, it is essential to carefully and precisely define the problem to be solved. Inductive statistics is based on the mathematics of probability. Before any inductive statistical analysis can be conducted on a population or other datastructure it is necessary to establish exactly how a random experiment could be conducted on the datastructure and the outcome that would be expected from a long series of trials of the random experiment.

## Changing a Casual Observation Into Useful Information

To change a casual observation into useful information or data requires the detailed reporting of at least the following attributes of the observation:

• Theme - The phenomenon or object being observed or measured must be recorded in some measurable or defined units;
• Location - The simple observation of a phenomenon without a record of the location of the phenomenon rarely generates useful information; and
• Time - A record of observation for which there is no concurrent record of time has minimal information content (Sinton; 1978).

Any observation that does not record explicitly or implicitly all of the above attributes is inadequate in its information content. Data for which no record of precision or reliability exists should be suspect. Once such data are entered into a computer it is assumed that they are accurate to the stated level of precision (Sinton; 1978).

At least three important types of data generalisation commonly take place:

• Aggregation - This usually involves the definition of spatial location and the counting of thematic characteristics for that location. Generated data is interval in nature and may be manipulable by mathematical or statistical techniques;
• Classification - In this case observations are matched and categorised with other observations exhibiting like characteristics; and
• Induction - This is the process by which a series of sample measurements are generalised to include a much larger group of phenomenon or locations assumed to have the same characteristics as the sample measurements (Sinton; 1978).

These procedures of abstraction and generalisation significantly affect the utility of data for analytic purposes. Certain types of detail present in the original data may be lost. It is important to establish the extent and characteristics of the detail lost in the process of generalisation as this affects the nature of the thematic content of the information (Sinton; 1978).

## Randomness, Independence And Experiments

An event occurs "at random" in the case where the circumstances required to bring about the event exist "in the hands of chance". In such cases no explicit cause exists and consequently it is not possible to directly attribute a cause or collection of contributory processes.

Events are said to be independent in the case where the occurrence of any one of the events, or occurrence of any combination of the events, has no bearing on the occurrence of any of the other events, or any combination of any of the other events. It is instructive to define an experiment as a process having the following properties:

• It is performed according to a set of rules that determines the performance completely;
• It can be repeated arbitrarily often; and
• The results of each performance depend on "chance". That is on influences that can not be controlled and therefore can not be uniquely predicted (Kreyszig; 1988).

A single performance of an experiment is referred to as a trial and the result as the outcome of the trial. It is a matter of common observation that most random experiments exhibit statistical regularity. That is to say, the relative frequency of an event E selected from a finite sample space in a long sequence of trials approaches some constant value. Because of this convergence a number P(E) is postulated and defined as the probability of the event E from the random experiment. The statement E has probability P(E) should be understood to mean that over a long series of trials it is to be expected that the relative frequency of E will converge to P(E) (Kreyszig; 1988).

## Scales Of Measurement - Stating Facts In Numerical Terms.

To formulate information in a scientific manner it is essential to have basic facts stated in numerical terms. However, it is not necessary to enumerate each unit in the universe in order to arrive at an acceptable estimate for the total. A carefully designed sample may provide the necessary information (Raj; 1968). However, it is possible to enumerate only those characteristics of entities for which there exists a correspondence between the characteristic and the concept of Cardinality.

Scales of measurement based on elements of the Real Number Line are possible only because there exists an isomorphism between what can be done with measurable properties of objects and what can be done with numbers (Stevens; 1946). When measuring characteristics of objects, experimental operations are performed for classifying (determining equality), for rank-ordering, and for determining when differences and when ratios between the aspects of objects are equal. The empirical operations performed and the characteristics of the property being measured determine the type of measuring scale attained (Stevens; 1946).

Table 2.1 describes the group structure and permissible statistics for each of 4 scales of measurement that can be erected when using a representation based on elements of the Real Number Line. The mathematical group structure of a scale is determined by the collection of algebraic functions which leave the scale form invariant. For a statistic to have any meaning when applying any particular scale the statistic must be invariant under all the transformations permissible for the scales listed mathematical group structure.

Scale Basic Empirical operations Mathematical Group Structure Permissible Statistics (invariantive)
Nominal Determination of Equality Permutation Group: x(i) = f(x)
f(x) means any one-to-one substitution.
• Number of cases
• Mode
• Contingency correlation
Ordinal Determination of greater or less Isotonic group: x(i) = f(x)
f(x) means any monotonic increasing function.
• Median Percentiles
Interval Determination of equality of intervals or differences General Linear Group
x(i) = ax + b
• Mean
• Standard deviation
• Rank-order correlation
• Product-moment correlation
Ratio Determination of equality of ratios Similarity Group
x(i) = ax
• Coefficient of variation

Table 2.1: The scales of measurement (Birkhoff in Stevens; 1946).

The permutation group includes the isotonic, general linear, and similarity groups as subgroups. Therefore, any statistic applicable when using the nominal scale is automatically applicable when using an ordinal, interval or ratio scale. Similarly, any statistic applicable when using an ordinal scale can be applied when using the interval or ratio scale and any statistic applicable when using the interval scale is applicable when using the ratio scale.

In order to take ratios of measurable characteristics of objects in any self-consistent manner it is essential that the ratio scale be used. This scale requires an absolute zero. Measurement of temperature in degrees Celsius serves as a specific example. Given a temperature x 0C it is not reasonable to consider a temperature 2*x 0C; for if x be 10 0C then 2*x corresponds to 20 0C, but if x be -10 0C then 2*x 0C corresponds to a temperature of -20 0C! The problem arises since measurements of temperature in degrees Celsius conform to only the interval scale.

## Mean Of A Population Does Not Approach That Of A Sample

Attempts to broaden the use of descriptive statistics from the sample on which they were based to cover the entire sample space are subject to sampling error. In addition, sample statistics can not be used to draw conclusions about the part from the whole. Inductive statistics can be used only to ascertain properties of the whole from that of a sample.

The application of Bayes rule from limited survey information, which may not be representative ,to infer the distribution of an entire population, requires the assumption that the mean of the population tends to (or is distributed about) that of the sample. One of the most common errors when using Bayesian Statistics is this assumption: that the mean of a population approaches that of a sample (Goode; 1962). Such an assumption is false and is the converse of the true situation: subject to certain conditions the mean of a collection of sampled properties tends to that of the population.

## Supplement Glossary

The meaning of terms defined by authors varies, therefore explanation of some common terms may serve to prevent confusion. The Collins Dictionary of Mathematics presents generally accepted standard definitions.

• Bijection: A bijection is an operation that associates two sets in such a way that each member of the codomain T, is paired with exactly one member of the domain D. A bijection is both injective and surjective and has an inverse.
• Cardinality and Equivalent Sets: Two sets are equivalent if an isomorphism is possible between their respective members. The cardinal number of a set is the number of elements in the set. For example, the Cardinality of the set {dog, cat, mouse, house} is 4. The cardinal number of a set is a characteristic that one set has in common with all equivalent sets.
• Domain: A domain is the set of input values for a relation or function.
• Function: A function f:X®Y is a relation from X to Y such that:
1. For each x e X, there exists some y e Y with (x,y) e f.
2. For each x e X, the above y is unique.
The element y e Y is the image of x under f, and is ordinarily written f(x). With such notation, a function f is the subset of X*Y consisting of all pairs (x,f(x)). If 2 does not hold the f is only a relation (Rotman; 1965).
• Group: A Group is a set on which a binary add operation is defined. A non empty set X on which the binary add operation is defined is called a group, only if, the following 4 rules are satisfied (Schmidt; 1966):
1. The product of any two elements of X itself belongs to X.
2. The symbolic multiplication is associative; that is to say, for any three elements A,B,C belonging to X, (A*B)*C = A*(B*C).
3. There exists an element I such that A*I =A for all members of the set X.
4. There is an element I, satisfying requirement 3 such that for each A in X there exists an X in X with A*X = I.
• Injective Mapping: An injective mapping is an operation that associates two sets in such a way that different members of the domain (D) are all paired with members of the codomain (T). For an injective mapping, there is no requirement all members of the codomain T be paired with members of the domain D. That is to say, the image of the mapping operation may be a proper subset of the codomain.
• Isomorphism: An isomorphism is a one-to-one correspondence between the elements of two or more sets that preserves the structural properties of the domain.
• Mapping: A mapping is defined from a domain to a codomain, whenever a rule is established by which an entity e, a member of the domain of the map, is associated with an image e', a member of the codomain of the map.
• Magnitude: Magnitude is a scalar quantity and consists of a number assigned to a quantity enabling comparisons to be made on the basis of the ratio of any two such quantities.
• Surjective Mapping: A surjective mapping is an operation that associates two sets in such a way that every member of the codomain T, is the image of at least one member of the domain D. Hence, the range of any surjective mapping is the entire codomain T.
• Well-Posed Problem: A problem that is well formulated and for which, under appropriate conditions, a unique solution, can be shown to exist.

## Miscellaneous Application Specific Notes

• Exact Indexing
• ABS Standardised Statistical Classifications
• Australian Spatial Data Standards

### Exact Indexing

To prevent multiple indexing of identical electronic information you can dynamically insert instructions to indexing robots. For example inserting the following instructions in everything other than an archive copy will ensure that each document is uniquely indexed.

<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

### ABS Standardised Statistical Classifications

• Australian Standard Research Classification
• Australian Standard Geographical Classification
• Australian Standard Classification of Drugs of Concern
• National Health Surveys

### Australian Spatial Data Standards

Note: XML parsers are required to check that data sets are structurally valid at run time - in short if the structure of the information is not valid then the parser MUST crash it.

BOUNDARIES
BOUNDARIES Biophysical
BOUNDARIES Cultural

HUMAN ENVIRONMENT
HUMAN ENVIRONMENT Economics
HUMAN ENVIRONMENT Housing .....

References

• Collins Dictionary of Mathematics. EJ Borowski & JM Borwein. Published by Harper Collins; London; 1989.
• Goode HH. Deferred Decision Theory. In Recent Developments in Information and Decision Processing. Edited by RE Machol & P Gray. Published by Macmillian; 1962; New York.
• Kaplan A. Sociology Learns the Language of Mathematics. In The World of Mathematics. Edited by JR Newman. Published by Allen and Unwin; Britain; 1961.