SAMPLE and TEST: Two FORTRAN IV programs for analysis of discrete-state, time-varying data using first-order, Markov-chain techniques

Robert B. Arundale
1984 Behavoir research methods, instruments & computers  
Discrete-state Markov-chain analysis provides a means of analyzing change over time in any behavior that can be characterized as occupying one of two or more discrete states at anyone point in time. Time-referenced sequences of discrete behavior states arise in observational and experimental studies of both individuals and social groups. A first-order, discrete-state Markov chain is a sequence of behavior states in which the probability of the organism or system's being in a given state at time
more » ... t 2 depends on the state the entity was in at the immediately preceding point in time, t 1 • Time measurement may be either ordinal or interval. The central descriptive device in Markov-chain analysis is an m x m matrix showing the probabilities of transitions from each of the m possible states at point t 1 to each of the m states at t 2 • Normally, each individual or group under study will exhibit its own unique sequence of behavior states. The transition probability matrix extracts from this unique sequence the general pattern of transitions among the states. That pattern may be examined in its own right or may be compared with the patterns of other entities behaving under similar or different conditions. Anderson and Goodman (1957) provided goodnessof-fit and likelihood-ratio statistics both for assessing whether or not a given sequence of behavior states meets the assumptions underlying Markov-chain analysis (see also Hewes, 1980 ) and for comparing pairs or sets of transition probability matrices. Valid application of first-order Markov-chain techniques requires that the behavior sequence be a first-order chain, as opposed to a zero-order (independence among the states) or a higher order chain (dependence on the two or more preceding states). Markov-chain analysis generalizes to higher order chains, but the number of time points required to obtain stable transition probabilities often becomes excessive. Valid application also requires that the behavior sequence represented in a single transition probability matrix be stationary. That is, the probabilities for each transition must not vary significantly over time within the sequence. Again, Markov analysis generalizes to nonstationary sequences, but the techniques for dealing with them are not well developed. Program SAMPLE allows one to assess whether the order and stationarity The author is affiliated with the Department of Speech and Drama, University of Alaska, Fairbanks, AK 99701. assumptions have been met. If the assumptions are valid, program TEST allows one to test both for significant differences between pairs of transition probability matrices and for homogeneity among a group of matrices. Program SAMPLE Program description. Program SAMPLE computes both the transition frequency and transition probability matrices for a sequence of behavior states and performs either or both of two types of sampling from the sequence, as specified by the user. Arundale (1977) found that if the behavior of an entity is indexed with high frequency, the probabilities of transitions to the same state may be artificially inflated, resulting in a distorted view of the pattern of changes among the states. Appropriate sampling from the original sequence can correct the artificial inflation. If specified by the user, the program also (1) finds the point of convergence or "steady state" of the process (see Hewes, 1975 , for information), (2) constructs a distribution of lengths of time that the sequence remains in each state and fits this distribution to an exponential curve (see Arundale, 1977, pp. 263-267, for information), and (3) calculates both goodness-of-fit and likelihood-ratio statistics for testing stationarity and/or the first-order Markov property. Input. The program is designed for batch-mode operation. The user provides the total number of time points in the sequence, the number of distinct states in the sequence and their codes (any machine-readable code is acceptable), a value for convergence tests, a descriptive title for identification, and the format of the data, all on three initializing control cards or lines. These are followed by cards or lines containing the sequence itself. On subsequent control cards or lines (which are unlimited in number), the user specifies the desired processing options: full sequence or sampling (with sampling parameters), convergence testing and optional printing of matrices, distribution construction, punching of matrices for subsequent testing, order statistics, and/or stationarity statistics (and parameters). Output. The output includes all descriptive information, a listing of the entire sequence, and the transition frequency and probability matrices. Subsequent output depends on the processing options selected. If sampling is performed, the sampled sequence is printed, together with its length and associated matrices. Convergence testing may show either all powers of the matrix or only the converged matrix, as specified. Distribution construction includes, for each state, a listing of time lengths and their frequencies, plus mean length, standard deviation, and R 2 for the fit to an exponential distribution. If matrices are punched, they are provided in the format required for input to program TEST, as described below. If order statistics are requested, the output 335
doi:10.3758/bf03202418 fatcat:hnq3ncqfkbhl3dk6z3lyg6po44