3 Data formats

3.1 BIOSYS-1

BIOSYS-1 is a FORTRAN-77 program for the analysis of electrophoretically detectable allelic variation in population and biochemical systematics. The program computes allele frequencies and genetic variability measures, Hardy-Weinberg expectations, F-statistics, heterogeneity chi square analyses, and genetic distances/similarities.

Figure 1: File structure of the genetics "database"

Inputs to BIOSYS-1 are ASCII files consisting of two basic components:

  1. A brief header section, including a job title and locus labels;

  2. One or more calls to "STEP" routines that give BIOSYS-1 information regarding the form of the data and the analyses that the program is to perform.

Below is a sample BIOSYS-1 input file, including header and STEP data, for a Single-Individual Genotype input:

SINGLE INDIVIDUAL GENOTYPE INPUT (ALPHABETIC ALLELIC DESIGNATIONS)

NOTU=1, NLOC=15,NALL=5,CRT;

(12(1X,A5)/3(1X,A5))

LDH-1 LDH-2 MDH-1 MDH-2 IDH-1 IDH-2 GPD-1 PGM-1 PGI-1 PGI-2 SOD-1 LAP-1 EST-1 EST-2 PEP-1

STEP DATA:

DATYP=1,ALPHA;

(A4,7X,15(1X,A1,A1))

DS1

CHATHAM RISE




1


1







0001

DS1

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

0002

DS1

AA

AA

AB

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AC

AA

0013

DS1

AA

AA

AB

AA

AA

AA

AA

AB

AA

AA

AA

AA

AA

AB

AA

0014

DS1

AA

AA

AA

AA

AA

AA

AA

AC

AA

AA

AA

AA

AA

AA

AA

0015

DS1

AA

AA

AB

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AB

AA

0016

DS1

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AB

AA

0036

DS1

AA

AA

AA

AA

AA

AA

AA

AB

AA

AA

AA

AA

AA

BB

AA

0037

DS1

AB

AA

BB

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AD

AA

0038

DS1

AA

AA

AB

AA

AA

AA

AA

AB

AA

AA

AA

AA

AA

AB

AA

0039

DS1

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

AA

0040

DS1

AA

AA

AA

AA

AA

AA

AA

BB

AA

AA

AA

AA

AA

AC

AA

NEXT

















END;



















3.2 REAP

The Restriction Enzyme Analysis Package (REAP) is a suite of nine programs (written in TURBO PASCAL 5.0 and TURBO C 1.0) designed to alleviate some of the difficulties inherent in restriction data manipulation, as well as to carry out some common phylogenetic analyses of restriction fragment or restriction site DNA data.

The user creates an ASCII file of composite haplotypes and a corresponding file of restriction enzyme profiles; from this REAP will generate a binary matrix, remove uninformative characters or Operational Taxanomic Units (OTUs), and compute estimates of evolutionary distance ( SE d " ) for site or fragment data. In addition, there are programs to estimate haplotype and nucleotide divergence among populations, to assess geographic heterogeneity in haplotype frequency distributions through Monte Carlo simulation, and to estimate genetic distance from DNA sequence data.

Each of the nine programs can run independently, as part of a batch process, or as a module in the integrated environment. Most of the programs can handle an unlimited number of OTUs and a maximum of 30,000 characters per out.

3.3 GENEPOP

GENEPOP is a population genetic software package for haploid or diploid data that is able to perform two major tasks. First, it computes exact tests or their unbiased estimation for Hardy-Weinberg equilibrium, population differentiation, and two-locus genotypic disequilibrium. Second, it converts the input GENEPOP file to formats used by other popular programs like BIOSYS, thereby allowing communication between them (ecumenicism). GENEPOP is written in QUICK BASIC and TURBOPASCAL.

GENEPOP requires an ASCII input file. All kinds of missing data can be handled.

Only two numbers code each allele, so that no more than 99 alleles can be considered.

The number of populations or loci is not limiting for most options. After checking the input file, GENEPOP displays a general menu for a variety of analyses.

The main test carried out is the Hardy-Weinberg (HW) test. The HW test is performed for each locus in each population. If there are four alleles or less, the exact HW test is performed. If more than four alleles are present, an unbiased estimation of the exact HW probability is performed using the Markov chain method. In both cases, GENEPOP provides the probability of error when rejecting Ho (i.e., HW equilibrium) and, if the Markov chain method has been used, the standard error of the estimate.

Other classical parameters are also automatically computed: expected genotypic proportions, allele frequencies, observed and expected numbers of homozygotes and hetrozygotes, and so on.


Updated : 16 November 2007