# DistMiss

## Authors

- M. San Cristobal(contact author), INRA
- C. Chevalet, INRA
- G. Laval, UP Human Evolutionary Genetics, Institut Pasteur, Paris, France

## Description

The program DistMiss computes a matrix of genetic distances between populations on the basis of their allele frequencies at L loci.

This kind of calculation is proposed in several population genetics software, in particular the PHYLIP software. **Our program allows some loci to be completely missing in some populations.**

A maximum of 80 populations, 150 loci and 50 alleles per locus are allowed (can be changed in the source code).

## Input of Distmiss

- file containing allele frequencies in PHYLIP format (freqall)
- file containing the number of genes typed in each population locus combination, with a minimal sample size to be accepted in the forthcoming calculations (sample size 10)
- the number of bootstraps (0 if original data)
- an indicator of the required distance method (see Details below)
- file containing a (sub)list of populations for the current analysis (listpop)
- file containing indicators for markers chosen for the current analysis (listmark)

### Allele frequency file

The first line contains the number of loci and the number of populations.

The second line contains the number of alleles per locus.

For each population, the name of the population followed by the allele frequencies.

The names of populations is a character string of length 10.

A test is made on the sum of allele frequencies per population x locus combination.

Rounding errors are allowed (e.g. sum = 1 +/- 0.001). If the sum is equal to 0, then

the locus is considered as missing in the population, as in the following exemple for

the last 7 loci in the first population.

3 4 2 2 3 4 Pop1 0.3000 0.7000 0.5000 0.5000 0.2000 0.2000 0.6000 0.0000 0.0000 0.0000 0.0000 Pop2 0.4000 0.6000 0.1000 0.9000 0.3000 0.4000 0.3000 0.2500 0.2500 0.2500 0.2500 Pop3 0.1000 0.9000 0.2000 0.8000 0.3333 0.3333 0.3333 0.2000 0.3000 0.2000 0.3000

### Sample size file

The first two lines are identical to the allele frequency file.

Then for each population, a first line contains the name of the population and the

number of individuals, a second line gives the number of haplotypes that were used to

calculate the allele frequencies.

In the exemple below, the first locus had 80 haplotypes among 2*50=100 possible.

The last locus has no genotypes in the first population.

3 4 2 2 3 4 Pop1 50 80 100 90 0 Pop2 30 60 60 60 60 Pop3 60 120 110 100 120

### Indicator of distance method

1 = NEIMB = Nei minimum 2 = NEIMC = Nei minimum corrected for sample size 3 = NEISB = Nei standard 4 = NEISC = Nei standard corrected for sample size 5 = REYNB = Reynolds et al (1983) 6 = REYNC = Reynolds et al (1983) corrected for sample size (Laval et al 2000) 7 = MORTB = Morton 8 = MORTC = Morton corrected for sample size 9 = NEI87 = Nei (1987)

### List of populations

The number of the chosen populations (order of the allele frequency file)

and their name. In the exemple, population 2 is absent from the current analysis.

1 Pop1 3 Pop3

### Marker file

It contains a first line with names of markers (optional) and a second line with 1 if

the marker is used in the current analysis and 0 otherwise.

In the exmple, marker Mark3 will not be taken into account in the calculations, even if

its information is still present in the allele frequency file.

Mark1 Mark2 Mark3 Mark4 1 1 0 1

## Output

- summary file (text format), named code_of_distance.res
- file with the distance matrix (PHYLIP format), named code_of_distance.ngh

## Source code

You can download the source code of the software: DistMiss_1.0.zip