# Quantitative genetics

Quantitative genetics is that branch of population genetics which deals with phenotypes which vary continuously (such as height or mass), rather than with phenotypes and gene-products which are discretely identifiable (such as eye-colour, or the presence of a particular biochemical). Both employ the frequencies of different alleles of a gene in breeding populations (gamodemes), and combine them with concepts arising from simple Mendelian inheritance in order to analyze inheritance patterns across generations and descendant lines. While population genetics can focus on particular genes and their subsequent metabolic products, quantitative genetics focuses more on the outward phenotypes, and makes summaries only of the underlying genetics. This, however, can be viewed as its strength, because it facilitates an interface with the biological macrocosm, including micro-evolution and artificial selection in plant and animal breeding. Both branches share some common history; and some mathematics: for example, they use expansion of the quadratic equation to represent the fertilization of gametes to form the zygote. However, because of the continuous distribution of phenotypic values, quantitative genetics needs also to employ many other statistical methods (such as the effect, the mean and the variance) in order to link the phenotype to underlying genetics principles. Some phenotypes (attributes) may be analyzed either as discrete categories or as continuous phenotypes, depending on the definition of cut-off points, or on the metric used to quantify them.[1]:27–69 Mendel himself had to discuss this matter in his famous paper,[2] especially with respect to his peas attribute tall/dwarf, which actually was "length of stem".[3][4] Analysis of quantitative trait loci, or QTL,[5] is a more recent addition to quantitative genetics, linking it more directly to molecular genetics.

## Basic principles

### Gene effects

In diploid organisms, the average genotypic "value" (locus value) may be defined by the allele "effect" together with a dominance effect, and also by how genes interact with genes at other loci (epistasis). The founder of quantitative genetics - Sir Ronald Fisher - perceived much of this when he proposed the first mathematics of this branch of genetics.[6]

File:Gene effects.jpg
Gene effects and Phenotype values.

Being a statistician, he defined the gene effects as deviations from a central value: thereby enabling the use of statistical concepts such as the mean and variance, which utilize this idea.[7] The central value he chose for the gene was the midpoint between the two opposing homozygotes at the one locus. The deviation from there to the "greater" homozygous genotype can be named "+a" ; and therefore it is "-a" from that same midpoint to the "lesser" homozygote genotype. This is the "allele" effect mentioned above. The heterozygote deviation from the same midpoint can be named "d", this being the "dominance" effect referred to above.[8] The diagram depicts the idea. However, in reality we measure phenotypes, and the figure also shows how observed phenotypes relate to the gene effects. Formal definitions of these effects recognize this phenotypic focus.[9][10] Epistasis has been approached statistically as interaction (ie "inconsistencies"),[11] but epigenetics suggests a new approach may be needed.

If 0<d<a, the dominance is regarded as partial or "incomplete"; while d=a indicates full or "classical" dominance. Previously, d>a was known as "over-dominance".[12]

Mendel's pea attribute "length of stem" provides us with a good example.[3] Mendel stated that the tall true-breeding parents ranged from 6–7 feet in stem length (183 – 213 cm), giving a median of 198 cm (= P1). The short parents ranged from 0.75 - 1.25 feet in stem length (23 – 46 cm), with a rounded median of 34 cm (= P2). Their hybrid ranged from 6–7.5 feet in length (183–229 cm), with a median of 206 cm (= F1). The mean of P1 and P2 is 116 cm, this being the phenotypic value of the homozygotes midpoint (mp). The allele affect (a) is [P1-mp] = 82 cm = -[P2-mp]. The dominance effect (d) is [F1-mp] = 90 cm.[13] This historical example illustrates clearly how phenotype values and gene effects are linked.

### Allele and Genotype frequencies

To obtain means, variances and other statistics, both quantities and their occurrences are required. The gene effects (above) provide the framework for quantities: and the frequencies of the contrasting alleles in the fertilization gamete-pool provide the information on occurrences.
File:Sexual-Repro-simpl.jpg
Analysis of Sexual reproduction.
Commonly, the frequency of the allele causing "more" in the phenotype (including dominance) is given the symbol p, while the frequency of the contrasting allele is q. An initial assumption made when establishing the algebra was that the parental population was infinite and random mating, which was made simply to facilitate the derivation. The subsequent mathematical development also implied that the frequency distribution within the effective gamete-pool was uniform: there were no local perturbations where p and q varied. Looking at the diagrammatic analysis of sexual reproduction, this is the same as declaring that pP = pg = p; and similarly for q.[12] This mating system, dependent upon these assumptions, became known as "panmixia".

Panmixia rarely actually occurs in nature,[14]:152–180[15][16] as gamete distribution may be limited, for example by dispersal restrictions or by behaviour, or by chance sampling (those local perturbations mentioned above). It is well-known that there is a huge wastage of gametes in Nature, which is why the diagram depicts a potential gamete-pool separately to the actual gamete-pool. Only the latter sets the definitive frequencies for the zygotes: this is the true "gamodeme" ("gamo" refers to the gametes, and "deme" derives from Greek for "population"). But, under Fisher's assumptions, the gamodeme can be effectively extended back to the potential gamete-pool, and even back to the parental base-population (the "source" population). The random sampling arising when small "actual" gamete-pools are sampled from a large "potential" gamete-pool is known as genetic drift, and will be considered subsequently.

While panmixia may not be widely extant, the potential for it does occur, although it may be only ephemeral because of those local perturbations. It has been shown, for example, that the F2 derived from random fertilization of F1 individuals (an allogamous F2), following hybridization, is an origin of a new potentially panmictic population.[17][18] It has also been shown that if panmictic random fertilization did occur continually, it would maintain the same allele and genotype frequencies across each successive panmictic sexual generation - this being the Hardy Weinberg equilibrium.[11]:34–39[19][20][21][22] However, as soon as genetic drift was initiated by local random sampling of gametes, the equilibrium would cease.

#### Random Mating

Male and female gametes within the actual fertilizing pool are considered usually to have the same frequencies for their corresponding alleles. (Exceptions have been considered.) This means that when p male gametes carrying the A allele randomly fertilize p female gametes carrying that same allele, the resulting zygote will have genotype AA, and, under random fertilization, the combination will occur with a frequency of p x p (= p2). Similarly, the zygote aa will occur with a frequency of q2. Heterozygotes (Aa) can arise in two ways: when p male (A allele) randomly fertilize q female (a allele) gametes, and vice versa. The resulting frequency for the heterozygous zygotes is thus 2pq.[11]:32 Notice that such a population is never more than half heterozygous, this maximum occurring when p=q= 0.5.

In summary then, under random fertilization, the zygote (genotype) frequencies are the quadratic expansion of the gametic (allelic) frequencies: ( p + q )2 = ( p2 + 2 p q + q2) = 1. (The "=1" states that the frequencies are in fraction form, not percentages; and that there are no omissions within the framework proposed.) Also recall that "random fertilization" and "panmixia" are not synonyms, as discussed in the previous section.

#### Mendel's Research Cross - a Contrast

Mendel's pea experiments were constructed by establishing true-breeding parents with "opposite" phenotypes for each attribute.[3] This meant that each opposite parent was homozygous for its respective allele only. In our example, "tall vs dwarf", the tall parent would be genotype TT with p = 1 (and q = 0); while the dwarf parent would be genotype tt with q = 1 (and p = 0). After controlled crossing, their hybrid will be Tt, with p = q = ½. However, the frequency of this heterozygote = 1, because this is the F1 of an artificial cross: it has not arisen through random fertilization.[23] The F2 generation was produced by natural self-pollination of the F1 (with monitoring against insect contamination), resulting in p = q = ½ being maintained. Such an F2 is said to be "autogamous". However, the genotype frequencies (0.25 TT, 0.5 Tt, 0.25 tt) have arisen through a mating system very different from random fertilization, and therefore the use of the quadratic expansion has been avoided. The numerical values obtained were the same as those for random fertilization only because this is the special case of having originally crossed homozygous opposite parents.[24] We can notice that, because of the dominance of T- [frequency (0.25 + 0.5)] over tt [frequency 0.25], the 3:1 ratio is still obtained.

A cross such as Mendel's, where true-breeding (largely homozygous) opposite parents are crossed in a controlled way to produce an F1, is a special case of hybrid structure. The F1 is often regarded as being "entirely heterozygous" for the gene under consideration. However, this is an over-simplification which does not apply generally: for example when individual parents are not homozygous, or when populations inter-hybridise to form "hybrid swarms".[23] The general properties of intra-species hybrids (F1) and F2 (both "autogamous" and "allogamous") will be considered in a later section.

#### Self Fertilization - an Alternative

Having noticed that the pea is naturally self-pollinated, we cannot continue to use it as an example for illustrating random fertilization properties. Self-fertilization ("selfing") is a major alternative to random fertilization, especially within Plants. Most of the Earth's cereals are naturally self-pollinated (rice, wheat, barley, for example), as well as the pulses. Considering the millions of individuals of each of these on Earth at any time, it's obvious that self-fertilization is at least as significant as random fertilization! Self-fertilization is the most intensive form of inbreeding, which arises whenever there is restricted independence in the genetical origins of gametes. Such reduction in independence arises if parents are already related, and/or from genetic drift or other spatial restrictions on gamete dispersal. Path analysis demonstrates that these are tantamount to the same thing.[25][26] Arising from this background, the inbreeding coefficient (often symbolized as F or f) quantifies the effect of inbreeding from whatever cause. There are several formal definitions of f, and some of these will be considered in later sections. For the present, note that for a long-term self-fertilized species f = 1. Natural self-fertilized populations are not single " pure lines ", however, but mixtures of such lines. This becomes particularly obvious when considering more than one gene at a time. Therefore, allele frequencies (p and q) other than 1 or 0 are still relevant in these cases (refer back to the Mendel Cross section). The genotype frequencies take a different form, however.

In general, the genotype frequencies become [p2(1-f) + pf] for AA and [2pq(1-f)] for Aa and [q2(1-f) + qf] for aa.[11] :65 Notice that the frequency of the heterozygote declines in proportion to f. When f = 1, these three frequencies become respectively p, 0 and q !! Conversely, when f = 0, they reduce to the random-fertilization quadratic expansion shown previously.

### Population Mean

The population mean shifts the central reference point from the homozygote midpoint (mp) to the mean of a sexually reproduced population. This is important not only to relocate the focus into the natural world, but also to use a measure of central tendency utilized by Statistics/Biometrics. In particular, the square of this mean is the Correction Factor, which is used to obtain the Genotypic Variances later.[7]
File:G mean.jpg
Population mean across all values of p, for various d effects.

For each genotype in turn, its allele effect is multiplied by its genotype frequency; and the products are accumulated across all genotypes in the model. Some algebraic simplification usually follows to reach a succinct result.

#### The Mean after random fertilization

The contribution of AA is p2 (+)a, that of Aa is 2pq d. and that of aa is q2 (-)a. Gathering together the two a terms and accumulating over all, the result is: a (p2 - q2) + 2pq d. Simplification is achieved by noting that (p2-q2) = (p-q)(p+q), and by recalling that (p+q) = 1, thereby reducing the left-hand term to (p-q). The succinct result is therefore G = a(p-q) + 2pqd.[12] :110 This defines the population mean as an "offset" from the homozygote midpoint (recall a and d are defined as deviations from that midpoint). The Figure depicts G across all values of p for several values of d, including one case of slight over-dominance. Notice that G is often negative, thereby emphasizing that it is itself a deviation (from mp). Finally, to obtain the actual Population Mean in "phenotypic space", the midpoint value is added to this offset: P = G + mp.

An example arises from data on ear length in maize.[27]:103 Assuming for now that one gene only is represented, a = 5.45 cm, d = 0.12 cm [virtually "0", really], mp = 12.05 cm. Further assuming that p = 0.6 and q = 0.4 in this example population, then:-

G = 5.45 (0.6 - 0.4) + (0.48)0.12 = 1.15 cm (rounded); and

P = 1.15 + 12.05 = 13.20 cm (rounded).

#### The Mean after self fertilization

The contribution of AA is p (+)a, while that of aa is q (-)a. (See above for the frequencies.) Gathering these two a terms together leads to an immediately very simple final result:- GS = a(p-q). P is obtained as above.

Mendel's peas can provide us with the allele effects and midpoint (see previously); and a mixed self-pollinated population with p = 0.6 and q = 0.4 provides example frequencies. Thus:-

GS = 82 (0.6 - .04) = 59.6 cm (rounded); and

PS = 59.6 + 116 = 175.6 cm (rounded).

#### The Mean - Generalized fertilization

A general formula incorporates the inbreeding coefficient f, and can then accommodate any situation. The procedure is exactly the same as before, using the weighted genotype frequencies given earlier. After translation into our symbols, and further rearrangement:[11] :77–78

Gf = a(q-p) + [2pqd - f 2pqd] = a(q-p) + (1-f) 2pqd = G - f 2pqd !

Supposing that the maize example (earlier) had been constrained on a holme (a narrow riparian meadow), and had partial inbreeding to the extent of f = 0.25, then, using the third version (above) of Gf:-

G0.25 = 1.15 - 0.25 (0.48) 0.12 = 1.136 cm (rounded), with P = 13.194 cm (rounded).

There is hardly any effect from inbreeding in this example, which arises because there was virtually no dominance in this attribute (d → 0). Examination of all three versions of Gf reveals that this would lead to trivial change in the Population mean. Where dominance was notable, however, there would be considerable change.

### Genetic drift

Genetic drift was introduced when discussing the likelihood of panmixia being widely extant as a natural fertilization pattern. [See section on Allele and Genotype frequencies.] Here the sampling of gametes from the potential gamodeme is discussed in more detail. The sampling involves random fertilization between pairs of random gametes, each of which may contain either an A or an a allele. The sampling is therefore Binomial sampling.[11]:382–395 [12]:49–63 [28]:35 [29]:55 Each sampling "packet" involves 2N alleles, and produces N zygotes (a "progeny" or a "line") as a result. During the course of the reproductive period, this sampling is repeated over and over, so that the final result is a mixture of sample progenies. These events, and the overall end-result, are examined here with an illustrative example.

The "base" allele frequencies of the example are those of the potential gamodeme: the frequency of A is pg = 0.75, while the frequency of a is qg = 0.25. [White label "1" in the Diagram.] Five example actual gamodemes are Binomially sampled out of this base (s = the number of samples = 5), and each sample is designated with an "index" k: with k = 1 .... s sequentially. (These are the sampling "packets" referred to in the previous paragraph.) The number of gametes involved in fertilization varies from sample to sample, and is given as 2Nk [at white label "2" in the Diagram]. The total (Σ) number of gametes sampled overall is 52 [white label "3" in the Diagram]. Because each sample has its own size, weights are needed to obtain averages (and other statistics) when obtaining the overall results. These are ωk = 2Nk / ( Σk 2Nk ), and are given at white label "4" in the Diagram.

File:Genetic Drift example B3.jpg
Genetic Drift example analysis.

#### The sample gamodemes - Genetic drift

Following completion of these five Binomial sampling events, the resultant actual gamodemes each contained different allele frequencies - (pk and qk). [These are given at white label "5" in the Diagram.] This outcome is actually the Genetic Drift itself. Notice that two samples (k = 1 and 5) happen to have the same frequencies as the base (potential) gamodeme. Another (k = 3) happens to have the p and q "reversed". Sample (k = 2) happens to be an "extreme" case, with pk = 0.9 and qk = 0.1 ; while the remaining sample (k = 4) is "middle of the range" in its allele frequencies. All of these results have arisen only by "chance", through Binomial sampling. Having occurred, however, they set in place all the downstream properties of the progenies.

Because sampling involves "chance", the Probabilities ( k ) of obtaining each of these samples become of interest. These Binomial Probabilities depend on the starting frequencies (pg and qg) and the sample size (2Nk). They are tedious to obtain,[11]:382–395 [29]:55 but are of considerable interest. [See white label "6" in the Diagram.] The two samples (k = 1, 5), with the allele frequencies the same as in the potential gamodeme, had higher "chances" of occurring than the other samples. Their Binomial Probabilities did differ, however, because of their different sample sizes (2Nk). The "reversal" sample (k = 3) had a very low Probability of occurring, confirming perhaps what might be expected. The "extreme" allele frequency gamodeme (k = 2) was not "rare", however; and the "middle of the range" sample (k=4) was rare. These same Probabilities will apply also to the progenies arising from these fertilizations.

It is here that some summarizing can begin. The overall allele frequencies in the progenies bulk are supplied by weighted averages of the appropriate frequencies of the individual samples. That is:- p = [Σkk pk)] and q = [Σkk qk)]. (Notice that k is replaced by for the overall result - a common practice.) [7] The results for the example are p = 0.631 and q = 0.369 [black label "5" in the Diagram]. These values are quite different to the starting ones (pg and qg) [white label "1"]. The sample allele frequencies also have variance as well as an average. This has been obtained using the "Sum of Squares method" (SS),[30] [see to the right of black label "5" in the Diagram]. [Further discussion on this variance occurs in the section below on Extensive genetic drift.]

#### The progeny lines - Dispersion

The genotype frequencies of the five sample progenies are obtained from the usual quadratic expansion of their respective allele frequencies (random fertilization). The results are given at the Diagram's white label "7" for the homozygotes, and at white label "8" for the heterozygotes. Re-arrangement in this manner prepares the way for monitoring inbreeding levels. This can be done either by examining the level of total homozygosis [(p2k + q2k) = (1 - 2pkqk)] , or by examining the level of heterozygosis (2pkqk), as they are complementary.[31] Notice that samples k= 1, 3, 5 all had the same level of heterozygosis, despite one being the "mirror image" of the others with respect to allele frequencies. The "extreme" allele-frequency case (k= 2) had the most homozygosis (least heterozygosis) of any sample. The "middle of the range" case (k= 4) had the least homozygosity (most heterozygosity): they were each equal at 0.50, in fact.

The overall summary can continue by obtaining the weighted average of the respective genotype frequencies for the progeny bulk. Thus, for AA, it is p2 = kk p2k)] , for Aa , it is 2pq = kk 2pkqk)] and for aa, it is q2 = kk q2k)]. The example results are given at black label "7" for the homozygotes, and at black label "8" for the heterozygote. Note that the heterozygosity mean is 0.3588, which will be used in the next section to examine the inbreeding resulting from this Genetic Drift.

The next focus of interest is the Dispersion itself, which refers to the "spreading apart" of the progenies' population means. These are obtained as Gk = [a(pk-qk) + 2pkqk d] [see section on the Population mean], for each sample progeny in turn, using the example gene effects given at white label "9" in the Diagram. Then, each Pk = Gk + mp is obtained also. The latter are given at white label "10" in the Diagram. Notice that the "best" line (k = 2) had the highest allele frequency for the "more" allele (A) (it also had the highest level of homozygosity). The worst progeny (k = 3) had the highest frequency for the "less" allele (a), which accounted for its poor performance. This "poor" line was less homozygous than the "best" line; and it shared the same level of homozygosity, in fact, as the two second-best lines (k = 1, 5). The progeny line with both the "more" and the "less" alleles present in equal frequency (k = 4) had a mean below the overall average (see next paragraph), and had the lowest level of homozygosity. These results reveal the fact that it is what alleles are most prevalent in the "gene-pool" (also called the "germplasm") which determines performance, not the level of homozygosity per se. Recall that it is Binomial sampling alone which has effected this Dispersion.

The overall summary can now be concluded by obtaining G = [ΣkkGk)] and P = [ΣkkPk)]. The example result for P is 36.94 (black label "10" in the Diagram). This will be used later to quantify inbreeding depression arising overall from the gamete sampling. However, recall that some very "non-depressed" progeny means have been identified already (k = 1, 2, 5)! This is an enigma of inbreeding - while there may be "depression" overall, there will usually be superior lines amongst the gamodeme samplings.

#### The equivalent post-dispersion panmictic - Inbreeding

Included in the overall summary were the averaqe allele frequencies in the mixture of progeny lines (p and q). These can now be used to construct a hypothetical panmictic equivalent.[11]:382–395 [12]:49–63 [28]:35 This can be regarded as a "reference" to assess the changes wrought by the gamete sampling. The example appends such a panmictic to the right of the Diagram. The frequency of AA is therefore (p)2 = 0.3979. This is less than that found in the dispersed bulk (0.4513 at black label "7"). Similarly, for aa, (q)2 = 0.1303 - again less than the equivalent in the progenies bulk (0.1898). Clearly, genetic drift has increased the overall level of homozygosis by the amount (0.6411 - 0.5342) = 0.1069. In a complementary approach, the heterozygosity could be used instead. The panmictic equivalent for Aa is 2 p q = 0.4658, which is higher than that in the Binomially sampled bulk (0.3588) [black label "8"]. The sampling has caused the heterozygosity to decrease by 0.1070, which differs trivially from the earlier estimate because of rounding errors.

The inbreeding coefficient (f) was introduced in the early section on Self Fertilization. Here, a formal definition of it is considered:- f is the probability that two "same" alleles (that is A and A, or a and a) which fertilize together are of common ancestral origin - or (more formally) f is the probability that two homologous alleles are autozygous.[12][26] Consider any random gamete in the potential gamodeme which has its syngamy partner restricted by Binomial sampling. The probability that that second gamete is homologous autozygous to the first is 1/(2N), the reciprocal of the gamodeme size. For the five example progenies, these quantities are 0.1, 0.0833, 0.1, 0.0833 and 0.125 respectively, and their weighted average is 0.0961. This is the inbreeding coefficient of the example progenies bulk, provided it is unbiased with respect to the full Binomial Distribution. An example based upon s = 5 is likely to be biased, however, when compared to an appropriate entire Binomial Distribution based upon s → ∞. Another derived definition of f for the full Distribution is that f also equals the rise in homozygosity which equals the fall in heterozygosity.[32] For the example, these frequency changes are 0.1069 and 0.1070, respectively: which result is different to the above, indicating that bias with respect to the full underlying Distribution is present in the example. For the example itself, these latter values are the better ones to use, namely f = 0.10695.

The population mean of the equivalent panmictic is found as [a (p-q) + 2 pq d] + mp. Using the example gene effects (white label "9" in the Diagram), this mean is 37.87. The equivalent mean in the dispersed bulk is 36.94 (black label "10") which is depressed by the amount 0.93. This is the inbreeding depression arising from this Genetic Drift. However, as noted previously, three progenies were not depressed (k = 1, 2, 5), and had means even greater than that of the panmictic equivalent. These are the lines the Plant Breeder would be looking for in his "Line Selection" programme.[33]

#### Extensive Binomial sampling - is Panmixia restored?

If the number of Binomial samplings is very large (s → ∞ ), then p → pg and q → qg. It might be queried whether panmixia would effectively re-appear under these circumstances. However, the sampling of allele frequencies has still occurred, with the result that σ2p, q0. In fact, as s → ∞, the σ2p, q[(pgqg)/2N], which is the variance of the whole Binomial Distribution.[11]:382–395 [12]:49–63 Furthermore, the "Wahlund equations" show that the progeny-bulk homozygote frequencies can be obtained as the sums of their respective average values (p2 or q2) and σ2p, q.[11]:382–395 Likewise, the bulk heterozygote frequency is (2 p q) minus twice the σ2p, q. The variance arising from the Binomial sampling is conspicuously present indeed ! Thus, even when s → ∞, the progeny-bulk genotype frequencies will still reveal increased homozygosis, and decreased heterozygosis, there will still be dispersion of progeny means, and there will still be inbreeding and inbreeding depression. That is, panmixia is not re-attained once it is lost. But a new potential panmixia can be initiated via an allogamous F2 following hybridization,[34] as noted earlier.

#### Continued genetic drift - increased Dispersion and Inbreeding

In the previous discussions on genetic drift, just one cycle (generation) of the process was examined. When the sampling continues over successive generations, conspicuous changes occur in σ2p, q and f. Another "index" is needed to keep track of "time": t = 1 .... y where y = the number of "years" (generations) considered. The approach often is to add the "current Binomial increment" (Δ = "de novo") to what has occurred previously.[11] The entire Binomial Distribution is examined here: there is no further benefit from an abbreviated example.

Consider firstly Dispersion via the σ2 p,q . Earlier this variance was seen to be (pg qg)/(2N) = pg qg [1 /(2N)] = pg qg f. [Notice particularly this last version, where f = [1 /(2N)] = Δf as the recurrences are followed.] With the extension over time, this is the result of the first cycle, so this can be designated as σ2 1 (for brevity). At cycle 2, this variance is generated again, becoming this time the "de novo" variance, and accumulates to that which was present already. The de novo is given the weight of "1", while the "carry-over" is given the weight of 1-[1/(2N)], which is the same as 1-Δf . Thus, σ22 = 1 Δσ2 + (1-Δf) σ21 . After gathering terms, simplifying and recalling previous symbols, this becomes σ22 = σ21 {1 + [1-Δf]} . The extension to generalize to any time t , after considerable simplification, becomes:- σ2t = pg qg {1 - [1 - Δf]t } .[11]:328 Recall that it was the variation in allele frequencies which caused the "spreading apart" of the progenies' means (Dispersion). Therefore, this σ2t can be used to indicate the extent of Dispersion over the generations.

File:RF Inbreeding.jpg
Inbreeding resulting from Genetic drift in Random fertilization .

The method for examining Inbreeding is very similar. The same weights as before are used respectively for de novo f ( Δf ) [recall this is 1/(2N) ] and carry-over f. Therefore, f2 = 1 Δf + (1 - Δf) f1 . In general, ft = Δf + (1- Δf) ft-1 = Δf (1- ft-1 ) + ft-1 after rearrangement.[11] The graphs to the left show levels of inbreeding over twenty generations arising from genetic drift (Binomial sampling of gamodemes) for various actual gamodeme sizes (2N).

Still further rearrangement of this general equation reveals some interesting relationships. After some simplification, (ft - ft-1) = Δf (1-ft-1). The left-hand side is the difference between the current and previous levels of inbreeding: the change in inbreeding (δft). Notice, it is not Δf, unless the ft-1 is 0. Another item of note is the (1-ft-1) - an index of non-inbreeding - the panmictic index (πt-1). Further rearrangements reveal that Δf = δft / πt-1 = 1 - [πt / πt-1] , all useful relationships.

File:RF Inbreeding B.jpg
Random fertilization compared to Cross-fertilization

One further rearrangement gives ft = 1 - (1-Δf)t (1-f0); which, assuming that f0 = 0, forms the section within the braces of the last equation above for the σ2t . That is, σ2t = pg qg ft ! Rearranged, this provides also the revelation that ft = σ2t / (pg qg) . The two principal threads of Binomial gamete sampling are thus cemented together, and are directly inter-changeable.

##### Selfing within Random fertilization

It is easy to overlook that self fertilization is intrinsically included as a part of random fertilization. Sewall Wright showed that a proportion 1/N of random fertilizations is actually self fertilization, with the remainder (N-1)/N being cross fertilization. Following path analysis and simplification, the divided random fertilization inbreeding was found to be:- ft = selfingt + crossingt = Δf (1+ ft-1 ) + [(N-1)/N] ft-1 .[26][35] Upon further rearrangement, the earlier results from the Binomial sampling were confirmed, along with some new arrangements. Two of these were potentially very useful, namely:- ft = Δf [ 1 + ft-1 (2N-1)]; and further ft = Δf(1-ft-1) + ft-1).

The insight provided by this leads to some issues about the use of the inbreeding coefficient for Binomial sampling random fertilization. Clearly, then, it is inappropriate for any species incapable of self fertilization, which includes plants with self-incompatibility mechanisms, dioecious plants, and bisexual animals (including, of course, mammals). The method developed by Wright was modified to develop a random fertilization inbreeding equation which involved only cross fertilization without any self fertilization. The proportion 1/N formerly due to selfing now defined the carry-over inbreeding arising from the cycle previous to the current cycle. The "inbreeding for cross fertilization only" final result was:- ft = ft-1 + Δf (1 + ft-2 - 2 ft-1).[11]:166 The graphs to the right depict the differences between standard random fertilization (based on Binomial sampling, which includes selfing) RF, and Binomial sampling adjusted for "cross fertilization alone" CF. As can be seen, the issue is non-trivial for small gamodeme sample sizes.

It now is necessary to note that not only is "panmixia" not a synonym for "random fertilization", but also that "random fertilization" is not a synonym for "cross fertilization".

#### Homozygosity and Heterozygosity

In the section on the "Sample gamodemes - Genetic drift", a series of gamete samplings was followed. An important result of this sampling was that homozygosity rose at the expense of heterozygosity. This is one view of the genotype frequencies following the inbreeding attendant upon the sampling. Another view is linked to the definition of the inbreeding coefficient, and examines homozygotes according to whether they arose as allozygotes or autozygotes. Recall that autozygous alleles have the same allelic origin, the likelihood (frequency) of which is the inbreeding coefficient (f) by definition. The proportion arising allozygously is therefore (1-f). For the A-bearing gametes, which are present with a general frequency of p, the overall frequency of those which are autozygous is therefore (f p). Similarly, for a-bearing gametes, the autozygous frequency is (f q). [Remember that the issue of auto/allo -zygosity can arise only for homologous alleles (that is A and A, or a and a), and not for non-homologous alleles (A and a), which cannot possibly have the same allelic origin.] These two viewpoints regarding genotype frequencies need to be connected in order to establish consistency.

Following firstly the auto/allo -zygosity viewpoint, consider the allozygous component. This occurs with the frequency of (1-f), and the alleles unite according to the random fertilization quadratic expansion. Thus:- (1-f)[(p0 + q0)2] = (1-f)[ p20 + q20 ] + (1-f)[ 2 p0 q0 ] . Consider next the autozygous component. As these alleles are autozygous (same allelic origin), they are effectively selfings, and produce either AA or aa genotypes, but no heterozygotes. They therefore produce (f p0 + f q0) genotypes, for AA and aa respectively. Adding the two components together results in:- [(1-f) p20 + f p0] for the AA homozygote; [(1-f) q20 + f q0] for the aa homozygote; and (1-f)[ 2 p0 q0 ] for the Aa heterozygote.[11]:65[12] This is the same equation presented earlier in the section on "Self fertilization - an alternative". The reason for the decline in heterozygosity is made clear here. Heterozygotes can arise only from the allozygous component, and its frequency in the mix is just (1-f) - the same as the frequency used for the heterozygotes.

Secondly, the sampling rise/fall viewpoint is followed. The decline in heterozygotes, f (2 p0 q0) , is distributed equally towards each homozygote and added to their basic random fertilization expectations. Therefore, the frequencies are:- (p20 + f p0 q0 ) for the AA homozygote; (q20 + f p0 q0 ) for the aa homozygote; and 2 p0 q0 - f (2 p0 q0) for the heterozygote.

Thirdly, the consistency proof begins with considering the AA homozygote's final equation in the auto/allo -zygosity paragraph above. Taking that equation "[ (1-f) p20 + f p0 ]", expand the parentheses, and follow by re-gathering [within the resultant] the two new terms with the common-factor f in them [resulting in:- p20 - f ( p20 - p0 ) ]. Next, in the " p20 " in that previous parenthesis part, one of the p is substituted by (1-q), becoming " (1-q0 ) p0 " instead of " p20 ". Following that substitution, it is a straightforward matter of multiplying-out, simplifying and watching signs. The end result is " (p20 + f p0 q0 ) " , which is exactly the result for AA in the sampling rise/fall paragraph. The two viewpoints are therefore consistent for the AA homozygote. In a like manner, the consistency of the aa viewpoints can also be proven. For the heterozygote, the two viewpoints are in agreement from the beginning.

### Allele shuffling - allele Substitution

The gene-model examines the heredity pathway from the point of view of "inputs" (alleles/gametes) and "outputs" (genotypes/zygotes), with fertilization being the "process" converting one to the other. An alternative viewpoint concentrates on the "process" itself, and considers the zygote genotypes as arising from allele shuffling. In particular, it regards the results as if one allele had "substituted" for the other during the shuffle, together with a residual which deviates from this view. This formed an integral part of Fisher's method,[6] in addition to his use of frequencies and effects to generate his genetical statistics.[12]
File:Allele Substn.jpg
Analysis of Allele Substitution
A discursive derivation of the Allele Substitution alternative follows.[12]:113 Suppose that the usual random fertilization of gametes in a "base" gamodeme - consisting of p gametes (A) and q gametes (a) - is replaced by fertilization with a "flood" of gametes all containing a single allele (A or a, but not both). The zygotic results can be interpreted in terms of the "flood" allele having "substituted for" the alternative allele in the underlying "base" gamodeme. The diagram assists in following this viewpoint: the upper part pictures an A substitution, while the lower part shows an a substitution. (The diagram's "RF allele" is the allele in the "base" gamodeme.)

Consider the upper part firstly. Because base A is present with a frequency of p, the substitute A will fertilize it with a frequency of p resulting in a zygote AA with an allele effect of a. Its contribution to the outcome will therefore be the product p a. Similarly, when the substitute fertilizes base a (resulting in Aa with a frequency of q and heterozygote effect of d), the contribution will be q d. The overall result of substitution by A will therefore be pa + qd. This is now oriented towards the population mean (see earlier section) by expressing it as a deviate from that mean : (pa + qd) - G. After some algebraic simplification, this becomes α A = q [a + (q-p)d] - the substitution effect of A.

A parallel reasoning can be applied to the lower part of the diagram, taking care with the differences in frequencies and gene effects. The result is the substitution effect of a, which is α a = -p [a + (q-p)d].

The common factor inside the brackets is known as the average allele substitution effect, and is usually given as α = a + (q-p)d.[12]:113 It can be derived also in a more direct way, but the result is the same.

In subsequent sections, these substitution effects will be used to define the gene-model genotypes as consisting of a partition predicted by these new effects (substitution expectations), and a deviation between these expectations and the previous gene-model effect. The expectations are also known as the breeding value, and the deviations are also known as the dominance deviations. Ultimately, the variance arising from the substitution expectations will become the Additive Genetic variance (σ2A) [12] (also the Genic variance [36]); while that arising from the deviations becomes the Dominance variance (σ2D).

## Extended principles

### Gene effects redefined

The gene-model effects (a, d and -a) are important soon in the derivation of the deviations from substitution expectations (δ), which were first discussed in the previous Allele Substitution section. However, they need to be redefined themselves before they become useful in that exercise. They firstly need to be re-centralized around the population mean (G), and secondly they need to be re-arranged as functions of α, the average allele substitution effect.

The re-centralized effect for AA, therefore, is a′ = a - G which, after simplification, becomes a′ = 2q(a-pd). The similar effect for Aa is d′ = d - G = a(q-p) + d(1-2pq), after simplification. Finally, the re-centralized effect for aa is (-a)′ = -2p(a+qd).[12]:116–119

These re-centralized effects eventually will have the genotype substitution expectations (see next section) subtracted from them in order to define subsequently the genotype substitution deviations. These expectations each are a function of the average allele substitution effect, and the present re-centralized effects have to be re-arranged still further to accommodate this last subtraction. Recalling that α = [a +(q-p)d], rearrangement gives a = [α -(q-p)d]. After substituting this for a in a′ and simplifying, the final version becomes a′′ = 2q(α-qd). Similarly, d′ becomes d′′ = α(q-p) + 2pqd; and (-a)′ becomes (-a)′′ = -2p(α+pd).[12]:118

### Genotype substitution - Expectations and Deviations

The zygote genotypes are the target of all this preparation. The homozygous genotype AA is a union of two substitution effects of A, one coming from each sex. Its substitution expectation is therefore αAA = 2αA = 2qα (see previous sections). Similarly, the substitution expectation of Aa is αAa = αA + αa = (q-p ; and for aa, αaa = 2αa = -2pα. These substitution expectations of the genotypes are known also as "Breeding values".[12]:114–116

The substitution deviations are the differences between these expectations and the gene effects after their two-stage redefinition in the previous section. Therefore, δAA = a′′ - αAA = -2q2d after simplification. Similarly, δAa = d′′ - αAa = 2pqd after simplification. Finally, δaa = (-a)′′ - αaa = -2p2d after simplification.[12]:116–119

The genotype substitution expectations will give rise ultimately to the σ2A, and the genotype substitution deviations will give rise to the σ2D.

### Genotypic variance

There are two major approaches to defining and partitioning the Genotypic variance : one is based on the gene-model effects,[36] while the other is based on the genotype substitution effects[12] They are algebraically inter-convertible with each other.[34] In this section, the basic random fertilization derivation is considered, with the effects of inbreeding and dispersion set aside. This will be dealt with later in order to arrive at a more general solution. Until this mono-genic treatment is replaced by a multi-genic one, and until epistasis is resolved in the light of the findings of epigenetics, the Genotypic variance will have only the components considered here.

#### Gene-model approach - Mather Jinks Hayman

File:Model Var 2.jpg
Components of Genotypic variance using the gene-model effects.

It is convenient to follow the Biometrical approach : which is based on correcting the unadjusted sum of squares (USS) by subtracting the correction factor (CF). Because all of our effects have been examined through frequencies, the USS can be obtained as the sum of the products of each genotype's frequency and the square of its gene-effect. The CF in this particular case is the mean squared. The result is the sum of squares (SS), which, again because of the use of frequencies, is also immediately the variance.[7]

The USS = p2a2 + 2pqd2 + q2(-a)2 , and the CF = G2 . The SS = USS - CF = σ2G .

After partial simplification,

σ2G = 2pqa2 + (q-p)4pqad + 2pqd2 + (2pq)2 d2 = σ2a + (weighted_covariance)ad + σ2d + σ2D = ½D + ½F´ + ½H1 + ¼H2 in Mather's terminology.[36]:212 [37]

Here, σ2a represents the homozygote or allelic variance, and σ2d represents the heterozygote or gene-model dominance variance. The random-fertilization dominance variance (σ2D) is present also. These components are plotted across all values of p in the Figure accompanying. Notice that the (weighted_covariance)ad[38] (hereafter abbreviated to covad) is negative for 0.5<p.

Further gathering of terms leads to ½D + ½F´ + ½H3 + ¼H2, where ½H3 = (q-p)2 ½H1 = (q-p)22pqd2. It will be useful later in Diallel analysis, which is an experimental design for estimating these genetical statistics.[39]

If, following the last-given rearrangements, the first three terms are amalgamated together, rearranged further and simplified, the result is the variance of the Fisherian substitution expectation. That is: σ2A = σ2a + covad + σ2d, a revealing insight indeed. Notice particularly that σ2A is not σ2a.[40] Notice also that σ2D = 2pq σ2d. From the Figure, this can be visualized as accumulating σ2a, σ2d and cov to obtain σ2A, while leaving the σ2D still separated. It is clear also in the Figure that σ2D < σ2d, as expected from the equations.
File:Fisher Var 1.jpg
Components of Genotypic variance using the allele-substitution effects.

The overall result is σ2G = 2pq [a+(q-p)d]2 + (2pq)2 d2 = σ2A + σ2D .

However, its derivation via the substitution effects themselves will be given also, in the next section.

#### Allele-substitution approach - Fisher

Reference to the several earlier sections on allele substitution reveals that the two ultimate effects are genotype substitution expectations and genotype substitution deviations. Notice that these are each defined already as deviations from the random fertilization population mean (G). For each genotype in turn, the product of the frequency and the square of the relevant effect is obtained, and these are accumulated to obtain directly a SS and σ2. Details follow.

σ2A = p2 α2AA + 2pq α2Aa + q2 α2aa which simplifies to σ2A = 2pqα2.

σ2D = p2 δ2AA + 2pq δ2Aa + q2 δ2aa which simplifies to σ2D = (2pq)2 d2.

Once again, σ2G = σ2A + σ2D .

Note that this allele-substitution approach defined the components separately, and then totaled them to obtain the final Genotypic variance. Conversely, the gene-model approach derived the whole situation (components and total) as one exercise. Bonuses arising from this were (a) the revelations about the real structure of σ2A, and (b) the relative sizes of σ2d and σ2D (see previous sub-section). It is also apparent that a "Mather" analysis is more informative, and that a "Fisher" analysis can always be constructed from it. The opposite conversion is not possible, however, because information about covad would be missing.

### Other fertilization patterns

File:Fertilization Patterns C.jpg
Spatial fertilization patterns

In previous sections, dispersive random fertilization (genetic drift) has been considered comprehensively, and self-fertilization and hybridizing have been examined to varying degrees. The diagram to the left depicts the first two of these, along with another "spatially based" pattern: Islands. This is a pattern of random fertilization featuring dispersed gamodemes, with the addition of "overlaps" in which non-dispersive random fertilization occurs. With the Islands pattern, individual gamodeme sizes (2N) are observable, and overlaps (m) are minimal. This is one of Sewall Wright's array of possibilities.[35] In addition to "spatially" based patterns of fertilization, there are others based on either "phenotypic" or "relationship" criteria. The phenotypic bases include assortative fertilization (between similar phenotypes) and disassortative fertilization (between opposite phenotypes). The relationship patterns include sib crossing, cousin crossing and backcrossing, and are considered in a separate section. Self fertilization may be considered both from a spatial and from a relationship point of view.

File:RF Inbreeding C c.jpg
"Islands" random fertilization.

#### "Islands" random fertilization

The breeding population consists of s small dispersed random fertilization gamodemes of sample size 2Nk ( k = 1 ... s ) with " overlaps " of proportion mk in which non-dispersive random fertilization occurs. The dispersive proportion is thus (1 - mk) . The bulk population consists of weighted averages of sample sizes, allele and genotype frequencies and progeny means, as was done for genetic drift in an earlier section. However, each gamete sample size is reduced to allow for the overlaps, thus finding a 2Nk effective for (1 - mk) .

For brevity, the argument will be followed further with the subscripts omitted. Recall that 1 / (2N) is Δf. Therefore, Islands Δf = (1 - m)2 / [2N - (2N-1)m2 ] . Notice that when m = 0 this reduces to the previous Δf. This is substituted into the inbreeding coefficient to obtain Islandsft = IslandsΔft + ( 1 - IslandsΔft ) Islandsf(t-1) , where t is the index over generations, as before. The effective overlap proportion can be obtained also, as mt = 1 - [ ( 2N IslandsΔft ) / ((2N - 1) IslandsΔft + 1 )] (1/2) . Here, the 2N refers to the un-reduced sample size, not the Islands adjustment.

The graphs to the right show the inbreeding for a gamodeme size of 2N = 50 for ordinary dispersed random fertilization (RF) (m=0), and for four overlap levels ( m = 0.0625, 0.125, 0.25, 0.5 ) of Islands random fertilization. There has indeed been reduction in the inbreeding resulting from the non-dispersed random fertilization in the overlaps: it is particularly notable as m appraoches 0.50. Sewall Wright suggested that this value should be the limit for the use of this approach.

### Dispersion and the Genotypic Variance

In the section on Genetic drift , and in other sections where Inbreeding has been discussed, a major outcome resulting from the allele frequency sampling has been the dispersion of progeny means. This collection of means has its own average, and also will have a variance - the amongst-line variance. (This is a variance of the attribute itself, not of allele frequencies.) As dispersion develops further over succeeding generations, this amongst-line variance would be expected to increase. Conversely, as homozygosity rises, the variance within-lines would be expected to decrease. The question arises therefore as to whether the total variance is changing, and, if so, in what direction. These issues are examined for both the genic (σ2A ) and the dominance ( σ2D ) variances.

Dispersion and components of the Genotypic variance.

The crucial overview equation comes from Sewall Wright,[11] :99 & 130[35] and is the outline of the inbred genotypic variance based on a weighted average of its extremes, the weights being quadratic with respect to the inbreeding coefficient ( f ). This equation is:-

σ2G(f) = (1-f) σ2G(0) + f σ2G(1) + f (1-f) [ G0 - G1 ]2 ,

where f is the inbreeding coefficient, σ2G(0) is the genotypic variance at f=0, σ2G(1) is the genotypic variance at f=1, G0 is the population mean at f=0, and G1 is the population mean at f=1. The (1-f) component concerns the reduction of variance within progeny lines ; the f component addresses the increase in variance amongst progeny lines ; while the f (1-f) component will be seen (in the next line) to address a part of the dominance variance.[11] :99 & 130 These components can be expanded to reveal further insight.

Thus:- σ2G(f) = (1-f) [ σ2A(0) + σ2D(0) ] + f (4pq a2 ) + f (1-f) (2pq d)2 .

In the first component, σ2G(0) has been expanded to show its two variance components as previously defined . The σ2G(1) in the second component is 4pqa2 [which, recall, equals 2 (σ2a) ] and will be derived shortly. The third component's substitution is the result of the subtraction between the two "inbreeding extremes" of the population means (see section on the "Population Mean").[34]

Summarising therefore gives:- the within-line components are (1-f) σ2A(0) and (1-f) σ2D(0) ; and the amongst-line components are 2f σ2a(0) and (f - f 2) σ2D(0).[34]

The total Genic variance (Additive Genetic variance) is thus [ (1-f) σ2A(0) + 2f σ2a(0) ] = (1+f) σ2A(f) , where the σ2A(f) will be discussed shortly in a sub-section. Similarly, the total Dominance variance is thus [ (1-f) σ2D(0) + (f - f 2) σ2D(0) ] = (1-f2) σ2D(0) . Graphs to the left show these three Genic variances, together with the three Dominance variances, across all values of f, for p = 0.5 (at which the dominance variance is at a maximum). Graphs to the right show the Genotypic variance partitions (being the sums of the respective genic and dominance partitions) changing over ten generations with an example f = 0.10.
File:ALWL-10p5.jpg
Development of Variance Dispersion

Answering, firstly, the questions posed at the beginning about the total variances [the Σ in the graphs] : the Genic variance rises linearly with the inbreeding coefficient, maximizing at twice its starting level ; the Dominance variance declines at the rate of (1 - f2 ) [and therefore declines only slowly at low levels of inbreeding] until it finishes at zero. Now, notice the other trends. It is probably intuitive that the within line variances decline to zero with continued inbreeding, and this is seen to be the case [both at the same linear rate (1-f) ]. The amongst line variances both increase with inbreeding up to f = 0.5, the Genic variance at the rate of 2f, and the Dominance variance at the rate of (f - f2) . At 0.5 < f, however, the trends change. The amongst line Genic variance continues its linear increase until it equals the total Genic variance. But, the amongst line Dominance variance now declines towards zero, because (f - f2) also declines with 0.5 < f.[34]

#### Derivation of σ2G(1)

Recall that when f=1, heterozygosity is zero, within-line variance is zero, and all genotypic variance is thus amongst-line variance and deplete of dominance variance. In other words, σ2G(1) is the variance amongst fully inbred line means. Recall further (from "The mean after self-fertilization" section) that such means (G1's, in fact) are G = a(p-q). Substituting (1-q) for the p, gives G1 = a (1 -2q) = a - 2aq.[12]:265 Therefore, the σ2G(1) is the σ2(a-2aq) actually. Now, in general, the variance of a difference (x-y) is [ σ2x + σ2y - 2 covxy ].[41]:100[42] :232 Therefore, σ2G(1) = [ σ2a + σ22aq - 2 cov(a, 2aq) ] . But a (an allele effect) and q (an allele frequency) are independent - so this covariance is zero. Furthermore, a is a constant from one line to the next, so σ2a is zero also. And still further, 2a is another constant (k), so the σ22aq is of the type σ2k X. In general, the variance σ2k X is equal to k2 σ2X .[42]:232 Putting all this together reveals that σ2(a-2aq) = (2a)2 σ2q . Recall (from the section on "Continued genetic drift") that σ2q = pq f . With f=1 here within this present derivation, this becomes pq 1 (that is pq), and it is substituted into the previous.

The final result is:- σ2G(1) = σ2(a-2aq) = 4a2 pq = 2(2pq a2) = 2 σ2a .

It follows immediately that f σ2G(1) = f 2 σ2a . [This last f comes from the initial Sewall Wright equation : it is not the f just set to "1" in the derivation concluded two lines above.]

#### Total dispersed Genic variance - σ2A(f) and αf

From the previous sections, it was found that the within line genic variance was based upon the Allele Substitution "Additive genetic variance" ( σ2A ) ; but the amongst line genic variance was based upon the Gene Model "Allele variance" ( σ2a ). These two cannot simply be added together to obtain the total Genic variance. [This was not a difficulty for the Dominance variances because all components referred to the same base - σ2D.] One approach to avoiding this problem was to re-visit the derivation of the " average allele substitution effect ", and to construct a version ( α f ) which would incorporate the effects of the dispersion. Crow and Kimura achieved this [11] :130–131 using the re-centered allele effects (a’, d’, (-a)’ ) discussed previously ["Gene effects re-defined"]. However, this was found subsequently to under-estimate slightly the total Genic variance, and a new variance-based derivation led to a refined version.[34]

The refined version is:- α f = { a2 + [(1-f ) / (1 + f )] 2(q - p ) ad + [(1-f ) / (1 + f )] (q - p )2 d2 } (1/2)

Consequently, σ2A(f) = (1 + f ) 2pq αf 2 does now agree with [ (1-f) σ2A(0) + 2f σ2a(0) ] exactly.

#### Total and partitioned dispersed Dominance variances

The total genic variance is of intrinsic interest in its own right. But it had had another important use as well: it was subtracted from the inbred Genotypic variance equation of Sewall Wright [35] in order to provide an estimator for the (total) Dominance variance . An anomaly appeared, however, because the total dominance variance appeared to rise early in inbreeding despite the decline in heterozygosity.[12] :128 :266 Consequently, the de novo derivation [referred to above] refined the equation for αf.[34] At the same time, a direct solution for the 'total Dominance variance was obtained, thus avoiding the need for the "subtraction" method of previous times. Furthermore, by incorporating the expanded Sewall Wright equation, direct solutions for the dispersion partitions of the Dominance variance were obtained also.

### Environmental variance

The environmental variance is phenotypic variability which cannot be ascribed to genetics. This sounds simple, but the experimental design needed to separate the two needs very careful planning. Even the "external" environment can be divided into spatial and temporal components, as well as partitions such as "litter" or "family" and "culture" or "history". Where does epigenetic variance get placed? Is it embedded within epistasis: or is it "internal environment"? These components are very dependent upon the actual experimental model used to do the research. Such issues are very important when doing the research itself, but in this article on quantitative genetics this overview may suffice.

It is an appropriate place, however, for a summary:

Phenotypic variance = genotypic variances + environmental variances + genotype-environment interaction + experimental "error" variance

ie σ²P = σ²G + σ²E + σ²GE + σ²

or σ²P = σ²A + σ²D + σ²I + σ²E + σ²GE + σ²

after partitioning the genotypic variance (G) into the components of "additive" (A), "dominance" (D), and "epistasic" (I) variance mentioned above.[43]

### Heritability and repeatability

The heritability of a trait is the proportion of the total (phenotypic) variance (σ²P) that is explained by the total genotypic variance (σ²G). This is known as the "broad sense" heritability (H2).[44] If only additive genetic variance (σ²A) is used in the numerator, the heritability is called "narrow sense" (h2).

The broad sense heritability indicates the proportion of the phenotypic variance due to the whole genotypical variance. In colloquial terms, it indicates the extent of "nature" while (1-H2) indicates the extent of "nurture". Narrow sense heritability indicates the proportion of the phenotypic variance attributable to the "additive" genetic variance, discussed above. It was pointed out there that this variance arises through substitution (i.e. phenotypic change) following fertilization. Fisher proposed that this narrow-sense heritability might be appropriate in considering the results of natural selection, focusing as it does on change-ability, and hence adaptation.[28] It has been used also for predicting generally the results of artificial selection. In the latter case, however, the broad sense heritability may be more appropriate, as the whole attribute is being altered: not just adaptive capacity. Generally, advance from selection is more rapid with higher heritability. In animals, heritability of reproductive traits is typically low, while heritability of disease resistance and production are moderately low to moderate, and heritability of body conformation is high.

Repeatability (r2) is the proportion of phenotypic variance attributable to differences in repeated measures of the same subject, arising from later records. It is used particularly for long-lived species. This value can only be determined for traits that manifest multiple times in the organism's lifetime, such as adult body mass, metabolic rate or litter size. Individual birth mass, for example, would not have a repeatability value: but it would have a heritability value. Generally, but not always, repeatability indicates the upper level of the heritability.[45]

r2 = (σ²G + σ²PE)/σ²P

where σ²PE = phenotype-environment interaction ≡ repeatability.

The above concept of repeatability is, however, problematic for traits that necessarily change greatly between measurements. For example, body mass increases greatly in many organisms between birth and adult-hood. Nonetheless, within a given age range (or life-cycle stage), repeated measures could be done, and repeatability would be meaningful within that stage.

## Relationship

File:Inbreeding & Coancestry.jpg
Connection between the inbreeding and co-ancestry coefficients.

From the heredity perspective, relations are individuals which have inherited genes from one or more common ancestors. Therefore, their "relationship" can be quantified on the basis of the probability that they each have inherited a copy of an allele which had its origin in the common ancestor. In earlier sections, the Inbreding coefficient has been defined as "the probability that two same alleles ( A and A, or a and a ) have a common origin"; or, more formally, " the probability that two homologous alleles are autozygous ". Previously, the emphasis was on an individual's likelihood of having two such alleles, and the coefficient was framed accordingly. It is obvious, however, that this probability of autozygosity for an individual must also be the probability that each of its two parents had this autozygous allele. In this re-focused form, the probability is called the co-ancestry coefficient for the two individuals i and j ( f ij ). In this form, it can be used to quantify relationship between two individuals, and may also be known as the " coefficient of kinship " or the " consanguinity coefficient ".[11]:132–143 [12]:82–92

### Pedigree analysis

File:Pedigree Analysis.jpg
Illustrative pedigree.

Pedigrees are diagrams which show the familial connections between individuals and their ancestors, and possibly between other members of the group sharing genetical inheritance with them. They are "relationship maps". A pedigree can be analyzed, therefore, to reveal coefficients of inbreeding and co-ancestry. Such pedigrees actually are informal depictions of path diagrams as used in path analysis, which was invented by Sewall Wright when he formulated his studies on inbreeding.[46]:266–298 Using the diagram to the left, the probability that individuals "B" and "C" have received autozygous alleles from ancestor "A" is 1/2 (one out of the two diploid alleles). This is the "de novo" inbreeding (ΔfPed) at this step. However, the "other" allele may have had "carry-over" autozygosity from previous generations, so the probability of this occurring is ( de novo complement multiplied by the inbreeding of ancestor A ), that is (1 - ΔfPed ) fA = (1/2) fA . Therefore, the total probability of autozygosity in B and C, following the bi-furcation of the pedigree, is the sum of these two components, namely (1/2) + (1/2)fA = (1/2) (1+f A ) . This can be viewed as the probability that two random gametes from ancestor A bear autozygous alleles , and in that context is called the " coefficient of parentage " = fAA .[11]:132–143 [12]:82–92 It will re-appear often in the following paragraphs.

Following the "B" path, the probability that any autozygous allele is "passed on" to each successive parent is again (1/2) at each step (including the last one to the "target" X ). The overall probability of transfer down the "B path" is therefore (1/2)3 . The power that (1/2) is raised to can be viewed as "the number of intermediates in the path between A and X ", nB = 3 . Similarly, for the "C path", nC = 2 , and the "transfer probability" is (1/2)2 . The combined probability of autozygous transfer from A to X is therefore [ fAA (1/2)(nB) (1/2)(nC) ] . Recalling that fAA = (1/2) (1+f A ) , fX = fPQ = (1/2)(nB + nC + 1) (1 + fA ) . In this example, assuming that fA = 0, fX = 0.0156 (rounded) = fPQ , one measure of the "relatedness" between P and Q.

File:Pedigree CrossMult.jpg
Cross-multiplication rules.

### Cross-multiplication rules

In the following sections on sib-crossing, and similar, a number of "averaging rules" will be found useful. These are derived from Path Analysis.[46] The rules show that any co-ancestry coefficient can be obtained as the average of " cross-over co-ancestries " between appropriate grand-parental and parental combinations. Thus, referring to the diagram to the right, Cross-Multiplier 1 is that fPQ = average of ( fAC , fAD , fBC , fBD ) = (1/4) [fAC + fAD + fBC + fBD ] = fY also. In a similar fashion, Cross-Multiplier 2 states that fPC = (1/2) [ fAC + fBC ] ; while Cross-Multiplier 3 states that fPD = (1/2) [ fAD + fBD ] . Returning to the first Multiplier, it can now be seen also to be fPQ = (1/2) [ fPC + fPD ], which, after substituting Multipliers 2 and 3, resumes its original form.

In much of the following, the grand-parental generation will be referred to as (t-2) , the parent generation as (t-1) , and the "target" generation will be t.

### Full-Sib crossing (FS)

File:Inbreeding- Sibs.jpg
Inbreeding in sibling relationships
The diagram to the right shows that Full Sib crossing is a direct application of Cross-Multiplier 1 , with the slight modification that parents A and B are repeated (in lieu of C and D) in order to indicate that individuals P1 and P2 have both of their parents in common - that is they are full siblings . Individual Y is the result of the crossing of two full siblings. Therefore, fY = fP1,P2 = (1/4) [ fAA + 2 fAB + fBB ] . Recall that fAA and fBB were defined earlier (in Pedigree analysis) as coefficients of parentage , being equal to (1/2)[1+fA ] and (1/2)[1+fB ] respectively, in the present context. Recognize that, in this guise, the grandparents A and B represent generation (t-2) . Thus, assuming that in any one generation all levels of inbreeding are the same, these two coefficients of parentage each represent (1/2) [1 + f(t-2) ] .
File:Crossing Inbreeding.jpg
Inbreeding from Full-sib and Half-sib crossing, and from Selfing.
Now, examine fAB . Recall that this also is fP1 or fP2 , and so represents their generation - f(t-1) . Putting it all together, ft = (1/4) [ 2 fAA + 2 fAB ] = (1/4) [ 1 + f(t-2) + 2 f(t-1) ] . That is the inbreeding coefficient for Full-Sib crossing .[11]:132–143 [12]:82–92 The graph to the left shows the rate of this inbreeding over twenty repetitive generations. The "repetition" means that the progeny after cycle t become the crossing parents that generate cycle (t+1 ), and so on successively. The graphs also show the inbreeding for random fertilization 2N=20 for comparison. Recall that this inbreeding coefficient for progeny Y is also the co-ancestry coefficient for its parents, and so is a measure of the relatedness of the two Fill siblings.

### Half-Sib crossing (HS)

Derivation of the Half Sib crossing takes a slightly different path to that for Full sibs. In the diagram to the right, the two half-sibs at generation (t-1) have only one parent in common - parent "A" at generation (t-2). The cross-multiplier 1 is utilized again, giving fY = f(P1,P2) = (1/4) [ fAA + fAC + fBA + fBC ] . There is just one coefficient of parentage this time, but three co-ancestry coefficients at the (t-2) level (one of them - fBC - being a "dummy" and not representing an actual individual in the (t-1) generation). As before, the coefficient of parentage is (1/2)[1+fA ] , and the three co-ancestries each represent f(t-1) . Recalling that fA represents f(t-2) , the final gathering and simplifying of terms gives fY = ft = (1/8) [ 1 + f(t-2) + 6 f(t-1) ] .[11]:132–143 [12]:82–92 The graphs at left include this half-sib (HS) inbreeding over twenty successive generations.
File:Inbreeding- Selfing.jpg
Self fertilization inbreeding
As before, this also quantifies the relatedness of the two half-sibs at generation (t-1) in its alternative form of f(P1, P2) .

### Self fertilization (SF)

A pedigree diagram for selfing is on the right. It is so straightforward it doesn't require any cross-multiplication rules. It employs just the basic juxtaposition of the inbreeding coefficient and its alternative the co-ancestry coefficient; followed by recognizing that, in this case, the latter is also a coefficient of parentage. Thus, fY = f(P1, P1) = ft = (1/2) [ 1 + f(t-1) ]  !! [11]:132–143 [12]:82–92 This is the fastest rate of inbreeding of all types, as can be seen in the graphs above. The selfing curve is, in fact, a graph of the coefficient of parentage .

### Cousins crossings

File:Inbreeding- Cousins First.jpg
Pedigree analysis First cousins

These are derived with methods similar to those for siblings.[11]:132–143 [12]:82–92 As before, the co-ancestry viewpoint of the inbreeding coefficient provides a measure of "relatedness" between the parents P1 and P2 in these cousin expressions.

The pedigree for First Cousins (FC) is given to the right. The prime equation is fY = ft = fP1,P2 = (1/4) [ f1D + f12 + fCD + fC2 ]. After substitution with corresponding inbreeding coefficients, gathering of terms and simplifying, this becomes ft = (1/4) [ 3 f(t-1) + (1/4) [2 f(t-2) + f(t-3) + 1 ]] , which is a version for iteration - useful for observing the general pattern, and for computer programming. A "final" version is ft = (1/16) [ 12 f(t-1) + 2 f(t-2) + f(t-3) + 1 ] .
File:Inbreeding- Cousins Second.jpg
Pedigree analysis Second cousins
The Second Cousins (SC) pedigree is on the left. Parents in the pedigree which are not related to the Common Ancestor are indicated by numerals instead of letters. Here, the prime equation is fY = ft = fP1,P2 = (1/4) [ f3F + f34 + fEF + fE4 ]. After working through the appropriate algebra, this becomes ft = (1/4) [ 3 f(t-1) + (1/4) [3 f(t-2) + (1/4) [2 f(t-3) + f(t-4) + 1 ]]] , which is the iteration version. A "final" version is ft = (1/64) [ 48 f(t-1) + 12 f(t-2) + 2 f(t-3) + f(t-4) + 1 ] .
File:Cousin Inbreeding.jpg
Inbreeding from several levels of cousin crossing.

In order to visualize the pattern in Full Cousin equations, start the series with the full sib equation re-written in iteration form:- ft = (1/4)[2 f(t-1) + f(t-2) + 1 ]. Notice that this is the "essential plan" of the last term in each of the cousin iterative forms: with the small difference that the generation indices increment by "1" at each cousin "level". Now, define the cousin level as k = 1 (for First cousins), = 2 (for Second cousins), = 3 (for Third cousins), etc., etc.; and = 0 (for Full Sibs, which are "zero level cousins"). The last term can be written now as:- (1/4) [ 2 f(t-(1+k)) + f(t-(2+k)) + 1] . Stacked in front of this last term are one or more iteration increments of the form (1/4) [ 3 f(t-j) + ... , where j is the iteration index and takes values from 1 ... k over the successive iterations as needed. Putting all this together provides a general formula for all levels of Full cousin possible, including Full Sibs. For kth level Full cousins, f{k}t = Ιterj = 1k { (1/4) [ 3 f(t-j) + }j + (1/4) [ 2 f(t-(1+k)) + f(t-(2+k)) + 1] . At the commencement of iteration, all f(t-x) are set at "0", and each has its value substituted as it is calculated through the generations. The graphs to the right show the successive inbreeding for several levels of Full Cousins.

File:Inbreeding- Csns Half.jpg
Pedigree analysis Half cousins

For First Half-cousins (FHC), the pedigree is to the left. Notice that there is just one Common Ancestor (individual A). Also, as for Second cousins, parents not related to the Common Ancestor are indicated by numerals. Here, the prime equation is fY = ft = fP1,P2 = (1/4) [ f3D + f34 + fCD + fC4 ]. After working through the appropriate algebra, this becomes ft = (1/4) [ 3 f(t-1) + (1/8) [6 f(t-2) + f(t-3) + 1 ]] , which is the iteration version. A "final" version is ft = (1/32) [ 24 f(t-1) + 6 f(t-2) + f(t-3) + 1 ] . The iteration algorithm is similar to that of Full Cousins, except that the last term is (1/8) [ 6 f(t-(1+k)) + f(t-(2+k)) + 1 ] . Notice that this last term is basically similar to the Half Sib equation, in parallel to the pattern found for Full Cousins and Full Sibs. In other words, Half Sibs are "zero level" Half Cousins.

There is a tendency to regard cousin crossing with a human-oriented point of view, possibly because of a wide interest in Genealogy. The use of pedigrees to derive the inbreeding perhaps reinforces this "Family History" view. However, such kinds of inter-crossing occur also in natural populations, especially those which are sedentary, or have a "breeding area" which is re-visited from season to season. The progeny-group of a harem with a dominant male, for example, may contain elements of sib-crossing, cousin crossing, and backcrossing, as well as genetic drift, especially of the "island" type. In addition to all of that, the occasional "outcross" will add an element of hybridization to the mix. It certainly is not panmixia !

### Backcrossing (BC)

File:Inbreeding- Backcross.jpg
Pedigree analysis - Backcrossing.
File:Backcrossing 1.jpg
Backcrossing- basic inbreeding levels

Following the hybridizing between A and R, the F1 (individual B) is crossed back (BC1) to an original parent (R) to produce the BC1 generation (individual C). [It is usual to use the same label for the act of making the back-cross and for the generation produced by it. The act of back-crossing is here in italics. ] Parent R is the Recurrent parent. Two successive backcrosses are depicted, with individual D being the BC2 generation. These generations have been given t indices also, as indicated. As before, fD = ft = fCR = (1/2) [ fRB + fRR ] , using cross-multiplier 2 previously given. The fRB just defined is the one that involves generation (t-1) with (t-2). However, there is another such fRB contained wholly within generation (t-2) as well, and it is this one that is used now:- as the co-ancestry of the parents of individual C in generation (t-1). As such, it is also the inbreeding coefficient of C, and hence is f(t-1). The remaining fRR is the coefficient of parentage of the recurrent parent, and so is (1/2) [1 + fR ] . Putting all this together :- f<sub|>t</sub> = (1/2) [ (1/2) [ 1 + fR ] + f(t-1) ] = (1/4) [ 1 + fR + 2 f(t-1) ] . The graphs at right illustrate Backcross inbreeding over twenty backcrosses for three different levels of (fixed) inbreeding in the Recurrent parent.

This routine is commonly used in Animal and Plant Breeding programmes. Often after making the hybrid (especially if individuals are short-lived), the recurrent parent needs separate "line breeding" for its maintenance as a future recurrent parent in the backcrossing. This maintenance may be through selfing, or through full-sib or half-sib crossing, or through restricted randomly-fertilized populations, depending on the species' reproductive possibilities. Of course, this incremental rise in fR carries-over into the ft of the backcrossing. The result is a more gradual curve rising to the asymptotes than shown in the present graphs, because the fR is not at a fixed level from the outset.

### Relatedness between relatives

Central in estimating the variances for the various components is the principle of relatedness. A child has a father and a mother. Consequently, the child and father share 50% of their alleles, as do the child and the mother. However, the mother and father normally do not share alleles as a result of shared ancestors. Similarly, two full siblings share also on average 50% of the alleles with each other, while half siblings share only 25% of their alleles. This variation in relatedness can be used to estimate which proportion of the total phenotypic variance (σ²P) is explained by the above-mentioned components.

The principle of relationship (R) is central to understanding the resemblances within families and can be useful when calculating inbreeding. Relationship has two definitions that can be applied: -The probable portion of genes that are the same for two individuals due to common ancestry exceeding that of the base population -Additive/numerator relationship: the relationship coefficient (Rxy¬) = twice the probability of two genes at loci in different individuals being identical by descent. Rxy values can range from 0 to 1. Relationship can be calculated in several ways; from the known relationships of the individual, from bracket pedigrees, and from pedigree path diagrams.

#### Calculating relationship from known relationships

Relationship Relationship Coefficient
Individual and itself 1.00
Individual and a monozygotic twin 1.00
Individual and parent 0.50
Full siblings 0.50
Half siblings 0.25
Individual and grandparent 0.25
Son of sire and daughter of sire 0.125
Grandson and granddaughter of sire 0.0625
• Note: if the common ancestor is inbred, multiply the relationship by (1+inbreeding coefficient)

#### Calculating relationship from pathway diagrams

RXY = Σ(.5)n(1+FCA)

n = number of segregations between X and Y through their common ancestor FCA = the inbreeding coefficient of the common ancestor

Example: calculating RAE and RBE Note: valid pathways only go through ancestors (only go against the direction of the arrow). For example, to calculate the relationship of A and B, the pathway A-D-B would be acceptable, whereas the pathway A-X-B would be not. The reason behind this is that having progeny together does not make two individuals related.

RAB: there are two possible pathways from A to E. A-D-F-E = (1/2)3 = .125 A-D-E = (1/2)2 = .25 Total: .375

RBE: there are four possible pathways from B to E. B-D-E = (1/2)2 = .25 B-D-F-E = (1/2)3 = .125 B-C-D-E = (1/2)3 = .125 B-C-D-F-E = (1/2)4 = .0625 Total: .5625

The square root of h^2 equals the correlation between additive genotype and expressed phenotype, as shown through the general procedures of Path Analysis.[46]:214–298

## Resemblances amongst relatives

These, in like manner to the Genotypic variances, can be derived through either the gene-model ("Mather") approach or the allele-substitution ("Fisher") approach. Here, each method will be demonstrated for alternate cases.

### Parent-offspring covariance

These can be viewed either as the covariance between any offspring and any one of its parents (PO), or as the covariance between any offspring and the "mid-parent" value of both its parents (MPO).

#### One-parent and offspring (PO)

This can be derived as the sum of cross-products between parent gene-effects and one-half of the progeny expectations using the allele-substitution approach. The one-half of the progeny expectation accounts for the fact that only one of the two parents is being considered. The appropriate parental gene-effects are therefore the second-stage redefined gene effects used to define the Genotypic variances earlier, that is:- a′′ = 2q(α - qd) and d′′ = (q-p)α + 2pqd and also (-a)′′ = -2p(α + pd) [see section "Gene effects redefined"]. Similarly, the appropriate progeny effects, for allele-substitution expectations are one-half of the earlier breeding values, the latter being:- αAA = 2qα, and αAa = (q-p)α and also αaa = -2pα [see section on "Genotype substitution - Expectations and Deviations"].

Because all of these effects are defined already as deviates from the Genotypic mean, the cross-product sum using {genotype-frequency * parental gene-effect * half-breeding-value} will provide immediately the allele-substitution-expectation covariance between any one parent and its offspring. After careful gathering of terms and simplification, this becomes cov(PO)A = pqα2 = ½ σ2A .[11] :132–141 [12] :134–147

Unfortunately, the allele-substitution-deviations are usually overlooked, but they have not "ceased to exist" nonetheless! Recall that these deviations are:- δAA = -2q2 d, and δAa = 2pq d and also δaa = -2p2 d [see section on "Genotype substitution - Expectations and Deviations"]. Consequently, the cross-product sum using {genotype-frequency * parental gene-effect * half-substitution-deviations} also will provide immediately the allele-substitution-deviations covariance between any one parent and its offspring. Once more, after careful gathering of terms and simplification, this becomes cov(PO)D = 2p2q2d2 = ½ σ2D .

It follows therefore that:- cov(PO) = cov(PO)A + cov(PO)D = ½ σ2A + ½ σ2D , when dominance is not overlooked !

#### Mid-parent and offspring (MPO)

Because there are many combinations of parental genotypes, there are many different mid-parents and offspring means to consider, together with the varying frequencies of obtaining each parental pairing. The gene-model approach is the most expedient in this case. Therefore, an unadjusted sum of cross-products (USCP) - using all products { parent-pair-frequency * mid-parent-gene-effect * offspring-genotype-mean } - is adjusted by subtracting the {overall genotypic mean}2 as correction factor (CF). After multiplying out all the various combinations, carefully gathering terms, simplifying, factoring and cancelling-out where applicable, this becomes:-

cov(MPO) = pq [a + (q-p)d ]2 = pq α2 = ½ σ2A , with no dominance having been overlooked in this case, as it had been "used-up" in defining the α.[11] :132–141 [12] :134–147

#### Applications (Parent-Offspring)

The most obvious application is an experiment containing all parents and their offspring, with or without reciprocal crosses, preferably replicated without bias, enabling estimation of all appropriate means, variances and covariances, together with their standard errors. These estimated statistics can then be used to estimate the genetical variances. Twice the difference between the estimates of the two forms of (corrected) parent-offspring covariance provides an estimate of σ2D; and twice the cov(MPO) estimates σ2A. With appropriate experimental design and analysis,[7][41][42] standard errors can be obtained for these genetical statistics as well. This is the basic core of an experiment known as Diallel analysis, the Mather, Jinks and Hayman version of which will be discussed in another section.

A second application involves the use of regression analysis, which estimates from statistics the ordinate (Y-estimate), derivative (regression coefficient) and constant (Y-intercept) of calculus.[7][41][47][48] The regression coefficient estimates the rate of change of the function predicting Y from X, based on minimizing the residuals between the fitted curve and the observed data (MINRES). No alternative method of estimating such a function satisfies this basic requirement of MINRES. In general, the regression coefficient is estimated as the ratio of the covariance(XY) to the variance of the determinator (X). In practice, the sample size is usually the same for both X and Y, so that this can be written as SCP(XY) / SS(X), where all terms have been defined previously.[7][47][48] In the present context, the parents are viewed as the "determinative variable" (X), and the offspring as the "determined variable" (Y), and the regression coefficient as the "functional relationship" (βPO) between the two. Taking cov(MPO) = ½ σ2A as cov(XY), and σ2P / 2 (the variance of the mean of two parents - the mid-parent) as σ2X, it can be seen that βMPO = [½ σ2A] / [½ σ2P] = h2 .[49] Next, utilizing cov(PO) = [ ½ σ2A + ½ σ2D ] as cov(XY), and σ2P as σ2X, it is seen that 2 βPO = [ 2 (½ σ2A + ½ σ2D )] / σ2P = H2 .

Analysis of epistasis has previously been attempted via an interaction variance approach of the type σ2AA , and σ2AD and also σ2DD. This has been integrated with these present covariances in an effort to provide estimators for the epistasis variances. However, the findings of epigenetics suggest that this may not be an appropriate way to define epistasis.

### Siblings covariances

Covariance between half-sibs (HS) is defined easily using allele-substitution methods; but, once again, the dominance contribution has historically been omitted. However, as with the mid-parent/offspring covariance, the covariance between full-sibs (FS) requires a "parent-combination" approach, thereby necessitating the use of the gene-model corrected-cross-product method; and the dominance contribution has not historically been overlooked. The superiority of the gene-model derivations is as evident here as it was for the Genotypic variances.

#### Half-sibs of the same common-parent (HS)

The sum of the cross-products { common-parent frequency * half-breeding-value of one half-sib * half-breeding-value of any other half-sib in that same common-parent-group } immediately provides one of the required covariances, because the effects used [breeding values - representing the allele-substitution expectations] are already defined as deviates from the genotypic mean [see section on "Allele substitution - Expectations and deviations"]. After simplification. this becomes:- cov(HS)A = ½ pq α2 = ¼ σ2A .[11] :132–141 [12] :134–147 However, the substitution deviations also exist, defining the sum of the cross-products { common-parent frequency * half-substitution-deviation of one half-sib * half-substitution-deviation of any other half-sib in that same common-parent-group }, which ultimately leads to:- cov(HS)D = p2 q2 d2 = ¼ σ2D . Adding the two components gives:-

cov(HS) = cov(HS)A + cov(HS)D = ¼ σ2A + ¼ σ2D .

#### Full-sibs (FS)

As explained in the introduction, a method similar to that used for mid-parent/progeny covariance is used. Therefore, an unadjusted sum of cross-products (USCP) - using all products { parent-pair-frequency * the square of the offspring-genotype-mean } - is adjusted by subtracting the {overall genotypic mean}2 as correction factor (CF). In this case, the multiplying out of all the various combinations, carefully gathering terms, simplifying, factoring and cancelling-out is very protracted. It eventually becomes:-

cov(FS) = pq α2 + p2 q2 d2 = ½ σ2A + ¼ σ2D , with no dominance having been overlooked.[11] :132–141 [12] :134–147

#### Applications (Siblings)

The most useful application here for genetical statistics is the correlation between half-sibs. Recall that the correlation coefficient (r) is the ratio of the covariance to the variance [see section on "Associated attributes" for example]. Therefore, rHS = cov(HS) / σ2all HS together = [¼ σ2A + ¼ σ2D ] / σ2P = ¼ H2 .[50] The correlation between full-sibs is of little utility, being rFS = cov(FS) / σ2all FS together = [½ σ2A + ¼ σ2D ] / σ2P . The suggestion that it "approximates" (½ h2) is poor advice.

Of course, the correlations between siblings are of intrinsic interest in their own right, quite apart from any utility they may have for estimating heritabilities or genotypic variances.

It may be worth noting that [ cov(FS) - cov(HS)] = ¼ σ2A . Experiments consisting of FS and HS families could utilize this by using intra-class correlation to equate experiment variance components to these covariances [see section on "Coefficient of relationship as an intra-class correlation" for the rationale behind this].

The earlier comments regarding epistasis apply again here [see section on "Applications (Parent-offspring"].

## Selection

### Basic principles

File:SelctnPresur.jpg
Genetic advance and Selection pressure repeated.

Selection operates on the attribute (phenotype), such that individuals which equal or exceed a selection threshold (zP) become effective parents to create the next generation. The proportion they represent of the base population is the selection pressure (Prob.). The smaller the proportion, the stronger is the pressure! The mean of the selected group (Ps) is superior to the base-population mean (P0) by the difference called the selection differential (S). All of these quantities are phenotypic. In order to "link" to the underlying genes, a heritability (h2) is used, fulfilling the role of a coefficient of determination in the Biometrical sense. The expected genetical change, still expressed in phenotypic units of measurement, is called the genetic advance (ΔG), and is obtained by the product of the selection differential (S) and its coefficient of determination (h2). The expected mean of the progeny (P1) can be found by adding the genetic advance (ΔG) to the base mean (P0). The graphs to the right show how the (initial) genetic advance is greater with stronger selection pressure (smaller Probability). They also show how progress from successive cycles of selection (even at the same selection pressure) steadily declines, because the Phenotypic variance and the Heritability are being diminished by the selection itself. This is discussed further shortly.

Thus ΔG = S h2.[12] :1710-181

and P1 = P0 + ΔG .[12] :1710-181

The narrow-sense heritability (h2) is usually used, thereby linking to the additive genetic variance (σ2A) . However, if appropriate, use of the broad-sense heritability (H2) would connect to the genotypic variance (σ2G) ; and even possibly an allelic heritability [ ħ2 = (σ2a) / (σ2P) ] might be contemplated, connecting to ( σ2a ).

In order to utilize these concepts before selection actually takes place, and so predict the outcome of alternatives (such as choice of selection threshold, for example), these phenotypic statistics are re-considered against the properties of the Normal Distribution, especially those concerning truncation of the superior tail of the Distribution. In such consideration, the 'standardized selection differential (i) and the standardized selection threshold (z) are used instead of the previous "phenotypic" versions. The phenotypic standard deviateP(0)) also is needed. All of this will be elaborated in a subsequent section.

Therefore, ΔG = (i σP) h2, .....where (i σP(0)) = S previously.[12] :1710-181

File:SelctnRptd.jpg
Changes arising from repeated selection

It was noted above that successive ΔG will decline because the "input" [the phenotypic variance ( σ2P )] will be reduced by the previous selection.[12]:1710-181 The heritability also will be reduced. The graphs to the left show these declines over ten cycles of repeated selection during which the same selection pressure is asserted. The accumulated genetic advance (ΣΔG) has virtually reached its asymptote by generation 6 in this example. This reduction depends partly upon truncation properties of the Normal Distribution, and partly upon the heritability together with meiosis determination ( b2 ). The last two items quantify the extent to which the truncation is "offset" by new variation arising from segregation and assortment during meiosis.[12] :1710-181 [26] This will be discussed soon, but here note the simplified result for undispersed random fertilization (f = 0).

Thus :- σ2P(1) = σ2P(0) [1 - i ( i-z) ½ h2] , where i ( i-z) = K = truncation coefficient; and ½ h2 = R = reproduction coefficient [12]:1710-181 [26] This can be written also as σ2P(1) = σ2P(0) [1 - K R ], which facilitates more detailed analysis of selection problems.

Here, i and z have already been defined, ½ is the meiosis determination (b2) for f=0, and the remaining symbol is the heritability. These all will be discussed further in following sections. Also notice that, more generally, R = b2 h2. If the general meiosis determination ( b2 ) is used, the results of prior inbreeding can be incorporated into the selection. The phenotypic variance equation then becomes:-

σ2P(1) = σ2P(0) [1 - i ( i-z) b2 h2].

The Phenotypic variance truncated by the selected group ( σ2P(S) ) is simply σ2P(0) [1 - K], and its contained genic variance is (h20 σ2P(S) ). Assuming that selection has not altered the environmental variance, the genic variance for the progeny can be approximated by σ2A(1) = ( σ2P(1) - σ2E) . From this, h21 = ( σ2A(1) / σ2P(1) ). Similar estimates could be made for σ2G(1) and H21 , or for σ2a(1) and ħ21 if required.

#### Alternative ΔG

A rearrangement follows which is useful for considering selection on multiple attributes (characters). It starts by expanding the heritability into its variance components. ΔG = i σP ( σ2A / σ2P ) . The σP and σ2P partially cancel, leaving a solo σP. Next, the σ2A inside the heritability can be expanded as (σA * σA), which leads to :

File:SelctnDifrntl.jpg
Selection differential and the Normal Distribution

ΔG = i σA ( σA / σP ) = i σA h . Corresponding re-arrangements could be made using the alternative heritabilities, giving ΔG = i σG H or ΔG = i σa ħ.

### Background

#### Standardized selection - the Normal distribution

The entire base population is outlined by the Normal curve [48]:78–89 to the right. Along the Z axis is every value of the attribute from least to greatest, and the height from this axis to the curve itself is the frequency of the value at the axis below. The equation for finding these frequencies for the "Normal" curve (the curve of "common experience") is given in the ellipse: notice it includes the mean (μ) and the variance (σ2). Moving infinitesimally along the z-axis, the frequencies of neighbouring values can be "stacked" beside the previous, thereby accumulating an "area representing the probability " of obtaining all the values within the "stack". [That's integration from calculus!] Selection is focused on such a Probability area, being the shaded-in one from the selection threshold (z) to the end of the superior tail of the curve. This " Prob. " is the selection pressure. The selected group (the effective parents of the next generation) include all phenotype values from z to the "end" of the "tail".[51] The mean of the selected group is μs, and the difference between it and the base mean (μ) represents the selection differential (S). By taking partial integrations over curve-sections of interest, and some rearranging of the algebra, it can be shown that the "selection differential" is S = [ y (σ / Prob.)] , where y is the frequency of the value at the "selection threshold" z (the ordinate of z).[11]:226–230 Rearranging this relationship gives S / σ = y / Prob., the left-hand side of which is, in fact, the selection differential divided by the standard deviation - that is the standardized selection differential (i). The right-side of the relationship provides an "estimator" for i - the ordinate of the selection threshold divided by the selection pressure. Tables of the Normal Distribution [41] :547–548 can be used, but tabulations of i itself are available also.[52]:123–124 The latter reference also gives values of i adjusted for small populations (400 and less),[52]:111–122 where "quasi-infinity" cannot be assumed (but was presumed in the "Normal Distribution" outline above). The standardized selection differential (i) is known also as the intensity of selection.[12]:174 ; 186

Finally, a cross-link with the differing terminology in the previous sub-section may be useful:- μ (here) = "P0" (there), μS = "PS" and σ2 = "σ2P".

#### Meiosis determination - Reproductive path analysis

File:ReproDetmntn.jpg
Reproductive coefficients of determination and Inbreeding
File:ReproPaths.jpg
Path analysis of sexual reproduction.

The meiosis determination (b2) is the coefficient of determination of meiosis to the process whereby parents generate gametes. Following the principles of standardized partial regression, of which path analysis is a pictorially-oriented version, Sewall Wright analyzed the paths of gene-flow during sexual reproduction, and established the "strengths of contribution" (coefficients of determination) of various components to the overall result.[26][35] Path analysis includes partial correlations as well as partial regression coefficients (the latter are the path coefficients). Lines with a single arrow-head are directional "determinative paths", and lines with double arrow-heads are "correlation connections". By tracing various routes according to path analysis rules, the algebra of standardized partial regression is emulated.[46]

The path diagram to the left is a representation of this analysis of sexual reproduction. It has many items of interest within it, but the one of importance in the selection context is that of meiosis: for it is there that segregation and assortment occur - the processes which partially ameliorate the truncation of the phenotypic variance arising from selection. The path coefficients b are the meiosis paths; those labeled a are the fertilization paths. The correlation between gametes from the same parent (g) is the "meiotic correlation"; that between parents within the same generation is rA; and that between gametes from different parents (f) became known subsequently as the "inbreeding coefficient".[11]:64 The primes ( ' ) indicate generation (t-1), and the unprimed indicate generation t. Here, some important results of the present analysis are given. Sewall Wright interpreted many in terms of inbreeding coefficients.[26][35]

The meiosis determination (b2) is ½ (1+g) and equals ½ (1 + f(t-1)) , implying that g = f(t-1). [Notice that this b2 is the coefficient of parentage (fAA) of Pedigree analysis re-written with a "generation level" instead of an "A" inside the parentheses.] With non-dispersed random fertilization, f(t-1)) = 0, giving b2 = ½, as used in the selection section above. However, being aware of its background, other fertilization patterns can be utilized as required. Another determination also involves inbreeding - the fertilization determination (a2) equals 1 / [ 2 ( 1 + ft ) ] . Also another correlation is an inbreeding indicator - rA = 2 ft / ( 1 + f(t-1) ), also known as the coefficient of relationship [not to be confused with the coefficient of kinship which is an alternative name for the co-ancestry coefficient - see introduction to "Relationship" section]. This rA will re-occur in the sub-section on "Dispersion and Selection".

These links with inbreeding reveal interesting facets about sexual reproduction which are not immediately apparent. The graphs to the right plot the meiosis and syngamy (fertilization) coefficients of determination against the inbreeding coefficient. There it is revealed that as inbreeding increases, meiosis becomes more important (the coefficient increases), while syngamy becomes less important. The overall role of reproduction [the product of the previous two coefficients - r2] remains the same.[53] This increase in m2 is particularly relevant for selection because it means that the selection truncation of the Phenotypic variance is offset to a lesser extent during a sequence of selections when accompanied by inbreeding (which is frequently the case).

### Dispersion and Selection

File:CA-RA.jpg
Genic intra-class correlation compared with Coefficient of Relationship

Selection basics concern the selection for future parenthood of individuals belonging to a population, on the basis of their phenotypic values. Gamete sampling results in that population becoming dispersed into progeny lines with variable allele and genotype frequencies, and with variable means. This necessitated the partitioning of the genotypic variance into within-line and amongst-line components. It remains to superimpose this dispersion structure onto the basics of selection. Sewall Wright's analysis of reproduction (see previous sub-section) provides a convenient key to enable this. This key is the coefficient of relationship (rA), which is the genic correlation amongst individuals within the (t-1) generation - that is, it is an intra-class correlation. This is a special class of correlations which can be defined in terms of variance components - in particular, they are generally the ratio of the inter-class variance to the total of both inter- and intra-class variances.[7]:282–284 [41]:294–296 This connection is demonstrated in the following sub-section. It is important to affirm this idea because the method of connecting selection and dispersion directly utilizes intra-class correlations.

#### Coefficient of relationship as an Intra-class correlation

It is sometimes "assumed" that the approximations σ2A(AL) = 2 f σ2A(0) and σ2A(WL) = (1- f) σ2A(0) and σ2A(Σ) = (1 + f) σ2A(0) will "suffice" [see section on Dispersion and the Genotypic variance], and it is temporally useful here (it will be corrected subsequently). Furthermore, recall that the inbreeding coefficient in the t-th generation (ft) is the same as the co-ancestry coefficient (fj j’) in the (t-1) generation [see section on Pedigree analysis, et seq]. Putting all of this together,

rA(jj’) = σ2A(AL) / σ2A(Σ) = [2fjj’ σ2A(0)] / [(1 + f(t-1)) σ2A(0)] = 2fjj’ / (1 + f(t-1)) (after cancelling the variances) = 2ft / (1 + f(t-1)) = Sewall Wright's result.

Thus, the coefficient of relationship is indeed an intra-class correlation in principle.[12] :208–218 The simplifying assumptions now need addressing.

Referring to the section on "Dispersion and the Genotypic variance", note that actually σ2A(AL) = 2 f σ2a and σ2A(Σ) = (1 + f) σ2A(f). After substituting these into the rA equation, and simplifying, the corrected genic intra-class correlation (cA) is:-

cA = rA ( a2 / α2f (t-1) ) .

The difference between the two is important if dominance is non trivial. See the graphs to the right.

#### Phenotypic intra-class correlation

This is the other prominent intra-class correlation relating selection to dispersion. It is based on partitioning the total phenotypic variance into amongst-line and within-line components, which have an underlying genetical cause. Its derivation serves to affirm that a correlation can be constructed with variance components. The following biometrical model forms the skeleton:-

Xij = μ + λi + ωij ...where... Xij is the phenotype of the j-th individual within the i-th line, i equals 1 .... g where g = the number of lines, j equals 1 .... n where n = number of individuals within the line, μ is the grand mean, λi is the line effect of the i-th line [the expectation - the deviation between the grand mean and the mean of the individuals within the i-th line], and ωij is the deviation of the j-th individual within line i and the mean of all the individuals within line i.[7][48]

The variance components associated with this model are:- σ2X( i j ) = σ2λ + σ2ω = σ2AL + σ2WL = σ2P(Σ) .

Recalling that the correlation (succinctly stated) is the ratio of the covariance to the variance (or to the geometric mean of two variances if necessary) [see section on "correlated traits"], the covariance [ X( i j ), X( i j’ ) ] needs defining, where j ≠ j’ being separate individuals within the same line. This covariance can be found as the Expectation of the cross-product of the model-components defining each individual,[48] as follows:-

cov [ X( i j ), X( i j’ ) ] = E{(λi + ω ij ) ( λi + ω ij’ )} , where E is the "Expectation", that is the mean under infinite sampling of all background.[7][48] Continuing :-

cov [ X( i j ), X( i j’ ) ] = E{( λi) 2 } + E{ λi ω ij’ } + E{ λi ω ij } + E{ω ij ω ij’ } = σ2AL + 0 + 0 + 0 , the first by definition of the variance, and the rest by the fundamental assumption of the independence of effects in the model.[7][48]

Thus, rP = cP = cov [ X( i j ), X( i j’ ) ] / σ2X( i j ) = σ2AL / [ σ2AL + σ2WL ] .

This satisfies the biometrical definition of any intra-class correlation, but it ignores the genetical origin of the matter. The very gamete sampling which has given rise to dispersion also has made likely that the uniting gametes were not independent, with the result that the E{ω ij ω ij’ } ≠ 0 after all ! Sewall Wright's reproductive paths identify two correlations which would give rise to this lack of independence - g ( the meiosis correlation - f(t-1) ) and f ( the gamete correlation of separate parents - ft ) [refer back to that section]. The combined effect of these can be defined as rg = ft + f(t-1) , which has a striking resemblance to de novo inbreeding plus carry-over inbreeding of previous sections !! Now, define g2 as the coefficient of determination which quantifies the genotypic component of σ2WL , giving consequently σ2WL(g) = g2 σ2WL . Finally, recalling that covariance equals correlation multiplied by variance, the genotypic covariance amongst individuals within the same dispersion line covWL(g) = rg σ2WL(g) = (ft + f(t-1)) [g2 σ2WL] . That is to say:-

E{ω ij ω ij’ } = rg σ2WL(g) ....[instead of 0], giving:-

kP = [ σ2AL + rg σ2WL(g) ] / σ2P(Σ) = rP + [ rg σ2WL(g) ] / σ2P(Σ) , after including the effects of inbreeding. All symbols are defined within the derivation. The implications of this are that if σ2AL and σ2WL were estimated from the Mean-squares of an actual analysis-of-variance of a "nursery", the "apparent" σ2AL would be biased upward by the amount [ rg σ2WL(g) ] / σ2P(Σ) , for which adjustment would be needed. Unbiased rP (cP) could then be calculated.

#### Relating intra-class correlations to Dispersion and to Heritabilities

##### Fundamentals

Both the phenotypic and genic intra-class correlations have been shown to be the ratios of their respective amongst-line variances to their respective total variances, as defined within the simple Dispersion model. Therefore:- [12] :208–218

σ2P(AL) = cP σ2P(Σ) , ...and... σ2P(WL) = (1 - cP) σ2P(Σ) .

Similarly, σ2A(AL) = cA σ2A(Σ) , ...and... σ2A(WL) = (1 - cA) σ2A(Σ) .

These are without error provided unbiased intra-class correlations (cP and cA) are used [see the sections above]. The amongst-line phenotypic variance is the variance of dispersed progeny means arising from genetic drift and/or gamete relationship. It is therefore a "genotypic variance" (of dispersed progenies). The within-line phenotypic variance is the variance of individuals within progeny lines, being their "genotypic variance" confounded probably with their "environmental variance". The "genotypic" component of this was [g2 σ2P(WL) ] in the amendment giving rise to kP in the section above. The genic variances are the previously-discussed dispersion genic variances.

It is possible also to define "Genotypic" (cG) and "Dominance" (cD) intra-class correlations which can be used in parallel ways to relate to their respective "amongst-line" and "within-line" variance components.

##### Corollaries

Recalling the comment above that σ2P(AL) = σ2G(AL) , it is often therefore equated to σ2A(AL) = cA σ2A(Σ) . Now, recalling that rP = σ2P(AL) / σ2P(Σ) , this new substitution leads to rP = cA σ2A(Σ) / σ2P(Σ). Further, recall that h2Individuals = σ2A(Σ) / σ2P(Σ) . It can thus be seen that rP = cA h2 Indiv. . That is, it is an "overview heritability" of line dispersion ! [Notice that this equality applies to rP  : not to kP.] A simple "within-line" heritability { [(1 - cA) / (1 - cP)] h2 Indiv. }, and "amongst-line" heritability { [ cA / cP] h2 Indiv. } can also be constructed : but more utilitarian versions will follow.[12] :208–218 Notice that this simple "within-line" heritability is not the same as g2 of the previous section on the "phenotypic intra-class correlation", being a "narrow-sense" heritability instead of a "broad-sense" one. The g2 would be [(1 - cG) / (1 - cP)] H2 Indiv. instead.

##### Amongst-line Selection

Within a natural dispersed bulk of progeny lines, it is probably impossible to identify the lines and their respective members: but, within the "nursery" of a controlled selection programme, parents and offspring are managed before-hand, and both progeny-lines and their members are certainly identifiable. It is thus possible to analyse the nursery as an "experiment", and to conduct a simple "single hypothesis" (one-way) analysis-of-variance (ANOVA) on it, extracting estimates of means and variance-components together with their standard errors.[7][41][48] If this was possible for the natural bulk, this ANOVA approach could also be utilized.

As a result, amongst-line selection can actually be effected: "observe" the line means and apply (to these) amongst-line selection as derived from the selection basics. Because observed means are usable, the variance of an estimated mean has to be added to the σ2(AL), whether it be phenotypic or genic. In general, the variance of an estimated mean σ2 = σ2y / n .[7][41][48] Similarly, the amongst-line "narrow-sense" heritability has to be re-defined in this new light. Finally, all these modifications have to be amalgamated into one compound-coefficient (ΚAL′ ) so as to convert at once the ΔGindiv into ΔGAL′ . The one remaining modification is to realize that selection pressure (Prob) now applies to the proportion of means which are selected, and is not focused upon the proportion of individuals. Consequently, the selection threshold (z), its ordinate (y), and the intensity of selection (i) are all focused in this same way.

Therefore,[12]:208–218 σ2P(AL′ ) = σ2P(AL) + σ2P( ỹ ) = σ2P(AL) + ( σ2P(WL) / nWL ) . That is:- σ2P(AL′ ) = cP σ2P(Σ) + [ (1-cP) σ2P(Σ) ] / nWL .

After gathering of terms, and simplifying, this becomes:- σ2P(AL′ ) = σ2P(Σ) {[(cP (nWL-1) + 1) / nWL ]} .

In a like manner, σ2A(AL′ ) = σ2A(Σ) {[(cA (nWL-1) + 1) / nWL ]} .

The ratio of the latter to the former furnishes the appropriate "narrow-sense" heritability, which, after simplifying, becomes:- h2AL′ = h2Indiv { [cA (nWL-1) + 1] / [cP (nWL-1) + 1] } .

Lastly, ΔGAL′ = i σP(AL′ ) h2AL′ .... = .... [ i σP(Σ) h2Indiv ] {[(cA (nWL-1) + 1)] / [ nWL(cP (nWL-1) + 1)](1/2) } .... = .... ΔGIndiv ΚAL′ , ....where ΚAL′ = {[(cA (nWL-1) + 1)] / [ nWL(cP (nWL-1) + 1)](1/2) }.[12]:208–218

##### Within-line Selection (and Overall Individual selection)

An alternative to amongst-line selection is to select individuals from within their progeny line context. This means essentially to select them with respect to their immediate peers, and it is called within-line selection. Notice its distinction with respect to "straight" individual selection: the latter selects individuals without any reference to their immediate peers within their progeny-line. That is, with "individual selection", the best individuals in the whole nursery are selected with the dispersion subdivisions being "ignored" altogether. In that case, the basic ΔG is used, together with the total phenotypic and genic variances, the straightforward h2Indiv, and the "obvious" selection pressure for overall "non-dispersed" individuals.

However, in within-line selection, the within-line partitions of variances (both phenotypic and genic) are used, and furthermore, the variance of the line mean has to be removed because it is un-usable when selecting individuals from amongst their immediate peers. Once again, the within-line "narrow-sense" heritability has to be re-defined in this new light. Finally, as before, all these modifications have to be amalgamated into one compound-coefficient (ΚWL′ ) so as to convert at once the ΔGindiv algebra into ΔGWL′ . Also as before, recall that selection pressure (Prob) now applies to the proportion of individuals within a progeny line which are selected, and is not focused upon the proportion of individuals overall in the nursery. Consequently, the selection threshold (z), its ordinate (y), and the intensity of selection (i) are all focused in this alternative way.

Therefore,[12]:208–218 σ2P(WL′ ) = σ2P(WL) - σ2P( ỹ ) = σ2P(WL) - ( σ2P(WL) / nWL ) . That is:- σ2P(WL′ ) = (1-cP) σ2P(Σ) - [ (1-cP) σ2P(Σ) ] / nWL .

After gathering of terms, and simplifying, this becomes:- σ2P(WL′ ) = σ2P(Σ) {[(1-cP)(nWL-1)] / nWL } .

In a like manner, σ2A(WL′ ) = σ2A(Σ) {[(1-cA)(nWL-1)] / nWL } .

The ratio of the latter to the former again furnishes the appropriate "narrow-sense" heritability, which, after simplifying, becomes:- h2WL′ = h2Indiv [(1-cA) / (1-cP)] .

Finally, ΔGWL′ = i σP(WL′ ) h2WL′ .... = .... [ i σP(Σ) h2Indiv ] {(1-cA) [(nWL-1) / ( nWL(1-cP))](1/2) } .... = .... ΔGIndiv ΚWL′ , ....where ΚWL′ = {(1-cA) [(nWL-1) / ( nWL(1-cP))](1/2) }.[12] :208–218

##### Selecting best Individuals from best Lines - Combined selection

Combining amongst-line selection and within-line selection necessitates two stages, each of which may have different selection pressures applied. The final overall selection pressure is the product of those of the separate stages:- ProbCbtn = ProbAL ProbWL . For example, for a ProbCbtn of 0.10, this can be achieved by several combinations of ProbAL and ProbWL, such as (0.316 * 0.316), or (0.5 * 0.2), or (0.2 * .05), and many other combinations. In the first of these examples, each stage is given equal "weight", as each is selected with the same selection pressure of 0.316 [this being the approximate square-root of ProbCbtn]. Recalling that i = y/Prob [see section on "standardized selection and the Normal curve"], it is clear that the result for the combination will be the same irrespective of the weights given to each stage, provided that the "final combination selection pressure" achieved is actually that which was planned. Using this ProbCbtn will lead to the appropriate i, z and y for the combined selection.

File:Sel Effic B.jpg
Relative efficiencies of selection Strategies under Half-sib crossing.

The amalgamation of the two ΔG coefficients from each stage will complete the algebra for Combined selection. Thus ΚCbtn = ΚAL + ΚWL , which, after simplification, becomes:-

ΚCbtn = { 1 + [ (cA - cP )2 / (1 - cP )] [(nWL - 1) / (1 + (nWL - 1) cP)]}(1/2) , where all symbols have been defined previously.

Finally, ΔGCbnd = [ i σP(Σ) h2Indiv ] ΚCbtn = ΔGIndiv ΚCbtn, remembering to use the appropriate i as discussed above.

#### Relative efficiencies of Selection strategies

Comparison of the values of the various ΔG conversion coefficients (Κx) will provide an immediate measure of the relative merits of the four strategies over time (generations) under various inbreeding regimes (dispersion strengths). For this purpose, the ΚIndiv can be set at 1. Using the Κx criteria, combined selection is found to be the best under every inbreeding regime. Therefore, the simplest method to visualize the relative efficiencies is to obtain the ratios of each Κx (t) to ΚCbnd (t). Graphs to the right show these Κ-ratios over ten cycles of selection for successive half-sib inbreeding [see section on half-sib crossing]. It is immediately apparent that, for this inbreeding regime, within-line selection has no value for making genetic advance (ΔG). Its only purpose might be for purification of breeding-stocks.

[An application note:-] However, even though combined selection is most efficient for ΔG, it is a two-stage selection activity, and therefore is more costly in time and money. It may not be so highly desirable when these new criteria are considered. For that reason, the graphs include a "0.9 efficiency judgment" line as well as the Κ-ratios themselves. This could be used in the following way. If a plant breeder was executing a "line selection" programme under such pollen management,[33] he might decide to utilize cheaper "one-pass" selection strategies that fell well above the 0.9 cut-off line in lieu of combined selection. On this basis, using these graphs, he might choose to use overall individual selection (that is "ignore" his dispersion structure in the nursery) for the first two cycles, then use combined selection for four further cycles, and finish with four cycles of amongst-line selection. Of course, this is only one possibility. He would not "abandon" his dispersion maintenance, however, and would continue with the rigorous pollen management which the half-sib crossing demands.

### Genetic drift and Selection

In the foregoing sections, dispersion has been considered as an "assistant" to selection, and it became apparent that the two can work well together. In quantitative genetics, selection is usually examined in this "biometrical" fashion, but the changes in the means (as monitored by ΔG) reflect the changes in allele and genotype frequencies which lie beneath this surface. Referral to the section on "Genetic drift" brings to mind that it also effects changes in allele and genotype frequencies, and associated means; and that this is the companion aspect to the Dispersion considered here ("the other side of the same coin"). However, these two forces of frequency change will seldom be "in concert", and may often act contrary to each other. One (selection) is "directional" being driven by selection pressure acting on the phenotype: the other (genetic drift) is driven by "chance" at fertilization (Binomial probabilities of gamete samples). If the two are tending towards the same allele frequency, their "coincidence" will be the Probability of obtaining that frequencies sample in the genetic drift: the likelihood of their being "in conflict", however, is the sum of Probabilities of all the alternative frequency samples ! In extreme cases, a single syngamy sampling can undo what selection has achieved, and the probabilities of it happening are available. It is important to keep this in mind. However, genetic drift resulting in sample frequencies similar to those of the "selection target" will not lead to so drastic an outcome, leading rather to what may be seen as "slowness in reaching selection goals".

## Correlated attributes

File:Metab model.png
Sources of attribute correlation.

Although some genes have only an effect on a single trait, many genes have an effect on various traits, which is termed pleiotropy. Because of this, a change in a single gene will have an effect on all those traits. This is calculated using covariances, and the phenotypic covariance (covP) between two traits can be partitioned in the same way as the variances described above [e.g. genic (covA ), dominance (covD), environment (covE )] . In general, the correlation coefficient is the ratio of the covariance to the geometric mean of the two variances of the traits.[48] :196–198 Various correlation coefficients can be obtained, using the appropriate partitions of variances and covariances. Of course, the Phenotypic correlation is the "usual" correlation of Statistics/Biometrics.

$\mbox{Phenotypic correlation} = \frac{\mathrm{cov}(P_{1}, P_{2})}{\sqrt{{V_{P_1}*V_{P_2}}}}$ ...and... $\mbox{Genotypic correlation} = \frac{\mathrm{cov}(G_{1}, G_{2})}{\sqrt{{V_{G_1}*V_{G_2}}}}$ ...and also... $\mbox{Environmental correlation} = \frac{\mathrm{cov}(E_{1}, E_{2})}{\sqrt{{V_{E_1}*V_{E_2}}}}$ .

The genic correlation {genetic correlation} is of particular interest, especially in quantifying the correlated effects of selection. It is as follows:-

$\mbox{Genic correlation} = \frac{\mathrm{cov}(A_{1}, A_{2})}{\sqrt{{V_{A_1}*V_{A_2}}}}$

## Footnotes and references

1. Anderberg, Michael R. (1973). Cluster analysis for applications. New York: Academic Press.
2. Mendel, Gregor (1866). "Versuche über Pflanzen Hybriden". Verhandlungen naturforschender Verein in Brünn iv.
3. Mendel, Gregor; Bateson, William [translator] (1891). "Experiments in plant hybridisation". J. Roy. hort. Soc. (London) xxv: 54–78.
4. The Mendel G.; Bateson W. (1891) paper, with additional comments by Bateson, is reprinted in:- Sinnott E.W.; Dunn L.C.; Dobzhansky T. (1958). "Principles of genetics"; New York, McGraw-Hill: 419-443. Footnote 3, page 422 identifies Bateson as the original translator, and provides the reference for that translation.
5. A QTL is a region in the DNA genome that affects, or is associated with, quantitative phenotypic traits.
6. Fisher, R. A. (1918). "The correlation between relatives on the supposition of Mendelian inheritance.". Trans. Roy. Soc, (Edinburgh) 52: 399–433. doi:10.1017/s0080456800012163.
7. Steel, R. G. D.; Torrie, J. H. (1980). Principles and procedures of statistics. (2 ed.). New York: McGraw-Hill. ISBN 0 07 060926 8.
8. Other symbols are sometimes used, but these are common.
9. The allele effect is the average phenotypic deviation of the homozygote from the mid-point of the two contrasting homozygote phenotypes at one locus, when observed over the infinity of all background genotypes and environments. In practice, estimates from large unbiased samples substitute for the parameter.
10. The dominance effect is the average phenotypic deviation of the heterozygote from the mid-point of the two homozygotes at one locus, when observed over the infinity of all background genotypes and environments. In practice, estimates from large unbiased samples substitute for the parameter.
11. Crow, J. F.; Kimura, M. (1970). An introduction to population genetics theory. New York: Harper & Row.
12. Falconer, D. S.; Mackay, Trudy F. C. (1996). Introduction to quantitative genetics (Fourth ed.). Harlow: Longman. ISBN 978-0582-24302-6. Lay summaryGenetics (journal) (24 August 2014).
13. Mendel commented on this particular tendency for F1 > P1, ie evidence of hybrid vigour in stem length. However, the difference may not be sufficient to be judged significant. (The relationship between the range and the standard deviation is known [Steel and Torrie (1980): 576], permitting an approximate significance test to be made for this present difference.)
14. Richards, A. J. (1986). Plant breeding systems. Boston: George Allen & Unwin. ISBN 0 04 581020 6.
15. Jane Goodall Institute. "Social structure of chimpanzees.". Chimp Central. Retrieved 20 August 2014.
16. Wikipedia. "Animal mating systems.". English Wikipedia. Retrieved 21 August 2014.
17. Gordon, Ian L. (2000). "Quantitative genetics of allogamous F2: an origin of randomly fertiliized populations.". Heredity 85: 43–52. PMID 10971690. doi:10.1046/j.1365-2540.2000.00716.x.
18. An F2 derived by self fertilizing F1 individuals (an autogamous F2), however, is not an origin of a randomly fertilized population structure. See Gordon (2001).
19. Castle, W. E. (1903). "The law of heredity of Galton and Mendel and some laws governing race improvement by selection.". Proc. Amer. Acad, Sci. 39: 233–242. doi:10.2307/20021870.
20. Hardy, G. H. (1908). "Mendelian proportions in a mixed population.". Science 28 (706): 49–50. PMID 17779291. doi:10.1126/science.28.706.49.
21. Weinberg, W. (1908). "Über den Nachweis der Verebung beim Menschen.". Jahresh. Verein f. vaterl. Naturk, Württem. 64: 368–382.
22. Usually in science ethics, a discovery is named after the earliest person to propose it. Castle, however, seems to have been overlooked: and later when re-found, the title "Hardy Weinberg" was so ubiquitous it seemed too late to update it. Perhaps the "Castle Hardy Weinberg" equlilbrium would be a good compromise?
23. Gordon, Ian L. (1999). "Quantitative genetics of intraspecies hybrids.". Heredity 83: 757–764. doi:10.1046/j.1365-2540.1999.00634.x.
24. Gordon, Ian L. (2001). "Quantitative genetics of autogamous F2.". Hereditas 134 (3): 255–262. PMID 11833289. doi:10.1111/j.1601-5223.2001.00255.x.
25. Wright, S. (1917). "The average correlation within subgroups of a population.". J. Wash. Acad. Sci. 7: 532–535.
26. Wright, S. (1921). "Systems of mating. I. The biometric relations between parent and offspring.". Genetics 6: 111–123.
27. Sinnott, Edmund W.; Dunn, L. C.; Dobzhansky, Theodosius (1958). Principles of genetics. New York: McGraw-Hill.
28. Fisher, R. A. (1999). The genetical theory of natural selection. ("variorum" ed.). Oxford: Oxford University Press. ISBN 0 19 850440 3.
29. Cochran, William G. (1977). Sampling techniques. (Third ed.). New York: John Wiley & Sons.
30. This is outlined subsequently in the Genotypic Variances section.
31. Both are used commonly.
32. See the earlier citations.
33. Allard, R. W. (1960). Principles of plant breeding. New York: John Wiley & Sons.
34. Gordon, I.L. (2003). "Refinements to the partitioning of the inbred genotypic variance". Heredity 91 (1): 85–89. PMID 12815457. doi:10.1038/sj.hdy.6800284.
35. Wright, Sewall (1951). "The genetical structure of populations.". Annals of Eugenics 15: 323–354.
36. Mather, Kenneth; Jinks, John L. (1971). Biometrical genetics (2 ed.). London: Chapman & Hall. ISBN 0 412 10220 X.
37. These have been translated from Mather's symbols into Fisherian ones to facilitate the comparison.
38. Covariance is the co-variability between two sets of data - in this case the a and the d. Similarly to the variance, it is based on a sum of cross-products (SCP) instead of a SS. From this, it is clear therefore that the variance is but a special form of the covariance!
39. Hayman, B. I. (1960). "The theory and analysis of the diallel cross. III.". Genetics 45: 155–172.
40. It has been observed that when p = q, or when d = 0, α [= a+(q-p)d] "reduces" to a. In such circumstances, σ2A = σ2a - but only numerically. They still have not become the one and the same identity.
41. Snedecor, George W.; Cochran, William G. (1967). Statistical methods. (Sixth ed.). Ames: Iowa State University Press. ISBN 0 8138 1560 6.
42. Kendall, M. G.; Stuart, A. (1958). The advanced theory of statistics. Volume 1. (2nd ed.). London: Charles Griffin.
43. It is common practice not to have a subscript on the experimental "error" variance.
44. This type of variance-ratio is an example of a coefficient of determination. It is used particularly in regression analysis. A standardized version of regression analysis is path analysis. Standardizing here means that the data were first divided by their own experimental standard errors, in order to unify the scales for all attributes.
45. Dohm, M. R. (2002). "Repeatability estimates do not always set an upper limit to heritibility.". Functional ecology 16: 273–280.
46. Li, Ching Chun (1977). Path analysis - a Primer (Second printing with Corrections ed.). Pacific Grove: Boxwood Press. ISBN 0 910286 40 X.
47. Draper, Norman R.; Smith, Harry (1981). Applied regression analysis. (Second ed.). New York: John Wiley & Sons. ISBN 0 471 02995 5.
48. Balaam, L. N. (1972). Fundamentals of biometry. London: George Allen & Unwin. ISBN 0 04 519008 9.
49. In the past, both forms of parent-offspring covariance have been applied to this task of estimating h2, but, as noted in the sub-section above, only one of them (cov(MPO)) is actually appropriate. The cov(PO) is useful, however, for estimating H2 as seen in the main text following.
50. Note that texts which ignore the dominance component of cov(HS) erroneously suggest that rHS "approximates" ( ¼ h2 ).
51. Theoretically, the tail is infinite, but in practice there is a quasi-end.
52. Becker, Walter A. (1967). Manual of procedures in quantitative genetics. (Second ed.). Pullman: Washington State University.
53. There is a small "wobble" arising from the fact that b2 alters one generation behind a2 - examine their inbreeding equations.