Posts by Category

statistics

Spatial Analysis with AR1 × AR1 Model: Theory & Complete R Analysis

19 minute read

Updated:

Field experiments are routinely affected by spatial heterogeneity — systematic variation in soil fertility, moisture, drainage, pH, and microclimate that creates patches of high and low performance across the trial. When this variation is ignored, it inflates the residual variance, reduces heritability estimates, biases treatment comparisons, and misranks genotypes. The AR1 × AR1 model (first-order autoregressive process in both row and column directions) is the gold standard for capturing and removing this spatial structure from field trial data.

Partially Replicated (p-rep) Design: Theory & Complete R Analysis

16 minute read

Updated:

The Partially Replicated (p-rep) design — formally developed by Cullis, Smith & Coombes (2006) — is the modern standard for Stage 1 multi-environment plant breeding trials. It overcomes the key limitation of the augmented design (zero replication of test entries) by replicating a controlled fraction (typically 20–30 %) of test entries twice, while the remainder appear only once. This provides direct within-trial error estimation for all genotypes and enables powerful spatial modelling of field heterogeneity.

Augmented Design: Theory & Complete R Analysis

14 minute read

Updated:

The Augmented Design — proposed by Federer (1956) — is a field experimental design specifically developed for early-generation plant breeding trials, where a large number of new (unreplicated) test entries are evaluated alongside a small set of replicated check (standard) varieties. It allows breeders to screen hundreds of genotypes within a single trial without the cost of fully replicating every entry, while still enabling valid statistical inference through the checks.

Alpha (α) Lattice Design: Theory, Layout & Complete R Analysis

12 minute read

Updated:

The Alpha (α) Lattice Design — introduced by Patterson & Williams (1976) — is an incomplete block design built for large-scale experiments where the number of treatments exceeds the practical block size. It is the standard design for plant breeding trials evaluating hundreds of genotypes, offering superior error control over RCBD while remaining flexible in treatment and block size combinations.

Latin Square Design (LSD): Theory & Complete R Analysis

15 minute read

Updated:

The Latin Square Design (LSD) extends blocking to two simultaneous directions of environmental variation. By controlling both a row gradient and a column gradient, it achieves greater error reduction than RCBD while using the same number of experimental units. It is the design of choice when two orthogonal sources of heterogeneity are known in advance — such as row (fertility) and column (irrigation) gradients in a field, or row (day) and column (technician) effects in a laboratory.

Randomized Complete Block Design (RCBD): Theory & Complete R Analysis

13 minute read

Updated:

The Randomized Complete Block Design (RCBD) is the most widely used experimental design in agricultural, biological, and environmental research. It extends the CRD by introducing blocks — groups of homogeneous experimental units — to account for a single known source of environmental variation (soil fertility gradient, slope, irrigation, temperature, etc.). Every treatment appears exactly once in every block, making blocks complete.

Completely Randomized Design (CRD): Theory & Complete R Analysis

10 minute read

Updated:

The Completely Randomized Design (CRD) is the simplest experimental design. Treatments are assigned to experimental units purely at random, with no restrictions. It is the starting point for understanding all other designs (RCBD, Latin Square, Alpha-lattice) and remains widely used in controlled laboratory and greenhouse experiments.

F-Test: Theory, Variants & Complete R Analysis

10 minute read

Updated:

The F-test is a family of statistical tests built on the F-distribution — the ratio of two independent chi-squared variables divided by their degrees of freedom. It answers three fundamental questions in applied statistics:

  1. Are two population variances equal? (Variance ratio test)
  2. Do several group means differ? (One-way ANOVA)
  3. Does a regression model explain significant variation? (Overall F in regression)

Z-Test and t-Test: Theory, Hypotheses & Complete R Analysis

9 minute read

Updated:

Hypothesis testing is the formal procedure for deciding whether sample data provide sufficient evidence to reject a claim about a population parameter. The Z-test and t-test are the two workhorses for testing means. This post covers the theory, assumptions, null (H0) / alternative hypotheses (H1), test statistics, and full R walkthroughs with real-style datasets.

Back to top ↑

R

Spatial Analysis with AR1 × AR1 Model: Theory & Complete R Analysis

19 minute read

Updated:

Field experiments are routinely affected by spatial heterogeneity — systematic variation in soil fertility, moisture, drainage, pH, and microclimate that creates patches of high and low performance across the trial. When this variation is ignored, it inflates the residual variance, reduces heritability estimates, biases treatment comparisons, and misranks genotypes. The AR1 × AR1 model (first-order autoregressive process in both row and column directions) is the gold standard for capturing and removing this spatial structure from field trial data.

Partially Replicated (p-rep) Design: Theory & Complete R Analysis

16 minute read

Updated:

The Partially Replicated (p-rep) design — formally developed by Cullis, Smith & Coombes (2006) — is the modern standard for Stage 1 multi-environment plant breeding trials. It overcomes the key limitation of the augmented design (zero replication of test entries) by replicating a controlled fraction (typically 20–30 %) of test entries twice, while the remainder appear only once. This provides direct within-trial error estimation for all genotypes and enables powerful spatial modelling of field heterogeneity.

Augmented Design: Theory & Complete R Analysis

14 minute read

Updated:

The Augmented Design — proposed by Federer (1956) — is a field experimental design specifically developed for early-generation plant breeding trials, where a large number of new (unreplicated) test entries are evaluated alongside a small set of replicated check (standard) varieties. It allows breeders to screen hundreds of genotypes within a single trial without the cost of fully replicating every entry, while still enabling valid statistical inference through the checks.

Alpha (α) Lattice Design: Theory, Layout & Complete R Analysis

12 minute read

Updated:

The Alpha (α) Lattice Design — introduced by Patterson & Williams (1976) — is an incomplete block design built for large-scale experiments where the number of treatments exceeds the practical block size. It is the standard design for plant breeding trials evaluating hundreds of genotypes, offering superior error control over RCBD while remaining flexible in treatment and block size combinations.

Latin Square Design (LSD): Theory & Complete R Analysis

15 minute read

Updated:

The Latin Square Design (LSD) extends blocking to two simultaneous directions of environmental variation. By controlling both a row gradient and a column gradient, it achieves greater error reduction than RCBD while using the same number of experimental units. It is the design of choice when two orthogonal sources of heterogeneity are known in advance — such as row (fertility) and column (irrigation) gradients in a field, or row (day) and column (technician) effects in a laboratory.

Randomized Complete Block Design (RCBD): Theory & Complete R Analysis

13 minute read

Updated:

The Randomized Complete Block Design (RCBD) is the most widely used experimental design in agricultural, biological, and environmental research. It extends the CRD by introducing blocks — groups of homogeneous experimental units — to account for a single known source of environmental variation (soil fertility gradient, slope, irrigation, temperature, etc.). Every treatment appears exactly once in every block, making blocks complete.

Completely Randomized Design (CRD): Theory & Complete R Analysis

10 minute read

Updated:

The Completely Randomized Design (CRD) is the simplest experimental design. Treatments are assigned to experimental units purely at random, with no restrictions. It is the starting point for understanding all other designs (RCBD, Latin Square, Alpha-lattice) and remains widely used in controlled laboratory and greenhouse experiments.

F-Test: Theory, Variants & Complete R Analysis

10 minute read

Updated:

The F-test is a family of statistical tests built on the F-distribution — the ratio of two independent chi-squared variables divided by their degrees of freedom. It answers three fundamental questions in applied statistics:

  1. Are two population variances equal? (Variance ratio test)
  2. Do several group means differ? (One-way ANOVA)
  3. Does a regression model explain significant variation? (Overall F in regression)

Z-Test and t-Test: Theory, Hypotheses & Complete R Analysis

9 minute read

Updated:

Hypothesis testing is the formal procedure for deciding whether sample data provide sufficient evidence to reject a claim about a population parameter. The Z-test and t-test are the two workhorses for testing means. This post covers the theory, assumptions, null (H0) / alternative hypotheses (H1), test statistics, and full R walkthroughs with real-style datasets.

Back to top ↑

field-experiments

Spatial Analysis with AR1 × AR1 Model: Theory & Complete R Analysis

19 minute read

Updated:

Field experiments are routinely affected by spatial heterogeneity — systematic variation in soil fertility, moisture, drainage, pH, and microclimate that creates patches of high and low performance across the trial. When this variation is ignored, it inflates the residual variance, reduces heritability estimates, biases treatment comparisons, and misranks genotypes. The AR1 × AR1 model (first-order autoregressive process in both row and column directions) is the gold standard for capturing and removing this spatial structure from field trial data.

Partially Replicated (p-rep) Design: Theory & Complete R Analysis

16 minute read

Updated:

The Partially Replicated (p-rep) design — formally developed by Cullis, Smith & Coombes (2006) — is the modern standard for Stage 1 multi-environment plant breeding trials. It overcomes the key limitation of the augmented design (zero replication of test entries) by replicating a controlled fraction (typically 20–30 %) of test entries twice, while the remainder appear only once. This provides direct within-trial error estimation for all genotypes and enables powerful spatial modelling of field heterogeneity.

Augmented Design: Theory & Complete R Analysis

14 minute read

Updated:

The Augmented Design — proposed by Federer (1956) — is a field experimental design specifically developed for early-generation plant breeding trials, where a large number of new (unreplicated) test entries are evaluated alongside a small set of replicated check (standard) varieties. It allows breeders to screen hundreds of genotypes within a single trial without the cost of fully replicating every entry, while still enabling valid statistical inference through the checks.

Alpha (α) Lattice Design: Theory, Layout & Complete R Analysis

12 minute read

Updated:

The Alpha (α) Lattice Design — introduced by Patterson & Williams (1976) — is an incomplete block design built for large-scale experiments where the number of treatments exceeds the practical block size. It is the standard design for plant breeding trials evaluating hundreds of genotypes, offering superior error control over RCBD while remaining flexible in treatment and block size combinations.

Latin Square Design (LSD): Theory & Complete R Analysis

15 minute read

Updated:

The Latin Square Design (LSD) extends blocking to two simultaneous directions of environmental variation. By controlling both a row gradient and a column gradient, it achieves greater error reduction than RCBD while using the same number of experimental units. It is the design of choice when two orthogonal sources of heterogeneity are known in advance — such as row (fertility) and column (irrigation) gradients in a field, or row (day) and column (technician) effects in a laboratory.

Randomized Complete Block Design (RCBD): Theory & Complete R Analysis

13 minute read

Updated:

The Randomized Complete Block Design (RCBD) is the most widely used experimental design in agricultural, biological, and environmental research. It extends the CRD by introducing blocks — groups of homogeneous experimental units — to account for a single known source of environmental variation (soil fertility gradient, slope, irrigation, temperature, etc.). Every treatment appears exactly once in every block, making blocks complete.

Completely Randomized Design (CRD): Theory & Complete R Analysis

10 minute read

Updated:

The Completely Randomized Design (CRD) is the simplest experimental design. Treatments are assigned to experimental units purely at random, with no restrictions. It is the starting point for understanding all other designs (RCBD, Latin Square, Alpha-lattice) and remains widely used in controlled laboratory and greenhouse experiments.

Back to top ↑

website

Adding Content to an Academic Website

9 minute read

Updated:

One thing I haven’t covered in my previous posts on creating and customizing an academic website is how to actually add content to your site. You know, the stuff that’s the reason why people go to your website in the first place? If you’ve followed those guides, your website should be professional looking and already feeling a little bit different from the stock template. However, adding new pages or tweaking the existing pages can be a little intimidating, and I realized I should probably walk through how to do so. Luckily Jekyll’s use of Markdown makes it really easy to add new content!

Customizing an Academic Website

8 minute read

Updated:

This is a followup to my previous post on creating an academic website. If you’ve followed that guide, you should have a website that’s professional-looking and informative, but it’s probably lacking something to really make it feel like your own. There are an infinite number of ways you could customize the academicpages template (many of them far, far beyond my abilities) but I’m going to walk you through the process I used to start tweaking my website. The goal here isn’t to tell you how you should personalize your website, but to give you the tools to learn how to implement whatever changes you want to make.

Building an Academic Website

23 minute read

Updated:

If you’re an academic, you need a website. Obviously I agree with this since you’re reading this on my website, but if you don’t have one, you should get one. Most universities these days provide a free option, usually powered by WordPress (both WashU and UNC use WordPress for their respective offerings). While these sites are quick to set up and come with the prestige of a .edu URL, they have several drawbacks that have been extensively written on.

Back to top ↑

PDF

Combining PDF Documents the Smarter Way

4 minute read

Updated:

My previous post on combining multiple PDF files had an important caveat that things would end up in the wrong order if you had files with leading ID numbers that started at 1 and ended at 12, you’d end up with PDFs combined in the order 1, 10, 11, 12, 2, 3, …, 9.

Combining PDF Documents

2 minute read

Updated:

How many times have you found that your institution has access to a digital version of a book you need only to discover that it comes in 15 different PDF files?

Back to top ↑

software

A tutorial on Genome-Wide Association Studies (GWAS) in Tassel (GUI)

3 minute read

Updated:

Genome-wide association studies (GWAS) increase their popularity among medical, biological, and social sciences to identify the association between single nucleotide polymorphisms and phenotypic traits. This tutorial aims to provide a guidelines for conducing genome wide analysis in Tassel.

A tutorial on investigate genetic admixture using STRUCTURE software

3 minute read

Updated:

Structure Software is a freely available software package that one may use for rigorous investigation of admixed individuals; identification of point of hybridization and migrants; and estimate over all structure of a population using a commonly used genetic markers such as SNPs and SSRs. This software was developed by Pritchard Lab at Stanford University and can downloaded at this link.

Back to top ↑

CI

Back to top ↑

mermaid

Back to top ↑

scientific

Back to top ↑

writing

Back to top ↑

QTL

Plot Genetic Linkage Maps using MapChart software

5 minute read

Updated:

MapChart is a free software to plot publishing quality genetic linkage maps as well as QTLs. This software was developed at Wageningen University by Roeland E. Voorrips and can be downloaded at this link .

Back to top ↑

maps,

Plot Genetic Linkage Maps using MapChart software

5 minute read

Updated:

MapChart is a free software to plot publishing quality genetic linkage maps as well as QTLs. This software was developed at Wageningen University by Roeland E. Voorrips and can be downloaded at this link .

Back to top ↑

molecular

Plot Genetic Linkage Maps using MapChart software

5 minute read

Updated:

MapChart is a free software to plot publishing quality genetic linkage maps as well as QTLs. This software was developed at Wageningen University by Roeland E. Voorrips and can be downloaded at this link .

Back to top ↑

Analysis

Plot Genetic Linkage Maps using MapChart software

5 minute read

Updated:

MapChart is a free software to plot publishing quality genetic linkage maps as well as QTLs. This software was developed at Wageningen University by Roeland E. Voorrips and can be downloaded at this link .

Back to top ↑

Structure

A tutorial on investigate genetic admixture using STRUCTURE software

3 minute read

Updated:

Structure Software is a freely available software package that one may use for rigorous investigation of admixed individuals; identification of point of hybridization and migrants; and estimate over all structure of a population using a commonly used genetic markers such as SNPs and SSRs. This software was developed by Pritchard Lab at Stanford University and can downloaded at this link.

Back to top ↑

TASSEL

A tutorial on Genome-Wide Association Studies (GWAS) in Tassel (GUI)

3 minute read

Updated:

Genome-wide association studies (GWAS) increase their popularity among medical, biological, and social sciences to identify the association between single nucleotide polymorphisms and phenotypic traits. This tutorial aims to provide a guidelines for conducing genome wide analysis in Tassel.

Back to top ↑

r

Honeycomb Design Analysis in R

2 minute read

Updated:

The Honeycomb (HC) design, developed by Fasoulas (1988) and later extended by Kyriakou and Fasoulas, is a field layout method used in plant breeding to improve the efficiency of mass selection under field variability. In this design, plants are arranged in a triangular (hexagonal) grid, so that each plant is surrounded by exactly six nearest neighbours at equal distances. This uniform spatial arrangement ensures that every plant experiences a similar level of competition, reducing environmental bias caused by uneven spacing or directional field effects.

Back to top ↑

geometry

Honeycomb Design Analysis in R

2 minute read

Updated:

The Honeycomb (HC) design, developed by Fasoulas (1988) and later extended by Kyriakou and Fasoulas, is a field layout method used in plant breeding to improve the efficiency of mass selection under field variability. In this design, plants are arranged in a triangular (hexagonal) grid, so that each plant is surrounded by exactly six nearest neighbours at equal distances. This uniform spatial arrangement ensures that every plant experiences a similar level of competition, reducing environmental bias caused by uneven spacing or directional field effects.

Back to top ↑

structural-analysis

Honeycomb Design Analysis in R

2 minute read

Updated:

The Honeycomb (HC) design, developed by Fasoulas (1988) and later extended by Kyriakou and Fasoulas, is a field layout method used in plant breeding to improve the efficiency of mass selection under field variability. In this design, plants are arranged in a triangular (hexagonal) grid, so that each plant is surrounded by exactly six nearest neighbours at equal distances. This uniform spatial arrangement ensures that every plant experiences a similar level of competition, reducing environmental bias caused by uneven spacing or directional field effects.

Back to top ↑

data-visualization

Honeycomb Design Analysis in R

2 minute read

Updated:

The Honeycomb (HC) design, developed by Fasoulas (1988) and later extended by Kyriakou and Fasoulas, is a field layout method used in plant breeding to improve the efficiency of mass selection under field variability. In this design, plants are arranged in a triangular (hexagonal) grid, so that each plant is surrounded by exactly six nearest neighbours at equal distances. This uniform spatial arrangement ensures that every plant experiences a similar level of competition, reducing environmental bias caused by uneven spacing or directional field effects.

Back to top ↑