Parameterizing Soybean-BioCro • BioCroValidation

Overview

This article shows how to create an objective function that can be used to parameterize BioCro’s soybean model (Matthews et al. 2021; Lochocki et al. 2022).

Since the original publication of Soybean-BioCro, the BioCro module library has undergone several changes, and the model has been re-parameterized several times. These parameterizations did not use BioCroValidation, since they were performed before BioCroValidation was created.

However, BioCroValidation is able to re-create the objective functions that were used for these parameterizations. Here, we re-create the objective function that was used for the parameterization included in version 3.2.0 of the BioCro R package.

In the commands below, we will use functions from several libraries, so we will load them now:

# Load required libraries
library(BioCroValidation)
library(BioCro)
library(dfoptim)
library(lattice)

Building the Objective Function

In this section, we will use the objective_function function from BioCroValidation package to create an objective function that can be used to parameterize Soybean-BioCro. For more details about this, please see the help page for objective_function by typing ?objective_function from an R terminal.

The Base Model Definition

We first need a base model definition that includes the necessary modules, initial values, parameters, and differential equation solver specifications. For this example, we will simply use Soybean-BioCro as the base model, with just one small change: we will use an Euler solver rather than the default solver, which will help make the optimization run faster. For reasonable sets of parameter values, the Euler solver does not seem to cause any substantial errors when running Soybean-BioCro.

# Specify the base model definition
base_model_definition            <- soybean
base_model_definition$ode_solver <- default_ode_solvers[['homemade_euler']]

The Observed Data

The observed data needed to parameterize Soybean-BioCro is included in the BioCroValidation package as the soyface_biomass data set, which consists of two years (2002 and 2005) of biomass data and associated standard deviations, included in four separate tables. However, each table requires some pre-processing to get it ready.

One issue is that the data set specifies the doy of year (DOY) for each harvest, but we need to specify the time using BioCro’s convention (the number of hours since the start of the year).

Another issue is that the data set includes pod and seed values, but Soybean-BioCro calculates shell and seed masses, where the shell and seed together comprise the pod.

Although the observations do not include root biomass, it is nevertheless important to constrain the predicted root mass to reasonable values. To do this, it is assumed that the maximum root mass is seventeen percent of the maximum aboveground biomass, and that it is achieved at the same time as maximum above-ground biomass, based on observations reported in Ordóñez et al. (2020). In the observed data, the sum of stem and leaf mass is largest at the fifth time point in both years. So, root mass is estimated at this single time point and added to the observed values.

In previous parameterizations, a standard deviation for the root mass was not explicitly estimated; instead, the standard-deviation-based weight factor was implicitly set to 1. Because the 'logarithm' method with \(\epsilon = 10^{-5}\) was used, a weight factor of 1 implies a standard deviation of \(1 / e - 10^{-5} \approx 0.3678694\). See the documentation page (?objective_function) for more information about this weighting method.

Finally, the data set includes some values that are not needed for the parameterization. This includes the leaf litter accumulated between each harvest, as well as the DOY and Rep_Mg_per_ha columns that have been superseded by other columns defined above.

Here we will define a helping function that can accomplish the required modifications described above; note that some operations are different depending on whether the table represents biomass values or standard deviations:

# Define a helping function for processing data tables
process_table <- function(data_table, type) {
  # Define new `time` column
  data_table$time <- (data_table$DOY - 1) * 24.0

  # Define new `Shell_Mg_per_ha` column
  data_table$Shell_Mg_per_ha <- if (type == 'biomass') {
    # The shell is all parts of the pod other than the seed
    data_table$Rep_Mg_per_ha - data_table$Seed_Mg_per_ha
  } else {
    # Add uncertainties in quadrature, a simple approach to error propagation
    sqrt(data_table$Rep_Mg_per_ha^2 + data_table$Seed_Mg_per_ha^2)
  }

  # Define new `Root_Mg_per_ha` column, which has just one non-NA value
  row_to_use <- 5                 # Choose row to use
  data_table$Root_Mg_per_ha <- NA # Initialize all values to NA

  if (type == 'biomass') {
    # Estimate a mass at one time point
    col_to_add <- c(
      'Leaf_Mg_per_ha',
      'Stem_Mg_per_ha',
      'Rep_Mg_per_ha'
    )

    data_table[row_to_use, 'Root_Mg_per_ha'] <-
      0.17 * sum(data_table[row_to_use, col_to_add])
  } else {
    # Estimate standard deviation at one time point
    data_table[row_to_use, 'Root_Mg_per_ha'] <- 1 / exp(1) - 1e-5
  }

  # Remove columns by setting them to NULL
  data_table$DOY              = NULL
  data_table$Rep_Mg_per_ha    = NULL
  data_table$Litter_Mg_per_ha = NULL

  # Return the processed table
  data_table
}

The Data-Driver Pairs

The BioCro R package includes weather data for the years in the soyface_biomass data set. So now we are ready to define the data-driver pairs, which includes the weather, the observed biomass, the standard deviation of the observed biomass, and the weight to assign to each year:

# Define the data-driver pairs
data_driver_pairs <- list(
  ambient_2002 = list(
    data       = process_table(soyface_biomass[['ambient_2002']],     'biomass'),
    data_stdev = process_table(soyface_biomass[['ambient_2002_std']], 'stdev'),
    drivers    = BioCro::soybean_weather[['2002']],
    weight     = 1
  ),
  ambient_2005 = list(
    data       = process_table(soyface_biomass[['ambient_2005']],     'biomass'),
    data_stdev = process_table(soyface_biomass[['ambient_2005_std']], 'stdev'),
    drivers    = BioCro::soybean_weather[['2005']],
    weight     = 1
  )
)

Here we have chosen equal weights for the two years.

The Post-Processing Function

The observed data includes values of the total litter, which is comprised of both leaf and stem litter. However, the model does not calculate this quntity; instead, it returns separate values of leaf and stem litter. To address this issue, we can provide a “post-processing function.” This is an (optional) function that is applied to each simulation result and can be used to add new columns. Here we define such a function, which adds a new column for the total litter:

# Define the post-processing function
post_process_function <- function(sim_res) {
  # Calculate the total litter as the sum of leaf and stem litter
  within(sim_res, {TotalLitter = LeafLitter + StemLitter})
}

The Data Definitions

The data sets above have columns whose names do not match the corresponding model outputs. For example, the Leaf_Mg_per_ha column of the observed data must be compared to the Leaf column of the model output, since both represent the leaf mass per unit ground area. To handle this mismatch, we can provide a set of “data definitions” that specify which columns should be compared:

# Define the data definition list, where the element names are columns in the
# observed data tables, and the element values are the corresponding column
# names in the model outputs
data_definitions <- list(
# Observed               Simulated
  CumLitter_Mg_per_ha = 'TotalLitter',
  Leaf_Mg_per_ha      = 'Leaf',
  Root_Mg_per_ha      = 'Root',
  Seed_Mg_per_ha      = 'Grain',
  Shell_Mg_per_ha     = 'Shell',
  Stem_Mg_per_ha      = 'Stem'
)

The Arguments to Vary

Here we wish to vary several parameters related to carbon partitioning for growth, senescence, maintenance respiration, and growth respiration:

For each growing tissue, there are two parameters (\(\alpha\) and \(\beta\)) that influence the parbon partitioning coefficients. Here we will vary these for the leaf, stem, and shell (6 parameters in total).
For each senescing tissue, there are three parameters (\(\alpha_{sen}\), \(\beta_{sen}\), and rate) that influence when senescence begins and the overall rate of scenescence. Here we will vary these for the leaf and stem (6 parameters in total).
For each growing tissue, there is one parameter (grc) that influences the rate of carbon use for growth respiration. Here we will vary these for the stem and root (2 parameters in total).
For each tissue, there is one parameter (mrc) that influences the rate of carbon use for maintenance respiration. Here we will vary these for the leaf, stem, and root (3 parameters in total).

Together, this is 17 arguments to vary. Typically, an optimization problem requires more time for each free parameter involved, so it is helpful to vary the smallest possible set. One way to reduce the number of free parameters is to treat some as being “dependent.” In other words, to calculate the values of some parameters from the values of others, so that only some of them are truly free or “independent.” Here we will do this by fixing the value of mrc_stem to the value of mrc_leaf. Thus, we can think of this is a single maintenance respiration coefficient for the entire shoot; this reduces the number of independent parameters by one (to 16).

The independent arguments must be specified as a list of named numeric elements, where the name is the argument name and the value is an initial guess for that argument. Here we will use the default Soybean-BioCro values as our initial guesses:

# Define a list of independent arguments and their initial values
independent_arg_names <- c(
  # Partitioning for leaf, stem, and shell
  'alphaLeaf',
  'betaLeaf',
  'alphaStem',
  'betaStem',
  'alphaShell',
  'betaShell',

  # Senescence for leaf and stem
  'alphaSeneLeaf',
  'betaSeneLeaf',
  'rateSeneLeaf',
  'alphaSeneStem',
  'betaSeneStem',
  'rateSeneStem',

  # Growth respiration for stem and root
  'grc_stem',
  'grc_root',

  # Maintenance respiration for leaf and root
  'mrc_leaf',
  'mrc_root'
)

independent_args <- soybean$parameters[independent_arg_names]

The dependent arguments must be specified as a function that takes a list of independent arguments as its input, and returns a list of dependent arguments as its output:

# Define a function that sets `mrc_stem` to the value of `mrc_leaf`
dependent_arg_function <- function(ind_args) {
  list(mrc_stem = ind_args[['mrc_leaf']])
}

The Quantity Weights

When determining the error metric value, we wish to assign different weights to each type of observed value. This can be handled via the quantity_weights, which must be a list of named numeric elements, where the name of each element is an output from the simulation, and its value is the weight.

# Specify the quantity weights; there is no systematic way to determine these,
# but the following weights have worked well in the past for Soybean-BioCro
quantity_weights <- list(
  Grain       = 1.0,
  Leaf        = 1.0,
  Root        = 0.1,
  Shell       = 0.5,
  Stem        = 1.0,
  TotalLitter = 0.1
)

The Extra Penalty Function

Sometimes an optimizer may choose parameter values that produce close agreement with the observed data but are nevertheless unreasonable from a biological perspective.

To prevent these unreasonable parameters from being chosen, “extra penalties” can be added to the error metric. These penalties can be specified using an extra_penalty_function, which must take the result from a BioCro simulation as its input and return a numeric error penalty value, which generally should be zero (when no issues are found) or a large positive number (if an issue has been found).

For Soybean-BioCro parameterization, three common issues are that:

Carbon is never partitioned to one or more key tissues.
Carbon partitioning to the stem and leaf starts at different times.
Carbon partitioning to the leaves begins too early or too late.

The function below will return a large value when any of these situations occurs, and will otherwise return a value of zero.

# Define an extra penalty function
extra_penalty_function <- function(sim_res) {
  # Set the penalty value
  PENALTY <- 9999

  # Get the first times when each partitioning coefficient becomes non-zero
  k_thresh <- 0.01 # Threshold k value to decide when growth has started
  hpd      <- 24.0 # Hours per day

  time <- sim_res[['time']]

  time_grain <- time[sim_res[['kGrain']] > k_thresh][1]
  time_leaf  <- time[sim_res[['kLeaf']]  > k_thresh][1]
  time_shell <- time[sim_res[['kShell']] > k_thresh][1]
  time_stem  <- time[sim_res[['kStem']]  > k_thresh][1]

  # Return a penalty if necessary
  if (is.na(time_grain) | is.na(time_leaf) | is.na(time_shell) | is.na(time_stem)) {
    # One or more tissues is not growing
    return(PENALTY)
  } else if (abs(time_leaf - time_stem) > 5 * hpd) {
    # The starts of leaf and stem growth are more than 5 days apart
    return(PENALTY)
  } else if (time_leaf - time[1] > 20 * hpd | time_leaf - time[1] < 10 * hpd) {
    # The start of leaf growth is too late (more than 20 days after sowing) or
    # too early (fewer than 10 days after sowing)
    return(PENALTY)
  } else {
    # No problems were detected
    return(0.0)
  }
}

The Objective Function

Now we are just about ready to build the objective function. There are a few more details to discuss:

Soybean-BioCro has always used the 'mean_max' method for determining normalization factors; see Equations 14-16 of Matthews et al. (2021) for more details.
Soybean-BioCro has always used the 'logarithm' method for determining weights from standard deviations with \(\epsilon = 10^{-5}\); see Equation 17 of Matthews et al. (2021) for more details.
Soybean-BioCro has not used any regularization.

With this, it is possible to build the function. Note that some useful information is printed out when the function is created, such as the full list of observed values and their corresponding weights.

# Create the objective function
obj_fun <- objective_function(
  base_model_definition,
  data_driver_pairs,
  independent_args,
  quantity_weights,
  data_definitions       = data_definitions,
  normalization_method   = 'mean_max',
  stdev_weight_method    = 'logarithm',
  stdev_weight_param     = 1e-5,
  regularization_method  = 'none',
  dependent_arg_function = dependent_arg_function,
  post_process_function  = post_process_function,
  extra_penalty_function = extra_penalty_function
)
#> 
#> The independent arguments and their initial values:
#> 
#> List of 16
#>  $ alphaLeaf    : num 23.9
#>  $ betaLeaf     : num -18.1
#>  $ alphaStem    : num 24.8
#>  $ betaStem     : num -18.5
#>  $ alphaShell   : num 13.6
#>  $ betaShell    : num -9.98
#>  $ alphaSeneLeaf: num 48.3
#>  $ betaSeneLeaf : num -29.5
#>  $ rateSeneLeaf : num 0.0122
#>  $ alphaSeneStem: num 4.19
#>  $ betaSeneStem : num -3.31
#>  $ rateSeneStem : num 0.000117
#>  $ grc_stem     : num 0.0283
#>  $ grc_root     : num 0.0027
#>  $ mrc_leaf     : num 0.000488
#>  $ mrc_root     : num 1e-05
#> 
#> The dependent arguments and their initial values:
#> 
#> List of 1
#>  $ mrc_stem: num 0.000488
#> 
#> The full data definitions:
#> 
#> List of 6
#>  $ Leaf_Mg_per_ha     : chr "Leaf"
#>  $ Stem_Mg_per_ha     : chr "Stem"
#>  $ Seed_Mg_per_ha     : chr "Grain"
#>  $ CumLitter_Mg_per_ha: chr "TotalLitter"
#>  $ Shell_Mg_per_ha    : chr "Shell"
#>  $ Root_Mg_per_ha     : chr "Root"
#> 
#> The user-supplied data in long form:
#> 
#> $ambient_2002
#>    time quantity_name quantity_value quantity_stdev time_index expected_npts
#> 1  4272          Leaf   0.1802843394   0.0408155501        649          3288
#> 2  4512          Leaf   0.5544619422   0.1638632739        889          3288
#> 3  4848          Leaf   1.3265529308   0.1337744335       1225          3288
#> 4  5184          Leaf   1.6979440069   0.2283266576       1561          3288
#> 5  5520          Leaf   1.8077427820   0.2024754215       1897          3288
#> 6  5880          Leaf   1.5788136482   0.0754751654       2257          3288
#> 7  6192          Leaf   0.9475377733   0.3445500325       2569          3288
#> 8  6888          Leaf   0.0000000000   0.0000000000       3265          3288
#> 9  4272          Stem   0.0852449694   0.0170797372        649          3288
#> 10 4512          Stem   0.4188538932   0.1384490248        889          3288
#> 11 4848          Stem   1.7110673664   0.1837107594       1225          3288
#> 12 5184          Stem   2.8928258965   0.4487440652       1561          3288
#> 13 5520          Stem   3.6859142604   0.4534474707       1897          3288
#> 14 5880          Stem   3.7452607171   0.2753213561       2257          3288
#> 15 6192          Stem   3.6184015745   0.1510453777       2569          3288
#> 16 6888          Stem   2.3057012247   0.1483892609       3265          3288
#> 17 4272         Grain   0.0000000000   0.0000000000        649          3288
#> 18 4512         Grain   0.0000000000   0.0000000000        889          3288
#> 19 4848         Grain   0.0000000000   0.0000000000       1225          3288
#> 20 5184         Grain   0.0000000000   0.0000000000       1561          3288
#> 21 5520         Grain   0.0000000000   0.0000000000       1897          3288
#> 22 5880         Grain   2.4803149604   0.3660583625       2257          3288
#> 23 6192         Grain   4.8468941378   0.5144215602       2569          3288
#> 24 6888         Grain   5.4133858263   0.1152608235       3265          3288
#> 25 4272   TotalLitter   0.0000000000   0.0000000000        649          3288
#> 26 4512   TotalLitter   0.0000000000   0.0000000000        889          3288
#> 27 4848   TotalLitter   0.0000000000   0.0000000000       1225          3288
#> 28 5184   TotalLitter   0.0000000000   0.0000000000       1561          3288
#> 29 5520   TotalLitter   0.3818897637   0.0456182275       1897          3288
#> 30 5880   TotalLitter   0.6633127734   0.0676627660       2257          3288
#> 31 6192   TotalLitter   0.9786377952   0.0320523860       2569          3288
#> 32 6888   TotalLitter   2.5831766621   0.2678295161       3265          3288
#> 33 4272         Shell   0.0000000000   0.0000000000        649          3288
#> 34 4512         Shell   0.0000000000   0.0000000000        889          3288
#> 35 4848         Shell   0.0003171479   0.0005493162       1225          3288
#> 36 5184         Shell   0.0793963255   0.0309899985       1561          3288
#> 37 5520         Shell   1.5545713035   0.2184025435       1897          3288
#> 38 5880         Shell   1.4956986001   0.6804094230       2257          3288
#> 39 6192         Shell   1.6977565178   0.9040650516       2569          3288
#> 40 6888         Shell   1.5955818021   0.1827578015       3265          3288
#> 45 5520          Root   1.1981988188   0.3678694412       1897          3288
#>          norm      w_var
#> 1   52.286943  3.1984472
#> 2   52.286943  1.8086619
#> 3   52.286943  2.0115255
#> 4   52.286943  1.4769342
#> 5   52.286943  1.5970874
#> 6   52.286943  2.5838191
#> 7   52.286943  1.0654869
#> 8   52.286943 11.5129255
#> 9  224.431645  4.0692772
#> 10 224.431645  1.9771808
#> 11 224.431645  1.6943383
#> 12 224.431645  0.8012803
#> 13 224.431645  0.7908538
#> 14 224.431645  1.2897800
#> 15 224.431645  1.8901088
#> 16 224.431645  1.9078489
#> 17 468.875938 11.5129255
#> 18 468.875938 11.5129255
#> 19 468.875938 11.5129255
#> 20 468.875938 11.5129255
#> 21 468.875938 11.5129255
#> 22 468.875938  1.0049352
#> 23 468.875938  0.6646928
#> 24 468.875938  2.1604709
#> 25 106.764827 11.5129255
#> 26 106.764827 11.5129255
#> 27 106.764827 11.5129255
#> 28 106.764827 11.5129255
#> 29 106.764827  3.0872287
#> 30 106.764827  2.6930715
#> 31 106.764827  3.4400717
#> 32 106.764827  1.3173673
#> 33  46.118035 11.5129255
#> 34  46.118035 11.5129255
#> 35  46.118035  7.4887956
#> 36  46.118035  3.4737681
#> 37  46.118035  1.5213696
#> 38  46.118035  0.3850459
#> 39  46.118035  0.1008429
#> 40  46.118035  1.6995388
#> 45   2.871361  1.0000000
#> 
#> $ambient_2005
#>    time quantity_name quantity_value quantity_stdev time_index expected_npts
#> 1  4104          Leaf     0.22227188     0.03289659        577          2952
#> 2  4440          Leaf     0.84603750     0.14679830        913          2952
#> 3  4776          Leaf     1.18446563     0.33807429       1249          2952
#> 4  5112          Leaf     2.21805938     0.15217591       1585          2952
#> 5  5448          Leaf     2.14744687     0.11907759       1921          2952
#> 6  5784          Leaf     1.51948125     0.51280870       2257          2952
#> 7  6120          Leaf     0.06575625     0.06168624       2593          2952
#> 8  6456          Leaf     0.00000000     0.00000000       2929          2952
#> 9  4104          Stem     0.18880312     0.01431814        577          2952
#> 10 4440          Stem     0.85220625     0.19883006        913          2952
#> 11 4776          Stem     1.61896875     0.60528625       1249          2952
#> 12 5112          Stem     4.04361563     0.55987405       1585          2952
#> 13 5448          Stem     4.47772500     0.30674464       1921          2952
#> 14 5784          Stem     3.89208750     0.37910849       2257          2952
#> 15 6120          Stem     2.89905000     0.22082398       2593          2952
#> 16 6456          Stem     2.17560000     0.24325473       2929          2952
#> 17 4104         Grain     0.00000000     0.00000000        577          2952
#> 18 4440         Grain     0.00000000     0.00000000        913          2952
#> 19 4776         Grain     0.00000000     0.00000000       1249          2952
#> 20 5112         Grain     0.00000000     0.00000000       1585          2952
#> 21 5448         Grain     0.00000000     0.00000000       1921          2952
#> 22 5784         Grain     3.02249063     0.34171478       2257          2952
#> 23 6120         Grain     3.99820312     0.39895675       2593          2952
#> 24 6456         Grain     4.96564688     0.50722409       2929          2952
#> 25 4104   TotalLitter     0.00000000     0.00000000        577          2952
#> 26 4440   TotalLitter     0.00000000     0.00000000        913          2952
#> 27 4776   TotalLitter     0.00000000     0.00000000       1249          2952
#> 28 5112   TotalLitter     0.06654375     0.06370846       1585          2952
#> 29 5448   TotalLitter     0.18230625     0.05624687       1921          2952
#> 30 5784   TotalLitter     0.33593438     0.07334289       2257          2952
#> 31 6120   TotalLitter     0.86697187     0.21417663       2593          2952
#> 32 6456   TotalLitter     1.14843750     0.24626746       2929          2952
#> 33 4104         Shell     0.00000000     0.00000000        577          2952
#> 34 4440         Shell     0.00000000     0.00000000        913          2952
#> 35 4776         Shell     0.00000000     0.00000000       1249          2952
#> 36 5112         Shell     0.29925000     0.16427520       1585          2952
#> 37 5448         Shell     2.30455312     0.43414807       1921          2952
#> 38 5784         Shell     2.51028750     0.68049551       2257          2952
#> 39 6120         Shell     1.37287500     0.65544843       2593          2952
#> 40 6456         Shell     1.40660625     0.81122149       2929          2952
#> 45 5448          Root     1.51805325     0.36786944       1921          2952
#>          norm      w_var
#> 1   78.716598  3.4140824
#> 2   78.716598  1.9186276
#> 3   78.716598  1.0844600
#> 4   78.716598  1.8826524
#> 5   78.716598  2.1278960
#> 6   78.716598  0.6678329
#> 7   78.716598  2.7855322
#> 8   78.716598 11.5129255
#> 9  320.800339  4.2455301
#> 10 320.800339  1.6152545
#> 11 320.800339  0.5020373
#> 12 320.800339  0.5800256
#> 13 320.800339  1.1817071
#> 14 320.800339  0.9699065
#> 15 320.800339  1.5103441
#> 16 320.800339  1.4136050
#> 17 394.522382 11.5129255
#> 18 394.522382 11.5129255
#> 19 394.522382 11.5129255
#> 20 394.522382 11.5129255
#> 21 394.522382 11.5129255
#> 22 394.522382  1.0737496
#> 23 394.522382  0.9188772
#> 24 394.522382  0.6787827
#> 25  21.102539 11.5129255
#> 26  21.102539 11.5129255
#> 27  21.102539 11.5129255
#> 28  21.102539  2.7532810
#> 29  21.102539  2.8778272
#> 30  21.102539  2.6124734
#> 31  21.102539  1.5409076
#> 32  21.102539  1.4012965
#> 33 100.824693 11.5129255
#> 34 100.824693 11.5129255
#> 35 100.824693 11.5129255
#> 36 100.824693  1.8061514
#> 37 100.824693  0.8343466
#> 38 100.824693  0.3849194
#> 39 100.824693  0.4224204
#> 40 100.824693  0.2092018
#> 45   4.608971  1.0000000
#> 
#> The user-supplied quantity weights:
#> 
#> List of 6
#>  $ Grain      : num [1:2] 1 1
#>  $ Leaf       : num [1:2] 1 1
#>  $ Root       : num [1:2] 0.1 0.1
#>  $ Shell      : num [1:2] 0.5 0.5
#>  $ Stem       : num [1:2] 1 1
#>  $ TotalLitter: num [1:2] 0.1 0.1
#> 
#> The user-supplied data-driver pair weights:
#> 
#> List of 2
#>  $ ambient_2002: num 1
#>  $ ambient_2005: num 1
#> 
#> The initial error metric terms:
#> 
#> List of 2
#>  $ terms_from_data_driver_pairs:List of 2
#>   ..$ ambient_2002:List of 2
#>   .. ..$ least_squares_terms:List of 6
#>   .. .. ..$ Grain      : num 0.00371
#>   .. .. ..$ Leaf       : num 0.0214
#>   .. .. ..$ Root       : num 0.00353
#>   .. .. ..$ Shell      : num 0.0115
#>   .. .. ..$ Stem       : num 0.0119
#>   .. .. ..$ TotalLitter: num 0.00264
#>   .. ..$ extra_penalty      : num 0
#>   ..$ ambient_2005:List of 2
#>   .. ..$ least_squares_terms:List of 6
#>   .. .. ..$ Grain      : num 0.00744
#>   .. .. ..$ Leaf       : num 0.00742
#>   .. .. ..$ Root       : num 0.0053
#>   .. .. ..$ Shell      : num 0.0118
#>   .. .. ..$ Stem       : num 0.00345
#>   .. .. ..$ TotalLitter: num 0.000304
#>   .. ..$ extra_penalty      : num 0
#>  $ regularization_penalty      : num 0
#> 
#> The initial error metric value:
#> 
#> [1] 0.09029496

Optimizing the Parameter Values

The objective function is designed to be passed to a minimization algorithm, which will determine the argument values that produce the best agreement between the model predictions and the observations.

Soybean-BioCro has already been parameterized, so there is already good agreement between the model and the data. This can be seen by examining the value of the error metric when using the default Soybean-BioCro values:

# Evaluate the error function with default Soybean-BioCro parameters
default_error <- obj_fun(as.numeric(independent_args))

This evaluates to 0.090295. This is a low value for a Soybean-BioCro parameterization, indicating that good agreement has already been found.

Here, as an example, we will intentionally change each parameter value by a small random amount, and then use an optimizer to improve the parameter values; in an ideal world, the optimizer will eventually pick parameter values close to the original Soybean-BioCro values.

There are many optimizers available in R. Base R includes the optim function, and others are available from the dfoptim and DEoptim packages. Here we will use the nmkb optimizer from the dfoptim library, which requires upper and lower bounds for each parameter and an initial guess.

Choosing an Initial Guess

As mentioned above, we will intentionally choose a “bad” initial guess by tweaking each parameter value by a small random amount. Note that we set a seed to ensure the same result is obtained every time this is performed. Also note that the initial guess must be a numeric vector, where the elements are ordered as they are in independent_args.

# Set a seed
set.seed(1234)

# Make an initial guess by perturbing the default values by a small amount
rel_size <- 0.02

initial_guess <- as.numeric(independent_args) *
  (1.0 + runif(length(independent_args), -rel_size, rel_size))

Even though the changes to parameter values are small, there is still a substantial change in the value of the error metric:

# Evaluate the error function with default Soybean-BioCro parameters
initial_error <- obj_fun(initial_guess)

This evaluates to 0.1847518, which is about 51 percent larger than with the default parameter values.

Choosing Lower and Upper Bounds

There is not always a systematic approach to choosing lower and upper bounds for parameter values, but the following bounds have worked well for Soybean-BioCro in the past:

The \(\alpha\) parameters used in partitioning and senescence calculations are confined to the interval [0, 50].
The \(\beta\) parameters used in partitioning and senescence calculations are confined to the interval [-50, 0].
The senescence rates have a lower bound of zero, but have different upper bounds for each tissue.
The maintenance respiration coefficients are confined to the interval [1e-6, 1e-2].
The growth respiration coefficients must be positive and non-zero, but have different bounds for each tissue.

There are many possible ways to specify the bounds in R, but ultimately they must be expressed as numeric vectors, where the elements are ordered as they are in independent_args. Here we use the bounds_table function from BioCroValidation to create a data frame where the lower and upper bounds are stored as columns. Later, the columns can be passed to the optimizer. The bounds_table function will also check the initial guess to ensure it lies within the bounds; for more information about this function, see its help page by typing ?bounds_table from an R terminal.

# Specify some bounds
aul <- 50   # Upper limit for alpha parameters
bll <- -50  # Lower limit for beta parameters
mll <- 1e-6 # Lower limit for mrc parameters
mul <- 1e-2 # Upper limit for mrc parameters

# Define a table with the bounds in the same order as `independent_args`
bounds <- bounds_table(
  independent_args,
  list(
    alphaLeaf     = c(0,      aul),
    alphaStem     = c(0,      aul),
    alphaShell    = c(0,      aul),
    alphaSeneLeaf = c(0,      aul),
    alphaSeneStem = c(0,      aul),
    betaLeaf      = c(bll,    0),
    betaStem      = c(bll,    0),
    betaShell     = c(bll,    0),
    betaSeneLeaf  = c(bll,    0),
    betaSeneStem  = c(bll,    0),
    rateSeneLeaf  = c(0,      0.0125),
    rateSeneStem  = c(0,      0.005),
    mrc_leaf      = c(mll,    mul),
    mrc_root      = c(mll,    mul),
    grc_stem      = c(8e-4,   0.08),
    grc_root      = c(0.0025, 0.075)
  ),
  initial_guess
)

Running the Optimizer

Now we will use an optimizer to improve on the initial guess. As mentioned above, we will use the nmkb optimizer from the dfoptim package. This is a good choice when a decent starting guess is known. If a broader search is necessary, DEoptim from the DEoptim package may be more appropriate, although it generally needs more time to run.

To make sure this example does not take too much time, we will use a loose tolerance; a more realistic example would probably use 1e-4 or 1e-5.

# Run the optimizer
optim_res <- nmkb(
  initial_guess,
  obj_fun,
  bounds[['lower']],
  bounds[['upper']],
  control = list(
    tol = 1e-2,
    restarts.max = 10
  ),
  debug_mode = FALSE # Passed to obj_fun; set to TRUE to enable debug mode
)

When this document was generated, running the optimizer required the following amount of time:

#>    user  system elapsed 
#>  82.471   0.298  82.778

The total time was about 1.38 minutes. For a real parameterization problem, it can be many times longer, and may even need days to run on a personal laptop computer.

The optimizer also reports how many times the objective function was called, among other details:

str(optim_res)
#> List of 6
#>  $ par        : num [1:16] 23.9 -18.2 24.8 -18.4 13.5 ...
#>  $ value      : num 0.0921
#>  $ feval      : num 207
#>  $ restarts   : num 5
#>  $ convergence: num 0
#>  $ message    : chr "Successful convergence"

The value of feval is 207, so on average, each call of the objective function required approximately 0.4 seconds.

Comparing Parameter Values

Let’s see whether the optimized parameters are closer to the default parameters than the initial guess was.

# Create a table of the various independent argument values
ind_arg_table <- data.frame(
  arg_name      = independent_arg_names,
  defaults      = as.numeric(independent_args),
  initial_guess = initial_guess,
  optimized     = optim_res[['par']],
  stringsAsFactors = FALSE
)

# Add differences
ind_arg_table <- within(ind_arg_table, {
  initial_diff   = abs(initial_guess - defaults)
  optimized_diff = abs(optimized - defaults)
  improved       = optimized_diff < initial_diff
})

# View results
print(ind_arg_table)
#>         arg_name     defaults initial_guess     optimized improved
#> 1      alphaLeaf  23.88950000  2.352036e+01  2.386565e+01     TRUE
#> 2       betaLeaf -18.08510000 -1.817357e+01 -1.817064e+01     TRUE
#> 3      alphaStem  24.82480000  2.493331e+01  2.479432e+01     TRUE
#> 4       betaStem -18.52140000 -1.861281e+01 -1.844565e+01     TRUE
#> 5     alphaShell  13.59550000  1.379177e+01  1.347693e+01     TRUE
#> 6      betaShell  -9.98090000 -1.003692e+01 -1.000498e+01     TRUE
#> 7  alphaSeneLeaf  48.32000000  4.737195e+01  4.986793e+01    FALSE
#> 8   betaSeneLeaf -29.53930000 -2.922329e+01 -3.023728e+01    FALSE
#> 9   rateSeneLeaf   0.01219800  1.227904e-02  1.244641e-02    FALSE
#> 10 alphaSeneStem   4.19370000  4.196091e+00  3.889660e+00    FALSE
#> 11  betaSeneStem  -3.31240000 -3.338050e+00 -3.387799e+00    FALSE
#> 12  rateSeneStem   0.00011700  1.172105e-04  1.274687e-04    FALSE
#> 13      grc_stem   0.02828000  2.803423e-02  2.844730e-02     TRUE
#> 14      grc_root   0.00270000  2.745731e-03  2.763255e-03    FALSE
#> 15      mrc_leaf   0.00048836  4.843030e-04  5.059738e-04    FALSE
#> 16      mrc_root   0.00001000  1.013492e-05  1.072817e-05    FALSE
#>    optimized_diff initial_diff
#> 1    2.385246e-02 3.691373e-01
#> 2    8.554244e-02 8.847188e-02
#> 3    3.048061e-02 1.085089e-01
#> 4    7.574948e-02 9.140640e-02
#> 5    1.185710e-01 1.962730e-01
#> 6    2.408361e-02 5.601704e-02
#> 7    1.547928e+00 9.480466e-01
#> 8    6.979834e-01 3.160108e-01
#> 9    2.484111e-04 8.103559e-05
#> 10   3.040402e-01 2.390600e-03
#> 11   7.539867e-02 2.565007e-02
#> 12   1.046873e-05 2.104822e-07
#> 13   1.673041e-04 2.457718e-04
#> 14   6.325510e-05 4.573082e-05
#> 15   1.761377e-05 4.056985e-06
#> 16   7.281688e-07 1.349183e-07

In this table, when the improved column is TRUE, this means that the optimized parameter value is closer to the default value than the initial guess was; in other words, it means that the optimizer truly improved on the initial guess. In this example, 7 out of 16 parameter estimates were improved (44 percent).

We can also compare the error metric to its original value. As shown above, it is now 0.0920539, which is only 1.9 percent larger than with the default parameter values.

The optimized parameter values could likely be improved by using a more stringent tolerance for the optimizer, but this would require more time to run.

Comparing Model Outputs

Another way to evaluate the results of the optimization is to compare simulations using the default, perturbed, and re-optimized versions of the model.

Following the re-parameterization, we now have new values of the independent arguments, but this is not enough to actually run the new version of the model. Thus, the next step is to form a new model definition by updating the values of the default soybean model. This can be accomplished using the update_model function from BioCroValidation:

# Get model definition lists for the perturbed and re-parameterized versions of
# the soybean model
soybean_perturbed <- update_model(
  BioCro::soybean,
  independent_args,
  initial_guess,
  dependent_arg_function = dependent_arg_function
)

soybean_reparam <- update_model(
  BioCro::soybean,
  independent_args,
  optim_res[['par']],
  dependent_arg_function = dependent_arg_function
)

We can check that the three models have different values of key parameters, such as the “dependent” argument mrc_stem:

print(BioCro::soybean$parameters$mrc_stem)
#> [1] 0.00048836

print(soybean_perturbed$parameters$mrc_stem)
#> [1] 0.000484303

print(soybean_reparam$parameters$mrc_stem)
#> [1] 0.0005059738

Now we can run each version of the model for a single year and visually compare their outputs:

# Define a helper function that runs a single model for the year 2005
run_2005 <- function(model_definition) {
  with(model_definition, {run_biocro(
    initial_values,
    parameters,
    soybean_weather[['2005']],
    direct_modules,
    differential_modules,
    ode_solver
  )})
}

# Run each model and combine the results
full_res <- rbind(
  within(run_2005(BioCro::soybean),   {model = 'Default Soybean-BioCro'}),
  within(run_2005(soybean_perturbed), {model = 'Perturbed Soybean-BioCro'}),
  within(run_2005(soybean_reparam),   {model = 'Re-parameterized Soybean-BioCro'})
)

# Plot the results
xyplot(
  Leaf + Stem + Root + Grain ~ fractional_doy,
  group = model,
  data = full_res,
  type = 'l',
  auto.key = list(space = 'top'),
  xlab = 'Day of year (2005)',
  ylab = 'Biomass (Mg / ha)'
)

Here we can see that while the simulated values for some tissues do not differ much between the models, there are large differences in other tissues; for these cases, the default and re-optimized versions are similar and the perturbed version is much different.

Saving the New Model Definition

A realistic parameterization takes a long time to complete, so it is important to save the results for later use. One approach is to save the model definition list using the save or saveRDS functions from base R. However, these options create binary files that are not human-readable, and they cannot be easily tracked using git. As an alternative, the BioCroValidation includes a function called write_model that forms a string representing an R command that can be called to re-create a model definition. This command string can be written to a text file, making it easy to read and to track with git.

Here we apply write_model to the re-optimized soybean model:

# Convert the re-parameterized soybean model to an R command string
r_cmd_string <- with(soybean_reparam, write_model(
  'soybean_reparam',
  direct_modules,
  differential_modules,
  initial_values,
  parameters,
  ode_solver
))

We can view the resulting R command string:

writeLines(r_cmd_string)
#> soybean_reparam <- list(
#>     direct_modules = list(
#>         "BioCro:format_time",
#>         "BioCro:stomata_water_stress_linear",
#>         "BioCro:sla_linear",
#>         "BioCro:parameter_calculator",
#>         "BioCro:soybean_development_rate_calculator",
#>         "BioCro:leaf_water_stress_exponential",
#>         "BioCro:partitioning_coefficient_logistic",
#>         "BioCro:soil_evaporation",
#>         "BioCro:solar_position_michalsky",
#>         "BioCro:shortwave_atmospheric_scattering",
#>         "BioCro:incident_shortwave_from_ground_par",
#>         "BioCro:height_from_lai",
#>         "BioCro:canopy_gbw_thornley",
#>         "BioCro:stefan_boltzmann_longwave",
#>         "BioCro:ten_layer_canopy_properties",
#>         "BioCro:ten_layer_c3_canopy",
#>         "BioCro:ten_layer_canopy_integrator",
#>         "BioCro:no_leaf_resp_neg_assim_partitioning_growth_calculator",
#>         "BioCro:senescence_coefficient_logistic",
#>         "BioCro:carbon_assimilation_to_biomass"
#>     ),
#>     differential_modules = list(
#>         "BioCro:senescence_logistic",
#>         "BioCro:maintenance_respiration",
#>         "BioCro:partitioning_growth",
#>         "BioCro:two_layer_soil_profile",
#>         "BioCro:development_index",
#>         "BioCro:thermal_time_linear"
#>     ),
#>     ode_solver = list(
#>         type = "boost_rkck54",
#>         output_step_size = 1.000000,
#>         adaptive_rel_error_tol = 1.000000e-04,
#>         adaptive_abs_error_tol = 1.000000e-04,
#>         adaptive_max_steps = 200
#>     ),
#>     initial_values = list(
#>         "Leaf" = 0.06312,
#>         "Stem" = 0.00789,
#>         "Root" = 0.00789,
#>         "Grain" = 1e-05,
#>         "Shell" = 1e-05,
#>         "LeafLitter" = 0,
#>         "RootLitter" = 0,
#>         "StemLitter" = 0,
#>         "soil_water_content" = 0.32,
#>         "cws1" = 0.32,
#>         "cws2" = 0.32,
#>         "DVI" = -1,
#>         "TTc" = 0,
#>         "Rhizome" = 1e-07,
#>         "RhizomeLitter" = 0
#>     ),
#>     parameters = list(
#>         "soil_air_entry" = -2.6,
#>         "soil_b_coefficient" = 5.2,
#>         "soil_bulk_density" = 1.35,
#>         "soil_clay_content" = 0.34,
#>         "soil_field_capacity" = 0.32,
#>         "soil_sand_content" = 0.32,
#>         "soil_saturated_conductivity" = 6.4e-05,
#>         "soil_saturation_capacity" = 0.52,
#>         "soil_silt_content" = 0.34,
#>         "soil_wilting_point" = 0.2,
#>         "iSp" = 3.5,
#>         "Sp_thermal_time_decay" = 0,
#>         "LeafN" = 2,
#>         "LeafN_0" = 2,
#>         "vmax_n_intercept" = 0,
#>         "vmax1" = 110,
#>         "alphab1" = 0,
#>         "alpha1" = 0,
#>         "maturity_group" = 3,
#>         "Tbase_emr" = 10,
#>         "TTemr_threshold" = 60,
#>         "Rmax_emrV0" = 0.199,
#>         "Tmin_emrV0" = 5,
#>         "Topt_emrV0" = 31.5,
#>         "Tmax_emrV0" = 45,
#>         "Tmin_R0R1" = 5,
#>         "Topt_R0R1" = 31.5,
#>         "Tmax_R0R1" = 45,
#>         "Tmin_R1R7" = 0,
#>         "Topt_R1R7" = 21.5,
#>         "Tmax_R1R7" = 38.7,
#>         "sowing_fractional_doy" = 0,
#>         "alphaLeaf" = 23.8656475358454,
#>         "alphaStem" = 24.7943193862222,
#>         "betaLeaf" = -18.1706424424747,
#>         "betaStem" = -18.4456505196764,
#>         "alphaRoot" = 36.967,
#>         "betaRoot" = -40.1915,
#>         "alphaShell" = 13.4769290411408,
#>         "betaShell" = -10.0049836091908,
#>         "kRhizome_emr" = 0,
#>         "rsec" = 0.2,
#>         "soil_clod_size" = 0.04,
#>         "soil_reflectance" = 0.2,
#>         "soil_transmission" = 0.01,
#>         "specific_heat_of_air" = 1010,
#>         "lat" = 40,
#>         "longitude" = -88,
#>         "atmospheric_pressure" = 101325,
#>         "atmospheric_transmittance" = 0.6,
#>         "atmospheric_scattering" = 0.3,
#>         "par_energy_fraction" = 0.5,
#>         "par_energy_content" = 0.219,
#>         "heightf" = 6,
#>         "min_gbw_canopy" = 0.005,
#>         "dry_biomass_per_carbon" = 30.026,
#>         "emissivity_sky" = 1,
#>         "chil" = 0.81,
#>         "k_diffuse" = 0.7,
#>         "kpLN" = 0,
#>         "leaf_reflectance_nir" = 0.42,
#>         "leaf_reflectance_par" = 0.1,
#>         "leaf_transmittance_nir" = 0.42,
#>         "leaf_transmittance_par" = 0.05,
#>         "lnfun" = 0,
#>         "Gstar_c" = 19.02,
#>         "Gstar_Ea" = 37830,
#>         "Jmax_c" = 17.57,
#>         "Jmax_Ea" = 43540,
#>         "Kc_c" = 38.05,
#>         "Kc_Ea" = 79430,
#>         "Ko_c" = 20.3,
#>         "Ko_Ea" = 36380,
#>         "phi_PSII_0" = 0.352,
#>         "phi_PSII_1" = 0.022,
#>         "phi_PSII_2" = -0.00034,
#>         "Rd_c" = 18.72,
#>         "Rd_Ea" = 46390,
#>         "theta_0" = 0.76,
#>         "theta_1" = 0.018,
#>         "theta_2" = -0.00037,
#>         "Tp_c" = 19.77399,
#>         "Tp_Ha" = 62990,
#>         "Tp_Hd" = 182140,
#>         "Tp_S" = 588,
#>         "Vcmax_c" = 26.35,
#>         "Vcmax_Ea" = 65330,
#>         "jmax" = 195,
#>         "jmax_mature" = 195,
#>         "sf_jmax" = 0.2,
#>         "electrons_per_carboxylation" = 4.5,
#>         "electrons_per_oxygenation" = 5.25,
#>         "tpu_rate_max" = 13,
#>         "Rd" = 1.28,
#>         "Catm" = 372.59,
#>         "O2" = 210,
#>         "b0" = 0.008,
#>         "b1" = 10.6,
#>         "Gs_min" = 0.001,
#>         "windspeed_height" = 5,
#>         "beta_PSII" = 0.5,
#>         "leafwidth" = 0.1,
#>         "growth_respiration_fraction" = 0,
#>         "grc_stem" = 0.0284473040567858,
#>         "grc_root" = 0.00276325510436536,
#>         "mrc_leaf" = 0.000505973771827042,
#>         "mrc_stem" = 0.000505973771827042,
#>         "mrc_root" = 1.07281688473608e-05,
#>         "mrc_grain" = 1e-05,
#>         "retrans" = 0.9,
#>         "retrans_rhizome" = 1,
#>         "rateSeneLeaf" = 0.0124464110871913,
#>         "rateSeneStem" = 0.000127468732965756,
#>         "rateSeneRoot" = 0,
#>         "rateSeneRhizome" = 0,
#>         "alphaSeneLeaf" = 49.8679276606362,
#>         "alphaSeneStem" = 3.88965982514793,
#>         "alphaSeneRoot" = 10,
#>         "alphaSeneRhizome" = 10,
#>         "betaSeneLeaf" = -30.2372834250869,
#>         "betaSeneStem" = -3.3877986715034,
#>         "betaSeneRoot" = -10,
#>         "betaSeneRhizome" = -10,
#>         "remobilization_fraction" = 0.6,
#>         "soil_depth1" = 0,
#>         "soil_depth2" = 2.5,
#>         "soil_depth3" = 10,
#>         "wsFun" = 2,
#>         "hydrDist" = 0,
#>         "rfl" = 0.2,
#>         "rsdf" = 0.44,
#>         "phi1" = 0.01,
#>         "phi2" = 1.5,
#>         "tbase" = 10,
#>         "km_leaf_litter" = 2,
#>         "timestep" = 1
#>     )
#> )

It can also be written to a text file:

# Save the model definition as an R file in the current working directory
writeLines(r_cmd_string, './soybean_reparam.R')

Commands From This Document

###
### Preliminaries
###

# Load required libraries
library(BioCroValidation)
library(BioCro)
library(dfoptim)
library(lattice)

###
### Prepare inputs for `objective_function`
###

# Specify the base model definition
base_model_definition            <- soybean
base_model_definition$ode_solver <- default_ode_solvers[['homemade_euler']]

# Define a helping function for processing data tables
process_table <- function(data_table, type) {
  # Define new `time` column
  data_table$time <- (data_table$DOY - 1) * 24.0

  # Define new `Shell_Mg_per_ha` column
  data_table$Shell_Mg_per_ha <- if (type == 'biomass') {
    # The shell is all parts of the pod other than the seed
    data_table$Rep_Mg_per_ha - data_table$Seed_Mg_per_ha
  } else {
    # Add uncertainties in quadrature, a simple approach to error propagation
    sqrt(data_table$Rep_Mg_per_ha^2 + data_table$Seed_Mg_per_ha^2)
  }

  # Define new `Root_Mg_per_ha` column, which has just one non-NA value
  row_to_use <- 5                 # Choose row to use
  data_table$Root_Mg_per_ha <- NA # Initialize all values to NA

  if (type == 'biomass') {
    # Estimate a mass at one time point
    col_to_add <- c(
      'Leaf_Mg_per_ha',
      'Stem_Mg_per_ha',
      'Rep_Mg_per_ha'
    )

    data_table[row_to_use, 'Root_Mg_per_ha'] <-
      0.17 * sum(data_table[row_to_use, col_to_add])
  } else {
    # Estimate standard deviation at one time point
    data_table[row_to_use, 'Root_Mg_per_ha'] <- 1 / exp(1) - 1e-5
  }

  # Remove columns by setting them to NULL
  data_table$DOY              = NULL
  data_table$Rep_Mg_per_ha    = NULL
  data_table$Litter_Mg_per_ha = NULL

  # Return the processed table
  data_table
}

# Define the data-driver pairs
data_driver_pairs <- list(
  ambient_2002 = list(
    data       = process_table(soyface_biomass[['ambient_2002']],     'biomass'),
    data_stdev = process_table(soyface_biomass[['ambient_2002_std']], 'stdev'),
    drivers    = BioCro::soybean_weather[['2002']],
    weight     = 1
  ),
  ambient_2005 = list(
    data       = process_table(soyface_biomass[['ambient_2005']],     'biomass'),
    data_stdev = process_table(soyface_biomass[['ambient_2005_std']], 'stdev'),
    drivers    = BioCro::soybean_weather[['2005']],
    weight     = 1
  )
)

# Define the post-processing function
post_process_function <- function(sim_res) {
  # Calculate the total litter as the sum of leaf and stem litter
  within(sim_res, {TotalLitter = LeafLitter + StemLitter})
}

# Define the data definition list, where the element names are columns in the
# observed data tables, and the element values are the corresponding column
# names in the model outputs
data_definitions <- list(
# Observed               Simulated
  CumLitter_Mg_per_ha = 'TotalLitter',
  Leaf_Mg_per_ha      = 'Leaf',
  Root_Mg_per_ha      = 'Root',
  Seed_Mg_per_ha      = 'Grain',
  Shell_Mg_per_ha     = 'Shell',
  Stem_Mg_per_ha      = 'Stem'
)

# Define a list of independent arguments and their initial values
independent_arg_names <- c(
  # Partitioning for leaf, stem, and shell
  'alphaLeaf',
  'betaLeaf',
  'alphaStem',
  'betaStem',
  'alphaShell',
  'betaShell',

  # Senescence for leaf and stem
  'alphaSeneLeaf',
  'betaSeneLeaf',
  'rateSeneLeaf',
  'alphaSeneStem',
  'betaSeneStem',
  'rateSeneStem',

  # Growth respiration for stem and root
  'grc_stem',
  'grc_root',

  # Maintenance respiration for leaf and root
  'mrc_leaf',
  'mrc_root'
)

independent_args <- soybean$parameters[independent_arg_names]

# Define a function that sets `mrc_stem` to the value of `mrc_leaf`
dependent_arg_function <- function(ind_args) {
  list(mrc_stem = ind_args[['mrc_leaf']])
}

# Specify the quantity weights; there is no systematic way to determine these,
# but the following weights have worked well in the past for Soybean-BioCro
quantity_weights <- list(
  Grain       = 1.0,
  Leaf        = 1.0,
  Root        = 0.1,
  Shell       = 0.5,
  Stem        = 1.0,
  TotalLitter = 0.1
)

# Define an extra penalty function
extra_penalty_function <- function(sim_res) {
  # Set the penalty value
  PENALTY <- 9999

  # Get the first times when each partitioning coefficient becomes non-zero
  k_thresh <- 0.01 # Threshold k value to decide when growth has started
  hpd      <- 24.0 # Hours per day

  time <- sim_res[['time']]

  time_grain <- time[sim_res[['kGrain']] > k_thresh][1]
  time_leaf  <- time[sim_res[['kLeaf']]  > k_thresh][1]
  time_shell <- time[sim_res[['kShell']] > k_thresh][1]
  time_stem  <- time[sim_res[['kStem']]  > k_thresh][1]

  # Return a penalty if necessary
  if (is.na(time_grain) | is.na(time_leaf) | is.na(time_shell) | is.na(time_stem)) {
    # One or more tissues is not growing
    return(PENALTY)
  } else if (abs(time_leaf - time_stem) > 5 * hpd) {
    # The starts of leaf and stem growth are more than 5 days apart
    return(PENALTY)
  } else if (time_leaf - time[1] > 20 * hpd | time_leaf - time[1] < 10 * hpd) {
    # The start of leaf growth is too late (more than 20 days after sowing) or
    # too early (fewer than 10 days after sowing)
    return(PENALTY)
  } else {
    # No problems were detected
    return(0.0)
  }
}

###
### Create the objective function
###

# Create the objective function
obj_fun <- objective_function(
  base_model_definition,
  data_driver_pairs,
  independent_args,
  quantity_weights,
  data_definitions       = data_definitions,
  normalization_method   = 'mean_max',
  stdev_weight_method    = 'logarithm',
  stdev_weight_param     = 1e-5,
  regularization_method  = 'none',
  dependent_arg_function = dependent_arg_function,
  post_process_function  = post_process_function,
  extra_penalty_function = extra_penalty_function
)

###
### Use an optimizer to choose parameter values
###

# Set a seed
set.seed(1234)

# Make an initial guess by perturbing the default values by a small amount
rel_size <- 0.02

initial_guess <- as.numeric(independent_args) *
  (1.0 + runif(length(independent_args), -rel_size, rel_size))

# Specify some bounds
aul <- 50   # Upper limit for alpha parameters
bll <- -50  # Lower limit for beta parameters
mll <- 1e-6 # Lower limit for mrc parameters
mul <- 1e-2 # Upper limit for mrc parameters

# Define a table with the bounds in the same order as `independent_args`
bounds <- bounds_table(
  independent_args,
  list(
    alphaLeaf     = c(0,      aul),
    alphaStem     = c(0,      aul),
    alphaShell    = c(0,      aul),
    alphaSeneLeaf = c(0,      aul),
    alphaSeneStem = c(0,      aul),
    betaLeaf      = c(bll,    0),
    betaStem      = c(bll,    0),
    betaShell     = c(bll,    0),
    betaSeneLeaf  = c(bll,    0),
    betaSeneStem  = c(bll,    0),
    rateSeneLeaf  = c(0,      0.0125),
    rateSeneStem  = c(0,      0.005),
    mrc_leaf      = c(mll,    mul),
    mrc_root      = c(mll,    mul),
    grc_stem      = c(8e-4,   0.08),
    grc_root      = c(0.0025, 0.075)
  ),
  initial_guess
)

# Run the optimizer
optim_res <- nmkb(
  initial_guess,
  obj_fun,
  bounds[['lower']],
  bounds[['upper']],
  control = list(
    tol = 1e-2,
    restarts.max = 10
  ),
  debug_mode = FALSE # Passed to obj_fun; set to TRUE to enable debug mode
)

###
### Check and record the new values
###

# Create a table of the various independent argument values
ind_arg_table <- data.frame(
  arg_name      = independent_arg_names,
  defaults      = as.numeric(independent_args),
  initial_guess = initial_guess,
  optimized     = optim_res[['par']],
  stringsAsFactors = FALSE
)

# Add differences
ind_arg_table <- within(ind_arg_table, {
  initial_diff   = abs(initial_guess - defaults)
  optimized_diff = abs(optimized - defaults)
  improved       = optimized_diff < initial_diff
})

# View results
print(ind_arg_table)

# Get model definition lists for the perturbed and re-parameterized versions of
# the soybean model
soybean_perturbed <- update_model(
  BioCro::soybean,
  independent_args,
  initial_guess,
  dependent_arg_function = dependent_arg_function
)

soybean_reparam <- update_model(
  BioCro::soybean,
  independent_args,
  optim_res[['par']],
  dependent_arg_function = dependent_arg_function
)

# Define a helper function that runs a single model for the year 2005
run_2005 <- function(model_definition) {
  with(model_definition, {run_biocro(
    initial_values,
    parameters,
    soybean_weather[['2005']],
    direct_modules,
    differential_modules,
    ode_solver
  )})
}

# Run each model and combine the results
full_res <- rbind(
  within(run_2005(BioCro::soybean),   {model = 'Default Soybean-BioCro'}),
  within(run_2005(soybean_perturbed), {model = 'Perturbed Soybean-BioCro'}),
  within(run_2005(soybean_reparam),   {model = 'Re-parameterized Soybean-BioCro'})
)

# Plot the results
xyplot(
  Leaf + Stem + Root + Grain ~ fractional_doy,
  group = model,
  data = full_res,
  type = 'l',
  auto.key = list(space = 'top'),
  xlab = 'Day of year (2005)',
  ylab = 'Biomass (Mg / ha)'
)

# Convert the re-parameterized soybean model to an R command string
r_cmd_string <- with(soybean_reparam, write_model(
  'soybean_reparam',
  direct_modules,
  differential_modules,
  initial_values,
  parameters,
  ode_solver
))

# Save the model definition as an R file in the current working directory
writeLines(r_cmd_string, './soybean_reparam.R')

References

Lochocki, Edward B, Scott Rohde, Deepak Jaiswal, Megan L Matthews, Fernando Miguez, Stephen P Long, and Justin M McGrath. 2022. “BioCro II: A Software Package for Modular Crop Growth Simulations.” In Silico Plants 4 (1): diac003. https://doi.org/10.1093/insilicoplants/diac003.

Matthews, Megan L, Amy Marshall-Colón, Justin M McGrath, Edward B Lochocki, and Stephen P Long. 2021. “Soybean-BioCro: A Semi-Mechanistic Model of Soybean Growth.” In Silico Plants 4 (1): diab032. https://doi.org/10.1093/insilicoplants/diab032.

Ordóñez, Raziel A., Sotirios V. Archontoulis, Rafael Martinez-Feria, Jerry L. Hatfield, Emily E. Wright, and Michael J. Castellano. 2020. “Root to Shoot and Carbon to Nitrogen Ratios of Maize and Soybean Crops in the US Midwest.” European Journal of Agronomy 120 (October): 126130. https://doi.org/10.1016/j.eja.2020.126130.