Project 2

Variability and Function Learning

Published

January 9, 2024

Introduction

In project 1, we applied model-based techniques to quantify and control for the similarity between training and testing experience, which in turn enabled us to account for the difference between varied and constant training via an extended version of a similarity based generalization model. In project 2, we will go a step further, implementing a full process model capable of both 1) producing novel responses and 2) modeling behavior in both the learning and testing stages of the experiment. Project 2 also places a greater emphasis on extrapolation performance following training - as varied training has often been purported to be particularly beneficial in such situations. Extrapolation has long been a focus of the literature on function learning (Brehmer, 1974; Carroll, 1963). Central questions of the function learning literature have included the relative difficulties of learning various functional forms (e.g. linear vs.bilinear vs. quadratic), and the relative effectiveness of rule-based vs. association-based exemplar models vs. various hybrid models (Bott & Heit, 2004; DeLosh et al., 1997; Jones et al., 2018; Kalish et al., 2004; M. Mcdaniel et al., 2009; M. A. Mcdaniel & Busemeyer, 2005). However the issue of training variation has received surprisingly little attention in this area.

Methods

Participants

Data was collected from 647 participants (after exclusions). The results shown below consider data from subjects in our initial experiment, which consisted of 196 participants (106 constant, 90 varied). The follow-up experiments entailed minor manipulations: 1) reversing the velocity bands that were trained on vs. novel during testing; 2) providing ordinal rather than numerical feedback during training (e.g. correct, too low, too high). The data from these subsequent experiments are largely consistently with our initial results shown below.

Task

We developed a novel visuomotor extrapolation task, termed the “Hit The Wall” (HTW) task, wherein participants learned to launch a projectile such that it hit a rectangle at the far end of the screen with an appropriate amount of force. Although the projectile had both x and y velocity components, only the x-dimension was relevant for the task.  Link to task demo

Design

  1. 90 training trials split evenly divided between velocity bands. Varied training with 3 velocity bands and Constant training with 1 band.

  2. No-feedback testing from 3 novel extrapolation bands. 15 trials each.  

  3. No-feedbacd testing from the 3 bands used during the training phase (2 of which were novel for the constant group). 9 trials each.

  4. Feedback testing for each of the 3 extrapolation bands. 10 trials each.

Experiment Procedure

Results

Training

Training performance is shown in Results Figure 2A. All groups show improvement from each of their training velocity-bands (i.e. decreasing average distance from target). In the velocity band trained at by both groups (800-1000), the constant group maintains a superior level of performance from the early through the final stages of training. This difference is unsurprising given that the constant group had 3x more practice trials from that band.

Code
#fig.cap="\\label{fig:figs}training performance"

#title=paste0("HTW Training")
#figpatch::fig(here("Assets/Training-1.png"))
knitr::include_graphics(here("Assets/Training-1.png"))

Code
  #fig_lab(.,title,pos="top",fontface='bold',size=12,hjust=.01)

Testing

For evaluating testing performance, we consider 3 separate metrics. 1) The average absolute deviation from the correct velocity, 2) The % of throws in which the wall was hit with the correct velocity and 3) The average x velocity produced.

Results Figure 2B shows the average velocity produced for all 6 bands that were tested. At least at the aggregate level, both conditions were able to differentiate all 6 bands in the correct order, despite only having received training feedback for 1/6 (constant) or 3/6 (varied) bands during training. Participants in both groups also had a bias towards greatly overestimating the correct velocity for band 100-300, for which both groups had an average of greater than 500.

Code
# fig2aCap=str_wrap("Figure 2B: Bands 100-300, 350-550 and 600-800 are novel extrapolations for both groups. Band 800-1000 was a training band for both groups. Bands 1000-1200, and 1200-1400 were trained for the varied group, and novel for the constant group.  Top figure displays mean deviation from correct velocity. Bottom figure displays the average % of trials where participants hit the wall with the correct velocity. Error bars indicate standard error of the mean. " ,width=170)


#figpatch::fig(here("Assets/Testing_Vx-1.png"))
knitr::include_graphics(here("Assets/Testing_Vx-1.png"))

As is reflected in Results Figure 2C, the constant group performed significantly better than the varied group at the 3 testing bands of greatest interest. Both groups tended to perform worse for testing bands further away from their training conditions. The varied group had a slight advantage for bands 1000-1200 and 1200-1400, which were repeats from training for the varied participants, but novel for the constant participants.

Code
#figpatch::fig(here("Assets/Test_Performance-1.png"))
knitr::include_graphics(here("Assets/Test_Performance-1.png"))

Code
# 
# gtitle="2C. Testing Performance"
# title = ggdraw()+draw_label(gtitle,fontface = 'bold',x=0,hjust=0)+theme(plot.margin = margin(0, 0, 0, 7))
# captionText=str_wrap("Figure 2C: Bands 100-300, 350-550 and 600-800 are novel extrapolations for both groups. Band 800-1000 was a training band for both groups. Bands 1000-1200, and 1200-1400 were trained for the varied group, and novel for the constant group.  Right side figure displays mean deviation from correct velocity band (lower values correspond to better performance). Bottom Left displays the average % of trials where participants hit the wall with the correct velocity (higher values correspond got better performance). Error bars indicate standard error of the mean. ",150)
# capt=ggdraw()+draw_label(captionText,fontface = 'italic',x=0,hjust=0,size=11)+theme(plot.margin = margin(0, 0, 0, 1))
# 
# plot_grid(title,NULL,leg,NULL,gbDev,gbHit,capt,NULL,ncol=2,rel_heights=c(.1,.1,1,.1),rel_widths=c(1,1))

Modeling

In project 1, we applied model-based techniques to quantify and control for the similarity between training and testing experience, which in turn enabled us to account for the difference between varied and constant training via an extended version of a similarity based generalization model. In project 2, we will go a step further, implementing a full process model capable of both 1) producing novel responses and 2) modeling behavior in both the learning and testing stages of the experiment. For this purpose, we will apply the associative learning model (ALM) and the EXAM model of function learning (DeLosh 1997). ALM is a simple connectionist learning model which closely resembles Kruschke’s ALCOVE model (Kruscke 1992), with modifications to allow for the generation of continuous responses.

ALM & Exam Description

Delosh et al. (1997) introduced the associative learning model (ALM), a connectionist model within the popular class of radial-basis networks. ALM was inspired by, and closely resembles Kruschke’s influential ALCOVE model of categorization (Kruscke 1992).

ALM is a localist neural network model, with each input node corresponding to a particular stimulus, and each output node corresponding to a particular response value. The units in the input layer activate as a function of their Gaussian similarity to the input stimulus. So, for example, an input stimulus of value 55 would induce maximal activation of the input unit tuned to 55. Depending on thevalue of the generalization parameter, the nearby units (e.g. 54 and 56; 53 and 57) may also activate to some degree. ALM is structured with input and output nodes that correspond to regions of the stimulus space, and response space, respectively. The units in the input layer activate as a function of their similarity to a presented stimulus. As was the case with the exemplar-based models, similarity in ALM is exponentially decaying function of distance. The input layer is fully connected to the output layer, and the activation for any particular output node is simply the weighted sum of the connection weights between that node and the input activations. The network then produces a response by taking the weighted average of the output units (recall that each output unit has a value corresponding to a particular response). During training, the network receives feedback which activates each output unit as a function of its distance from the ideal level of activation necessary to produce the correct response. The connection weights between input and output units are then updated via the standard delta learning rule, where the magnitude of weight changes are controlled by a learning rate parameter.

See Table 2A for a full specification of the equations that define ALM and EXAM.

Model Equations

Code
text_tbl <- data.frame(
    'Step'=c("Input Activation","Output Activation","Output Probability","Mean Output","Feedback Activation","Update Weights","Extrapolation",""),
    'Equation' = c("$a_i(X) = \\frac{e^{-c \\cdot (X-X_i)^2}}{ \\sum_{k=1}^Me^{-c \\cdot (X-X_i)^2}}$", 
                   '$O_j(X) = \\sum_{k=1}^Mw_{ji} \\cdot a_i(X)$',
                   '$P[Y_j | X] = \\frac{O_i(X)}{\\sum_{k=1}^Mo_k(X)}$',
                   "$m(x) = \\sum_{j=1}^LY_j \\cdot \\bigg[\\frac{O_j(X)}{\\sum_{k=1}^Lo_k(X)}\\bigg]$",
                   "$f_j(Z)=e^{-c\\cdot(Z-Y_j)^2}$",
                   "$w_{ji}(t+1)=w_{ji}(t)+\\alpha \\cdot {f_i(Z(t))-O_j(X(t))} \\cdot a_i(X(t))$",
                   "$P[X_i|X] = \\frac{a_i(X)}{\\sum_{k=1}^Ma_k(X)}$",
                   "$E[Y|X_i]=m(X_i) + \\bigg[\\frac{m(X_{i+1})-m(X_{i-1})}{X_{i+1} - X_{i-1}} \\bigg] \\cdot[X-X_i]$"),
    
    'Description'= c(
            "Activation of each input node, $X_i$, is a function of the Gaussian similarity between the node value and stimulus X. ",
            "Activation of each Output unit $O_j$ is the weighted sum of the input activations and association weights",
            "Each output node has associated response, $Y_j$. The probability of response $Y_j$ is determined by the ratio of output activations",
            "The response to stimulus x is the weighted average of the response probabilities",
            "After responding, feedback signal Z is presented, activating each output node via the Gaussian similarity to the ideal response  ",
            "Delta rule to update weights. Magnitude of weight changes controlled by learning rate parameter alpha.",
            "Novel test stimulus X activates input nodes associated with trained stimuli",
            "Slope value computed from nearest training instances and then added to the response associated with the nearest training instance,m(x)")
)
text_tbl$Step=cell_spec(text_tbl$Step,font_size=12)
text_tbl$Equation=cell_spec(text_tbl$Equation,font_size=18)
almTable=kable(text_tbl, 'html', 
  booktabs=T, escape = F, align='l',
  caption = '<span style = "color:black;"><center><strong>Table 1: ALM & EXAM Equations</strong></center></span>',
  col.names=c("","Equation","Description")) %>%
  kable_styling(position="left",bootstrap_options = c("hover")) %>%
  column_spec(1, bold = F,border_right=T) %>%
  column_spec(2, width = '10cm')%>%
  column_spec(3, width = '15cm') %>%
  pack_rows("ALM Activation & Response",1,4,bold=FALSE,italic=TRUE) %>%
  pack_rows("ALM Learning",5,6,bold=FALSE,italic=TRUE) %>%
  pack_rows("EXAM",7,8,bold=FALSE,italic=TRUE)
  #save_kable(file="almTable.html",self_contained=T)
#almTable


cat(almTable)
Table 1: ALM & EXAM Equations
Equation Description
ALM Activation & Response
Input Activation \(a_i(X) = \frac{e^{-c \cdot (X-X_i)^2}}{ \sum_{k=1}^Me^{-c \cdot (X-X_i)^2}}\) Activation of each input node, \(X_i\), is a function of the Gaussian similarity between the node value and stimulus X.
Output Activation \(O_j(X) = \sum_{k=1}^Mw_{ji} \cdot a_i(X)\) Activation of each Output unit \(O_j\) is the weighted sum of the input activations and association weights
Output Probability \(P[Y_j | X] = \frac{O_i(X)}{\sum_{k=1}^Mo_k(X)}\) Each output node has associated response, \(Y_j\). The probability of response \(Y_j\) is determined by the ratio of output activations
Mean Output \(m(x) = \sum_{j=1}^LY_j \cdot \bigg[\frac{O_j(X)}{\sum_{k=1}^Lo_k(X)}\bigg]\) The response to stimulus x is the weighted average of the response probabilities
ALM Learning
Feedback Activation \(f_j(Z)=e^{-c\cdot(Z-Y_j)^2}\) After responding, feedback signal Z is presented, activating each output node via the Gaussian similarity to the ideal response
Update Weights \(w_{ji}(t+1)=w_{ji}(t)+\alpha \cdot {f_i(Z(t))-O_j(X(t))} \cdot a_i(X(t))\) Delta rule to update weights. Magnitude of weight changes controlled by learning rate parameter alpha.
EXAM
Extrapolation \(P[X_i|X] = \frac{a_i(X)}{\sum_{k=1}^Ma_k(X)}\) Novel test stimulus X activates input nodes associated with trained stimuli
\(E[Y|X_i]=m(X_i) + \bigg[\frac{m(X_{i+1})-m(X_{i-1})}{X_{i+1} - X_{i-1}} \bigg] \cdot[X-X_i]\) Slope value computed from nearest training instances and then added to the response associated with the nearest training instance,m(x)

Model Fitting and Comparison

Following the procedure used by McDaniel & Busemeyer (2009), we will assess the ability of both ALM and EXAM to account for the empirical data when fitting the models to 1) only the training data, and 2) both training and testing data. Models will be fit directly to the trial by trial data of each individual participants, both by minimizing the root-mean squared deviation (RMSE), and by maximizing log likelihood. Because ALM has been shown to do poorly at accounting for human patterns extrapolation (DeLosh et al., 1997), we will also fit the extended EXAM version of the model, which operates identically to ALM during training, but includes a linear extrapolation mechanism for generating novel responses during testing.








References

Bott, L., & Heit, E. (2004). Nonmonotonic Extrapolation in Function Learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 38–50.
Brehmer, B. (1974). Hypotheses about relations between scaled variables in the learning of probabilistic inference tasks. Organizational Behavior and Human Performance, 11(1), 1–27. https://doi.org/10.1016/0030-5073(74)90002-6
Carroll, J. D. (1963). Functional Learning: The Learning of Continuous Functional Mappings Relating Stimulus and Response Continua. ETS Research Bulletin Series, 1963(2), i–144. https://doi.org/10.1002/j.2333-8504.1963.tb00958.x
DeLosh, E. L., McDaniel, M. A., & Busemeyer, J. R. (1997). Extrapolation: The Sine Qua Non for Abstraction in Function Learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19.
Jones, A., Schulz, E., Meder, B., & Ruggeri, A. (2018). Active Function Learning [Preprint]. Animal Behavior and Cognition. https://doi.org/10.1101/262394
Kalish, M. L., Lewandowsky, S., & Kruschke, J. K. (2004). Population of Linear Experts: Knowledge Partitioning and Function Learning. Psychological Review, 111(4), 1072–1099. https://doi.org/10.1037/0033-295X.111.4.1072
Mcdaniel, M. A., & Busemeyer, J. R. (2005). The conceptual basis of function learning and extrapolation: Comparison of rule-based and associative-based models. Psychonomic Bulletin & Review, 12(1), 24–42. https://doi.org/10.3758/BF03196347
Mcdaniel, M., Dimperio, E., Griego, J., & Busemeyer, J. (2009). Predicting transfer performance: A comparison of competing function learning models. Journal of Experimental Psychology. Learning, Memory, and Cognition, 35, 173–195. https://doi.org/10.1037/a0013982