Dissertation Proposal Presentation

# Dissertation Proposal Presentation

### The Influence of Variability on Learning and Generalization

.large[Thomas Gorman | September 20th, 2022 | PY 230]

--
<br><br>
[Jump to Project 1](#Project_1)

[Jump to Project 2](#Project_2)

[Jump to Project 3](#Project_3)

<a href="https://tegorman13.github.io/DP/index.html" target="_blank">Link to Proposal Site</a>
---

---

## Variability and Learning Generalization

- Variation during training linked to improved transfer in numerous domains 
  - e.g. concept, motor, category learning. 
  - Benefits often argued to arise from variation promoting abstraction of some scheme/prototype/rule
  
- Particular interesting finding (Kerr & Booth, 1978), that varied training can result in better performance than constant training even from the conditions that the constant group trained, which was novel to the varied group.

--
- But also plenty of contradictory results and complications
  - Cases where varied training makes no difference
  - Cases where more training variation results in worse outcomes
  - Cases where the influence of variation interacts with some other factor
    - difficulty
    - prior knowledge
    - Frequency effects, or amount of training/learning before testing

---

## General theme of dissertation

- Similarity-based accounts of the influence of variabiliy on learning
  - Two empirical studies
  - One Retrospective with a very large dataset

---

name: Project_1
## Project 1
### An Instance-Based Model Account of the Benefits of Varied Practice in Motor Learning

- Can we replicate previous findings of a varied-training benefit in a projectile throwing task, and can such results be accounted for with a quantitative process model?

- Compare constant group trained from 1 location, to varied group trained from 2 locations. Then test both groups, from the training positions and a novel position.

---
### Project 1 - Task and Design 
.pull-left[<img src="Assets/igasMethods.png" width="430" height="300" />]
--
.pull-right[
- **Training Stage** - 120 training trials. Constant groups throws from single position. Varied group 60 trials from 2 locations. 
- **Transfer Stage** - All subjects tested from both positions they were trained, and the positions trained by other group
- **Data recorded** - For every throw, recorded the X velocity and Y velocity of ball at release
]
--
.pull-left[
### Experiment 1
- Train a varied group from 2 positions, and a constant group from 1. Then test both groups, from the training positions and a novel position.
]

.pull-right[
### Experiment 2
- 6 constant conditions, each trained from a unique location, and a varied condition trained from 2 locations
]

---

## Results - Experiment 1

.center[<img src="Assets/igasE1Test.png" width="650" height="500" />]

---

## Results - Experiment 2

.center[<img src="Assets/igasE2Test.png" width="650" height="500" />]

---

### Similarity between training throws and solutions

- Euclidean distance: `$$\sqrt{(x_{Train_i}-x_{Solution_j})^2 + (y_{Train_i}-y_{Solution_j})^2}$$`   
- Gaussian Similarity: `$$e^{-c\cdot d^p_{i,j}}$$`
- c parameters dictates how quickly similarity decreases as a function of distance

---

### Project 1 - Model of Similarity between training and solutions
 - For each subject, compute similarity between training throws and each of the 6 testing locations
 - Test whether similarity explains the difference in performance between Constant and Varied conditions
   - Assuming equivalent generalization (1 c for both groups) - similarity improves model fit, but does not explain group difference
   - Assume training condition influences generalization gradient (fit c separately) - now similarity does explain the difference between Constant and Varied.

**Limitations**
- Theoretically motivated similarity computation, but not a full cognitive process model
  - missing link between similarity and generation of the response
- Only considers generalization - no account of the learning process
- Limited to group patterns - obscuring large degree of individual differences

---

name: Project_2
## Project 2 - Variability and Extrapolation in a Function Learning Task

---

## Project 2 - Questions and Goals

**Empirical**
- Design a task-space large enough to assess multiple degrees of extrapolation 
- Compare varied and constant generalization from several distinct distances from their nearest training condition
- Account for various secondary concerns, e.g. item difficulty; testing with vs. without feedback; ordinal vs. continuous feedback.

**Model-based**
- If variation does influence extrapolation, can a similarity-based model provide a good account?
- Can our modelling framework simultaneously account for both training and testing data?
- Accounting for the full distribution of responses

---

## Project 2 - Design
.center[<img src="Assets/htwDesign.png" width="650" height="500" />]

???
https://pcl.sitehost.iu.edu/tg/HTW/HTW_Index.html?sonaid=

---

## Project 2 - Training

.center[<img src="Assets/htwTraining.png" width="650" height="500" />]

---

## Project 2 - Testing Vx Distributions
.center[<img src="Assets/htwTestVx.png" width="650" height="500" />]

---

## Project 2 - Testing Deviation

.center[<img src="Assets/htwTestDev.png" width="650" height="500" />]

---

## Project 2 - Modelling Proposal - ALM & EXAM

**Model Structure** 
- Localist connectionist model
- Input nodes cover the range of stimulus values
- Nodes activate as a function of their similarity to the stimulus
- Output nodes cover the range of valid response values
- Fully connected - weights update via delta rule

**Free parameters**

- Learning rate parameter 
- Generalization/sensitivity parameter (c parameter)

???

Delosh et al. (1997) introduced the associative learning model (ALM), a
connectionist model within the popular class of radial-basis networks.
ALM was inspired by, and closely resembles Kruschke's influential ALCOVE
model of categorization (Kruscke 1992).

ALM is a localist neural network model, with each input node corresponding to a
particular stimulus, and each output node corresponding to a particular response
value. The units in the input layer activate as a function of their Gaussian
similarity to the input stimulus. So, for example, an input stimulus of value 55
would induce maximal activation of the input unit tuned to 55. Depending on
thevalue of the generalization parameter, the nearby units (e.g. 54 and 56; 53
and 57) may also activate to some degree. ALM is structured with input and
output nodes that correspond to regions of the stimulus space, and response
space, respectively. The units in the input layer activate as a function of
their similarity to a presented stimulus. As was the case with the
exemplar-based models, similarity in ALM is exponentially decaying function of
distance. The input layer is fully connected to the output layer, and the
activation for any particular output node is simply the weighted sum of the
connection weights between that node and the input activations. The network then
produces a response by taking the weighted average of the output units (recall
that each output unit has a value corresponding to a particular response).
During training, the network receives feedback which activates each output unit
as a function of its distance from the ideal level of activation necessary to
produce the correct response. The connection weights between input and output
units are then updated via the standard delta learning rule, where the magnitude
of weight changes are controlled by a learning rate
parameter.

---

.center[<img src="Assets/ALM_Equations.png" width="650" height="500" />]

???

Following the procedure used by McDaniel & Busemeyer (2009), we will
assess the ability of both ALM and EXAM to account for the empirical
data when fitting the models to 1) only the training data, and 2) both
training and testing data. Models will be fit directly to the trial by
trial data of each individual participants, both by minimizing the
root-mean squared deviation (RMSE), and by maximizing log likelihood. Because ALM
has been shown to do poorly at accounting for human patterns extrapolation [@deloshExtrapolationSineQua1997], we will also
fit the extended EXAM version of the model, which operates identically to ALM during training, 
but includes a linear extrapolation mechanism for generating novel responses during testing.

---

name: Project_3
## Project 3

.center[<img src="Assets/lbdScreen.png" width="500" height="400" />]

---

## Lost In Migration

---

## Lost In Migration Gameplay Demo

// Written by @labnol
</script>

???

https://www.youtube.com/watch?v=Hf3QQtsOaIE&t=139s&ab_channel=DF

---

## Project 3 - Spatial Layouts

.center[<img src="Assets/lbdLayouts.png" width="650" height="500" />]

---

## Project 3 - Learning Curves

.center[<img src="Assets/lbdLearnCurve.png" width="650" height="500" />]

---

## Project 3 - Split Test Dimensions

.pull-left[<img src="Assets/lbdSplitTest.png" width="400" height="400" />]

.pull-right[

**Expanded dimensions of the split test**

- Rotation and Orientation version -  A continuous range of discrepancies between target and flanker birds on incongruent trials ( compared to fixed to 90, 180, or 270 degrees in base version)
- Distance between birds version -  A 0-20px continuous distance between target and flankers (constant value of 41px in base)
- Size of bird version - The size of the target bird and flankers birds (same size for all 4) are independently drawn from a range of 35-60px (both fixed at 41px in base)

]
---

## Project 3 - Split Test Game Sequence Counts

.center[<img src="Assets/lbdModeCounts.png" width="650" height="500" />]

???
Frequency of users in each possible sequence of experiencing baseline and split-test versions. Base users are those that experience purely the baseline version. Split indicates the number of users who have thus far only completed the split-test version, and split_base indicates users who started out in the split test, but have switched into the baseline version. Base_split users are those that start with the baseline version, and are then randomly assigned to a single game with the split-test and then switch back to the baseline version (thus each of the green bars reflect a distinct set of users

---

## Project 3 - Variability via Randomization

.center[<img src="Assets/lbdSequence.png" width="650" height="500" />]

???
As mentioned above, the trial selection process is not entirely-random, due to the constraint congruent and incongruent trials occur at approximately equal proportions. However, all other aspects of the trial generation process are random (i.e. no dependence on previous states, user performance, number of gameplays). A simple consequence of such randomization, is that some users will experience a wider range of trial-states, particularly in the early stages of the game. Figure XXX illustrates a toy case wherein two users receive four trials with discrepant levels of variation in spatial layout, the XY coordinates of the birds on the screen, and bird direction. 
---

## Project 3 - Variability via Randomization

.center[<img src="Assets/lbdRvDist.png" width="650" height="500" />]

???

To quantify the amount of variability experienced by a user, we may start by simply taking the number of unique trial-states that the user has encountered after a given number of trials with the game. Each trial of the game can be defined a long many different dimensions , both discrete (e.g. bird direction, layout), and continuous (X and Y coordinates on the screen). For simplictly, consider only the 3 categorical/ordinal trial dimensions, which consist of 4 values for target direction, 4 for flanker direction, and 7 spatial layouts, resulting in 112 ($4*4*7$) distinct trial-states. Figure \@ref(fig:rvDist) demonstrates how users who have completed the same total number of trials will still differ in the number of unique trial-states experienced.

Although nice for its simplicity, quantifying variability as the number of unique states experienced is a fairly coarse and limited metric. One limitation comes through the lack of spatial relations between trial-states, for example n \@ref(fig:sequencePlot) the lower variation example in panel B consists of four unique trial-states, but neverthless covers a far narrower region of the full state-space compared to the high variation example in panel A. Another issue, reflected in \@ref(fig:rvDist), is that given enough trials, the differences between users will eventually become negligible as they each approach the maximum of 112 unique states encountered. A more suitable measure may therefore be uniformity of the frequency distribution of trial types (e.g. a measure of entropy).

---

## Project 3 - Learning Model

**Modelling performance as a non-linear function of # of trials completed**

Exponential Model: `$y_t = u - ae^{-ct}$`

Power Model: `$y_t = u - at^{-c}$`

---

### Project 3 - Similarity Between Trials

- Predicting performance on the current trial-state as a function of the previously experienced trial-states
  - benefits of repeating an exact trial-state vs. repetitions of particular dimensional values
  - How much variance does experience with particular types of trials matter after controlling for the total number of trials completed?
<br>
- Explaining any effect of variability via randomization, and variability via the split-tests
<br>

- Challenge of determining similarity between trial-states in the early stages of learning
- Controlling for other factors that have a strong influence on performance, such as age, congruent vs. incongruent.

<br>

???

A central challenge of project 3 will be to establish an appropriate measure of distance between the current trial `$trial_n$` and prior experience `$trial_{1:n-1}$`. Similarity could be defined
as the simple dimensional/featural overlap between two trials, or by modelling a
trial-state as a point in a multidimensional space, and using some distance
metric (e.g., Euclidean distance) to compute the distance between trials, and then transforming
that distance into psychological space with an exponential or Gaussian function.

As was described above, each trial can be defined a long a number of dimensions. However are unlikely to be equal in their influence, and may thus need to be differentially weighted.Determining the identies of the dimensions themselves may also be nontrivial.

Once a similarity measure between trials has been established, we can use the
individual trial histories for each user to compute how similar a given trial is
to the totality of their prior experience in the game (i.e., the summed
similarity of all previous trials), or to the similarity of a subset of their
more recent trials. We can then begin to perform inferential statistics
assessing the extent to which users matched in total experience with the game
may perform differently on individual trials as a function of the similarity
between their prior experience and that particular trial. Additionally, our similarity metric can
be used to attempt to explain differences in performance between baseline, split-test, and split-test -> baseline users.

## Thank You

<br><br>
---

## Extra

[Jump to Project 1](#Project_1)

[Jump to Project 2](#Project_2)

[Jump to Project 3](#Project_3)

<a href="https://tegorman13.github.io/DP/index.html" target="_blank">Link to Proposal Site</a>

<a href="https://pcl.sitehost.iu.edu/tg/demos/igas_expt1_demo.html" target="_blank">Link to Project 1 Task</a>

<a href="https://pcl.sitehost.iu.edu/tg/HTW/HTW_Index.html?sonaid=" target="_blank">Link to Project 2 Task</a>

<a href="https://www.youtube.com/watch?v=Hf3QQtsOaIE&t=139s&ab_channel=DF" target="_blank">Link to Lim Video</a>

---