Download the source file `nonlinear_0.1.tar.gz`

and install the packages by the following commands in R:

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/packages/nonlinear_0.1.tar.gz", "nonlinear_0.1.tar.gz")
install.packages("nonlinear_0.1.tar.gz", type="source", repos=NULL)
```

If the package is installed succesfully, you could remove the source file (delete it).

Run the demo about autocorrelation as follows:

```
library(nonlinear)
demoAR()
```

Load packages:

```
library(mgcv)
library(itsadug)
```

Check R version and package versions:

`R.version.string`

`## [1] "R version 3.2.1 (2015-06-18)"`

`packageVersion("mgcv")`

`## [1] '1.8.6'`

`packageVersion("itsadug")`

`## [1] '1.0.2'`

If these versions do not agree with the ones included in the Lab Assignment file, please let us know.

The data discussed in this lab session is collected during two consequent courses (2014, 2015) at the CCP Spring Training in Experimental Psycholinguistics (STEP), in Edmonton, Canada. The data is collected in a noisy environment with participants who had already seen the experiment. The experiment is simplified version of an eye-tracking experiment of Vince Poretta (in prep; 2015).

Download the data to your working directory and load the data in R (see more info about loading data below)

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/Data/gazedata.rda", "countdat.rda")
load('countdat.rda')
```

Inspect the data with the follwoing two commands. Pay attention to the column `InterestAreaLabel`

.

```
head(countdat)
str(countdat)
```

In this assignment we are interested in the looks to the Target (e.g., “shop”) in comparison with the looks to the Onset Competitor (e.g., “shut”) and RhymeCompetitor (e.g., “drop”), and with the looks to a distractor (e.g., “fill”).

First, we plot the data. The data plots provide a sanity check for the GAMMs: if the pattern of the smooths are completely off (different range of values, or a completely different pattern), you may want to see what causes these differences.

Calculating the grand averages of the proportions of looks to the different areas of interest per timebin.

```
# IMPORTANT: note the spaces in the column names:
tail(colnames(countdat), 15)
```

```
## [1] "RhymecompReg" "OnsetcompFreq"
## [3] "OnsetcompReg" "DistractorFreq"
## [5] "DistractorReg" "Vowel"
## [7] "Stimtype" "Datafile"
## [9] "NativeEnlish" "StartTarget"
## [11] "InterestAreaLabel.OnsetComp_IA " "InterestAreaLabel.Target_IA "
## [13] "InterestAreaLabel.RhymeComp_IA " "InterestAreaLabel.Distract_IA "
## [15] "InterestAreaLabel.NA"
```

```
# removing spaces from column names, because these will give trouble:
colnames(countdat) <- gsub("\\ ","", colnames(countdat))
```

```
# calculating proportions:
countdat$TotalCount <- with(countdat, InterestAreaLabel.Target_IA
+ InterestAreaLabel.OnsetComp_IA
+ InterestAreaLabel.RhymeComp_IA
+ InterestAreaLabel.Distract_IA
+ InterestAreaLabel.NA )
countdat$propTarget <- with(countdat, InterestAreaLabel.Target_IA / TotalCount )
countdat$propOnset <- with(countdat, InterestAreaLabel.OnsetComp_IA / TotalCount )
countdat$propRhyme <- with(countdat, InterestAreaLabel.RhymeComp_IA / TotalCount )
countdat$propDistr <- with(countdat, InterestAreaLabel.Distract_IA / TotalCount )
```

Plot the grand averages per timebin. Use your preferred plotting functions, or use the code below.

```
# Setup empty plot window:
emptyPlot(range(countdat$Timebin), c(0,1),
h=.5,
main='Looks to interest areas', xlab='Time (ms)', ylab="Proportion")
# Distractor
avg <- aggregate(propDistr ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propDistr, col="black", lty=2)
# Onset competitor
avg <- aggregate(propOnset ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propOnset, col="indianred", lwd=2)
# Rhyme competitor
avg <- aggregate(propRhyme ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propRhyme, col="steelblue", lwd=2)
# Target
avg <- aggregate(propTarget ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propTarget, col="green", lwd=2)
```

As most people seem to have found the target before 2000 ms after word onset, we reduce the data by excluding all data points recorded after 2000 ms:

`countdat <- countdat[countdat$Timebin < 2000,]`

Create a new column that contains a matrix with looks to the target (success) and looks to the onset competitor (failures).

`countdat$TargetLooks <- cbind(countdat$InterestAreaLabel.Target_IA, countdat$InterestAreaLabel.OnsetComp_IA)`

```
head(countdat)
str(countdat)
```

cbind() data

If you are working in R Studio, the command `View(countdat)`

will show something different than `head(countdat)`

for the column `TargetLooks`

.

The cause for these differences is the use of the function `cbind`

: `cbind`

combines vectors into different columns of a matrix. Try for example:

```
test1 <- 1:10
test2 <- 2:11
mat <- cbind(test1,test2)
class(mat)
```

`## [1] "matrix"`

We have stored a matrix with two columns in the data frame `countdat`

under the name `TargetLooks`

. So each cell of the column `countdat$TargetLooks`

has two values:

`head(countdat$TargetLooks)`

```
## [,1] [,2]
## [1,] 0 0
## [2,] 14 0
## [3,] 20 0
## [4,] 20 0
## [5,] 20 0
## [6,] 20 0
```

The first column indicates the counts of successes (i.e., looks to the target), and the second column indicates the counts of failures (i.e., looks to the distractor).

Regression functions (e.g., `lm`

, `lmer`

, `bam`

) will recognize this structure as binomial data: counts of successes and failures.

As there are more than 10.000 rows in the data, we will use the function `bam()`

:

`nrow(countdat)`

`## [1] 20958`

Run a GAM model that includes the following nonlinear effects:

a main effect of

`Timebin`

a main effect of

`TrialIndex`

a

*random smooth*for subjects over`Timebin`

(include`k=5`

to reduce the running time); the subjects are indicated by the column`RecordingSessionLabel`

, but we change this first into`Subject`

(saves typing).

Because logistic models take more time to run, you can leave out other random effects (it will take too long to complete this lab session).

```
# order the data with function start_event:
countdat <- start_event(countdat, column="Timebin", event="Event")
# make RecordingSessionLabel as factor, and store it ito a column with shorter name:
countdat$Subject <- as.factor(countdat$RecordingSessionLabel)
m1 <- bam(TargetLooks ~ s(Timebin) + s(TrialIndex)
+ s(Timebin, Subject, bs='fs', m=1, k=5),
data=countdat, family="binomial")
```

```
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.
```

Inspect the summary of the model.

`summary(m1)`

```
##
## Family: binomial
## Link function: logit
##
## Formula:
## TargetLooks ~ s(Timebin) + s(TrialIndex) + s(Timebin, Subject,
## bs = "fs", m = 1, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.4412 0.5366 6.413 1.43e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(Timebin) 7.455 7.981 217.7 <2e-16 ***
## s(TrialIndex) 6.201 7.355 318.2 <2e-16 ***
## s(Timebin,Subject) 112.074 124.000 6470.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.421 Deviance explained = 22.3%
## fREML = 1.1538e+05 Scale est. = 1 n = 20958
```

- What does the intercept represent in this model? What does the intercept tell: do people look on average more to the target or to the onset competitor?

The intercept is the log odds ratio between looks to the target and looks to the onset competitor. As the intercept is far higher than 0, the intercept is higher than a proportion of 0.5, rather close to a proportion of 1. This means overall more looks to target than to onset competitor.

- What does the smooth over
`Timebin`

represent?

The partial effect of how the log odds for looks to the target versus looks to the onset competitor change over time.

More info on logits (binomial count data)

The regression model uses the counts of target and onset competitor looks to calculate a *logit*, which is the log odds ratio between looks to the target and looks to the onset competitor: \(log(\frac{N_{target.looks}}{N_{onsetcomp.looks}})\) (or following the logit definition: \(log(\frac{N_{target.looks}}{N - N_{target.looks}})\) with\(N=N_{target.looks}+N_{onsetcomp.looks}\)).

If \(N_{target.looks}\) equals \(N_{onsetcomp.looks}\), the ratio is 1, and the logit is 0 (

`log(1)=`

0).If \(N_{target.looks}\) is a larger count than the \(N_{onsetcomp.looks}\), the ratio is higher than 1, and the logit is a

*positive value*, e.g. (`log(2)=`

0.6931472).If \(N_{target.looks}\) is a smaller count than the \(N_{onsetcomp.looks}\), the ratio is lower than 1, and the logit is a

*negative value*, e.g. (`log(.5)=`

-0.6931472).

Illustration of the relation between logits and proportions:

Code for plotting the logits:

```
total <- 500
successes <- 1:total
proportions <- successes / total
logits <- log( proportions / (1-proportions))
plot(proportions, logits)
```

Back transformation of logits to proportions:

`plot(logits, plogis(logits))`

Visualizing the trend over time:

```
par(mfrow=c(1,2))
plot(m1, select=1, scale=0, shade=TRUE)
abline(h=0)
plot_smooth(m1, view="Timebin", rm.ranef=TRUE)
```

Interpretation of the plots. - Why is the shape of the smooth different from the target looks as we saw in the data plots?

In the models we compare the ratio of target looks and onset competitor looks. The earlier plot visualized the *proportions* of looks to target, onset competitor, rhyme competitor and distractor looks.

- What do these plot tell about the looks to the target versus the looks to the onset competitor?
- Left plot: partial effect. Over time the log odds increase, so over time the looks to the target increase and the looks to the onset competitor decreases.
- Right plot: summed effects, fitted values, intercept is included. Here we see that the log odds ratio is estimated around zero at the start of the trial. This means equal amount of looks to target and distractor. Gradually the looks to the target increase. Around 700 ms after sound onset, the participants look significantly more to target than to the distractor, because the smooth is significantly different from zero. Below is an illustration how we could use the plot to determine at which timebin people can distinguish between target and onset competitor:

```
out <- plot_smooth(m1, view="Timebin", rm.ranef=TRUE)
(diff <- find_difference(out$fv$fit, out$fv$CI, xVals=out$fv$Timebin))
```

```
## $start
## [1] 708.6207
##
## $end
## [1] 1950
##
## $xVals
## [1] TRUE
```

```
addInterval(0, lowVals=diff$start, highVals = diff$end, col='red', lwd=2)
abline(v=c(diff$start, diff$end), lty=3, col='red')
text(mean(c(diff$start, diff$end)), 1, "sign. more looks to target", col='red', font=3)
```

For interpretation of the results, we could tranform the summed effects back to proportions with the function `plot_smooth`

:

```
par(mfrow=c(1,1))
plot_smooth(m1, view="Timebin", rm.ranef=TRUE,
transform=plogis, ylim=c(0,1))
abline(h=.5, v=700, col='red', lty=2)
```

- When do people start to look significantly more to the target than to the onset competitor?

As we already derived from the logit plot, we see that around 700 ms after sound onset the proportion of taregt looks increases .5 . significantly.

Run a new GAMM model that includes a nonlinear interaction between `Timebin`

and `TrialIndex`

, and a *random smooth* for subjects (`Subject`

) over `Timebin`

(include `k=5`

for the random effect to reduce the running time).

```
m2 <- bam(TargetLooks ~ te(Timebin, TrialIndex)
+ s(Timebin, Subject, bs='fs', m=1, k=5),
data=countdat, family="binomial")
```

Inspect the summary of model `m2`

.

`summary(m2)`

```
##
## Family: binomial
## Link function: logit
##
## Formula:
## TargetLooks ~ te(Timebin, TrialIndex) + s(Timebin, Subject, bs = "fs",
## m = 1, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.4212 0.5741 5.959 2.53e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## te(Timebin,TrialIndex) 21.82 22.62 1594 <2e-16 ***
## s(Timebin,Subject) 113.72 124.00 6511 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.415 Deviance explained = 21.1%
## fREML = 1.2794e+05 Scale est. = 1 n = 20958
```

- What does the intercept represent in this model?

The log odds ratio between looks to the target and looks to the onset competitor, and a constant value that needs to be added to the interaction surface and the random effects.

- What does it mean that the smooth term
`te(Timebin, TrialIndex)`

is significant in the summary?

The interaction surface is at some point significantly different from a flat plane through zero. However, this does not mean that the interaction itself is significant: it could be that the two main effects included in the interaction cause the surface to be significantly different from a flat zero surface.

Compare this model with model `m1`

with the function `compareML`

.

`compareML(m1,m2)`

```
## m1: TargetLooks ~ s(Timebin) + s(TrialIndex) + s(Timebin, Subject,
## bs = "fs", m = 1, k = 5)
##
## m2: TargetLooks ~ te(Timebin, TrialIndex) + s(Timebin, Subject, bs = "fs",
## m = 1, k = 5)
##
## Model m1 preferred: lower fREML score (12561.705), and lower df (1.000).
## -----
## Model Score Edf Difference Df
## 1 m2 127943.4 8
## 2 m1 115381.6 7 12561.705 -1.000
##
## AIC difference: -345.21, model m1 has lower AIC.
```

- Could you conclude that the interaction between
`Timebin`

and`TrialIndex`

is significant?

Actually the model without interaction is significant, because it’s fREML score is much lower, and its also has less degrees of freedom to model the data. The model that includes only the two main effects is preferred, so the nonlinear trend over time is changing nonlinearly for different trials, but only with a constant value.

Visualize the interaction:

```
par(mfrow=c(1,2), cex=1.1)
# plot the partial effect:
plot(m2, select=1, rug=FALSE, main="Interaction",
cex.axis=2, cex.lab=2) # parameters for increasing font size
# Now with colors:
pvisgam(m2, select=1, view=c("Timebin", "TrialIndex"), zlim=c(-5,14), main="")
```

`## [1] "Tensor(s) to be plotted: te(Timebin,TrialIndex)"`