Download the source file `nonlinear_0.1.tar.gz`

and install the packages by the following commands in R:

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/packages/nonlinear_0.1.tar.gz", "nonlinear_0.1.tar.gz")
install.packages("nonlinear_0.1.tar.gz", type="source", repos=NULL)
```

If the package is installed succesfully, you could remove the source file (delete it).

Run the demo about autocorrelation as follows:

```
library(nonlinear)
demoAR()
```

Load packages:

```
library(mgcv)
library(itsadug)
```

Check R version and package versions:

`R.version.string`

`## [1] "R version 3.2.1 (2015-06-18)"`

`packageVersion("mgcv")`

`## [1] '1.8.6'`

`packageVersion("itsadug")`

`## [1] '1.0.1'`

If these versions do not agree with the ones included in the Lab Assignment file, please let us know.

The data discussed in this lab session is collected during two consequent courses (2014, 2015) at the CCP Spring Training in Experimental Psycholinguistics (STEP), in Edmonton, Canada. The data is collected in a noisy environment with participants who had already seen the experiment. The experiment is simplified version of an eye-tracking experiment of Vince Poretta (in prep; 2015).

Download the data to your working directory and load the data in R (see more info about loading data below)

```
download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/Data/gazedata.rda", "countdat.rda")
load('countdat.rda')
```

Inspect the data with the follwoing two commands. Pay attention to the column `InterestAreaLabel`

.

```
head(countdat)
str(countdat)
```

In this assignment we are interested in the looks to the Target (e.g., “shop”) in comparison with the looks to the Onset Competitor (e.g., “shut”) and RhymeCompetitor (e.g., “drop”), and with the looks to a distractor (e.g., “fill”).

First, we plot the data. The data plots provide a sanity check for the GAMMs: if the pattern of the smooths are completely off (different range of values, or a completely different pattern), you may want to see what causes these differences.

Calculating the grand averages of the proportions of looks to the different areas of interest per timebin.

```
# IMPORTANT: note the spaces in the column names:
colnames(countdat)
# removing spaces from column names, because these will give trouble:
colnames(countdat) <- gsub("\\ ","", colnames(countdat))
# calculating proportions:
countdat$TotalCount <- with(countdat, InterestAreaLabel.Target_IA
+ InterestAreaLabel.OnsetComp_IA
+ InterestAreaLabel.RhymeComp_IA
+ InterestAreaLabel.Distract_IA
+ InterestAreaLabel.NA )
countdat$propTarget <- with(countdat, InterestAreaLabel.Target_IA / TotalCount )
countdat$propOnset <- with(countdat, InterestAreaLabel.OnsetComp_IA / TotalCount )
countdat$propRhyme <- with(countdat, InterestAreaLabel.RhymeComp_IA / TotalCount )
countdat$propDistr <- with(countdat, InterestAreaLabel.Distract_IA / TotalCount )
```

Plot the grand averages per timebin. Use your preferred plotting functions, or use the code below.

```
# Setup empty plot window:
emptyPlot(range(countdat$Timebin), c(0,1),
h=.5,
main='Looks to interest areas', xlab='Time (ms)', ylab="Proportion")
# Distractor
avg <- aggregate(propDistr ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propDistr, col="black", lty=2)
# Onset competitor
avg <- aggregate(propOnset ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propOnset, col="indianred", lwd=2)
# Rhyme competitor
avg <- aggregate(propRhyme ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propRhyme, col="steelblue", lwd=2)
# Target
avg <- aggregate(propTarget ~ Timebin, data=countdat, mean, na.rm=TRUE)
lines(avg$Timebin, avg$propTarget, col="green", lwd=2)
```

As most people seem to have found the target before 2000 ms after word onset, we reduce the data by excluding all data points recorded after 2000 ms:

`countdat <- countdat[countdat$Timebin < 2000,]`

Create a new column that contains a matrix with looks to the target (success) and looks to the onset competitor (failures).

```
countdat$TargetLooks <- cbind(countdat$InterestAreaLabel.Target_IA, countdat$InterestAreaLabel.OnsetComp_IA)
head(countdat)
str(countdat)
```

cbind() data

If you are working in R Studio, the command `View(countdat)`

will show something different than `head(countdat)`

for the column `TargetLooks`

.

The cause for these differences is the use of the function `cbind`

: `cbind`

combines vectors into different columns of a matrix. Try for example:

```
test1 <- 1:10
test2 <- 2:11
mat <- cbind(test1,test2)
class(mat)
```

We have stored a matrix with two columns in the data frame `countdat`

under the name `TargetLooks`

. So each cell of the column `countdat$TargetLooks`

has two values:

`head(countdat$TargetLooks)`

The first column indicates the counts of successes (i.e., looks to the target), and the second column indicates the counts of failures (i.e., looks to the distractor).

Regression functions (e.g., `lm`

, `lmer`

, `bam`

) will recognize this structure as binomial data: counts of successes and failures.

As there are more than 10.000 rows in the data, we will use the function `bam()`

:

`nrow(countdat)`

Run a GAM model that includes the following nonlinear effects:

a main effect of

`Timebin`

a main effect of

`TrialIndex`

a

*random smooth*for subjects over`Timebin`

(include`k=5`

to reduce the running time); the subjects are indicated by the column`RecordingSessionLabel`

, but we change this first into`Subject`

(saves typing).

Because logistic models take more time to run, you can leave out other random effects (it will take too long to complete this lab session).

```
# first order the data:
countdat <- start_event(countdat, column="Timebin", event="Event")
# make RecordingSessionLabel as factor, and store it ito a column with shorter name:
countdat$Subject <- as.factor(countdat$RecordingSessionLabel)
m1 <- bam(TargetLooks ~ ... ,
data=countdat, family="binomial")
```

Inspect the summary of the model.

What does the intercept represent in this model? What does the intercept tell: do people look on average more to the target or to the onset competitor?

What does the smooth over

`Timebin`

represent?

Visualizing the trend over time:

```
par(mfrow=c(1,2))
plot(m1, select=1, scale=0, shade=TRUE)
abline(h=0)
plot_smooth(m1, view="Timebin", rm.ranef=TRUE)
```

Interpretation of the plots. - Why is the shape of the smooth different from the target looks as we saw in the data plots?

- What do these plot tell about the looks to the target versus the looks to the onset competitor?

For interpretation of the results, we could tranform the summed effects back to proportions with the function `plot_smooth`

:

```
par(mfrow=c(1,1))
plot_smooth(m1, view="Timebin", rm.ranef=TRUE,
transform=plogis, ylim=c(0,1))
```

- When do people start to look significantly more to the target than to the onset competitor?

Run a new GAMM model that includes a nonlinear interaction between `Timebin`

and `TrialIndex`

, and a *random smooth* for subjects (`Subject`

) over `Timebin`

(include `k=5`

for the random effect to reduce the running time).

```
m2 <- bam(TargetLooks ~ ... ,
data=countdat, family="binomial")
```

Inspect the summary of model `m2`

.

What does the intercept represent in this model?

What does it mean that the smooth term

`te(Timebin, TrialIndex)`

is significant in the summary?

Compare this model with model `m1`

with the function `compareML`

.

- Could you conclude that the interaction between
`Timebin`

and`TrialIndex`

is significant?

Visualize the interaction:

```
par(mfrow=c(1,2), cex=1.1)
# plot the partial effect:
plot(m2, rug=FALSE, main="Interaction",
cex.axis=2, cex.lab=2) # parameters for increasing font size
# Now with colors:
pvisgam(m2, select=1, view=c("Timebin", "TrialIndex"), zlim=c(-4,14), main="")
```

Which of the two predictor seems to have a stronger effect on the gaze pattern:

`Timebin`

or`TrialIndex`

?How would the

*summed*effect plot look like? What would change?- How do the random effects change the interaction surface?

Plot the summed effects (note: `pvisgam`

is for **P**artial effects, `fvisgam`

for **F**itted values):

`fvisgam(m2, view=c("Timebin", "TrialIndex"), zlim=c(-2,12), main="Interaction", rm.ranef=TRUE)`

- What do the negative and the positive values in the surface represent?

To plot the summed effects in proportion space, add `transform=plogis`

:

```
fvisgam(m2, view=c("Timebin", "TrialIndex"), zlim=c(0,1),
main="Interaction", rm.ranef=TRUE, transform=plogis)
```

To test if native English participants differ from nonnative English participants, we would like to include the factor `NativeEnglish`

.

Run a new GAMM model that splits the nonlinear interaction between `Timebin`

and `TrialIndex`

by `NativeEnglish`

. Don’t forget to include an intercept term (parametric) for `NativeEnglish`

. In addition add a *random smooth* for subjects (`Subject`

) over `Timebin`

(include `k=5`

for the random effect to reduce the running time).

```
countdat$NativeEnglish <- as.factor(countdat$NativeEnglish)
m3 <- bam(TargetLooks ~ ... ,
data=countdat, family="binomial")
```

Inspect the summary of model `m3`

.

What does the summary reveal about the interaction between

`Timebin`

and`Trial`

for the two groups?Can you conclude that the two surfaces are different from each other?

Run a model comparison using the function `compareML`

for the models `m2`

and `m3`

.

Which model is preferred?

- What does this result indicate about the difference between native and nonnative English participants?

Plot the two surfaces and the difference between the two surfaces:

```
par(mfrow=c(1,3))
fvisgam(m3, view=c("Timebin","TrialIndex"),
cond=list(NativeEnglish="Native"),
main="Native", rm.ranef=TRUE, zlim=c(-3,20))
fvisgam(m3, view=c("Timebin","TrialIndex"),
cond=list(NativeEnglish="Nonnative"),
main="Nonnative", rm.ranef=TRUE, zlim=c(-3,20))
plot_diff2(m3, view=c("Timebin","TrialIndex"), plotCI=TRUE,
comp=list(NativeEnglish=c("Native","Nonnative")),
main="Difference", rm.ranef=TRUE, zlim=c(-2,8))
```

Do these plots support your earlier conclusion about the difference between native and nonnative English participants?

Decompose the nonlinear interaction between `Timebin`

and `Time`

, as included in model `m2`

in two main effects and an interaction surface (hint: use `ti()`

). Also include a *random smooth* for subjects (`Subject`

) over `Timebin`

(include `k=5`

for the random effect to reduce the running time).

```
m4 <- bam(TargetLooks ~ ... ,
data=countdat, family="binomial")
```

Inspect the summary of model `m4`

.

What does the interaction term represent?

- What could you conclude on the basis of this summary with respect to the interaction between
`Timebin`

and`TrialIndex`

?

Plot the partial effect of the interaction in model `m4`

and compare these with the plots of model `m2`

.

```
par(mfrow=c(1,2))
pvisgam(m4, view=c("Timebin","TrialIndex"),
select=3,
main="model m4",
zlim=c(-2,2))
pvisgam(m2, view=c("Timebin","TrialIndex"),
select=1,
main="model m2",
zlim=c(-8,8))
```

- Why are these two surfaces so different? Note that the zlimit ranges are different, but also the pattern.

Step by step add interactions with `NativeEnglish`

to model `m4`

to see if the main effects and interaction are influenced by native language. Use model comparisons to determine the best fitting model.

Hand in your best-fitting model. Is there a difference in native and nonnative speakers with respect to looks to the target?

Plot the distribution and the ACF plot of the final model, as indicated below. Do you trust the model, or did you find something that needs to improve? (If so, what did you find?) Do you expect the residuals to follow a normal distribution?

```
par(mfrow=c(1,2))
# check distribution:
qqnorm(resid(m4))
qqline(resid(m4))
# check autocorrelation:
acf(resid(m4))
```

What is the difference between

`te()`

and`ti()`

? Name an advantage and a disadvantage of using`ti()`

.We would like to compare how the Target and the Rhyme competitor are different from each other in the following analysis. Which counts do we need to combine with

`cbind`

for analyzing this difference? (For this assignment you do not have to perform that analysis.)

Hand in the R code, R output, the plots, and the answers on the questions in the section ‘d. Model criticism’.

Deadline: before next class.

Format: html document (R Markdown)

An easy way to include everything in a single document is by using R markdown.

Quick instructions for R Markdown in R Studio: www.sfs.uni-tuebingen.de/~jvanrij/Rinfo.html)

When using *NOT* using R Studio, try the following steps:

```
download.file('http://www.let.rug.nl/~wieling/statscourse/lecture4/lab/answers/lab-including-answers.Rmd', 'analysis.Rmd')
library(rmarkdown)
render('analysis.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'analysis.html'),
sep='')) # shows result
```

Alternatively you could hand in a pdf that contains all the information.

The assignments can be send by mail to vanrij.jacolien ‘AT’ gmail.com, with “LSA lab 2” in the subject line.Run the demo about nonlinear three-way interactions ( `te("Longitude", "Latitude", "WordFrequency")`

) as follows:

```
library(nonlinear)
demoInteraction()
```

Note that the data for this lab session is saved in a different format as the data of previous lab session. Here are some examples for loading frequently used data types in R:

`.Rdata`

or`.rda`

files contain one or more R objects with names.

```
# load some (not existing) file in R:
load('file.rda')
# use ls() to see which objects are loaded in your workspace:
ls()
```

`.rds`

files contain one R object without the name. Therefore, when read in R the object must be stored into a variable.

```
# load some (not existing) file in R:
newdat <- readRDS('file.rds')
# use str() to see what object is loaded:
str(newdat)
```

- text files

```
# for tab delimited text with header (non-existing file):
dat <- read.table('file.txt', header=TRUE, sep='\t', dec='.')
# for cvs file, exported from MS Excel:
dat <- read.table('file.cvs', header=TRUE, sep=',', dec='.')
# see help file for other options to set:
help(read.table)
```