--- title: "Vignette 1: Example analysis with GSPCR" output: rmarkdown::html_vignette: css: github-markdown.css toc: true number_sections: true vignette: > %\VignetteIndexEntry{Vignette 1: Example analysis with GSPCR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- SPCR regresses a dependent variable onto a few supervised principal components computed from a large set of predictors. The **steps** followed by SPCR are the following: 1. Regress the dependent variable onto each column of a data set of *p* possible predictors via *p* simple linear regressions. This results in *p* bivariate association measures. 2. Define a subset of the original *p* variables by discarding all variables whose bivariate association measures less than a chosen threshold. 3. Use the subset of original data to estimate *q* PCs. 4. Regress the dependent variable onto the *q* PCs. A key aspect of the method is that both the number of PCs and the threshold value used in step 2 can be determined by cross-validation. GSPCR **extends** SPCR by allowing the dependent variable to be of any measurement level (i.e., ratio, interval, ordinal, nominal) by introducing likelihood-based association measures (or threshold types) in step 1. Furthermore, GSPCR allows the predictors to be of any type by combining the PCAmix framework (Kiers, 1991; Chavent et al., 2017) with SPCR in step 3. The `gspcr` R package allows to: - **tune** the of the threshold values and number of PCs in a GSPCR model; - **plot** the cross-validation trends used to tune the threshold value and the number of PCs to compute; - **estimate** the GSPCR model on a data set; - **predict** observations on both the training data and new, previously unseen, data Before we do anything else, let us **load the packages** we will need for this vignette. If you don't have these packages, please install them using `install.packages()`. ```r # Load R packages library(gspcr) ``` # Parameter tuning We start this vignette by estimating `gspcr` in a very simple scenario with a continuous dependent variable and a set of continuous predictors. First, we store the **example dataset** `GSPCRexdata` (see the helpfile for details `?GSPCRexdata`) in two separate objects: ```r # Comment goal of code X <- GSPCRexdata$X$cont y <- GSPCRexdata$y$cont ``` Then, we randomly select a **subset of the data** to use as a training set. We use 90\% of the data as training data. ```r # Set a seed set.seed(20230415) # Sample a subset of the data train <- sample(x = 1:nrow(X), size = nrow(X) * .9) ``` Now we are ready to **use the** `cv_gscpr()` **function** to cross-validate the threshold value and the number of pcs to be used. ```r # Train the GSPCR model out <- cv_gspcr( dv = y[train], ivs = X[train, ] ) ``` We can then **extract** the cross-validated **solutions** from the resulting object. ```r # Extract solutions out$sol_table ``` ``` thr_value thr_number Q standard -1268.236 2 1 oneSE -1268.236 2 1 ``` # Graphical output We can visually **examine the solution paths** produced by the cross-validation procedure by using the `plot()` functions. ```r # Plot the solution paths plot(out) ```