14 oct 2018

CARET:Splitting Based on the Predictors

I´m practicing with CARET, and the best way is to follow the tutorials in the webpage. This time is the way how we can split the data with Caret:

4.2:Splitting Based on the Predictors

Read and try to understand the concept.
I try to write the code of the plot for the plot  of the figure, and finally more or less I do it:

testing <- scale(BostonHousing[, c("age", "nox")])
set.seed(5)
## A random sample of 5 data points
startSet <- sample(1:dim(testing)[1], 5)
samplePool <- testing[-startSet,]
start <- testing[startSet,]
newSamp <- maxDissim(start, samplePool, n = 20)
newSamp<-samplePool[newSamp,]
rownames(newSamp)<-c(1:20)
plot(samplePool[,1],samplePool[,2],pch=20,

     col="grey",xlim=c(-2.500,1.400),
     ylim=c(-1.600,2.900),xlab="age",ylab="nox")
par(new=TRUE)
plot(start[,1],start[,2],pch="S",

     col="red",xlim=c(-2.500,1.400),
     ylim=c(-1.600,2.900),xlab="",ylab="",cex=1.3, font=2)
par(new=TRUE)
plot(newSamp[,1],newSamp[,2],col="blue",xlim=c(-2.500,1.400),
     ylim=c(-1.600,2.900),xlab="",ylab="")
text(newSamp[,1],newSamp[,2],

     labels=rownames(newSamp),cex=1.3,     
     font=2)

The samples chosen are different because of the random order. See how the distribution of the chosen samples cover the structure of the data.

No hay comentarios:

Publicar un comentario