I´m practicing with CARET, and the best way is to follow the tutorials in the webpage. This time is the way how we can split the data with Caret:
4.2:Splitting Based on the Predictors
Read and try to understand the concept.
I try to write the code of the plot for the plot of the figure, and finally more or less I do it:
testing <- scale(BostonHousing[, c("age", "nox")])
set.seed(5)
## A random sample of 5 data points
startSet <- sample(1:dim(testing)[1], 5)
samplePool <- testing[-startSet,]
start <- testing[startSet,]
newSamp <- maxDissim(start, samplePool, n = 20)
newSamp<-samplePool[newSamp,]
rownames(newSamp)<-c(1:20)
plot(samplePool[,1],samplePool[,2],pch=20,
col="grey",xlim=c(-2.500,1.400),
ylim=c(-1.600,2.900),xlab="age",ylab="nox")
par(new=TRUE)
plot(start[,1],start[,2],pch="S",
col="red",xlim=c(-2.500,1.400),
ylim=c(-1.600,2.900),xlab="",ylab="",cex=1.3, font=2)
par(new=TRUE)
plot(newSamp[,1],newSamp[,2],col="blue",xlim=c(-2.500,1.400),
ylim=c(-1.600,2.900),xlab="",ylab="")
text(newSamp[,1],newSamp[,2],
labels=rownames(newSamp),cex=1.3,
font=2)
The samples chosen are different because of the random order. See how the distribution of the chosen samples cover the structure of the data.
No hay comentarios:
Publicar un comentario