upaksta’s blog

忘れないうちにメモする備忘録です

データの標準化 R

ベクトル、行列、データフレームのデータを標準化する

①、ベクトルデータの場合

> x <- c(1,2,3,4,5)

> zx <- (x - mean(x))/sdx
> zx
[1] -1.2649111 -0.6324555 0.0000000 0.6324555 1.2649111

これをscale()関数でできる

> scale(x)
[,1]
[1,] -1.2649111
[2,] -0.6324555
[3,] 0.0000000
[4,] 0.6324555
[5,] 1.2649111
attr(,"scaled:center")
[1] 3
attr(,"scaled:scale")
[1] 1.581139

②、行列データ

> m <- matrix(1:9,3,3)
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> scale(m)
[,1] [,2] [,3]
[1,] -1 -1 -1
[2,] 0 0 0
[3,] 1 1 1
attr(,"scaled:center")
[1] 2 5 8
attr(,"scaled:scale")
[1] 1 1 1

> m1 <- c(1,2,3)

> m2 <- c(4,5,6)
> m3 <- c(7,8,9)

> scale(m1)
[,1]
[1,] -1
[2,] 0
[3,] 1
attr(,"scaled:center")
[1] 2
attr(,"scaled:scale")
[1] 1

> scale(m2)
[,1]
[1,] -1
[2,] 0
[3,] 1
attr(,"scaled:center")
[1] 5
attr(,"scaled:scale")
[1] 1> scale(m3)
[,1]
[1,] -1
[2,] 0
[3,] 1
attr(,"scaled:center")
[1] 8
attr(,"scaled:scale")
[1] 1
→ 各列に対して標準化を行っている

③、データフレームの標準化

> zmtcars <- scale(mtcars)

> zmtcars (#各データの要素は多いので省略)

attr(,"scaled:center")
mpg cyl disp hp drat wt qsec vs
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 0.437500
am gear carb
0.406250 3.687500 2.812500
attr(,"scaled:scale")
mpg cyl disp hp drat wt qsec vs
6.0269481 1.7859216 123.9386938 68.5628685 0.5346787 0.9784574 1.7869432 0.5040161
am gear carb
0.4989909 0.7378041 1.6152000

→ 行列と同じく、列ごとに標準化される