This function applies the equal count algorithm to divide a set of observations into intervals which can have certain level of ovelapping. It calls `lattice::equal.count` but extends the output.
equal_count(df, vble, n_int = 6, frac = 0.5)
df | dataframe |
---|---|
vble | numeric variable to be analized |
n_int | number of intervals |
frac | overlapping fraction |
a list with two elements:
a tibble where each rows referes to one of the generated interval, with its lower and upper limits, number of values in it and number of values overlapping with the next interval
a tibble in long format where each observation appears as many times as the number of intervals in which it belongs, with an identifier of the observation (`id`, its position in the original data.frame) and an identifier of the interval.
equal_count(iris, Sepal.Length, 15, 0.3)#> $intervals #> # A tibble: 15 × 5 #> n lower upper count overlap #> <int> <dbl> <dbl> <table> <int> #> 1 1 4.25 4.85 16 7 #> 2 2 4.65 5.05 23 16 #> 3 3 4.85 5.15 25 19 #> 4 4 4.95 5.25 23 13 #> 5 5 5.05 5.55 27 13 #> 6 6 5.35 5.65 19 13 #> 7 7 5.45 5.75 21 8 #> 8 8 5.65 5.95 18 10 #> 9 9 5.75 6.15 22 12 #> 10 10 5.95 6.35 25 13 #> 11 11 6.15 6.45 20 16 #> 12 12 6.25 6.65 23 7 #> 13 13 6.45 6.85 18 11 #> 14 14 6.65 7.25 20 9 #> 15 15 6.85 7.95 17 NA #> #> $df_long #> # A tibble: 317 × 7 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species id interval #> <dbl> <dbl> <dbl> <dbl> <fct> <int> <fct> #> 1 5.1 3.5 1.4 0.2 setosa 1 3 #> 2 5.1 3.5 1.4 0.2 setosa 1 4 #> 3 5.1 3.5 1.4 0.2 setosa 1 5 #> 4 4.9 3 1.4 0.2 setosa 2 2 #> 5 4.9 3 1.4 0.2 setosa 2 3 #> 6 4.7 3.2 1.3 0.2 setosa 3 1 #> 7 4.7 3.2 1.3 0.2 setosa 3 2 #> 8 4.6 3.1 1.5 0.2 setosa 4 1 #> 9 5 3.6 1.4 0.2 setosa 5 2 #> 10 5 3.6 1.4 0.2 setosa 5 3 #> # … with 307 more rows #>