Random Planted Forest
Usage
rpf(x, ...)
# S3 method for class 'data.frame'
rpf(
x,
y,
max_interaction = 1,
ntrees = 50,
splits = 30,
split_try = 10,
t_try = 0.4,
deterministic = FALSE,
nthreads = 1,
purify = FALSE,
cv = FALSE,
loss = "L2",
delta = 0,
epsilon = 0.1,
...
)
# S3 method for class 'matrix'
rpf(
x,
y,
max_interaction = 1,
ntrees = 50,
splits = 30,
split_try = 10,
t_try = 0.4,
deterministic = FALSE,
nthreads = 1,
purify = FALSE,
cv = FALSE,
loss = "L2",
delta = 0,
epsilon = 0.1,
...
)
# S3 method for class 'formula'
rpf(
formula,
data,
max_interaction = 1,
ntrees = 50,
splits = 30,
split_try = 10,
t_try = 0.4,
deterministic = FALSE,
nthreads = 1,
purify = FALSE,
cv = FALSE,
loss = "L2",
delta = 0,
epsilon = 0.1,
...
)
# S3 method for class 'recipe'
rpf(
x,
data,
max_interaction = 1,
ntrees = 50,
splits = 30,
split_try = 10,
t_try = 0.4,
deterministic = FALSE,
nthreads = 1,
purify = FALSE,
cv = FALSE,
loss = "L2",
delta = 0,
epsilon = 0.1,
...
)
Arguments
- x, data
Feature
matrix
, ordata.frame
, orrecipe
.- ...
(Unused).
- y
Target vector for use with
x
. The class ofy
(eithernumeric
orfactor
) determines if regression or classification will be performed.- max_interaction
[1]
: Maximum level of interaction determining maximum number of split dimensions for a tree. The default1
corresponds to main effects only. If0
, the number fo columns inx
is used, i.e. for 10 predictors, this is equivalent to settingmax_interaction = 10
.- ntrees
[50]
: Number of trees generated per family.- splits
[30]
: Number of splits performed for each tree family.- split_try
[10]
: Number of split points to be considered when choosing a split candidate.- t_try
[0.4]
: A value in (0,1] specifying the proportion of viable split-candidates in each round.- deterministic
[FALSE]
: Choose whether approach deterministic or random.- nthreads
[1L]
: Number of threads used for computation, defaulting to serial execution.- purify
[FALSE]
: Whether the forest should be purified. Set toTRUE
to enable components extract withpredict_components()
are valid. Can be achieved after fitting withpurify()
.- cv
[FALSE]
: Determines if cross validation is performed.- loss
["L2"]
: For regression, only"L2"
is supported. For classification,"L1"
,"logit"
and"exponential"
are also available."exponential"
yields similar results as"logit"
while being significantly faster.- delta
[0]
: Only used ifloss
is"logit"
or"exponential"
. Proportion of class membership is truncated to be smaller 1-delta when calculating the loss to determine the optimal split.- epsilon
[0.1]
: Only used if loss ="logit"
or"exponential"
. Proportion of class membership is truncated to be smaller 1-epsilon when calculating the fit in a leaf.- formula
Formula specification, e.g. y ~ x1 + x2.
Examples
# Regression with x and y
rpfit <- rpf(x = mtcars[, c("cyl", "wt")], y = mtcars$mpg)
# Regression with formula
rpfit <- rpf(mpg ~ cyl + wt, data = mtcars)