Skip to contents

Random Planted Forest

Usage

rpf(x, ...)

# S3 method for data.frame
rpf(
  x,
  y,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

# S3 method for matrix
rpf(
  x,
  y,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

# S3 method for formula
rpf(
  formula,
  data,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

# S3 method for recipe
rpf(
  x,
  data,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

Arguments

x, data

Feature matrix, or data.frame, or recipe.

...

(Unused).

y

Target vector for use with x. The class of y (either numeric or factor) determines if regression or classification will be performed.

max_interaction

[1]: Maximum level of interaction determining maximum number of split dimensions for a tree. The default 1 corresponds to main effects only. If 0, the number fo columns in x is used, i.e. for 10 predictors, this is equivalent to setting max_interaction = 10.

ntrees

[50]: Number of trees generated per family.

splits

[30]: Number of splits performed for each tree family.

split_try

[10]: Number of split points to be considered when choosing a split candidate.

t_try

[0.4]: A value in (0,1] specifying the proportion of viable split-candidates in each round.

deterministic

[FALSE]: Choose whether approach deterministic or random.

nthreads

[1L]: Number of threads used for computation, defaulting to serial execution.

purify

[FALSE]: Whether the forest should be purified. Set to TRUE to enable components extract with predict_components() are valid. Can be achieved after fitting with purify().

cv

[FALSE]: Determines if cross validation is performed.

loss

["L2"]: For regression, only "L2" is supported. For classification, "L1", "logit" and "exponential" are also available. "exponential" yields similar results as "logit" while being significantly faster.

delta

[0]: Only used if loss is "logit" or "exponential". Proportion of class membership is truncated to be smaller 1-delta when calculating the loss to determine the optimal split.

epsilon

[0.1]: Only used if loss = "logit" or "exponential". Proportion of class membership is truncated to be smaller 1-epsilon when calculating the fit in a leaf.

formula

Formula specification, e.g. y ~ x1 + x2.

Value

Object of class "rpf" with model object contained in $fit.

Examples

# Regression with x and y
rpfit <- rpf(x = mtcars[, c("cyl", "wt")], y = mtcars$mpg)

# Regression with formula
rpfit <- rpf(mpg ~ cyl + wt, data = mtcars)