Random Planted Forest

Usage

rpf(x, ...)

# S3 method for class 'data.frame'
rpf(
  x,
  y,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

# S3 method for class 'matrix'
rpf(
  x,
  y,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

# S3 method for class 'formula'
rpf(
  formula,
  data,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

# S3 method for class 'recipe'
rpf(
  x,
  data,
  max_interaction = 1,
  ntrees = 50,
  splits = 30,
  split_try = 10,
  t_try = 0.4,
  deterministic = FALSE,
  nthreads = 1,
  purify = FALSE,
  cv = FALSE,
  loss = "L2",
  delta = 0,
  epsilon = 0.1,
  ...
)

Arguments

x, data: Feature matrix, or data.frame, or recipe.
...: (Unused).
y: Target vector for use with x. The class of y (either numeric or factor) determines if regression or classification will be performed.
max_interaction: [1]: Maximum level of interaction determining maximum number of split dimensions for a tree. The default 1 corresponds to main effects only. If 0, the number fo columns in x is used, i.e. for 10 predictors, this is equivalent to setting max_interaction = 10.
ntrees: [50]: Number of trees generated per family.
splits: [30]: Number of splits performed for each tree family.
split_try: [10]: Number of split points to be considered when choosing a split candidate.
t_try: [0.4]: A value in (0,1] specifying the proportion of viable split-candidates in each round.
deterministic: [FALSE]: Choose whether approach deterministic or random.
nthreads: [1L]: Number of threads used for computation, defaulting to serial execution.
purify: [FALSE]: Whether the forest should be purified. Set to TRUE to enable components extract with predict_components() are valid. Can be achieved after fitting with purify().
cv: [FALSE]: Determines if cross validation is performed.
loss: ["L2"]: For regression, only "L2" is supported. For classification, "L1", "logit" and "exponential" are also available. "exponential" yields similar results as "logit" while being significantly faster.
delta: [0]: Only used if loss is "logit" or "exponential". Proportion of class membership is truncated to be smaller 1-delta when calculating the loss to determine the optimal split.
epsilon: [0.1]: Only used if loss = "logit" or "exponential". Proportion of class membership is truncated to be smaller 1-epsilon when calculating the fit in a leaf.
formula: Formula specification, e.g. y ~ x1 + x2.

Value

Object of class "rpf" with model object contained in $fit.

Examples

# Regression with x and y
rpfit <- rpf(x = mtcars[, c("cyl", "wt")], y = mtcars$mpg)

# Regression with formula
rpfit <- rpf(mpg ~ cyl + wt, data = mtcars)