Package 'deltatest'

Title: Statistical Hypothesis Testing Using the Delta Method
Description: Statistical hypothesis testing using the Delta method as proposed by Deng et al. (2018) <doi:10.1145/3219819.3219919>. This method replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which can account for within-user correlation.
Authors: Koji Makiyama [aut, cre, cph], Shinichi Takayanagi [med]
Maintainer: Koji Makiyama <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0.9000
Built: 2025-03-15 11:13:37 UTC
Source: https://github.com/hoxo-m/deltatest

Help Index


The Delta Method for Ratio

Description

Applies the Delta method to the ratio of two random variables, f(X,Y)=X/Yf(X,Y)=X/Y, to estimate the expected value, variance, standard error, and confidence interval.

Methods

Public methods


Method new()

Initialize a new DeltaMethodForRatio object.

Usage
DeltaMethodForRatio$new(numerator, denominator, bias_correction = FALSE)
Arguments
numerator, denominator

numeric vectors sampled from the distributions of the random variables in the numerator and denominator of the ratio.

bias_correction

logical value indicating whether correction to the mean of the metric is performed using the second-order term of the Taylor expansion. The default is FALSE.


Method get_expected_value()

Get the expected value.

Usage
DeltaMethodForRatio$get_expected_value()
Returns

numeric estimate of the expected value of the ratio.


Method get_variance()

Get the variance.

Usage
DeltaMethodForRatio$get_variance()
Returns

numeric estimate of the variance of the ratio.


Method get_squared_standard_error()

Get the squared standard error.

Usage
DeltaMethodForRatio$get_squared_standard_error()
Returns

numeric estimate of the squared standard error of the ratio.


Method get_standard_error()

Get the standard error.

Usage
DeltaMethodForRatio$get_standard_error()
Returns

numeric estimate of the standard error of the ratio.


Method get_confidence_interval()

Get the confidence interval.

Usage
DeltaMethodForRatio$get_confidence_interval(
  alternative = c("two.sided", "less", "greater"),
  conf_level = 0.95
)
Arguments
alternative

character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". You can specify just the initial letter.

conf_level

numeric value specifying the confidence level of the interval. The default is 0.95.

Returns

numeric estimates of the lower and upper bounds of the confidence interval of the ratio.


Method get_info()

Get statistical information.

Usage
DeltaMethodForRatio$get_info(
  alternative = c("two.sided", "less", "greater"),
  conf_level = 0.95
)
Arguments
alternative

character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". You can specify just the initial letter.

conf_level

numeric value specifying the confidence level of the interval. The default is 0.95.

Returns

numeric estimates include the expected value, variance, standard error, and confidence interval.


Method compute_expected_value()

Class method to compute the expected value of the ratio using the Delta method.

Usage
DeltaMethodForRatio$compute_expected_value(
  mean1,
  mean2,
  var2,
  cov = 0,
  bias_correction = FALSE
)
Arguments
mean1

numeric value of the mean numerator of the ratio.

mean2

numeric value of the mean denominator of the ratio.

var2

numeric value of the variance of the denominator of the ratio.

cov

numeric value of the covariance between the numerator and denominator of the ratio. The default is 0.

bias_correction

logical value indicating whether correction to the mean of the metric is performed using the second-order term of the Taylor expansion. The default is FALSE.

Returns

numeric estimate of the expected value of the ratio.


Method compute_variance()

Class method to compute the variance of the ratio using the Delta method.

Usage
DeltaMethodForRatio$compute_variance(mean1, mean2, var1, var2, cov = 0)
Arguments
mean1

numeric value of the mean numerator of the ratio.

mean2

numeric value of the mean denominator of the ratio.

var1

numeric value of the variance of the numerator of the ratio.

var2

numeric value of the variance of the denominator of the ratio.

cov

numeric value of the covariance between the numerator and denominator of the ratio. The default is 0.

Returns

numeric estimate of the variance of the ratio


Method compute_confidence_interval()

Class method to compute the confidence interval of the ratio using the Delta method.

Usage
DeltaMethodForRatio$compute_confidence_interval(
  mean,
  standard_error,
  alternative = c("two.sided", "less", "greater"),
  conf_level = 0.95
)
Arguments
mean

numeric value of the estimated mean of the ratio.

standard_error

numeric value of the estimated standard error of the mean of the ratio.

alternative

character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". You can specify just the initial letter.

conf_level

numeric value specifying the confidence level of the interval. The default is 0.95.

Returns

numeric estimates of the lower and upper bounds of the confidence interval of the ratio.


Method clone()

The objects of this class are cloneable with this method.

Usage
DeltaMethodForRatio$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References


Two Sample Z-Test for Ratio Metrics Using the Delta Method

Description

Performs two sample Z-test to compare the ratio metrics between two groups using the delta method. The Delta method is used to estimate the variance by accounting for the correlation between the numerator and denominator of ratio metrics.

Usage

deltatest(
  data,
  formula,
  by,
  group_names = "auto",
  type = c("difference", "relative_change"),
  bias_correction = FALSE,
  alternative = c("two.sided", "less", "greater"),
  conf.level = 0.95,
  na.rm = FALSE,
  quiet = FALSE
)

Arguments

data

data.frame containing the numerator and denominator columns of the ratio metric, aggregated by randomization unit. It also includes a column indicating the assigned group (control or treatment). For example, if randomizing by user while the metric is click-through rate (CTR) per page-view, the numerator is the number of clicks per user, and the denominator is the number of page views per user.

formula

expression representing the ratio metric. It can be written in three styles: standard formula x/y ~ group, lambda formula ~ x/y, or NSE expression x/y.

by

character string or symbol that indicates the group column. If the group column is specified in the formula argument, it is not required.

group_names

character vector of length 2 or "auto". It specifies which of the two strings contained in the group column is the control group and which is the treatment group. The first string is considered the control group, and the second string is considered the treatment group. If "auto" is specified, it is interpreted as specifying the strings in the group column sorted in lexicographical order. The default is "auto".

type

character string specifying the test type. If "difference" (default), the hypothesis test evaluates the difference in means of the ratio metric between two groups. If "relative_change", it evaluates the relative change (μ2μ1)/μ1(\mu_2 - \mu_1) / \mu_1 instead. You can specify just the initial letter.

bias_correction

logical value indicating whether correction to the mean of the metric is performed using the second-order term of the Taylor expansion. The default is FALSE.

alternative

character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". You can specify just the initial letter.

conf.level

numeric value specifying the confidence level of the interval. The default is 0.95.

na.rm

logical value. If TRUE, rows containing NA values in the data will be excluded from the analysis. The default is FALSE.

quiet

logical value indicating whether messages should be displayed during the execution of the function. The default is FALSE.

Value

A list with class "htest" containing following components:

statistic

the value of the Z-statistic.

p.value

the p-value for the test.

conf.int

a confidence interval for the difference or relative change appropriate to the specified alternative hypothesis.

estimate

the estimated means of the two groups, and the difference or relative change.

null.value

the hypothesized value of the difference or relative change in means under the null hypothesis.

stderr

the standard error of the difference or relative change.

alternative

a character string describing the alternative hypothesis.

method

a character string describing the method used.

data.name

the name of the data.

References

  • Deng, A., Knoblich, U., & Lu, J. (2018). Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3219819.3219919

Examples

library(dplyr)
library(deltatest)

n_user <- 2000

set.seed(314)
df <- deltatest::generate_dummy_data(n_user) |>
  group_by(user_id, group) |>
  summarise(click = sum(metric), pageview = n(), .groups = "drop")

deltatest(df, click / pageview, by = group)

Generate Dummy Data

Description

Generate random dummy data for simulation studies. For details, see Section 4.3 in Deng et al. (2017).

Usage

generate_dummy_data(
  n_user,
  model = c("Bernoulli", "normal"),
  xi = 0,
  sigma = 0,
  random_unit = c("user", "session", "pageview"),
  treatment_ratio = 0.5
)

Arguments

n_user

integer value specifying the number of users included in the generated data. Since multiple rows are generated for each user, the number of rows in the data exceeds the number of users.

model

character string specifying the model that generates the potential outcomes. It must be one of "Bernoulli" (default) or "normal". You can specify just the initial letter.

xi

numeric value specifying the treatment effect variation (TEV) under the Bernoulli model, where TEV=2ξTEV = 2\xi. This argument is ignored if the model argument is set to "normal". The default is 0.

sigma

numeric value specifying the treatment effect variation (TEV) under the normal model, where TEV=σTEV = \sigma. This argument is ignored if the model argument is set to "Bernoulli". The default is 0.

random_unit

character string specifying the randomization unit. It must be one of "user" (default), "session", or "pageview". You can specify just the initial letter. The default is 0.

treatment_ratio

numeric value specifying the ratio assigned to treatment. The default value is 0.5.

Value

data.frame with the columns user_id, group, and metric, where each row represents a metric value for a page-view.

References

  • Deng, A., Lu, J., & Litz, J. (2017). Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. doi:10.1145/3018661.3018677

Examples

library(deltatest)

set.seed(314)
generate_dummy_data(n_user = 2000)