| Title: | Statistics Utilities |
|---|---|
| Description: | Facilitate reporting for regression and correlation modeling, hypothesis testing, variance analysis, outlier detection, and detailed descriptive statistics. |
| Authors: | Etienne Camenen [aut, cre] |
| Maintainer: | Etienne Camenen <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-09 08:08:43 UTC |
| Source: | https://github.com/ecamenen/gimmemystats |
Redefine the default parameters of rstatix::add_significance()
by adding p-value significance symbols to a data frame.
add_significance0(data, p.col = NULL, output.col = NULL)add_significance0(data, p.col = NULL, output.col = NULL)
data |
a data frame containing a p-value column. |
p.col |
column name containing p-values. |
output.col |
the output column name to hold the adjusted p-values. |
a data frame
library(magrittr) library(rstatix, warn.conflicts = FALSE) data("ToothGrowth") ToothGrowth %>% t_test(len ~ dose) %>% adjust_pvalue() %>% add_significance0("p.adj")library(magrittr) library(rstatix, warn.conflicts = FALSE) data("ToothGrowth") ToothGrowth %>% t_test(len ~ dose) %>% adjust_pvalue() %>% add_significance0("p.adj")
Formats a data frame or vector containing categorical variables and calculates the frequency of each category.
count_category(x, width = 15, collapse = FALSE, sort = TRUE, format = TRUE)count_category(x, width = 15, collapse = FALSE, sort = TRUE, format = TRUE)
x |
Data frame or vector containing categorical variables. |
width |
Integer specifying the maximum width for wrapping text. |
collapse |
Logical specifying whether to merge categories with identical proportions. |
sort |
Logical or character vector. If |
format |
Logical specifying whether to format category names if the input is a vector. |
A tibble with one row per category and the following columns:
Factor specifying the category labels, possibly wrapped to the specified width. When
collapse = TRUE, multiple categories with identical frequencies are
merged into a single label separated by commas.
Integer specifying the frequency count for each category.
# Vector of categorical variable k <- 5 n <- runif(k, 1, 10) %>% round() x <- paste("Level", seq(k)) %>% mapply(function(x, y) rep(x, y), ., n) %>% unlist() count_category(x) # Data frame of categorical variable df <- sapply(seq(k), function(x) runif(10) %>% round()) %>% as.data.frame() colnames(df) <- paste("Level", seq(k)) count_category(df) count_category(x, sort = FALSE, width = 5) count_category(x, sort = seq(k), format = FALSE) x2 <- c(x, rep("Level 6", n[1])) count_category(x2, collapse = TRUE)# Vector of categorical variable k <- 5 n <- runif(k, 1, 10) %>% round() x <- paste("Level", seq(k)) %>% mapply(function(x, y) rep(x, y), ., n) %>% unlist() count_category(x) # Data frame of categorical variable df <- sapply(seq(k), function(x) runif(10) %>% round()) %>% as.data.frame() colnames(df) <- paste("Level", seq(k)) count_category(df) count_category(x, sort = FALSE, width = 5) count_category(x, sort = seq(k), format = FALSE) x2 <- c(x, rep("Level 6", n[1])) count_category(x2, collapse = TRUE)
A dataset containing the distribution of household tasks among different arrangements: Wife, Alternating, Husband, and Jointly. The data represents the frequency of each task performed by each arrangement.
data(housetasks)data(housetasks)
A data.frame with 13 rows (tasks) and 4 columns (arrangements):
Numeric, the frequency of the task performed primarily by the wife.
Numeric, the frequency of the task performed in an alternating manner.
Numeric, the frequency of the task performed primarily by the husband.
Numeric, the frequency of the task performed jointly by both partners.
The dataset was downloaded from the ggpubr GitHub repository:
https://raw.githubusercontent.com/kassambara/ggpubr/refs/heads/master/inst/demo-data/housetasks.txt
data(housetasks) head(housetasks)data(housetasks) head(housetasks)
Detects outliers using methods like IQR, percentiles, Hampel, MAD, or SD.
identify_outliers( x, probabilities = c(0.25, 0.75), method = "iqr", weight = 1.5, replace = FALSE )identify_outliers( x, probabilities = c(0.25, 0.75), method = "iqr", weight = 1.5, replace = FALSE )
x |
Vector containing numerical values. |
probabilities |
Numeric vector specifying probabilities for percentiles. |
method |
Character specifying the method: |
weight |
Double specifying the multiplier for the detection threshold. |
replace |
Logical specifying whether to replace outliers with |
A numeric vector whose content depends on the value of replace:
A numeric vector containing only the detected outlier
values. The vector is named with the original indices or names of x.
A numeric vector of the same length as x, where
detected outliers are replaced by NA.
x <- rnorm(100) identify_outliers(x, method = "iqr") identify_outliers(x, method = "percentiles", probabilities = c(0.1, 0.9)) identify_outliers(x, method = "sd", weight = 3) identify_outliers(x, method = "mad", replace = TRUE)x <- rnorm(100) identify_outliers(x, method = "iqr") identify_outliers(x, method = "percentiles", probabilities = c(0.1, 0.9)) identify_outliers(x, method = "sd", weight = 3) identify_outliers(x, method = "mad", replace = TRUE)
Calculates correlations between multiple variables.
mcor_test( x, y = NULL, estimate = TRUE, p.value = FALSE, method = "spearman", method_adjust = "BH" )mcor_test( x, y = NULL, estimate = TRUE, p.value = FALSE, method = "spearman", method_adjust = "BH" )
x |
Data frame containing numerical variables. |
y |
Data frame containing numerical variables. If |
estimate |
Logical specifying whether to return correlation coefficients. |
p.value |
Logical specifying whether to return adjusted p-values. |
method |
Character specifying the correlation method: |
method_adjust |
Character specifying the p-value adjustment method. |
Depending on the values of estimate and p.value, one of the following:
A numeric matrix of correlation
coefficients, with columns corresponding to variables in x and rows
to variables in y.
A numeric matrix of adjusted p-values,
with columns corresponding to variables in x and rows to variables in
y.
A named list with two elements:
Numeric matrix of correlation coefficients.
Numeric matrix of adjusted p-values.
library(magrittr) x0 <- runif(20) x <- lapply( c(1, -1), function(i) sapply(seq(10), function(j) x0 * i + runif(10, max = 1)) ) %>% Reduce(cbind, .) %>% set_colnames(paste("Variable", seq(20))) y <- lapply( c(1, -1), function(i) sapply(seq(10), function(j) x0 * i + runif(10, max = 1)) ) %>% Reduce(cbind, .) %>% set_colnames(paste("Variable", seq(20))) %>% .[, seq(5)] mcor_test(x) mcor_test( x, y, p.value = TRUE, method = "pearson", method_adjust = "bonferroni" )library(magrittr) x0 <- runif(20) x <- lapply( c(1, -1), function(i) sapply(seq(10), function(j) x0 * i + runif(10, max = 1)) ) %>% Reduce(cbind, .) %>% set_colnames(paste("Variable", seq(20))) y <- lapply( c(1, -1), function(i) sapply(seq(10), function(j) x0 * i + runif(10, max = 1)) ) %>% Reduce(cbind, .) %>% set_colnames(paste("Variable", seq(20))) %>% .[, seq(5)] mcor_test(x) mcor_test( x, y, p.value = TRUE, method = "pearson", method_adjust = "bonferroni" )
Identifies pairwise differences between categories following a chi-squared or Fisher's exact test.
post_hoc_chi2( x, method = "fisher", method_adjust = "BH", digits = 3, count = FALSE, ... )post_hoc_chi2( x, method = "fisher", method_adjust = "BH", digits = 3, count = FALSE, ... )
x |
Data frame, vector, or table. If numeric, treated as a contingency table and the names are considered as categories; otherwise, the levels of the factor or the characters are used. |
method |
Character specifying the statistical test: |
method_adjust |
Character specifying the p-value adjustment method. |
digits |
Integer specifying the number of decimal places for the test statistic. |
count |
Logical specifying if |
... |
Additional arguments passed to |
If x is numeric, it is treated as a contingency table and the names are considered as categories; otherwise, the levels of the factor or the characters are used.
A tibble with pairwise test results containing the following columns:
Character vectors specifying the pair of groups being compared.
Numeric vector specifying the total count or sample size for the comparison.
Numeric vector specifying the test statistic (for chi-squared tests only).
Numeric vector specifying the degrees of freedom (for chi-squared tests only).
Raw p-value for the pairwise comparison, formatted as numeric or character ("< 0.001" for very small p-values).
Character vectors specifying the significance codes for raw p-values: 'ns' (not significant).
False Discovery Rate adjusted p-value using the specified method, formatted as numeric or character ("< 0.001" for very small values).
Character vectors specifying the significance codes for FDR-adjusted p-values: 'ns' (not significant), '' (p < 0.05), '' (p < 0.01), '' (p < 0.001).
For Fisher's exact tests, the statistic and df columns are not included..
x <- c(rep("A", 100), rep("B", 78), rep("C", 25)) post_hoc_chi2(x) x <- data.frame(G1 = c(Yes = 100, No = 78), G2 = c(Yes = 75, No = 23)) post_hoc_chi2(x, count = TRUE, method = "chisq") data("housetasks") housetasks[, c("Wife", "Husband")] %>% t() %>% post_hoc_chi2(count = TRUE, workspace = 1e6) x <- cbind( mapply(function(x, y) rep(x, y), letters[seq(3)], c(7, 5, 8)) %>% unlist(), mapply(function(x, y) rep(x, y), LETTERS[seq(3)], c(6, 6, 8)) %>% unlist() ) post_hoc_chi2(x)x <- c(rep("A", 100), rep("B", 78), rep("C", 25)) post_hoc_chi2(x) x <- data.frame(G1 = c(Yes = 100, No = 78), G2 = c(Yes = 75, No = 23)) post_hoc_chi2(x, count = TRUE, method = "chisq") data("housetasks") housetasks[, c("Wife", "Husband")] %>% t() %>% post_hoc_chi2(count = TRUE, workspace = 1e6) x <- cbind( mapply(function(x, y) rep(x, y), letters[seq(3)], c(7, 5, 8)) %>% unlist(), mapply(function(x, y) rep(x, y), LETTERS[seq(3)], c(6, 6, 8)) %>% unlist() ) post_hoc_chi2(x)
Calculates and prints frequency counts and percentages for binomial (two-level) categorical variables.
print_binomial(x, digits = 1, width = 15)print_binomial(x, digits = 1, width = 15)
x |
Data frame, matrix, or vector containing binomial variables. |
digits |
Integer specifying the number of decimal places for the test statistic. |
width |
Integer specifying the maximum width for wrapping text. |
A tibble with one row per level for each categorical level containing the following columns:
Character vector specifying the name of each variable.
Character vector specifying the category level for each variable.
Character vector combining the frequency count and the percentage for each level.
x <- data.frame(A = sample(c("X", "Y"), 100, replace = TRUE)) print_binomial(x) print_binomial(x, digits = 2, width = 5)x <- data.frame(A = sample(c("X", "Y"), 100, replace = TRUE)) print_binomial(x) print_binomial(x, digits = 2, width = 5)
Formats the results of a Chi-squared or Fisher's exact test.
print_chi2_test(x, digits = 3)print_chi2_test(x, digits = 3)
x |
Test object from |
digits |
Integer specifying the number of decimal places for the test statistic. |
A character string containing the formatted test results with:
For Chi-squared test.
Formatted p-value with significance stars.
Total count for sample size.
For Fisher's exact test, only the P-value and sample size are included.
x <- c(A = 100, B = 78, C = 25) library(rstatix) print_chi2_test(chisq_test(x)) xtab <- as.table(rbind(c(490, 10), c(400, 100))) dimnames(xtab) <- list( group = c("grp1", "grp2"), smoker = c("yes", "no") ) print_chi2_test(fisher_test(xtab))x <- c(A = 100, B = 78, C = 25) library(rstatix) print_chi2_test(chisq_test(x)) xtab <- as.table(rbind(c(490, 10), c(400, 100))) dimnames(xtab) <- list( group = c("grp1", "grp2"), smoker = c("yes", "no") ) print_chi2_test(fisher_test(xtab))
Calculates and prints the median and interquartile range (IQR) or the mean and standard deviation (SD).
print_dispersion(x, digits = 1, width = 15, method = "median")print_dispersion(x, digits = 1, width = 15, method = "median")
x |
Vector containing numerical values. |
digits |
Integer specifying the number of decimal places for the test statistic. |
width |
Integer specifying the maximum width for wrapping text. |
method |
Character specifying the method: |
A character string containing a measure of central tendency and
dispersion. Depending on method, this is either the median and
interquartile range or the mean and standard deviation.
print_dispersion(runif(10)) print_dispersion(runif(10), method = "mean", digits = 2, width = 5)print_dispersion(runif(10)) print_dispersion(runif(10), method = "mean", digits = 2, width = 5)
Calculates and prints frequency counts and percentages for multinomial (multi-level) categorical variables.
print_multinomial(x, label = NULL, digits = 1, width = 15, n = nrow(x), ...)print_multinomial(x, label = NULL, digits = 1, width = 15, n = nrow(x), ...)
x |
Data frame, matrix, or vector containing multinomial variables. |
label |
Character vector specifying the names of the categorical variables. |
digits |
Integer specifying the number of decimal places for the test statistic. |
width |
Integer specifying the maximum width for wrapping text. |
n |
Integer specifying the total number of observations. |
... |
Additional arguments passed to |
A tibble with one row per level for each categorical level containing the following columns:
Character vector specifying the name of each variable.
Character vector specifying the category level for each variable.
Character vector combining the frequency count and the percentage for each level.
x <- data.frame(A = sample(c("X", "Y", "Z"), 100, replace = TRUE)) print_multinomial(x, label = "A") x2 <- rbind(x, data.frame(A = rep("Level A", length(x[x == "Level X", ])))) print_multinomial( x, label = "Variable A", sort = FALSE, n = 90, digits = 2, width = 5 )x <- data.frame(A = sample(c("X", "Y", "Z"), 100, replace = TRUE)) print_multinomial(x, label = "A") x2 <- rbind(x, data.frame(A = rep("Level A", length(x[x == "Level X", ])))) print_multinomial( x, label = "Variable A", sort = FALSE, n = 90, digits = 2, width = 5 )
Prints summary statistics (mean, median, quartiles, range, etc.) for numeric variables.
print_numeric(x, digits = 1, width = 15)print_numeric(x, digits = 1, width = 15)
x |
Data frame, matrix, or vector containing numerical variables. |
digits |
Integer specifying the number of decimal places for the test statistic. |
width |
Integer specifying the maximum width for wrapping text. |
A tibble with one row per numeric variable and the following columns:
Character specifying the variable name.
Character specifying the mean and standard deviation.
Character specifying the median and interquartile range.
Character specifying the first and third quartiles.
Character specifying the minimum and maximum values.
Numeric specifying the kurtosis coefficient.
Numeric specifying the skewness coefficient.
Character specifying the Shapiro-Wilk normality test significance code.
Integer specifying the number of zero values.
Integer specifying the number of missing values.
x <- data.frame(A = rnorm(100), B = rnorm(100)) print_numeric(x) print_numeric(x, digits = 2, width = 5)x <- data.frame(A = rnorm(100), B = rnorm(100)) print_numeric(x) print_numeric(x, digits = 2, width = 5)
Formats the results of a hypothesis test (ANOVA, Kruskal-Wallis, or Wilcoxon).
print_test(x, digits = 0, digits_p = 2)print_test(x, digits = 0, digits_p = 2)
x |
Test object from |
digits |
Integer specifying the number of decimal places for the test statistic. |
digits_p |
Integer specifying the number of decimal places for the p-value. |
A character string containing the formatted test results with:
Name of the statistical test (ANOVA, Kruskal-Wallis, Wilcoxon, t-test, Friedman, or mixed-effects model).
Test statistic (F, K, W, T, or )
with degrees of freedom when applicable.
P-value with significance stars.
library(rstatix) data("ToothGrowth") res <- anova_test(ToothGrowth, len ~ dose) print_test(res) res <- kruskal_test(ToothGrowth, len ~ dose) print_test(res) res <- wilcox_test(ToothGrowth, len ~ supp) print_test(res) library(lmerTest) data("sleepstudy", package = "lme4") res <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) print_test(res)library(rstatix) data("ToothGrowth") res <- anova_test(ToothGrowth, len ~ dose) print_test(res) res <- kruskal_test(ToothGrowth, len ~ dose) print_test(res) res <- wilcox_test(ToothGrowth, len ~ supp) print_test(res) library(lmerTest) data("sleepstudy", package = "lme4") res <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) print_test(res)
Summarizes descriptive statistics for binomial variables
summary_binomial(x, ...)summary_binomial(x, ...)
x |
Data frame, matrix, or vector containing binomial variables. |
... |
Additional arguments passed to |
A tibble with descriptive statistics containing the following columns:
Character vector specifying the name of each variable.
Character vector combining the reference level of a variable with its frequency count and its percentage.
x <- data.frame(A = sample(c("X", "Y"), 100, replace = TRUE)) summary_binomial(x) summary_binomial(x, digits = 2, width = 5)x <- data.frame(A = sample(c("X", "Y"), 100, replace = TRUE)) summary_binomial(x) summary_binomial(x, digits = 2, width = 5)
Formats the output of print_numeric into a concise summary.
summary_numeric(x, ...)summary_numeric(x, ...)
x |
Data frame, matrix, or vector containing numerical variables. |
... |
Additional arguments passed to |
A tibble with one row per numeric variable and the following columns:
Character specifying the variable name.
Character specifying the median and interquartile range.
x <- data.frame(A = rnorm(100), B = rnorm(100)) summary_numeric(x) summary_numeric(x, digits = 2, width = 5)x <- data.frame(A = rnorm(100), B = rnorm(100)) summary_numeric(x) summary_numeric(x, digits = 2, width = 5)
Converts the first character of each string to uppercase and the rest to lowercase.
to_title(x)to_title(x)
x |
A character vector or a list containing strings to convert to title case. |
A character vector with the same length as x, where each element
has its first character converted to uppercase and remaining characters are preserved as-is.
to_title(c("hELLO", "WoRLD", "R")) # Returns: "Hello" "World" "R"to_title(c("hELLO", "WoRLD", "R")) # Returns: "Hello" "World" "R"