Data Input
Data Upload
Data Selection
๐ Data Quality Check
Data Structure
Missing Values by Column
โ๏ธ Data Editor
๐ Data Filtering
Filter Your Data
Select which rows to keep based on column values:
Transform Tools - Stack/Unstack/Subsets
๐งช Data Lab: Transform & Clean
๐ฏ Step 1: What's your goal?
๐๏ธ Step 2: Select columns
โถ๏ธ Step 3: Run & Save
๐ Preview
๐ฏ Step 1: Pick a column to filter
๐ง Step 2: Set your condition
โถ๏ธ Step 3: Apply & Save
๐ Preview
๐ฏ Step 1: What needs cleaning?
๐ง Step 2: Fill or replace NAs
โถ๏ธ Step 3: Apply & Save
๐งน Cleaning Report
๐ฏ Step 1: What do you need?
๐ง Step 2: Configure
โถ๏ธ Step 3: Run & Save
๐ Result
๐ Transformation History
Every action you've taken is logged here. You can undo or download any snapshot.๐พ Save Current Data
Save your transformed data to use in Visualization or Statistical Analysis modules.Download Current Data
๐ Quick Data Summary
Sample Size Calculator
Sample Size & Power Calculator
Plan your study with confidence - supports t-tests, ANOVA, proportions, correlation & more
STEP 1: Analysis Type
What are you comparing?
STEP 2: What to Calculate?
Choose your unknown
STEP 3: Basic Parameters
Standard settings
STEP 4: Effect Size
The difference you want to detect
STEP 5: Get Results
Results update automatically
Results calculate automatically as you change inputs
(2 second delay after last change)
RESULTS
Power Curve
Effect Size Guide: What Numbers Should I Use?
What is Effect Size?
Effect size = How big is the difference you want to detect?
Think of it like this: If you're looking for a needle in a haystack...
- Large effect: Looking for a sword (easy to find, need fewer samples)
- Medium effect: Looking for a key (moderate difficulty)
- Small effect: Looking for a needle (hard to find, need MANY samples)
How to Choose Your Effect Size
- Best approach: Use historical data or pilot study to estimate the real difference
- Six Sigma projects: Start with medium effect size for process improvements
- When unsure: Use small effect size (conservative, ensures adequate power)
- Breakthrough changes: You can expect large effects (e.g., new technology vs old)
Effect Size Reference Tables
Cohen's d โ For Comparing Means (t-tests)
What it measures: The difference between two means, expressed in standard deviation units.
Formula: d = (Meanโ - Meanโ) / Standard Deviation
| Size | d value | Real-World Example |
|---|---|---|
| Small | 0.2 | Height difference between 15 and 16 year old girls (~0.5 inch) |
| Medium | 0.5 | Height difference between 14 and 18 year old girls (~1.5 inch) |
| Large | 0.8 | Height difference between adult men and women (~2.5 inch) |
Cohen's f โ For ANOVA (3+ Groups)
What it measures: The spread of group means relative to within-group variation.
When to use: Comparing 3 or more groups (e.g., 3 machines, 4 suppliers, 5 shift teams).
| Size | f value | Real-World Example |
|---|---|---|
| Small | 0.10 | Subtle difference between 4 suppliers (hard to notice visually) |
| Medium | 0.25 | Noticeable difference between machines (visible in box plots) |
| Large | 0.40 | Obvious difference between methods (anyone can see it) |
Cohen's w โ For Chi-Square (Categorical Data)
What it measures: How much the observed proportions differ from expected.
When to use: Contingency tables, testing independence (e.g., defect type vs shift, pass/fail vs supplier).
| Size | w value | Real-World Example |
|---|---|---|
| Small | 0.10 | Slight preference in customer survey (51% vs 49%) |
| Medium | 0.30 | Clear pattern in defect distribution (60% vs 40%) |
| Large | 0.50 | Strong relationship (e.g., 75% defects from one machine) |
Correlation (r) โ For Relationship Strength
What it measures: How strongly two variables move together (ranges from -1 to +1).
When to use: Testing if X and Y are related (e.g., temperature vs yield, training hours vs performance).
| Size | r value | Real-World Example |
|---|---|---|
| Small | ยฑ0.10 | Weak link: coffee consumption vs productivity |
| Medium | ยฑ0.30 | Moderate link: study time vs exam scores |
| Large | ยฑ0.50 | Strong link: height vs weight, practice vs skill |
Cohen's fยฒ โ For Regression (Rยฒ Significance)
What it measures: How much variance in Y is explained by your predictors (X variables).
Formula: fยฒ = Rยฒ / (1 - Rยฒ)
| Size | fยฒ value | Equivalent Rยฒ | Meaning |
|---|---|---|---|
| Small | 0.02 | ~2% | Model explains little variance (but may still be useful) |
| Medium | 0.15 | ~13% | Model explains moderate variance (typical for social sciences) |
| Large | 0.35 | ~26% | Model explains substantial variance (strong predictive model) |
Quick Decision Guide: Which Effect Size Should I Use?
| Don't know what to expect? | โ Use SMALL (conservative, won't under-power your study) | |
| Typical process improvement? | โ Use MEDIUM (most common in Six Sigma) | |
| Major change or new technology? | โ Use LARGE (breakthrough improvements) | |
| Have pilot data or historical data? | โ CALCULATE your actual expected effect size! |
How Effect Size Impacts Sample Size
Example: Two-sample t-test at ฮฑ=5%, Power=80%
| Effect Size | Cohen's d | n per group | Total N |
|---|---|---|---|
| Small | 0.2 | 393 | 786 |
| Medium | 0.5 | 64 | 128 |
| Large | 0.8 | 26 | 52 |
Notice: Detecting small effects requires 15x more samples than detecting large effects!
Analysis
Statistical Analysis
Six Sigma Inferential Statistics Tool
Enter your sample data to calculate confidence intervals or test hypotheses.
Perfect for DMAIC projects when you have collected measurements.
Step 1: Choose Your Analysis
Step 2: Enter Your Sample Data
Step 3: Analysis Settings
Results Summary
Visual Results
Detailed Analysis
Six Sigma Interpretation
Statistical Assumptions
Data Visualization
๐ Plot Mode
๐ Variable Selection
X-Axis Variable(s)
Y-Axis Variable(s) (Optional)
๐จ Choose Plot Type
๐ Layout Options
โจ Additional Mappings
๐ Comparison Setup
๐ Distribution Overview
Advanced Visualization
๐ Advanced Plot Setup
๐ Data Requirements
๐ฏ Map Your Data Columns
๐จ Additional Mappings
๐ Upload Your Data
๐ Data Preview
๐พ Export Plot
Statistical Process Control
Control Chart Selection Guide
Control Rules Selection
Select which rules to detect out-of-control conditions:
โข Rule 1: Any point beyond control limits
โข Rule 2: Process shift or bias detected
โข Rule 3: Systematic trend in process
โข Rule 4: Excessive variation or overcontrol
โข Rule 5: Points near control limits
โข Rule 6: Process moving away from center
Download Data Template
Control Charts
Process Statistics
Out of Control Signals
Pareto Analysis
Pareto Analysis Settings
Pareto Chart
Analysis Results
Pareto Summary
80/20 Analysis
Process Capability Analysis
Process Capability Analysis Settings
Process Capability Chart
Capability Metrics
Overall Capability
Potential (Within)
Performance
Z Benchmark
Process Capability Sixpack (Minitab Style)
Normal Probability Plot
Process Performance Metrics
Detailed Capability Analysis Results
Non-Normal Capability Analysis
Non-Normal Process Capability Analysis Settings
Non-Normal Process Capability Chart
Non-Normal Capability Analysis Results
Distribution Fitting Details:
About Non-Normal Capability Analysis
Non-normal capability analysis uses fitted distributions to properly calculate capability indices when data doesn't follow a normal distribution. Standard Cp and Cpk indices can lead to incorrect conclusions with non-normal data.
Metrics Provided:
- Z-bench: Calculates process capability from percentiles of the fitted distribution
- Pp(percentile): Process performance index based on percentiles
- Ppk(percentile): Process performance index taking into account process centering
- PPM (Parts Per Million): Expected defect rates based on the fitted distribution
Distribution Selection:
- Auto (Best Fit): Automatically selects the best-fitting distribution using Anderson-Darling statistic
- Manual Selection: Choose a specific distribution that might be appropriate for your process
Non-normal capability analysis is particularly important for processes with natural skewness, such as those with physical boundaries at zero (e.g., diameter, surface roughness).
Distribution Fitting
Distribution Fitting & Identification
Identify the best-fitting distribution for your data - Minitab-style analysis
STEP 1: Select Data
Choose numeric variable to analyze
STEP 2: Select Distributions
Choose distributions to fit (Minitab-style)
Symmetric Distributions
Reliability & Life Data
Right-Skewed Distributions
For Min/Max Data
For Data with Outliers
For Bounded Data
Data Transformations
STEP 3: Options (Optional)
Specification limits & settings
Specification Limits (for Capability)
Advanced Settings
DISTRIBUTION FITTING RESULTS
Distribution Rankings
Legend:
Interpretation Guide
Histogram with Fitted Distributions
Probability Plots
Q-Q Plot (Quantile-Quantile)
P-P Plot (Probability-Probability)
All Distributions - Q-Q Grid
Parameter Estimates
All Parameters Summary:
Percentile Estimates
Inverse Lookup: Find Percentile for a Value
Process Capability (Non-Normal)
Random Data Generation
Generated Data Summary:
Download DataHistogram of Generated Data:
Distribution Selection Guide
Continuous Data
- Normal: Symmetric, bell-shaped
- Lognormal: Right-skewed, positive values
- Gamma: Right-skewed, waiting times
Reliability/Lifetime
- Weibull: Failure times, bathtub curve
- Exponential: Constant failure rate
- Loglogistic: Accelerated life testing
Extreme Values
- Gumbel: Maximum values
- SEV: Minimum values
- Pareto: Heavy-tailed phenomena
Special Cases
- Beta: Bounded [0,1] proportions
- Uniform: Equal probability
- Cauchy: Heavy tails, no mean
Data Transformation
Data Transformation Tools
Before and After Transformation
Transformation Results
Normality Test Results:
Transformed Specification Limits:
Use these values for normal capability calculations:
One-Way ANOVA Settings
โข Select a numeric response variable (continuous outcome)
โข Select a categorical factor variable (groups to compare)
โข Numeric variables with โค10 unique values are included as potential factors
โข For continuous predictors with >10 values, use regression analysis instead
Two-Way ANOVA Settings
โข Select a numeric response variable (continuous outcome)
โข Select two different categorical factor variables
โข Numeric variables with โค10 unique values are included as potential factors
โข For continuous predictors with >10 values, use regression analysis instead
โข Interaction term tests if the effect of one factor depends on the other
๐ Generalized ANOVA Settings
๐ Analysis Results
โน๏ธ Generalized ANOVA Information
About Generalized ANOVA
Generalized ANOVA allows you to analyze the relationship between one continuous response variable and multiple factors and/or covariates.
- Factors: Categorical variables (groups)
- Covariates: Continuous variables used as controls
- Interactions: Test whether the effect of one variable depends on another
- Model Types: Choose between ANOVA, Linear Model, or Mixed Effects approaches
Model Interpretation
- Main Effects: Individual contribution of each factor/covariate
- Interaction Effects: Combined effects between variables
- F-statistic: Test of significance for each effect
- p-value < 0.05: Statistically significant effect
Gage R&R (Continuous)
Gage R&R Analysis
Data Input
New to Gage R&R?
Download a template file to see the required data format:
Download Template CSVThe template includes:
- Part column: Unique part identifiers
- Operator column: Operator names/IDs
- Measurement column: Numeric measurements
- Multiple measurements per part-operator combination
Nested: Each operator measures different/unique parts
Gage Evaluation
ANOVA Results
Range Chart
Xbar Chart
These 6 plots mirror Minitab's Gage R&R output. Use them together to diagnose measurement issues.
1. Components of Variation
2. R Chart by Operator
3. Xbar Chart by Operator
4. Measurements by Part
5. Measurements by Operator
6. Operator x Part Interaction
Bootstrap Confidence Intervals on Variance Components
95% BCa bootstrap intervals โ more robust than Minitab's Satterthwaite approximation.
Generate Management Report
Create a comprehensive HTML report for management presentation.
Generate & Download Report
Import Previous Study
Upload a previous JSON export to compare trends.
Report Preview
Click 'Generate & Download Report' above to create and download a comprehensive HTML report with all analysis results and charts.
Attribute Agreement Analysis (Gage R&R for Attributes)
Before Regression Analysis
Check your data quality and assumptions before running regression
Regression Diagnostics
Fit a linear model, check all assumptions, download a complete report
Residuals vs Fitted
Normal Q-Q Plot
Scale-Location
Cook's Distance
Correlation Circle Plot
Correlation Heatmap
Regression and Correlation Analysis
Regression and Correlation Analysis
Correlation Highlighting
Categorical Variables
Ridge Regression Parameters
Advanced Analysis
Correlation Matrix
High Correlation Warnings
Correlation Visualization
Regression Plot
Regression Summary
Variance Inflation Factor (VIF) - Multicollinearity Analysis
VIF values indicate the degree of multicollinearity. Generally:
- VIF = 1: No correlation
- VIF < 5: Moderate correlation (acceptable)
- VIF > 5: High correlation (concerning)
- VIF > 10: Severe multicollinearity (problematic)
Multiple Regression Summary
Variance Inflation Factor (VIF) - Multicollinearity Analysis
VIF values indicate the degree of multicollinearity. Generally:
- VIF = 1: No correlation
- VIF < 5: Moderate correlation (acceptable)
- VIF > 5: High correlation (concerning)
- VIF > 10: Severe multicollinearity (problematic)
Pareto Chart of Standardized Effects
Bars extending beyond the reference line indicate statistically significant predictors at ฮฑ = 0.05
Ridge Regression Summary
Diagnostic Plots
Regression Equation and Model Details
Model Performance
ANOVA Table
Prediction Tool
Enter Values for Prediction
Prediction Results
Logistic Regression Analysis
Binary Logistic Regression Analysis
๐ Model Equation
๐ฏ Business Insights & Strategic Recommendations
Model Summary
Model Performance Metrics
๐ Model Coefficients & Business Impact Analysis
๐ฏ Odds Ratios & Strategic Impact
Confusion Matrix
Classification Metrics
Diagnostic Plots
ROC Curve Analysis
ROC Statistics
๐ฎ Strategic Scenario Planning Tool
Enter Values for Prediction
Prediction Results
Design of Experiments (DOE) Analysis
Taguchi Design of Experiments
Robust parameter design for process optimization using orthogonal arrays and signal-to-noise ratios
CI