What This Node Does
The Describe Data node generates statistical summaries and profiles of your dataset. It analyzes numeric columns (mean, median, quartiles) and categorical columns (top values, frequencies), providing a comprehensive overview of your data’s characteristics and distributions. [SCREENSHOT: Describe Data node showing statistical summary output]When to Use This Node
Use the Describe Data node when you need to:- Understand your data - Get quick statistical overview of all columns
- Profile numeric columns - See mean, median, quartiles, min/max for numeric data
- Analyze categories - Identify most frequent values in categorical columns
- Check data quality - Spot outliers, missing patterns, or unusual distributions
Step-by-Step Usage Guide
1
Add Describe Data node
2
Connect input data
3
Configure quantiles
Set number of quantiles for numeric columns (default: 2 = median)[SCREENSHOT: Quantiles configuration]
4
Configure top K
Set number of top values to show for categorical columns (default: 3)[SCREENSHOT: Top K configuration]
5
Review statistical summary
Tips and Best Practices
Use Early in Workflows: Add Describe Data right after Input to understand data before transformations.
Adjust Quantiles for Detail: Use 4 quantiles for quartile analysis (25th, 50th, 75th percentiles). Use 10 for decile analysis.
Top K for Categories: Increase Top K to 5-10 for high-cardinality columns to see more frequent values.
Spot Data Issues: Look for unexpected mins/maxes, unusual distributions, or missing data patterns.
Compare Before/After: Use Describe Data before and after cleaning steps to verify transformations worked correctly.
Not for ML Preprocessing: This node describes data but doesn’t transform it. Use Formula, Convert, or other transformation nodes for actual data prep.

