Skip to main content

What This Node Does

The Describe Data node generates statistical summaries and profiles of your dataset. It analyzes numeric columns (mean, median, quartiles) and categorical columns (top values, frequencies), providing a comprehensive overview of your data’s characteristics and distributions. [SCREENSHOT: Describe Data node showing statistical summary output]

When to Use This Node

Use the Describe Data node when you need to:
  • Understand your data - Get quick statistical overview of all columns
  • Profile numeric columns - See mean, median, quartiles, min/max for numeric data
  • Analyze categories - Identify most frequent values in categorical columns
  • Check data quality - Spot outliers, missing patterns, or unusual distributions

Step-by-Step Usage Guide

1

Add Describe Data node

2

Connect input data

3

Configure quantiles

Set number of quantiles for numeric columns (default: 2 = median)[SCREENSHOT: Quantiles configuration]
4

Configure top K

Set number of top values to show for categorical columns (default: 3)[SCREENSHOT: Top K configuration]
5

Review statistical summary

Tips and Best Practices

Use Early in Workflows: Add Describe Data right after Input to understand data before transformations.
Adjust Quantiles for Detail: Use 4 quantiles for quartile analysis (25th, 50th, 75th percentiles). Use 10 for decile analysis.
Top K for Categories: Increase Top K to 5-10 for high-cardinality columns to see more frequent values.
Spot Data Issues: Look for unexpected mins/maxes, unusual distributions, or missing data patterns.
Compare Before/After: Use Describe Data before and after cleaning steps to verify transformations worked correctly.
Not for ML Preprocessing: This node describes data but doesn’t transform it. Use Formula, Convert, or other transformation nodes for actual data prep.