Skip to main content

What This Node Does

The Aggregation node groups rows by one or more columns and calculates summary statistics like sum, average, count, min, and max. It’s the workflow equivalent of SQL’s GROUP BY statement, essential for creating reports, dashboards, and analytics. [SCREENSHOT: Aggregation node on canvas showing “10,000 rows → 50 groups”]

When to Use This Node

Use the Aggregation node when you need to:
  • Summarize data - Calculate totals, averages, counts across groups
  • Create reports - Generate “Sales by Region”, “Orders by Month”, “Revenue by Product”
  • Prepare dashboard data - Aggregate metrics for chart widgets
  • Find patterns - Identify top customers, peak sales periods, popular products

Step-by-Step Usage Guide

1

Add Aggregation node to canvas

2

Connect to upstream data

3

Select Group By columns

Choose one or more columns to group by (e.g., region, product_category). Leave empty for a single grand total row.[SCREENSHOT: Group By section with column selection]
4

Add aggregations

For each metric, select the column, aggregate function (SUM, AVG, COUNT, etc.), and output name.[SCREENSHOT: Aggregation configuration showing multiple functions]
5

Preview aggregated results

Tips and Best Practices

Filter Before Aggregating: Apply Filter nodes before Aggregation to reduce data volume. Filter → Aggregate is much faster than Aggregate → Filter.
Descriptive Output Names: Use clear names like total_revenue instead of sum_sales_amount. This makes dashboards and reports easier to understand.
Handle NULLs: COUNT ignores NULL values, but SUM treats them as zero. Filter out NULLs before aggregating if this affects your calculations.
Sort After Aggregate: Add a Sort node after Aggregation to order results by your metrics (e.g., highest to lowest sales).
Date Grouping: Extract date parts (year, month, quarter) using a Formula node BEFORE aggregating. Group by order_year instead of full timestamps for better performance.
Fewer Groups = Faster: Grouping by columns with fewer unique values (region: 4 values) is faster than high-cardinality columns (customer_id: 1M values).