What This Node Does
The Distinct node removes duplicate rows from your dataset, keeping only unique rows. It compares all columns by default or specific columns you choose, and supports first/last occurrence selection for deduplication. [SCREENSHOT: Distinct node on canvas showing “10,000 → 8,543 rows (1,457 duplicates removed)“]When to Use This Node
Use the Distinct node when you need to:- Remove duplicate records - Clean datasets with accidentally duplicated rows
- Find unique values - Get list of unique customers, products, or categories
- Deduplicate before joins - Remove duplicates to prevent cartesian explosions in joins
- Clean imported data - Remove duplicates from CSV uploads or sync connectors
Step-by-Step Usage Guide
1
Add Distinct node to canvas
2
Connect to upstream data
3
Choose All Columns or Specific Columns mode
All Columns: Row is duplicate only if ALL columns matchSpecific Columns: Row is duplicate if selected columns match[SCREENSHOT: Distinct mode selection]
4
Select columns (if using Specific Columns mode)
Check columns that define uniqueness[SCREENSHOT: Column selection for distinct]
5
Preview deduplicated results
Tips and Best Practices
Sort Before Distinct: To control which duplicate row is kept, sort first. Sort by date (DESC) → Distinct → Keeps most recent.
Specific Columns for Unique Values: To find unique values in one column, use Specific Columns mode with just that column selected.
All Columns for Exact Duplicates: Use All Columns mode to remove only exact duplicate rows (every column matches).
Distinct Before Aggregation: Remove duplicates before aggregation to ensure accurate COUNT results.
Check Duplicate Count: After running, check how many duplicates were removed. If 0, you may have wrong columns selected.
Use with Select: Distinct on fewer columns is faster. Use Select before Distinct to keep only needed columns.

