Aggregate
The Aggregate component consolidates data by applying functions such as sum, average, count, min, max, and others to combine multiple rows into single outputs. It efficiently extracts key metrics, trends, or summaries from large datasets.
It supports various data types, including numeric, textual, and date-based data, making it adaptable for a wide range of analytical tasks.
Additionally, users can optionally group rows by a category column.
Configuration
Upon clicking the Aggregate node, users are presented with the following fields in the configuration section:
Defining and Naming New Aggregate Columns
Users can define new aggregate columns by selecting columns and specifying aggregate functions. This section contains three fields:
Aggregate Type: Users can select appropriate aggregate type
Count
Count Distinct Values
First Value
Last Value
Max
Min
Sum
Mean (Average)
Select Column: Users can select the column for which they want to apply the aggregate component.
Alias Name: Users can provide a name for the new aggregated column.
Clicking on the + Add Selector button allows selection of multiple columns.
Selecting columns for grouping (Optional)
When columns are selected for grouping, the previously defined aggregate operations (such as sum, average, count) will be applied based on these grouping columns. This organizes the data into groups according to the selected columns, and the specified aggregate operations are then performed on each group separately.
Select group by pass through (Optional)
Regardless of the columns used for grouping or aggregation, all selected columns in this group by pass through will be displayed in the results.
If no columns are selected for grouping, and no columns are included for the group by pass-through, only the resultant aggregated new column will be displayed in the output.
Example Usage
Let's consider a scenario where we have a dataset containing information about sales transactions at a retail store. Each row represents a transaction, and the columns include the transaction ID, product name, quantity sold, and the total sale amount.
Dataset
Transaction ID | Product Name | Sale Amount | Sale Date |
|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 |
2 | Smartphone | 800 | 10-01-2023 |
3 | Tablet | 500 | 15-01-2023 |
4 | Laptop | 1500 | 20-01-2023 |
5 | Headphones | 100 | 25-01-2023 |
Here's how we can apply each aggregate function:
Count
Problem Statement: To determine the total number of transactions for a particular product.
Using the COUNT function, we can count the number of rows, which gives us the total transaction count.
This can be of two ways:
Enabling Group By Pass Through
Transaction ID | Product Name | Sale Amount | Sale Date | Transaction Count |
|---|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 | 2 |
2 | Smartphone | 800 | 10-01-2023 | 1 |
3 | Tablet | 500 | 15-01-2023 | 1 |
4 | Laptop | 1500 | 20-01-2023 | 2 |
5 | Headphones | 100 | 25-01-2023 | 1 |
Disabling Group By Pass Through
Product Name | Transaction Count |
|---|---|
Smartphone | 1 |
Headphones | 1 |
Tablet | 1 |
Laptop | 2 |
Count Distinct Values
Problem Statement: To determine the number of unique products sold.
By applying the COUNT_DISTINCT_VALUES function on the "product name" column and selecting the same column in the GROUP BY section, we can calculate the total number of unique products sold.
Enabling Group By Pass Through
Transaction ID | Product Name | Sale Amount | Sale Date | Unique Count |
|---|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 | 1 |
2 | Smartphone | 800 | 10-01-2023 | 1 |
3 | Tablet | 500 | 15-01-2023 | 1 |
4 | Laptop | 1500 | 20-01-2023 | 1 |
5 | Headphones | 100 | 25-01-2023 | 1 |
Disabling Group By Pass Through
Product Name | Unique Count |
|---|---|
Tablet | 1 |
Laptop | 1 |
Smartphone | 1 |
Headphones | 1 |
First Value and Last Value
Problem Statement: To retrieve the initial and final transaction IDs for a specific product.
We can utilize the FIRST_VALUE and LAST_VALUE functions on the "transaction ID" column, with the product name selected as the group by column. This allows us to pinpoint the first and last transaction IDs, respectively.
Enabling Group By Pass Through
Transaction ID | Product Name | Sale Amount | Sale Date | First Transaction | Last Transaction |
|---|---|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 | 1 | 4 |
2 | Smartphone | 800 | 10-01-2023 | 2 | 2 |
3 | Tablet | 500 | 15-01-2023 | 3 | 3 |
4 | Laptop | 1500 | 20-01-2023 | 1 | 4 |
5 | Headphones | 100 | 25-01-2023 | 5 | 5 |
Disabling Group By Pass Through
Product Name | First Transaction | Last Transaction |
|---|---|---|
Headphones | 5 | 5 |
Tablet | 3 | 3 |
Laptop | 1 | 4 |
Smartphone | 2 | 2 |
MAX and MIN
Problem Statement: To determine the highest and lowest sales amount for any product.
By applying the MAX function on the "Sale Amount" column, we can find the maximum Sale Amount, and similarly, by applying the MIN function, we can find the minimum Sale Amount.
Enabling Group By Pass Through
Transaction ID | Product Name | Sale Amount | Sale Date | Min Sale Amount | Max Sale Amount |
|---|---|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 | 1200 | 4 |
2 | Smartphone | 800 | 10-01-2023 | 800 | 2 |
3 | Tablet | 500 | 15-01-2023 | 500 | 3 |
4 | Laptop | 1500 | 20-01-2023 | 1200 | 4 |
5 | Headphones | 100 | 25-01-2023 | 100 | 5 |
Disabling Group By Pass Through
Product Name | Min Sale Amount | Max Sale Amount |
|---|---|---|
Smartphone | 800 | 2 |
Headphones | 100 | 5 |
Laptop | 1200 | 4 |
Tablet | 500 | 3 |
SUM
Problem Statement: To calculate the total sale amount generated from all transactions of a particular "product name".
By applying SUM function on the "Sale Amount" column and selecting "product name" column in GROUP BY section. This allows us to sum up the sale amounts across all transactions, providing the total sale amount of that particular product.
Enabling Group By Pass Through
Transaction ID | Product Name | Sale Amount | Sale Date | Sum Amount |
|---|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 | 2700 |
2 | Smartphone | 800 | 10-01-2023 | 800 |
3 | Tablet | 500 | 15-01-2023 | 500 |
4 | Laptop | 1500 | 20-01-2023 | 2700 |
5 | Headphones | 100 | 25-01-2023 | 100 |
Disabling Group By Pass Through
Product Name | Sum Amount |
|---|---|
Laptop | 2700 |
Headphones | 100 |
Tablet | 500 |
Smartphone | 800 |
MEAN
Problem Statement: To compute the average sale amount for a specific product.
We apply the MEAN function on the "Sale Amount" column and select the "product name" column in the GROUP BY section. This allows us to calculate the average sale amount for each product.
Enabling Group By Pass Through
Transaction ID | Product Name | Sale Amount | Sale Date | Mean Amount |
|---|---|---|---|---|
1 | Laptop | 1200 | 05-01-2023 | 1350 |
2 | Smartphone | 800 | 10-01-2023 | 800 |
3 | Tablet | 500 | 15-01-2023 | 500 |
4 | Laptop | 1500 | 20-01-2023 | 1350 |
5 | Headphones | 100 | 25-01-2023 | 100 |
Disabling Group By Pass Through
Product Name | Mean Amount |
|---|---|
Tablet | 500 |
Headphones | 100 |
Smartphone | 800 |
Laptop | 1350 |