xflow Help

Aggregate

The Aggregate component consolidates data by applying functions such as sum, average, count, min, max, and others to combine multiple rows into single outputs. It efficiently extracts key metrics, trends, or summaries from large datasets.

  • It supports various data types, including numeric, textual, and date-based data, making it adaptable for a wide range of analytical tasks.

  • Additionally, users can optionally group rows by a category column.

Configuration

Upon clicking the Aggregate node, users are presented with the following fields in the configuration section:

Defining and Naming New Aggregate Columns

Users can define new aggregate columns by selecting columns and specifying aggregate functions. This section contains three fields:

Aggregate Type: Users can select appropriate aggregate type

  • Count

  • Count Distinct Values

  • First Value

  • Last Value

  • Max

  • Min

  • Sum

  • Mean (Average)

Select Column: Users can select the column for which they want to apply the aggregate component.

Alias Name: Users can provide a name for the new aggregated column.

  • Clicking on the + Add Selector button allows selection of multiple columns.

Selecting columns for grouping (Optional)

When columns are selected for grouping, the previously defined aggregate operations (such as sum, average, count) will be applied based on these grouping columns. This organizes the data into groups according to the selected columns, and the specified aggregate operations are then performed on each group separately.

Select group by pass through (Optional)

Regardless of the columns used for grouping or aggregation, all selected columns in this group by pass through will be displayed in the results.

If no columns are selected for grouping, and no columns are included for the group by pass-through, only the resultant aggregated new column will be displayed in the output.

Example Usage

Let's consider a scenario where we have a dataset containing information about sales transactions at a retail store. Each row represents a transaction, and the columns include the transaction ID, product name, quantity sold, and the total sale amount.

Dataset

Transaction ID

Product Name

Sale Amount

Sale Date

1

Laptop

1200

05-01-2023

2

Smartphone

800

10-01-2023

3

Tablet

500

15-01-2023

4

Laptop

1500

20-01-2023

5

Headphones

100

25-01-2023

Here's how we can apply each aggregate function:

Count

Problem Statement: To determine the total number of transactions for a particular product.

  • Using the COUNT function, we can count the number of rows, which gives us the total transaction count.

This can be of two ways:

Enabling Group By Pass Through

Transaction ID

Product Name

Sale Amount

Sale Date

Transaction Count

1

Laptop

1200

05-01-2023

2

2

Smartphone

800

10-01-2023

1

3

Tablet

500

15-01-2023

1

4

Laptop

1500

20-01-2023

2

5

Headphones

100

25-01-2023

1

Disabling Group By Pass Through

Product Name

Transaction Count

Smartphone

1

Headphones

1

Tablet

1

Laptop

2

Count Distinct Values

Problem Statement: To determine the number of unique products sold.

  • By applying the COUNT_DISTINCT_VALUES function on the "product name" column and selecting the same column in the GROUP BY section, we can calculate the total number of unique products sold.

Enabling Group By Pass Through

Transaction ID

Product Name

Sale Amount

Sale Date

Unique Count

1

Laptop

1200

05-01-2023

1

2

Smartphone

800

10-01-2023

1

3

Tablet

500

15-01-2023

1

4

Laptop

1500

20-01-2023

1

5

Headphones

100

25-01-2023

1

Disabling Group By Pass Through

Product Name

Unique Count

Tablet

1

Laptop

1

Smartphone

1

Headphones

1

First Value and Last Value

Problem Statement: To retrieve the initial and final transaction IDs for a specific product.

  • We can utilize the FIRST_VALUE and LAST_VALUE functions on the "transaction ID" column, with the product name selected as the group by column. This allows us to pinpoint the first and last transaction IDs, respectively.

Enabling Group By Pass Through

Transaction ID

Product Name

Sale Amount

Sale Date

First Transaction

Last Transaction

1

Laptop

1200

05-01-2023

1

4

2

Smartphone

800

10-01-2023

2

2

3

Tablet

500

15-01-2023

3

3

4

Laptop

1500

20-01-2023

1

4

5

Headphones

100

25-01-2023

5

5

Disabling Group By Pass Through

Product Name

First Transaction

Last Transaction

Headphones

5

5

Tablet

3

3

Laptop

1

4

Smartphone

2

2

MAX and MIN

Problem Statement: To determine the highest and lowest sales amount for any product.

  • By applying the MAX function on the "Sale Amount" column, we can find the maximum Sale Amount, and similarly, by applying the MIN function, we can find the minimum Sale Amount.

Enabling Group By Pass Through

Transaction ID

Product Name

Sale Amount

Sale Date

Min Sale Amount

Max Sale Amount

1

Laptop

1200

05-01-2023

1200

4

2

Smartphone

800

10-01-2023

800

2

3

Tablet

500

15-01-2023

500

3

4

Laptop

1500

20-01-2023

1200

4

5

Headphones

100

25-01-2023

100

5

Disabling Group By Pass Through

Product Name

Min Sale Amount

Max Sale Amount

Smartphone

800

2

Headphones

100

5

Laptop

1200

4

Tablet

500

3

SUM

Problem Statement: To calculate the total sale amount generated from all transactions of a particular "product name".

  • By applying SUM function on the "Sale Amount" column and selecting "product name" column in GROUP BY section. This allows us to sum up the sale amounts across all transactions, providing the total sale amount of that particular product.

Enabling Group By Pass Through

Transaction ID

Product Name

Sale Amount

Sale Date

Sum Amount

1

Laptop

1200

05-01-2023

2700

2

Smartphone

800

10-01-2023

800

3

Tablet

500

15-01-2023

500

4

Laptop

1500

20-01-2023

2700

5

Headphones

100

25-01-2023

100

Disabling Group By Pass Through

Product Name

Sum Amount

Laptop

2700

Headphones

100

Tablet

500

Smartphone

800

MEAN

Problem Statement: To compute the average sale amount for a specific product.

  • We apply the MEAN function on the "Sale Amount" column and select the "product name" column in the GROUP BY section. This allows us to calculate the average sale amount for each product.

Enabling Group By Pass Through

Transaction ID

Product Name

Sale Amount

Sale Date

Mean Amount

1

Laptop

1200

05-01-2023

1350

2

Smartphone

800

10-01-2023

800

3

Tablet

500

15-01-2023

500

4

Laptop

1500

20-01-2023

1350

5

Headphones

100

25-01-2023

100

Disabling Group By Pass Through

Product Name

Mean Amount

Tablet

500

Headphones

100

Smartphone

800

Laptop

1350

Last modified: 21 February 2025