xflow Help

Window Functions

The Window Functions can analyze and compute values not just for individual rows, but for groups of rows that have a defined relationship based on partitioning and ordering criteria. This allows to perform operations such as calculating running totals, ranks, or comparing values from previous or subsequent rows within the same partition.

It supports various data types, including numeric, textual, and date-based data, making it adaptable for a wide range of analytical tasks.

Configuration

Upon clicking the Window Functions node, users are presented with the following fields in the configuration section:

Defining and Naming New Window Function Columns

Users can define new Window Function columns by selecting columns and specifying aggregate functions.

Function Type: Choose from the available functions:

  • CUME_DIST

  • DENSE_RANK

  • FIRST_VALUE

  • LAST_VALUE

  • LEAD

  • LAG

  • NTILE

  • PERCENT_RANK

  • RANK

Alias Name: Users can provide a name for the new column.

Select Column: Users can select the column for which they want to apply the function.

  • Clicking on the + Add Selector button allows selection of multiple columns.

Order By (Mandatory)

Specifies the order of rows within each partition for the function application. Ordering determines the sequence in which rows are processed within each partition, which is essential for functions like ranking, lead, lag, and cumulative calculations.

  • Each sorting column includes dropdown option to select the column and set the sorting order (ascending or descending).

Partition By (Optional)

When columns are selected for partitioning, the previously defined window functions will be applied based on these partitioning columns. This organizes the data into partitions according to the selected columns, and the specified window functions are then performed within each partition separately.

Example: If you partition by "Product Name", each product's sales data is treated as a separate group for calculations.

Example Usage

Let's consider a scenario where we have a dataset containing information about sales transactions at a retail store. Each row represents a transaction, and the columns include the transaction ID, product name, quantity sold, and the total sale amount.

Dataset

Transaction ID

Product Name

Sale Amount

Sale Date

1

Laptop

1200

05-01-2023

2

Smartphone

800

10-01-2023

3

Tablet

500

15-01-2023

4

Laptop

1500

20-01-2023

5

Headphones

100

25-01-2023

Here's how we can apply each window function:

CUME_DIST

CASE 1: Without Partition By

Problem Statement: To determine the cumulative distribution of sale amounts across different products, indicating each product's position relative to others in terms of sales.

Configuration:

  • Function Type: CUME_DIST

  • Alias Name: CUME_DIST

  • Order By: Sort the Sale Amount in ascending order.

Resultant Output

CUME_DIST calculates the cumulative distribution of Sale Amounts across all products, showing their relative positions based on ascending Sale Amounts.

Transaction ID

Product Name

Sale Amount

Sale Date

CUME_DIST

5

Headphones

100

25/01/2023

0.2

3

Tablet

500

15/01/2023

0.4

2

Smartphone

800

10/01/2023

0.6

1

Laptop

1200

05/01/2023

0.8

4

Laptop

1500

20/01/2023

1

CASE 2: With Partition By

Problem Statement: To determine the cumulative distribution of sale amounts across different products within each product category, indicating each product's position relative to others in terms of sales within its category.

Configuration:

  • Function Type: CUME_DIST

  • Alias Name: CUME_DIST

  • Order By: Sort the Sale Amount in ascending order.

  • Partition By: Partition the data by Product Name to calculate distribution within each product category.

Resultant Output

With partition by Product Name, CUME_DIST calculates the cumulative distribution within each product category, indicating each product's position relative to others within its category.

Transaction ID

Product Name

Sale Amount

Sale Date

CUME_DIST

5

Headphones

100

25/01/2023

1

3

Tablet

500

15/01/2023

1

2

Smartphone

800

10/01/2023

1

1

Laptop

1200

05/01/2023

0.5

4

Laptop

1500

20/01/2023

1

RANK and DENSE_RANK

CASE 1: Without Partition By

Problem Statement: To rank products based on their Product Names.

Configuration

  • Function Type: RANK, DENSE_RANK

  • Alias Name: Rank, Dense_Rank

  • Order By: Sort the Product Name in descending order.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

Rank

Dense_Rank

3

Tablet

500

15/01/2023

1

1

2

Smartphone

800

10/01/2023

2

2

1

Laptop

1200

05/01/2023

3

3

4

Laptop

1500

20/01/2023

3

3

5

Headphones

100

25/01/2023

5

4

CASE 2: With Partition By

Problem Statement: To rank products based on their Sale Amount within each Product Name category.

Configuration

  • Function Type: RANK, DENSE_RANK

  • Alias Name: Rank, Dense Rank

  • Partition By: Product Name

  • Order By: Sort the Sale Amount in descending order within each product category.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

Rank

Dense_Rank

2

Smartphone

800

10/01/2023

1

1

4

Laptop

1500

20/01/2023

1

1

1

Laptop

1200

05/01/2023

2

2

3

Tablet

500

15/01/2023

1

1

5

Headphones

100

25/01/2023

1

1

FIRST_VALUE and LAST_VALUE

CASE 1: Without Partition By

Problem Statement: To fetch the first and last sale amounts recorded in the dataset.

Configuration

  • Function Type: FIRST_VALUE, LAST_VALUE

  • Alias Names: FIRST_VALUE, LAST_VALUE

  • Column Name: Sale Amount

  • Order By: Sort the Sale Date in ascending order.

Resultant Output

In this scenario without partition by, FIRST_VALUE retrieves the first recorded sale amount and LAST_VALUE retrieves the last recorded sale amount, both based on the earliest and latest sale dates respectively.

Transaction ID

Product Name

Sale Amount

Sale Date

FIRST_VALUE

LAST_VALUE

1

Laptop

1200

05/01/2023

1200

100

2

Smartphone

800

10/01/2023

1200

100

3

Tablet

500

15/01/2023

1200

100

4

Laptop

1500

20/01/2023

1200

100

5

Headphones

100

25/01/2023

1200

100

CASE 2: With Partition By

Problem Statement: To fetch the first and last sale amounts recorded for each product category.

Configuration

  • Function Type: FIRST_VALUE, LAST_VALUE

  • Alias Names: FIRST_VALUE, LAST_VALUE

  • Partition By: Product Name

  • Order By: Sort the Sale Date in ascending order within each product category.

Resultant Output

Explanation:

In this scenario with partition by Product Name, FIRST_VALUE retrieves the first recorded sale amount and LAST_VALUE retrieves the last recorded sale amount within each product category, based on the earliest and latest sale dates for each category.

Transaction ID

Product Name

Sale Amount

Sale Date

FIRST_VALUE

LAST_VALUE

1

Laptop

1200

05/01/2023

1200

1500

2

Smartphone

800

10/01/2023

800

800

3

Tablet

500

15/01/2023

500

500

5

Headphones

100

25/01/2023

100

100

4

Laptop

1500

20/01/2023

1200

1500

LEAD and LAG

CASE 1: Without Partition By

Problem Statement: To analyze the difference in Sale Amount between consecutive transactions.

Configuration

  • Function Type: LEAD, LAG

  • Alias Names: Lead, Lag

  • Order By: Sort by Transaction ID.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

Lead_Sale_Amount

Lag_Sale_Amount

1

Laptop

1200

05/01/2023

800

NULL

2

Smartphone

800

10/01/2023

500

1200

3

Tablet

500

15/01/2023

1500

800

4

Laptop

1500

20/01/2023

100

500

5

Headphones

100

25/01/2023

NULL

1500

CASE 2: With Partition By

Problem Statement: To analyze the difference in Sale Amount within each Product Name category.

Configuration

  • Function Type: LEAD, LAG

  • Alias Names: Lead, Lag

  • Partition By: Product Name

  • Order By: Sort by Sale Date within each Product Name.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

Lead_Sale_Amount

Lag_Sale_Amount

1

Laptop

1200

05/01/2023

1500

NULL

2

Smartphone

800

10/01/2023

NULL

NULL

3

Tablet

500

15/01/2023

NULL

NULL

4

Laptop

1500

20/01/2023

NULL

1200

5

Headphones

100

25/01/2023

NULL

NULL

NTILE

CASE 1: Without Partition By

Problem Statement: To categorize transactions into quartiles based on their Sale Amount.

Configuration

  • Function Type: NTILE

  • Alias Name: NTILE

  • Column Name: 4 (input Value)

  • Order By: Sort by Sale Amount in descending order.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

NTILE

4

Laptop

1500

20/01/2023

1

1

Laptop

1200

05/01/2023

1

2

Smartphone

800

10/01/2023

2

3

Tablet

500

15/01/2023

3

5

Headphones

100

25/01/2023

4

CASE 2: With Partition By

Problem Statement: To categorize transactions into quartiles within each Product Name category.

Configuration

  • Function Type: NTILE

  • Alias Name: NTILE

  • Partition By: Product Name

  • Order By: Sort by Sale Amount in descending order within each Product Name.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

NTILE

4

Laptop

1500

20/01/2023

1

1

Laptop

1200

05/01/2023

2

2

Smartphone

800

10/01/2023

1

3

Tablet

500

15/01/2023

1

5

Headphones

100

25/01/2023

1

PERCENT_RANK

CASE 1: Without Partition By

Problem Statement: To calculate the relative rank of transactions based on their Sale Amounts as a percentage.

Configuration

  • Function Type: PERCENT_RANK

  • Alias Name: Percent_Rank

  • Order By: Sort by Sale Amount in ascending order.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

Percent_Rank

5

Headphones

100

25-01-2023

0

3

Tablet

500

15-01-2023

0.25

2

Smartphone

800

10-01-2023

0.5

1

Laptop

1200

05-01-2023

0.75

4

Laptop

1500

20-01-2023

1

CASE 2: With Partition By

Problem Statement: To calculate the relative rank of transactions based on their Sale Amounts within each Product Name category as a percentage.

Configuration

  • Function Type: PERCENT_RANK

  • Alias Name: PERCENT_RANK

  • Partition By: Product Name

  • Order By: Sort by Sale Amount in ascending order within each Product Name.

Resultant Output

Transaction ID

Product Name

Sale Amount

Sale Date

ONE

4

Laptop

1500

20/01/2023

0

1

Laptop

1200

05/01/2023

1

2

Smartphone

800

10/01/2023

0

3

Tablet

500

15/01/2023

0

5

Headphones

100

25/01/2023

0

Last modified: 21 February 2025