Union

Union component allows users to merge two or more datasets with similar structures into a single dataset, preserving all records from each dataset.

Union: The Union operation combines two datasets while eliminating duplicate rows. If a row appears in both datasets, it is included only once in the result. This operation is useful when users want to merge datasets but do not need to retain duplicate records.
Union All: The Union All operation also combines two datasets but retains all rows from both datasets, including duplicates. If a row appears in both datasets, it is included in the result for each occurrence. This operation is helpful when users want to merge datasets while preserving all records, even if they are duplicates.

Configuration

Upon clicking the Union component, users are presented with the following fields in the configuration section:

Defining and Naming Union Columns

Users can choose between two operation types: union and union all.
Upon selecting the operation type, a table is displayed.
The number of headers in the table depends on the number of nodes connected to the Union node.

Case 1: When there are matching columns in all the connected nodes.

By default, columns from all connected nodes with the same name and datatype are displayed under their respective node name headers.

For each row, the matched columns are automatically filled in the Alias name column. In the example above, the Alias name for the first row would be "UserID (integer)".

Case 2: When there are no matching columns in all the connected nodes. By default, no columns will be added. Users can manually add columns using the + Add Row button.

Adding Additional Rows:

If needed, users can add more rows by clicking the + Add Row option.
In these additional rows, all columns should have the same datatype. It is not mandatory for every column in the row to be filled.
Here, the Alias name will not be automatically filled; the user must manually provide the name.

Auto-Select Button:

Users can click the Auto-Select button to reset the configuration, ensuring that columns are automatically matched based on name and data type.
This is useful when users have manually modified the column selections and wish to revert to a system-suggested configuration.

Example Usage

Let's consider a scenario where we have two datasets representing employee information from the same organization:

Problem Statement: We aim to merge two employee datasets into a single dataset while ensuring data integrity. The objective is to use Union to eliminate duplicate records or Union All to retain all records, including duplicates.

Dataset 1

ID	Name	Age	City
1	Alice	25	New York
2	Bob	30	Los Angeles
3	Charlie	28	Chicago

Dataset 2

ID	Name	Age	City
3	Charlie	28	Chicago
4	David	35	Houston
5	Emily	27	Boston

We aim to merge these datasets using the Union, ensuring that duplicate records are eliminated to maintain data integrity. By selecting Union, we'll combine the datasets while excluding any duplicate entries.

ID	Name	Age	City
1	Alice	25	New York
2	Bob	30	Los Angeles
3	Charlie	28	Chicago
4	David	35	Houston
5	Emily	27	Boston

Conversely, opting for Union All will include duplicate records from both datasets.

ID	Name	Age	City
1	Alice	25	New York
2	Bob	30	Los Angeles
3	Charlie	28	Chicago
3	Charlie	28	Chicago
4	David	35	Houston
5	Emily	27	Boston

Last modified: 21 February 2025