Join
The Join component is utilized to merge rows from two or more datasets based on specified criteria. It combines columns from input tables using one or multiple joining columns, offering various join modes to tailor the merging process according to user requirements.
Types of Joins
In the dropdown menu for the Join Type section, users can choose from the following types of joins:
Inner Join: Includes only rows that have matching values in both input tables.
Outer Join: Includes all rows from both input tables, filling in missing values with NULL where necessary.
Left Join: Includes all rows from the left input table and matching rows from the right input table.
Right Join: Includes all rows from the right input table and matching rows from the left input table.
Configuration
Upon clicking the Join node, users are presented with the following fields in the configuration section:
Defining Join Conditions and Selectors:
Operation Type: Select the type of join from a dropdown menu.
Source Column: Choose the left and right data sources for the join.
On Conditions: Define the condition for datasets based on specified column relationships.
Selectors: Selectors allow users to specify what should be included in the output by providing aliases for both data sources.
Users must provide an alias and enter a unique prefix for both the left and right aliases to ensure column names are distinct. These prefixes are automatically appended to each column name in the result.
Example Usage
Let's consider a scenario where we have two datasets: one containing sales transaction details and another containing customer information.
Transaction Data
Transaction ID | Customer ID | Product | Quantity | Total Amount |
|---|---|---|---|---|
1 | 101 | Laptop | 2 | 140000 |
2 | 102 | Smartphone | 1 | 56000 |
3 | 103 | Tablet | 3 | 105000 |
4 | 104 | Headphones | 2 | 7500 |
Customer Data
Customer ID | Customer Name | Age | City |
|---|---|---|---|
101 | Alice | 30 | New York |
102 | Bob | 28 | Los Angeles |
103 | Charlie | 35 | Chicago |
105 | Emily | 27 | Boston |
We want to merge these datasets to gain insights into customer behavior and their corresponding sales activities. To accomplish this, we'll explore various types of joins to combine the data effectively.
Configuration:
Upon clicking the Join node, users are presented with the following fields in the configuration section:
Operation Type: Select 'Join Type' from the dropdown menu.
Source Column: Choose the left source as "Transaction Data" and the right source as "Customer Data".
On Conditions: Define the condition to match records based on the common column "Customer ID".
Selectors: Use 'L_' as the alias prefix for the left source and 'R_' for the right source to ensure distinct column names in the result.
Inner Join
Operation Type: Select "Inner Join" from the dropdown menu.
Combines rows where there's a matching Customer ID in both datasets. We get a dataset with only transactions made by existing customers.
Resultant Output
L_Trans ID | L_Customer ID | L_Product | L_Quantity | L_Total Amount | R_Customer ID | R_Customer Name | R_Age | R_City |
|---|---|---|---|---|---|---|---|---|
1 | 101 | Laptop | 2 | 140000 | 101 | Alice | 30 | New York |
2 | 102 | Smartphone | 1 | 56000 | 102 | Bob | 28 | Los Angeles |
3 | 103 | Tablet | 3 | 105000 | 103 | Charlie | 35 | Chicago |
Outer Join
Operation Type: Select "Outer Join" from the dropdown menu.
Includes all rows from both datasets, ensuring no information is lost. Provides a complete view of sales transactions and customer details.
Resultant Output
L_Transaction ID | L_Customer ID | L_Product | L_Quantity | L_Total Amount | R_Customer ID | R_Customer Name | R_Age | R_City |
|---|---|---|---|---|---|---|---|---|
1.0 | 101.0 | Laptop | 2.0 | 140000.0 | 101.0 | Alice | 30.0 | New York |
2.0 | 102.0 | Smartphone | 1.0 | 56000.0 | 102.0 | Bob | 28.0 | Los Angeles |
3.0 | 103.0 | Tablet | 3.0 | 105000.0 | 103.0 | Charlie | 35.0 | Chicago |
4.0 | 104.0 | Headphones | 2.0 | 7500.0 | ||||
Null | Null | Null | Null | Null | 105.0 | Emily | 27.0 | Boston |
Left Join
Operation Type: Select "Left Join" from the dropdown menu.
Includes all sales transactions and adds customer details where available. Gives us sales information along with customer demographics.
Resultant Output
L_Transaction ID | L_Customer ID | L_Product | L_Quantity | L_Total Amount | R_Customer ID | R_Customer Name | R_Age | R_City |
|---|---|---|---|---|---|---|---|---|
1 | 101 | Laptop | 2 | 140000 | 101.0 | Alice | 30.0 | New York |
2 | 102 | Smartphone | 1 | 56000 | 102.0 | Bob | 28.0 | Los Angeles |
3 | 103 | Tablet | 3 | 105000 | 103.0 | Charlie | 35.0 | Chicago |
4 | 104 | Headphones | 2 | 7500 | Null | Null | Null | Null |
Right Join
Operation Type: Select "Right Join" from the dropdown menu.
Includes all customer details and adds sales information where available. Provides insights into customer profiles alongside transaction history.
Resultant Output
L_Transaction ID | L_Customer ID | L_Product | L_Quantity | L_Total Amount | R_Customer ID | R_Customer Name | R_Age | R_City |
|---|---|---|---|---|---|---|---|---|
1.0 | 101.0 | Laptop | 2.0 | 140000.0 | 101 | Alice | 30 | New York |
2.0 | 102.0 | Smartphone | 1.0 | 56000.0 | 102 | Bob | 28 | Los Angeles |
3.0 | 103.0 | Tablet | 3.0 | 105000.0 | 103 | Charlie | 35 | Chicago |
Null | Null | Null | Null | Null | 105 | Emily | 27 | Boston |