Text Operation
The String Manipulation node allows users to modify or transform text data by performing various operations such as concatenation, substring extraction, search and replace, converting case (uppercase or lowercase), and other transformations applied to strings of characters.
Configuration
Upon selecting the Text Operation node, users are presented with the following configuration options:
Choosing Columns: Users can select the column they want to manipulate from a dropdown list containing all columns with string datatype.
Selecting Functions: Users can choose from a dropdown list of functions to apply to the selected column. Available functions include:
1. CONCAT: Concatenates strings together. Users can choose to input the column name or a specific value.
On Column: Select the target column.
Concat with: Choose a column or specify a value. 2. SUBSTRING: Extracts a portion of a string based on specijfied start and end positions.
On Column: Select the target column.
Substring Type: Specify the extraction type (Left, Right, or Index based).
Position: Specify the extraction position by providing the Start Index and End Index.
3. REPLACE: Replaces occurrences of a specified substring with another string.
On Column: Select the target column.
Find: Specify the substring to be replaced by selecting an Existing Column or a Value.
Replace: Specify the replacement string by selecting an Existing Column or a Value.
4. SPLIT: Splits a string into substrings based on a specified delimiter.
On Column: Select the target column.
Split By: Choose a function from the dropdown or enter a custom value.
Split Into: Specify the number of resulting columns.
Prefix: Enter a prefix for the new column names.
The split operation generates the specified number of columns plus one additional column by default, which includes any remaining data beyond the split.
5. TRIM: Removes leading and trailing whitespace from a string.
TRIM: Removes leading and trailing whitespace from a string.
On Column: Select the target column.
Trim Type: Select the trim type from the dropdown (Left Trim, Right Trim, or Trim Both Sides).
6. UPPER: Converts all characters in a string to uppercase.
7. LOWER: Converts all characters in a string to lowercase.
8. INITCAP: Converts the first character of each word in a string to uppercase.
9. REVERSE: Reverses the order of characters in a string.
10. STATIC: Allows users to assign a custom static value to a new or existing column.
Output Options: Users can choose to replace the existing column with the manipulated data or add a new column for the output of string manipulations. This provides flexibility in managing the output of the manipulation process.
Example Usage
Let’s consider the following dataset to demonstrate each operation:
Dataset
ItemID | Description |
|---|---|
1 | Large Red Apple |
2 | Small Green Avocado |
3 | Juicy Orange |
4 | Sweet Pineapple |
Concat
Problem Statement: Concatenate the Description column with the ItemID column to create unique identifiers for each item.
Configuration:
Operation Type: CONCAT
On Column: Description
Concat with: Existing Column and choose ItemID column.
Output Column: Create a new column named Unique Id.
Resultant Output
ItemID | Description | Unique Id |
|---|---|---|
1 | Large Red Apple | Large Red Apple 1 |
2 | Small Green Avocado | Small Green Avocado 2 |
3 | Juicy Orange | Juicy Orange 3 |
4 | Sweet Pineapple | Sweet Pineapple 4 |
Substring
Problem Statement: Extract a specific part of the Description column to focus on descriptive keywords.
Configuration:
Operation Type: SUBSTRING
On Column: Description
Substring Type: Left
Left Index: 5
Output Column: Create a new column named Keyword.
Resultant Output
ItemID | Description | Keyword |
|---|---|---|
1 | Large Red Apple | Large |
2 | Small Green Avocado | Small |
3 | Juicy Orange | Juicy |
4 | Sweet Pineapple | Sweet |
Replace
Problem Statement: Correct the color reference in the Description column by replacing "Green" with "Yellow".
Configuration:
Operation Type: REPLACE
On Column: Description
Find: Choose Value
Find Value: Green
Replace: Choose Value
Replace Value: Yellow
Output Column: Choose Existing Column
Resultant Output
ItemID | Description |
|---|---|
1 | Large Red Apple |
2 | Small Yellow Avocado |
3 | Juicy Orange |
4 | Sweet Pineapple |
Split
Problem Statement: Separate the Description column into two parts to isolate the primary description.
Configuration:
Operation Type: SPLIT
On Column: Description
Split By: give space
Split Into: 2
Prefix: Description
Resultant Output
ItemID | Description | Description 1 | Description 2 | Description Others |
|---|---|---|---|---|
1 | Large Red Apple | Large | Red | Apple |
2 | Small Green Avocado | Small | Green | Avocado |
3 | Juicy Orange | Juicy | Orange | |
4 | Sweet Pineapple | Sweet | Pineapple |
Trim
Problem Statement: Clean up the Description column by removing leading and trailing whitespace.
Configuration:
Operation Type: TRIM
On Column: Description
Trim Type: TRIM BOTH SIDES
Output Column: Existing Column
Resultant Output
ItemID | Description |
|---|---|
1 | Large Red Apple |
2 | Small Green Avocado |
3 | Juicy Orange |
4 | Sweet Pineapple |
Upper
Problem Statement: Standardize the Description column by converting all text to uppercase.
Configuration:
Operation Type: UPPER
On Column: Description
Output Column: Existing Column
Resultant Output
ItemID | Description |
|---|---|
1 | LARGE RED APPLE |
2 | SMALL GREEN AVOCADO |
3 | JUICY ORANGE |
4 | SWEET PINEAPPLE |
Lower
Problem Statement: Normalize the Description column by converting all text to lowercase.
Configuration:
Operation Type: LOWER
On Column: Description
Output Column: Existing Column
Resultant Output
ItemID | Description |
|---|---|
1 | large red apple |
2 | small green avocado |
3 | juicy orange |
4 | sweet pineapple |
Initcap
Problem Statement: Enhance readability in the Description column by capitalizing the first letter of each word.
Configuration:
Operation Type: INITCAP
On Column: Description
Output Column: Existing Column
Resultant Output
ItemID | Description |
|---|---|
1 | Large Red Apple |
2 | Small Green Avocado |
3 | Juicy Orange |
4 | Sweet Pineapple |
Reverse
Problem Statement: Present the Description column in reverse order to explore new perspectives.
Configuration:
Operation Type: INITCAP
On Column: Description
Output Column: Create a new column named Reverse Description.
Resultant Output
ItemID | Description | Reverse Description |
|---|---|---|
1 | Large Red Apple | elppA deR egraL |
2 | Small Green Avocado | odacovA neerG llamS |
3 | Juicy Orange | egnarO yciuJ |
4 | Sweet Pineapple | elppaeniP teewS |
Static
Problem Statement: Assign a static category label to all items in the dataset.
Configuration:
Operation Type: STATIC
Static Value: Fruit Category
Output Column: Create a new column named Category
Resultant Output
ItemID | Description | Category |
|---|---|---|
1 | Large Red Apple | Fruit Category |
2 | Small Green Avocado | Fruit Category |
3 | Juicy Orange | Fruit Category |
4 | Sweet Pineapple | Fruit Category |