xflow Help

Text Operation

The String Manipulation node allows users to modify or transform text data by performing various operations such as concatenation, substring extraction, search and replace, converting case (uppercase or lowercase), and other transformations applied to strings of characters.

Configuration

Upon selecting the Text Operation node, users are presented with the following configuration options:

Choosing Columns: Users can select the column they want to manipulate from a dropdown list containing all columns with string datatype.

Selecting Functions: Users can choose from a dropdown list of functions to apply to the selected column. Available functions include:

1. CONCAT: Concatenates strings together. Users can choose to input the column name or a specific value.

  • On Column: Select the target column.

  • Concat with: Choose a column or specify a value. 2. SUBSTRING: Extracts a portion of a string based on specijfied start and end positions.

  • On Column: Select the target column.

  • Substring Type: Specify the extraction type (Left, Right, or Index based).

  • Position: Specify the extraction position by providing the Start Index and End Index.

3. REPLACE: Replaces occurrences of a specified substring with another string.

  • On Column: Select the target column.

  • Find: Specify the substring to be replaced by selecting an Existing Column or a Value.

  • Replace: Specify the replacement string by selecting an Existing Column or a Value.

4. SPLIT: Splits a string into substrings based on a specified delimiter.

  • On Column: Select the target column.

  • Split By: Choose a function from the dropdown or enter a custom value.

  • Split Into: Specify the number of resulting columns.

  • Prefix: Enter a prefix for the new column names.

The split operation generates the specified number of columns plus one additional column by default, which includes any remaining data beyond the split.

5. TRIM: Removes leading and trailing whitespace from a string.

  • TRIM: Removes leading and trailing whitespace from a string.

  • On Column: Select the target column.

  • Trim Type: Select the trim type from the dropdown (Left Trim, Right Trim, or Trim Both Sides).

6. UPPER: Converts all characters in a string to uppercase.

7. LOWER: Converts all characters in a string to lowercase.

8. INITCAP: Converts the first character of each word in a string to uppercase.

9. REVERSE: Reverses the order of characters in a string.

10. STATIC: Allows users to assign a custom static value to a new or existing column.

Output Options: Users can choose to replace the existing column with the manipulated data or add a new column for the output of string manipulations. This provides flexibility in managing the output of the manipulation process.

Example Usage

Let’s consider the following dataset to demonstrate each operation:

Dataset

ItemID

Description

1

Large Red Apple

2

Small Green Avocado

3

Juicy Orange

4

Sweet Pineapple

Concat

Problem Statement: Concatenate the Description column with the ItemID column to create unique identifiers for each item.

Configuration:

  • Operation Type: CONCAT

  • On Column: Description

  • Concat with: Existing Column and choose ItemID column.

  • Output Column: Create a new column named Unique Id.

Resultant Output

ItemID

Description

Unique Id

1

Large Red Apple

Large Red Apple 1

2

Small Green Avocado

Small Green Avocado 2

3

Juicy Orange

Juicy Orange 3

4

Sweet Pineapple

Sweet Pineapple 4

Substring

Problem Statement: Extract a specific part of the Description column to focus on descriptive keywords.

Configuration:

  • Operation Type: SUBSTRING

  • On Column: Description

  • Substring Type: Left

  • Left Index: 5

  • Output Column: Create a new column named Keyword.

Resultant Output

ItemID

Description

Keyword

1

Large Red Apple

Large

2

Small Green Avocado

Small

3

Juicy Orange

Juicy

4

Sweet Pineapple

Sweet

Replace

Problem Statement: Correct the color reference in the Description column by replacing "Green" with "Yellow".

Configuration:

  • Operation Type: REPLACE

  • On Column: Description

  • Find: Choose Value

  • Find Value: Green

  • Replace: Choose Value

  • Replace Value: Yellow

  • Output Column: Choose Existing Column

Resultant Output

ItemID

Description

1

Large Red Apple

2

Small Yellow Avocado

3

Juicy Orange

4

Sweet Pineapple

Split

Problem Statement: Separate the Description column into two parts to isolate the primary description.

Configuration:

  • Operation Type: SPLIT

  • On Column: Description

  • Split By: give space

  • Split Into: 2

  • Prefix: Description

Resultant Output

ItemID

Description

Description 1

Description 2

Description Others

1

Large Red Apple

Large

Red

Apple

2

Small Green Avocado

Small

Green

Avocado

3

Juicy Orange

Juicy

Orange

4

Sweet Pineapple

Sweet

Pineapple

Trim

Problem Statement: Clean up the Description column by removing leading and trailing whitespace.

Configuration:

  • Operation Type: TRIM

  • On Column: Description

  • Trim Type: TRIM BOTH SIDES

  • Output Column: Existing Column

Resultant Output

ItemID

Description

1

Large Red Apple

2

Small Green Avocado

3

Juicy Orange

4

Sweet Pineapple

Upper

Problem Statement: Standardize the Description column by converting all text to uppercase.

Configuration:

  • Operation Type: UPPER

  • On Column: Description

  • Output Column: Existing Column

Resultant Output

ItemID

Description

1

LARGE RED APPLE

2

SMALL GREEN AVOCADO

3

JUICY ORANGE

4

SWEET PINEAPPLE

Lower

Problem Statement: Normalize the Description column by converting all text to lowercase.

Configuration:

  • Operation Type: LOWER

  • On Column: Description

  • Output Column: Existing Column

Resultant Output

ItemID

Description

1

large red apple

2

small green avocado

3

juicy orange

4

sweet pineapple

Initcap

Problem Statement: Enhance readability in the Description column by capitalizing the first letter of each word.

Configuration:

  • Operation Type: INITCAP

  • On Column: Description

  • Output Column: Existing Column

Resultant Output

ItemID

Description

1

Large Red Apple

2

Small Green Avocado

3

Juicy Orange

4

Sweet Pineapple

Reverse

Problem Statement: Present the Description column in reverse order to explore new perspectives.

Configuration:

  • Operation Type: INITCAP

  • On Column: Description

  • Output Column: Create a new column named Reverse Description.

Resultant Output

ItemID

Description

Reverse Description

1

Large Red Apple

elppA deR egraL

2

Small Green Avocado

odacovA neerG llamS

3

Juicy Orange

egnarO yciuJ

4

Sweet Pineapple

elppaeniP teewS

Static

Problem Statement: Assign a static category label to all items in the dataset.

Configuration:

  • Operation Type: STATIC

  • Static Value: Fruit Category

  • Output Column: Create a new column named Category

Resultant Output

ItemID

Description

Category

1

Large Red Apple

Fruit Category

2

Small Green Avocado

Fruit Category

3

Juicy Orange

Fruit Category

4

Sweet Pineapple

Fruit Category

Last modified: 21 February 2025