Advanced Data Transformations with the Script Operator
In addition to the standard SQL operators – like join or union – the Data Flow includes a Script operator, which you can use to perform more advanced transformations using Python code snippets.
In this example, we’re building a data flow to combine product reviews with the structured product master data. We’ve already applied several SQL operators to narrow down and combine the product review data, but it still needs some cleanup. As you can see, some reviews in our data sources are prefixed with the word “Review”. We don’t want to have this in our data and can use a Python code to remove it.
We could do this with SQL, but that would take quite a lot of code. However, with Python, we can do that with just one code line. To do so, let’s grab a Script operator, drag it onto the canvas and connect it to our data flow.
Then, in the Properties panel, click on the pencil icon to open the script editor.
You’re now inside the incoming Pandas DataFrame where you can use permitted NumPy and Pandas objects and functions to perform advanced data transformations. Note that this is executed in the sandbox mode, meaning that some of the functionality will not be available. The most up-to-date documentation will appear directly in the tool – simply click on the question mark icon and the help window will appear.
In our example, we want to use a Python code string to remove a prefix. We’ll simply add a line of code in the script editor, as a body of the transform function. Easy as that. With the Script Operator you can now easily and quickly achieve data transformations that would take considerable effort with SQL.
With the Script Operator in place, we can now finish our data flow, save and execute it. From there on, it can be used for creating views in the View Builders or visualized in stories.
For more details about using the Script operator, take a look at the following resources on SAP Help Portal: