Intro
Domo's Magic ETL has gotten a major upgrade. Not only has the data processing engine been significantly upgraded making most DataFlows run remarkably faster and in more efficient ways, but new tiles and functionality have been added to make the most out of transforming your data using the new Magic ETL. All of the new features can be divided into four main categories:
Performance
Faster Runtimes
The new engine is much more performant than the old Magic ETL. While each DataFlow's performance changes may vary, most DataFlows will run significantly faster simply by flipping the Try the New Magic ETL toggle switch.
We have even found that the new Magic ETL often outperforms many MySQL and RedShift DataFlows.
DataSet Views as inputs
DataSet Views can now be used as inputs in the new Magic ETL. This allows for easier filtering, aggregating, renaming, or dropping columns, and performs Beast Mode functions prior to bringing your data into the new Magic ETL. Try filtering out all unnecessary rows before bringing your data into the new Magic ETL for reduced runtimes.
Append Processing
One of the limitations of the old Magic ETL is that at every DataFlow run, the engine loads all of the input DataSet's data, every time, even if that data is unnecessary for the transformations taking place. Now, with the new Magic ETL, at the start of each run, the system reviews the state of the DataFlow's inputs and outputs. Where possible, only new rows added to the inputs since the last DataFlow execution will be processed during that DataFlow execution. Those rows will then be auto-appended to the output DataSet(s) resulting in the exact same output data but in a dramatically reduced runtime. To learn more about this optimization, see New Magic ETL DataFlow Auto Append Processing.
New/Updated Tiles
New Tiles
Add Formula
If you're a SQL power user, transitioning to the old Magic ETL often felt clunky. At times, what took one line of MySQL code can take 15+ mouse clicks in the old Magic ETL. With the Add Formula tile, that frustration is no more. The Add Formula tile is a row-by-row expression evaluator that allows you to write SQL-style syntax directly into your new Magic ETL DataFlow. Create and modify your columns with compounded expressions. Case statements, statistical utility functions, and time-value-money operations are all easily accomplishable with this new tile.
Alter Columns
The Alter Columns tile is an upgraded version of the Set Column Type tile. Now you can easily rename, remove, or change the data types of your columns in one simple tile.
Dynamic Unpivot
This new tile is the inverse of the Unpivot tile. If you expect your schema on an input DataSet to change, using the Dynamic Unpivot tile allows you to narrow a table by pivoting all column data except those columns specified into new rows. Any columns now excluded in the configuration will become row values.
Updated Tiles
Group By
One important distinction you need to consider when using the Formula Tile is "What type of transformation am I trying to perform?" If the answer involves aggregations in any way, then you should look to the Group By tile which now supports SQL-style expressions with formula support. The Add Formula tile performs row-by-row operations which does not support aggregations. If you want to use operations like SUM, MEDIAN, or PERCENTILE, then be sure to select the Group By tile to aggregate your data.
Filter Rows
The Filter Rows tile now supports SQL-style expressions as well. In the old Magic ETL, performing a compound filter statement required multiple mouse clicks. In the new Magic ETL, complex filter rules are quickly configurable with this expression evaluator. For more information on how to write filter formulas, see How to Write a Filter Formula in Magic ETL.
Pivot & Unpivot
What was the Uncollapse Columns tile is now the Pivot tile. What was the Collapse Columns tile is now the Unpivot tile.
Join Data
Historically, column name collisions were difficult to handle when joining data in the old Magic ETL. In the new Magic ETL, the Join Data tile enables you to easily specify what should happen when duplicate columns names occur. With easy column name conflict resolution and drop columns options, the Join Data tile has never been easier.
Python & R Scripting
Scripting tiles are now much more flexible. Specifying a schema for your output DataSet is now optional. Executing your script provides the needed schema for your output DataSet, saving you time. You can run a preview to generate the schema or specify it when you need it as part of downstream tiles.
User Interface
Color-Coded, curved lines
In the old Magic ETL, complex ETLs were difficult to follow and understand by viewing the graph alone. Now with color-coded, curved lines, lines are by default colored by data source, allowing you to more quickly grasp the flow of your transformations. Don't like the default colors? You can change them using the color picker.
Notes on tiles
Have you ever come into a complex Magic ETL DataFlow you did not create or that has been running untouched for months? Deciphering what transformations are taking place or even why can sometimes be difficult. With annotate functionality on individual tiles, you can write detailed explanations regarding what the DataFlow is doing on every tile.
New tile categorizations
Finding the specific tile you needed in the old Magic ETL could be difficult given that the drop-down categories were limited to DataSets, Edit Columns, Edit Data, and Combine Data. The new Magic ETL has much more detailed categorization: Text, Dates and Numbers, Utility, Filter, Combine Data, Aggregate, and Pivot. The tile tooltips have also been updated to reflect exactly what's new about each tile.
Selectable text in Data Previews
Easily interact with the preview data in a new Magic ETL DataFlow with selectable text. Now you can quickly copy and paste individual cells or a group of cells from the Preview or Data tabs.
Advanced Options
Enhancements to data type handling
The inclusion of formulas makes the new Magic ETL a more versatile tool. This new feature highlighted the need to ensure data type handling is configurable upon the initial load of a DataSet as well as throughout the DataFlow. Easily specify what data type a column should be as well as the expected format and what to do when the data is unreadable as that specified data type.
Preview and Data Table on Input DataSet tile
Due to the addition of data type classification upon input, we've introduced a new Preview tab on the Input DataSet tile. Now you can easily see the raw data in the Data tab as well as how that data has changed per our configured transform settings with the new Preview tab.
DataFlow Transform Settings
The new Magic ETL provides the ability to handle time zones, locales, and collation modes as well as specify default date and timestamp formats for your DataFlow. This is accessible at the DataFlow level as well as the individual tile level when one would need to specify what time zone a tile should be performed in.
Comments
0 comments
Please sign in to leave a comment.