Intro
Domo’s Magic ETL is getting a major upgrade with the new Magic ETL, a new data processing engine with the same easy-to-use graphical interface as its predecessor. The new Magic ETL is broadly compatible with the old Magic ETL; most existing transforms can be upgraded frictionlessly. This new engine is intended as a replacement for the old, and with time, we will be deprecating the old Magic ETL.
The new Magic ETL is more performant, more feature-rich, and more consistent. Some of these improvements come at the cost of exact behavioral compatibility with the old Magic ETL. In this document, we hope to outline all the differences in behavior that you can expect to see when switching an existing transform to the new Magic ETL. We hope you will do so for as much of your data pipeline as possible, as the improvements are substantial.
Video - Upgrading to the new Magic ETL
Getting this feature
If you are interested in using this feature, please contact your Customer Success Manager (CSM).
Note: This feature is available on demand and paid.
To request this feature be enabled,
-
Reach out to your Domo Customer Success Manager, Technical Consultant, or AE.
-
If you do not have contact information for your CSM, TC, or AE, contact Technical Support by email at support@domo.com
Depending on the feature, you may be required to complete training before you can use the feature.
Why Upgrade?
There are some types of major improvements that can be done invisibly to the user. If our only goal was to improve performance, for example, we could have done so “in place”, without requiring DataFlow authors to elect to use the new release. But we also wanted to make the new Magic ETL more internally consistent, and more aligned with other Domo features (like Beast Mode). One of our goals was to eliminate all cases where Magic resolves column name collisions without the user’s input. Joins and Append Rows both deal with name collisions, and in the new Magic ETL they both deal with them differently than they did in the old Magic ETL. We heard from many customers that this change is important for data cleanliness as we’d often see columns coming out of the old Magic ETL with names like “id_1_2_1_1”. However, we couldn’t make these changes without allowing the user a choice.
Rethinking auto-rename was just the start. The new Magic ETL features a new formula system with over 200 functions inspired by Domo Beast Mode, various SQL databases, and Spreadsheet software. Formulas are a game-changer for the new Magic ETL. The functionality offered by the formula system goes beyond expressing a cluster of traditionally-configured tiles more concisely. In some cases, it even surpasses what is possible in a MySQL DataFlow. Formulas can be used to edit columns in place or add new columns (see the new Add Formula tile), or as part of a Group By or Filter Rows tile, to construct complex aggregations or filter conditions. The complete list of supported functions available in the new formula system can be found here.
Notable Behavior Changes
There are changes in the new Magic ETL that might cause transforms to work differently than they did in the old Magic ETL. This could manifest as errors like “Column not found” or “Duplicate column name”, or more subtly as different output data. As such, it is important to be aware of all of these changes before migrating an existing DataFlow to the new Magic ETL or before writing a new DataFlow if you are already very familiar with the old Magic ETL behavior.
Filter Rows
Null values are treated differently by the comparison functions. In SQL and the new Magic ETL DataFlows, the result of any comparison with null is always null, which is treated by the Filter Rows tile like a false result. In most contexts in the new Magic ETL, null is not equal to null, but null is also not not equal to null; null is not greater than zero, but it is also not less than zero, nor is it equal to zero. If you have nulls in columns referenced by Filter Rows tiles, all of those rows will be dropped, regardless of the function or its other argument. The only functions that can deal with nulls truthfully are “is null” and “is not null”. In the old Magic ETL, a filter might have tested “column = null”. This will no longer have the same result; such filters should be changed to use the “is null” operator.
If after switching a flow from the old Magic ETL to the new Magic ETL, it runs successfully, but the output seems to be missing rows, this change is a likely culprit. It can be especially surprising with the “is not equal to” function. It might seem like “column <> 5” should only drop rows where column’s value is 5, but in fact it will also drop rows where column is null, for the reason described above. The table below has additional examples of null behavior in Filter Rows.
Test |
Evaluated Result |
Filter Effect |
---|---|---|
4 <> 5 | true | keep |
5 <> 5 | false | drop |
null <> 5 | null | drop |
null = null | null | drop |
null <> null | null | drop |
null is null | true | keep |
null is not null | false | drop |
5 is not null | true | keep |
5 is null | false | drop |
To handle nulls in more complex ways, use the new formula system. The Filter Rows tile is one of three tiles (Group By and Add Formula being the other two) which supports formula evaluation. The IFNULL() function works particularly well for handling a null result from a filter expression. For example, if column “col” has null values, and we want to filter for rows whose value for “col” is not 5, preserving our nulls, we could write this expression: IFNULL(col <> 5, TRUE)
Besides null handling, there are many other scenarios that filter formulas make easier. In the old Magic ETL, the single formula below would have required two separate Filter Rows tiles with multiple conditions each.
Date & Time Parsing
Date and time parsing, the process of converting incoming text into Date or Date & Time values, has changed significantly. Generally, it is stricter than it was in the old Magic ETL. Some text that parsed successfully in the old Magic ETL might result in an error in the new Magic ETL. The goal of being stricter is avoiding scenarios where a value is silently misinterpreted, like a month being mixed up with a day. The new Magic ETL uses a fixed list of unambiguous date and time formats, plus a locale-specific list of ambiguous formats (e.g. “01/02/2020” is January 2nd in the US and February 1st in most other places). If the new Magic ETL fails to parse a date, a custom format can be specified on the Input DataSet tile, on a per-column basis. For information on the format specification, refer to Oracle’s documentation of the Java DateTimeFormatter class.
Date & Time Operations
Date and time operations in the new Magic ETL are performed by default in the company timezone specified in the Domo Admin menu. The old Magic ETL performed these operations in UTC, the international standard time. This means that a company whose timezone is America/Denver that uses the “Month of date” operation on a Date & Time column will get the month of that timestamp from Denver’s perspective, not that of Greenwich (UTC).
The WEEK_OF_YEAR() function in the Date Operations tile is also now different in the new Magic ETL. Week 1 is the first week with a Sunday in the year. This differs from the old Magic ETL and Beast Mode, where week 1 is the first partial week.
Using the Add Formula tile, week behavior can be adjusted: the WEEK() function accepts a second argument: an integer from 0 to 7, representing the “week mode” to use when calculating the week of a date. These modes are described in Oracle’s MySQL documentation, and are reflected in the behavior of Beast Mode. Mode 0 (or WEEK() without the mode argument) yields the same result as the WEEK_OF_YEAR() function in the Date Operations tile.
Joins
There are two major changes to the Join Data tile in the new Magic ETL:
- It is no longer important to ensure that the table with fewer duplicate join keys is on the left of the join. The old Magic ETL issue manifesting as the error, “The left input cannot include over 10,000 duplicates”, has been eliminated in the new Magic ETL.
- Name conflicts are no longer resolved without user input. Instead, we have added a section to the Join Data tile’s configuration for describing name conflict resolution. This section can be filled out automatically or manually. In addition to renaming conflicting columns, it is now possible to remove them in the Join Data tile’s configuration as well.
Append Rows (Union)
In the old Magic ETL, when two or more columns going into an Append Rows tile had the same names but different data types, they would not be combined. Instead, two or more new columns would be created, with names like “ID Whole Number” and “ID Text”. In the new Magic ETL, the Append Rows tile never changes column names. Columns that have different types will have their data converted to the best type possible given the types of all the columns that share a name. For example, if one column had type Whole Number, while another had type Decimal, the columns would be combined into a column with type Decimal. In cases where the types are wholly incompatible, like Dates and Decimals, the resulting column will be Text type. This behavior can be replaced with stricter behavior using a new option in the Append Rows tile configuration. The stricter behavior is to throw an error (i.e. fail the execution) if two columns with the same names have different types.
Other Considerations
Beyond understanding ways that the new Magic ETL alters the behavior of the old Magic ETL, it is valuable to learn the ways that the new Magic ETL increases transform functionality. Those familiar with the old Magic ETL who are new to the new Magic ETL should be sure to try out the following:
- The new Add Formula tile—which supports over 200 functions, most of which are unique to the new Magic ETL. When using Add Formula for the first time, be sure to expand the formula editor and browse the function list.
- The Input DataSet tile—which now supports changing the type of incoming columns, as well as configuring how errors and nulls should be handled. If you want to interpret all nulls in a Text column as the Empty String instead, you can now do so.
- The new Dynamic Unpivot tile—which makes more sense in some scenarios than the older Unpivot (Collapse Columns) tile. If you want to pivot all columns except a few, rather than pivoting a few columns and leaving most alone, consider using this new tile.
Comments
0 comments
Please sign in to leave a comment.