Teradata has long distinguished itself, not simply by its scale or performance, but by an advanced SQL engine that was designed for handling extremely complex functions such as recursive queries and implicit joins, unique syntax, and custom logic for parallelizing workloads. The result is that Teradata has long positioned itself for organizations that have the most challenging analytic problems. And it’s finally embraced cloud aggressively with Vantage.
The Redshifts, Synapses, Snowflakes, and BigQueries of the world have been positioned as modern cloud, hyperscale alternatives with pay as you go pricing that should provide more economic alternatives to legacy Teradata platforms. For many, functionality gaps or requirements to change source code and/or schema have been the show-stoppers for migration.
Naturally, there’s a startup that thinks it has the answer to that.
Datometry says that the answer isn’t data virtualization, but database virtualization. Its approach is to insert a runtime that acts as a buffer between your Teradata SQL statements and the target cloud data warehouse. The idea is to enable Teradata customers to run their Teradata queries on different targets without having to modify or completely rewrite their existing SQL programs. Its product, Hyper Q, is adding Oracle to the list of database sources.
The core of Datometry’s approach is its own proprietary hypervisor that emulates the SQL database calls on the fly. Under the hood, it breaks down those complex calls, stored procedures, and/or macros into atomic operations that the target data warehouse should understand. For instance, a recursive query, which is used for querying nested or hierarchical data structures, is translated on the fly to a series of simple individual calls to the target, with intermediate results stored in temporary tables managed by the hypervisor. Given that these operations are likely to be complex, it offers policy-based queueing that fit with existing policies run on the source. It provides JDBC and ODBC APIs for BI and ETL tools.
Of course, Datometry is not the first to state “don’t change your programs.” There are SQL translators, but Datometry claims their effectiveness is often spotty. They estimate that code converters should handle roughly 60-70% of all workloads. The traditional workaround was adding non-SQL code in the application to compensate for the difference between the Teradata SQL and the SQL of the target database. Likewise, custom data types and structures are also often missed by cloud database schema migration tools.
Can Datometry handle all the idiosyncrasies of Teradata SQL? The company claims to have 99% coverage of Teradata workloads. Admittedly there is a cost — Datometry’s virtualization layer will add 1-2% of overhead, although as they say with EPA ratings, your mileage will vary depending on the workload. The company claims that’s a small price to pay compared to the overhead of maintaining code from SQL code and schema conversion tools.
Datometry performed its initial proof of concept with SQL Server on HPE Superdome machines on-premises about four years ago, and has since pivoted to supporting Azure Synapse and Google BigQuery in the cloud. As noted above, it has just announced a preview for Oracle. Significantly, Datometry has not yet targeted Amazon Redshift or Snowflake — so it still has its work cut out for it.