What’s an ELT?
That’s a data-transformation tool that uses the underlying database engine to actually perform the required data-transformation. More precisely, inside an ELT tool, when you want to transform the data, you’ll follow these steps:
- Extract the required datasets/tables
- Load all the datasets/tables inside whatever database you are using. For example, the database can be “hive”, if you are using Hadoop.
- Transform the data using SQL scripts
The ELT approach is strongly encouraged by Database vendors .
However, the poor expressiveness of the SQL language strongly limits the type of operations that you can perform using an ELT approach (i.e. most Anatella data-transformation-graphs cannot be reproduced in simple SQL, or it’s very difficult to do so).
For example: pivoting the data (i.e. the flatten and unflatten operations inside Anatella) is a very complex operation to reproduce in SQL. Another example is the “timeTravel” action inside Anatella: While it’s possible to code this action in SQL, the running-time is typically multiplied by 1000.
Furthermore, the “advanced” functionalities in SQL are not standardized (i.e. the PSQL from Oracle is
different from the TSQL from Teradata, from the Transac-SQL from SQLServer, from the HiveQL from
Hadoop/Hive). This means that an advanced SQL data transformation coded in one of the above will surely not run if you try to execute it on another database.
On the other hand, since Anatella uses its own engine to compute all the data transformations, we can
guarantee that any and all Anatella-data-transformations will always run, whatever the infrastructure
available on the premises.
Independence from the current load of the underlying database
To compute the data-transformations, Anatella uses the local CPU from a machine. This means that you have a guaranteed computing power at your disposal, all the time. The typical scenario that always happens is to receive some error messages from the database because your DBA (Data Base Administrator) didn’t give you the rights to execute such time-consuming & power-consuming SQL command.
Furthermore, all users from a company are typically using the same database engine meaning,
you are dependent on the goodwill of the other users: i.e. if some user decides to consume a
large part of the resources of your database, you are screwed: you can’t work anymore. This will never happen with Anatella because, with Anatella, you have a guaranteed computing power at your disposal, all the time.