In the second of our three-part blog series AI solutions, we take a look at how we used Azure Databricks to run the highly customisable Amdaris AI accelerator.
The figure above shows the typical workflow in a data science project.
Typically, data science projects are developed in Python programming language on local machines, and then deployed on production-ready infrastructures. In production, the Extract Transform Load (ETL) stage can be addressed by a data engineer by using Spark through an easy-to-use Spark DataFrame API. The DataFrame API is one of the most accessible ways to work with data in Spark and performing map-reduce operations behind the scenes.
The Spark DataFrame API resembles the API of a widely used data science package from Python, namely Pandas. The data engineer is often using Scala programming language inside Databricks, and the data scientist uses Python on a local computer. However, Databricks supports Python, R and Java programming languages as well.
In a typical data science project, the business requirements need to be included in ETL pipelines over time, requiring an increased level of flexibility.
By amending the data as required by the business dynamics, the inputs to the AI algorithms are affected as well. The AI algorithms need to be trained automatically with any change performed in the data structure.
Moreover, as time goes by and the data volume increases, the AI algorithms will usually have an enhanced performance if they are retrained periodically, in an automated manner. At first glance, this task seems a highly complicated action that requires not only the AI skills but also the efforts of a DevOps.
Amdaris can tackle the problem of AI models scheduled training with minimal effort through an ingenious approach which is outlined below.
Azure Databricks and the Amdaris AI accelerator
By using Azure Databricks, you can separate cluster resources from storage resources. This saves money as it is charged separately for computer and storage, and the storage does not incur high costs for generic business data. Azure Databricks supports, in addition to Spark API, the creation of notebooks employing data science frameworks and libraries such as scikit-learn and Tensorflow under Python environment.
This feature of Azure Databricks has been explored by our data scientists, who have built an in-house AI accelerator in the form of a software framework developed in the Python programming language.
The Amdaris data science framework was built using robust software design principles. The framework is built from multiple layers that are stacked on top of each other. Each layer contains multiple modules that have the same API interface to be used selectively to solve a problem. For a classical data science problem, there are three categories of layers included in the data science accelerator:
- layers specific to the ETL of available data
- layers specific to the business scenario
- layers specific to AI solutions.
More information about the Amdaris in-house AI accelerator can be found in our blog harness the benefits of customisable data science accelerator.
AI projects do not usually start with coding on the cloud infrastructures, but with prototype code on local machines. Benefiting from the in-house AI accelerator, the data scientist or another AI specialist is developing the required data transformations that satisfy business requirements.
Subsequently, the data scientist will focus on the AI modelling stage of the project by employing the software layers created in the accelerator that focus on machine learning (ML) algorithms.
What next?
If you would like to speak to someone at Amdaris about AI solutions, just get in touch. Call +44(0)117 935 3444 or contact us using the form below and let us know about your next plans. We will help you choose the best technology for making your project a success.
Look out for part 3 of this blog series which focuses on how Amdaris can support your data science projects.