New opportunities for hardware acceleration in data analytics
In a previous post, we introduced hardware acceleration and types of hardware platforms, and reviewed a few commercial applications of hardware platforms. This article advocates an end–to–end hardware platform for accelerating data analytics applications.
A Hardware Platform for Processing Data
The repeat transfer costs from CPU to an accelerator (and back) when processing data can are computationally expensive*. *An end–to–end platform that performs all the data processing steps required to generate insights can completely eliminate these transfer costs.
Framework to Identify New Opportunities
The analytics framework displayed above is one way to view the stages of data processing to generate insights. These are:
‘Collection’: acquiring or extracting source data. Depending on the origin, data could be of streaming or historical.
‘Cleaning’: rectifying anomalies and inconsistencies, and normalizing the syntax of the data.
‘Integration’: aligning the data to existing datasets, or to a common vocabulary.
‘Analyze’: use descriptive, predictive or prescriptive models. Data mining techniques such as machine learning, deep learning, natural language processing and others are in this category.
‘Visualization’: creating images that illustrate the models for a wider audience and deeper exploration.
‘Alerting’: operationalizing a model by generating notifications and updating relevant stakeholders.
Action on new insights leads to generating new, more enriched data. Thus, data analytics is an iterative approach and all steps are computationally expensive.
We highlight in blue the steps in which hardware platforms are already employed for accelerating applications.
There are no commercially available hardware accelerated solutions for cleaning, integration or alerting. We now identify approaches to parallelize applications and this make them suitable for hardware acceleration.
Accelerating Data Cleaning
Enterprises, internet of things (IoT) and healthcare companies generate and acquire data at unprecedented scale. The likelihood of ‘dirty’ data is high and there exists a growing demand for cleaning data in real-time.
The key components of Data Cleaning are identifying missing data, computing normality and linearity, and determining outliers. The building blocks for all these components shown in the figure above have parallel solutions. Accelerating all these components in hardware will greatly reduce the time to clean data. Further, data cleaning is a multi-pass and iterative process, and thus applies well to hardware acceleration. In addition, hardware platforms allow data from multiple sources to be cleaned and prepped in parallel. Therefore, the total time to clean data can be significantly reduced with hardware acceleration.
Accelerating Data Integration
Data is now aggregated from multiple data sources such as data warehouses, public or federal database systems, and web-based systems. Data is formatted differently across legacy systems and these systems often contain redundant data.
Traditionally, the schema on the left was employed to aggregate data from a variety of sources into a data warehouse. This approach offers a tightly coupled architecture because the data are already physically reconciled in a single query-able repository.
More recently, newer schematics such as the schema on the right (above) have been employed that favor loose coupling between data sets, and provide a unified query-interface to access real time data over a ‘mediated’ schema or an indirectly mapped model. This approach allows information to be retrieved in parallel from multiple original databases since each query can be transformed into specialized sub-queries to match the schema of the original databases, and these specialized sub-queries can be executed in parallel and accelerated via hardware.
Another approach for data integration is ontology-based matching. Consider the case where two companies merge their databases. Certain concepts and definitions in their respective schemas like ‘earnings’ inevitably have different meanings. In one database it may mean profits in dollars (a floating-point number), while in the other it might represent the number of sales (an integer). This is a conflict that needs to be resolved for the merge to succeed. One strategy for the resolution of such conflicts involves the use of ontologies that define schema terms to resolve semantic conflicts and create a match. However, matching large ontologies is challenging since it involves lots of comparisons between concepts. This leads to high execution time and requires a large amount of computing resources. Novel approaches  that distribute the comparisons between concepts among parallel nodes reduce matching runtimes, and can be further accelerated via hardware platforms.
False or redundant alerts reduce the efficiency of data driven processes. Correlating alerts, aggregating with source ‘event data’ or ‘time series data’ to infer causality and thus identifying the origin of an anomaly is computationally expensive. However, there is published research that parallelizes the key components of an alerting system including anomaly detection , correlation analysis  and causality inference . All of these components can be further accelerated via hardware in order to deliver real time alerts and feedback.
Integrated Platforms are Ideal
There are several efficiency gains from employing an end-to-end hardware platform.
Programmers won’t need to design algorithms with constraints on asynchronous transfers to try and hide these transfer costs. Also, we can avoid errors and non-determinism introduced by communication systems and protocols. An end-to-end platform can completely eliminate data transfer costs and is therefore a more efficient approach.
Hierarchical memory systems speedup data processing steps by placing the hot data in cache/registers.
Data is already in the format and structure expected by the compute units of the follow-on step.
Feedback loops in analyzing data will be tighter and thus more efficient.
Finally, the growing popularity and demand of hardware platforms in turn advances the technology, tools and methodology for hardware acceleration.
Approximate Computation is computation that returns a possibly inaccurate result rather than a guaranteed accurate result. This is based on the realization that human brain’s neurons don’t do exact arithmetic; they perform 99% of the computation. Similarly, approximate computation exploits the gap between the level of accuracy provided by the computing system and that required by the applications. The latter is not 100% stringent due to the ability of systems and applications to tolerate a calculated loss of quality or optimality in a computed result. MIT, UT Austin, Univ. of Washington and several other universities are exploring approximate computing techniques and applications. These techniques include statistical inference, probability models, uncertainty propagation, among others. All these approaches lend themselves well to hardware platforms. Further, hardware platforms allow for truncating bits while storing and communicating data that leads to lower power while improving speeds . Approximate computing is potentially applicable to all steps of data processing.
End–to–end hardware platforms are inevitable
CPUs simply can’t keep up with the growing demands of data analytics applications. Hardware platforms have already accelerated analytics applications. An end–to–end system eliminates the repeat data transfer costs to a co-processor platform. Further, the open opportunities in data cleaning, data integration, alerting systems and approximate computing applications lend themselves nicely to hardware acceleration. End-to-end platform will further accelerate feedback loops in iterative and multi-pass data processes. With increasing popularity of hardware platforms, the tools and methodologies needed for implementing hardware acceleration will continue to evolve.
 “A Parallel Approach for Matching Large-scale Ontologies”, Tiago Brasileiro Araújo; Carlos Eduardo Pires; Thiago Pereira da Nobrega; Dimas C. Nascimento.
 “Massively Parallel Anomaly Detection in Online Network Measurement”, S. Shanbhag; T. Wolf
 “Supporting correlation analysis on scientific datasets in parallel and distributed settings”,Y. Su; G. Agrawal; J. Woodring; A Biswas; H. Shen
 “Order-independent constraint-based causal structure learning”, Diego Colombo; Marloes H. Maathuis
 “Hardware Implementation of Truncated Multipliers Using Spartan-3AN, Virtex-4 and Virtex-5 FPGA Devices”, M. H. Rais