Hardware acceleration in data analytics
General purpose CPUs are not a sustainable platform for real-time analytics. This is because:
a) data is growing in volume, velocity, complexity and variability among other dimensions;
b) data analytics applications progressively require results from analysis in realtime; and
c) CPUs are unable to continue scaling in computational power, frequency, memory size and bandwidth.
Hardware acceleration is the use of computer hardware to execute certain functions more efficiently than is possible on a general purpose CPU. Hardware platforms have accelerated applications — increasingly data analytics applications — up to a 1,000x and proven themselves to be commercially viable.
Most hardware accelerators are not standalone platforms but are co-processors to a CPU. In other words, a CPU is needed for initial processing before the compute-intensive task is offloaded to the hardware accelerators. In some cases, the hardware accelerator might communicate with the CPU during the computation. However, the typical process of applying hardware acceleration is:
1. Programmer decides what to offload to the accelerator (usually the compute-intensive functions);
2. Programs host or CPU modules;
3. Programs data movement code, usually through dedicated APIs (between the CPU and the hardware platform); then
4. Writes accelerator (hardware platform) modules using a dedicated language.
The programmer will employs parallel or multi-threaded constructs (type data, task, model or hybrid).
The most common types of hardware platforms are Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs) and Custom Integrated Circuits (ICs). These platforms are described below along with current examples of data analytics applications leveraging these platforms.
GPUs were traditionally designed for graphics applications using a few thousand ‘small’ cores (vs. 2–4 ‘big’ cores in CPUs) and operate in a Single Instruction Multiple Data (SIMD) fashion. GPUs are recently being explored for general purpose computations . GPU architectures have been continuously evolving towards higher performance, with larger memory sizes and larger memory bandwidths. This is because parallel processing and large memory bandwidths hide latency of single-threaded performance. The rapid increase in the number and diversity of scientific communities exploring the computational power of GPUs for their data intensive algorithms has arguably had a contribution in encouraging GPU manufacturers to design easily programmable general purpose GPUs (GPGPUs). Programming frameworks for GPUs such as CUDA and OpenCL have been gaining popularity.
GPUs in Data Analytics
Since GPUs are best suited for data parallel and iterative applications with SIMD characteristics. They have been successfully applied in speeding up:
computational finance applications  such Monte Carlo simulations, risk analytics and option pricing;
in defense and intelligence applications [6, 7] such as image processing, signal processing and video surveillance; and
visual data exploration [8, 9] such as 3D visualization, real-time rendering, detecting/searching shape similarity, anomaly detection.
Further, the growing popularity of artificial intelligence (AI) has created an opportunity for applying GPUs for compute intensive machine learning and deep learning algorithms . Large corporations including Facebook, Baidu, IBM, Microsoft and Yandex and early stage startups including MetaMind, Nervana Systems and Minds.ai are all leveraging the computational power of GPUs at a relatively low cost and advancing the field of AI.
FPGAs are arrays of reconfigurable logic and are popular devices for hardware prototyping. High performance systemsincreasingly utilize FPGAs because of improvements in speed and density. FPGAs provide higher memory bandwidth, lower power consumption than CPUs (and GPUs) and enable streaming dataflow computations. However, they are not suited for double precision floating point computations or when memory accesses are not well-structured. FPGAs were traditionally programmed using hardware description languages that are synthesizable such as Verilog and VHDL. These languages include complex constructs for describing parallel simulations and timing delays but are not easy to program or debug. Recent improvements in FPGA tool flows such as adoption of C-to-gate techniques and compilers, and enabling OpenCL for FPGA programming has helped increase the use of FPGAs but these techniques are fairly nascent.
FPGAs in Data Analytics
FPGAs are most suited for simple, repetitive tasks which require high performance and low power. There are several examples of employing FPGAs in data analytics applications. FPGA-based solutions have been used to enable real time event streaming and processing  at SAP and Netezza. Microsoft now applies FPGA-acceleration for Bing Search . Ryft provides FPGA-based appliance (and libraries) for accelerating (fuzzy) search on historical and streaming data . Pico Computing employs FPGAs to accelerate image filtering and object tracking applications .
Custom designed ICs are arguably the fastest accelerators we have today, offering speedups at several orders of magnitude of the single-threaded software performance on the CPU. These chips are application-specific, and thus deliver high performance at low power with tiny form factor for the target application, although at a high cost. However, the high manufacturing (and programming) cost of the IC can be amortized at high production volumes.
Custom ICs in Data Analytics
Custom ICs have been the most suitable accelerators for space, military and medical applications that are compute intensive due to high performance and small footprint. A few corporations are now employing custom chips in data analytics applications. These include Movidius designing ICs for use in computer vision  and Intel providing ICs with specialized logic for decoding video . Accelerating neural networks for deep learning algorithms using custom ICs is currently being explored at large corporations such as Samsung, Qualcomm, MobileEye and Orcam, and startups such as Nervana Systems  and Teradeep .
General purpose CPUs are not a sustainable platform for real-time analytics due to the rapidly evolving characteristics of data and data analytics applications. Hardware acceleration platforms have demonstrated multiple successes in compute intensive applications. These platforms are getting better and faster with increased popularity, and are currently employed in quite a few data analytics applications. There are still several opportunities in data analytics that can benefit from applying hardware acceleration, some of which will be suggested and reviewed in the talk at Strata Hadoop Conference on Tuesday March 29th, 2016.
- AI-First Companies: Flipped
- The AI-first startup playbook
- AI adoption is limited by incurred risk, not potential benefit
- AI-First Companies
- Data rights are the new IP rights
- The Intelligent Enterprise Stack
- Beating Behemoths
- Don't sell your data
- Framework to grasp industrial analytics opportunities
- Beyond systems of record
- Positioning a machine learning company
- The intelligence era and the virtuous loop
- Vertical beats horizontal in machine learning
- Zetta Bytes AMA: Questions to ask about pricing
- Zetta Bytes AMA: Hiring a CTO
- AI Entrepreneurs: 3 things to check before you pursue a customer
- There are more data scientists than you think
- Stages of funding in the intelligence era
- Could data costs kill your AI startup?
- Measuring AI startups by the right yardstick
- Finding the Goldilocks zone for applied AI
- Data is not the new oil
- Machine Learning in the Deployment Age
- Innovations of the Next Decade
- Zetta Bytes: Privacy Preserving Machine Learning
- GDPR panic may spur data and AI innovation
- Computing like a human
- New opportunities for hardware acceleration in data analytics
- Hardware acceleration in data analytics
- Skan: Visualizing the Future (of Work)
- Announcing Zetta Fund III, a $180M fund for AI-first companies
- Aptology: The Science of Fit
- Verusen (The intelligent supply chain)
- Opsani: What's Next? Continuous Optimization
- Promethium: Starting a Fire
- Lilt: Translation for all