The trick with machines is to get them to learn the characteristics or properties of these different classification levels and then be able use that learning to accurately classify a new object they haven’t been previously exposed to. That’s the gist of the “artificial intelligence” that gets used to describe these efforts. In other words, while computers have been able to identify things they’ve seen before, learning to recognize a new image is not just a dog, but a long-haired miniature dachshund after they’ve “seen” enough pictures of dogs is a critical capability. Actually, what’s really important—and really new—is the ability to do this extremely rapidly and accurately. Like most computer-related problems, the work to enable this has to be broken down into a number of individual steps. In fact, the word “convolution” refers to a complex process that folds back on itself. It also describes a mathematical formula in which results from one level are fed forward to the next level in order to improve the accuracy of the process. The phrase “neural network” stems from early efforts to create a computing system that emulated the human brain’s individual neurons working together to solve a problem. While most computer scientists now seem to discount the comparison to the functioning of a real human brain, the idea of a number of very simple elements connected together in a network and working together to solve a complex problem has stuck, hence convolutional neural networks (CNNs). Deep learning refers to the number, or depth, of filtering and classification levels used to recognize an object. While there seems to be debate about how many levels are necessary to justify the phrase “deep learning,” many people seem to suggest 10 or more. (Although Microsoft’s research work on visual recognition went to 127 levels!) A key point to understanding deep learning is there are two critical but separate steps involved in the process. The first involves doing extensive analysis of enormous data sets and automatically generating “rules” or algorithms that can accurately describe the various characteristics of different objects. The second involves using those rules to identify the objects or situations based on real-time data, a process known as inferencing. The “rule” creation efforts necessary to build these classification filters are done offline in large data centers using a variety of different computing architectures. NVIDIA has had great success with their Tesla (the chip, not the car)-based GPU-compute initiatives. These leverage the floating point performance of graphics chips and the company’s GPU Inference Engine (GIE) software platform to help reduce the time necessary to do the data input and analysis tasks of categorizing raw data from months to days to hours in some cases. We’ve also seen some companies talk about the ability of other customizable chip architectures, notably FPGAs (Field Programmable Gate Arrays), to handle some of these tasks as well. Intel recently purchased Altera to specifically bring FPGAs into their data center family of processors, in an effort to drive the creation of even more powerful servers and ones uniquely suited to performing these (and other) types of analytics workloads. Once the basic “rules” of classification have been created in these non real-time environments, they have to be deployed on devices that accept live data input and make real-time classifications. Though related, this is a different set of tasks and a different type of work than what’s used to create these rules in the first place. In this inferencing area, we’re just starting to see a number of companies talking about bringing deep learning and artificial intelligence to a variety of devices. In truth, there’s little to no new “learning” going on in these implementations—they’re essentially completely focused on being able to recognize the objects, situations or data points they are pre-programmed to look for based on the rules or algorithms that have been loaded onto them for a particular application. Still, this is an enormously difficult task because of the need to run the multiple layers of a convolutional neural network in real time. Qualcomm, for example, just announced their 820 chip, known primarily as the compute engine inside many of today’s high-end smartphones, can be used for deep learning and neural network applications. The new ingredient required to make this work is the Snapdragon Neural Processing Engine, an SDK powered by the company’s Zeroth Machine Intelligence Platform. The combination can be used on the 820 to speed the performance of CNNs and deep learning on devices ranging from connected video cameras to cars and much more. The 820 incorporates a CPU, GPU and DSP, all of which could potentially be used to run deep learning algorithms for different applications. In the case of autonomous cars—which are expected to be one of the key beneficiaries of deep learning and neural networks—NVIDIA’s liquid-cooled Drive PX2 platform can also accelerate neural network performance. Announced at this year’s CES, the Drive PX2 includes two next generation SOCs (System on Chip—essentially a CPU, GPU and other computing elements all connected together on a single chip). It is specifically designed to monitor the camera, LIDAR and other sensor inputs from a car, then to recognize objects or situations and react accordingly. Future iterations of AI and deep learning accelerators will likely be able to bring some of the offline “rule creating” mechanisms onboard so that objects equipped with these components will be able to get smarter over time. Of course, it’s also possible to update the algorithms on existing devices in order to achieve a similar result. Regardless of how the technology evolves, it’s going to be a critical element in the devices around us for some time to come, so it’s important to understand at least a little bit about how the magic works. source: techpinions.com