Guest Article : Ivan Corbett
Way back in 2007 the Wii surpassed everyone’s expectations shooting right to number one in console sales, its innovative use of motion controls and a family orientated approach rapidly gained popularity amongst a new wave of casual gamers. Suddenly everyone could play, no longer was the video game console solely the domain of the kids. This new and massive market worried the competition at Microsoft and Sony greatly, something had to be done.
Immediately Microsoft started work on emulating the Wii “Project Newton” an internal engineering project was cranking out wands by the new time and Microsoft came very close to bringing these to market. But then the inevitable happened, the Wiis’ market expansion rate dropped, millions of consoles started to gather dust, third party developers cried foul of the comparatively low horsepower of the Wii and the imprecise motion controller.
It was then that a small Israeli company named Pimesense pitched the idea of gesture based interfaces to several companies, Apple even took them up on it, although strict rules on them licencing the technology to other companies and Apples’ legendary secrecy eventually killed the deal. Microsoft then swooped in and picked up the fledgling company with a better offer, this was the day “Project Natal” started. It was a simple idea, create a new type of natural user interface or “NUI” for computers a way of interacting with them without the need for any controller at all, just your body movement, gestures and voice. In early 2008 a prototype was shown costing an estimated $30,000 and being massive it was difficult to see this going anywhere outside of the labs.
But a challenge was set to the Microsoft labs incubation team named after a town called Natal in Brazil, to miniaturise this technology and reduce the cost enough that it could be sold as an attachment for their Xbox 360 console, this was to be this answer to the Wii, sufficiently different in approach to catch the attention of the market the Wii had created and advanced enough technically that the third party developers would take interest. When Project Natal was announced to the public at E3 2009 it captured worldwide attention and set the target high for the now global teams that were working on the technology. It was divided into three parts: Primesense and the Microsoft Israel team would work on the 3D depth sensor technology, the most costly part of the package; Microsoft Seattle and their Stanford research team would work on the software to turn the depth map produced by the sensor into usable data for a game to interpret, and Microsoft Research in India would work on “Project Berlin” the voice command segment of the technology based on existing voice command tech but using a new four microphone approach to cancel background noise and provide acoustic source location (ASL), a technology whereby the sensor could tell where sound was coming from.
By early 2010 it had become a reality and draft sensors were starting to appear, combining a near Infra-Red projector, a CMOS camera tuned to see the resulting projection (the distortion of this projection from the source image allows the sensor to accurately judge how far objects are from it), a colour camera used for video chat and facial recognition and four microphones spaced precisely for ASL (this lead to the sensors elongated shape) mounted on a motorised tilt base so that it could adjust its’ field of view automatically. Only the software was left to complete.
The sensor was producing an excellent depth map of the room but interpreting this data was a different story, first the player had to be removed from the background so it simply looks for human shaped objects and cuts everything behind their depth plane out, then it uses a pre-computed library of tens of thousands of possible human body positions to work out where each of your limbs are (or guess if it can’t see them based on a basic understanding of how humans can move and bend) and assigns a point to each joint (22 in total per player) these points form a skeleton of reference points that the games can interpret, All this in a fraction of a second. Meanwhile the voice technology had matured very well and they had created an excellent system for filtering out sound that the console itself was producing whether it be a game or movie and also the ambient sounds of the room and responding to voice commands rapidly and with few mistakes, all be it a little sensitive to accents and requiring recalibration for each new room it was to be used in (the sensor plays a series of test sounds and measures the time it takes to bounce off objects in the room so it can filter out the echo effect). It was quite the technical achievement and one that Microsoft was rightly proud of, touting the newly renamed Kinect sensor and nothing short of revolutionary at E3 in 2010.
Of course while Microsoft may have thought of Kinect as a sure fire hit, the rest of the world and indeed their game development partners still needed convincing, sure the technology was great on paper and technically advanced. But was it and good in the real world, was it fun and most importantly, would it sell?