Searching for anything over broadband web and advanced TV systems is going to have to evolve. We are still in the dark ages when it comes to getting relevant content, being forced to type in ‘keywords’ seems to me at least, such a crude, clunky way to get to stuff. I sit at computer and TV screens and would ideally just like to ‘think’ of (visualize, imagine, gesture) something I would like and voila up pops a very select group of items I can then refine. But the telepathic interface is probably a few years off yet at least. For now we move slowly forward, in the consumer facing domain at least, one step at a time. The reason for this post is to highlight two areas of development that are definitely part of the next step. The first is the interface and the second use of metadata. I will be looking at some baby steps Yahoo! is taking with its Mindset work later but first there is a conference in Genoa in a couple of weeks looking at a key new area ‘Enactive Interfaces’. Love the term. Here is their scope:
The scope of the conference is creating a truly multidisciplinary research community on the new generation of human-computer interfaces called Enactive Interfaces.
Enactive Interfaces are related to a fundamental “interaction” concept which is not exploited by most of the existing human-computer interface technologies.
In the symbolic way of learning, knowledge is stored as words, mathematical symbols or other symbol systems, while in the iconic stage knowledge is stored in the form of visual images, such as diagrams and illustrations.
On the other hand, ENACTIVE knowledge is a form of knowledge based on action for apprehension tasks. Enactive knowledge is not simply multisensory mediated knowledge, but knowledge stored in the form of motor responses and acquired by the act of “doing”.
A typical example of enactive knowledge is constituted by the competence required by tasks such as driving a car, dancing, playing a musical instrument, modelling objects from clay, performing sports.
This type of knowledge transmission can be considered the most direct, in the sense that it is natural and intuitive, since it is based on the experience and on the perceptual responses to motor acts.
If you look at their programme pages there are a vast range of alternate ways to interact with our wonderful portals of personalized media. Everything from facial expression controlling what we get, lots of ‘hand movement in the air’, gesture futures through to a pot pourri of audio enactions. I also like the emphasis placed on resonant interfaces, ones that adapt themselves to you. Personalize media of course includes the way we get to our most relevant content. I spent quite a few years looking at personalized interfaces when I was running various BBC cross-media navigator projects. This became too big to handle at the time and it was at least 8 years ahead of its time, but I managed to generate quite a few demonstrators of a web, tv, mobile, pvr future all exhibiting cross-functionality and personalization – it even included an BBC personality avatar that reflected your personality as you browsed through your content. Might post a full detail of those projects soon if your nice to me! Anyway back to enaction. This is the holy grail of interface design, avoiding as far as possible picking up a keyboard and typing in text, using our bodies and gestures to generate intention – which dovetails nicely into the second part of this post focusing on Yahoo’s Mindset project.
I remember seeing a great demo at IBC a few years ago which showed how you could use a range of classification data and MPEG7 user profile data to narrow down and personalize selection. It used a bunch of sliders that had emotional options. So you could slide it for example between exciting or laid back or between silly or serious. It worked as well as the ‘richness’ of the metadata attached to the content of course but it did pave the way for further thinking in this area. Yahoo Research Labs have picked this up and are now running a beta of a similar approach applied to their search algorithms.
They call it intent driven search and if you play with the beta version you can start to see the potential behind it. At the moment they have one slider that goes between shopping and researching. OK strange bed-fellows but it does highlight that web ‘passive’ usage may be dominated by commercial on one hand and education on the other. I have put in a range of search terms and some are more successful than others – shame they have a sponsors block at the top of the results which is partly confusing. Nevertheless I am sure they are much further down the road than this beta suggests. Here is a little more from their FAQ
We’d like to keep improving and developing Mindset, and so we really value your feedback.. Our primary goal here ain’t to impress you with the results, it’s to give you a look at the underlying technology. We believe machine learning technology is powerful and has many uses. This demo was just one example — using the commercial/non-commercial classification and the search metaphor — of how this technology could be used. After you’ve read more about machine learning in the next section, perhaps you’ll think of other ways it could be used.(snip)
The field of machine learning studies and develops computer algorithms that improve automatically through experience. Machine learning can tackle the problem of automatically learning and replicating a human activity. For example, think of a baby watching what the adults do and mimicking them. Loosely speaking, machine learning technology is like that baby. Machine learning starts with a “seed set” of human-generated data. This seed set is divided into a training set and a test set. Using the training set, the machine “learns” what the human was doing in creating that set, and then it tries to apply that learning to the test set. If it “fails the test”, it goes back to the training set and “relearns”. This iteration continues until the learning is complete. For further reading on machine learning, check out www-2.cs.cmu.edu/~tom/mlbook.html and jmlr.csail.mit.edu/.
It is a start and hats off to Yahoo! Research for bringing this future looking area into the popular public domain. And so to building bridges. Looking at enactive interfaces and combining that with intention driven search we have an interesting ‘mash’. Given richly tagged content running in a machine learning based engine imagine being able to gesture with your face or hands to control the search! Subtle hand movements suggest you want more ‘breathtaking’ content, a smile suggests you want something more ‘humourous’. There are several standards including the one I keep harping on about, TV-Anytime, that have defined some rich classifications that we can attach to content. Here is a small example of ‘atmosphere’ metadata as a small part of a much bigger classification matrix defined when I was at TV-Anytime.
TV-Anytime, classification dictionary – atmosphere element
Alternative, Analytical, Astonishing, Ambitious, Black, Breathtaking, Chilling, Coarse, Compelling, Confrontational, Contemporary, Crazy, Cutting edge, Eclectic, Edifying, Exciting, Fast-moving, Frantic, Fun, Gripping, Gritty, Gutsy, Happy, Heart-rending, Heart-warming, Hot, Humorous, Innovative, Insightful, Inspirational, Intriguing, Irreverent, Laid back, Outrageous, Peaceful, Powerful, Practical, Rollercoaster, Romantic, Rousing, Sad, Satirical, Serious, Sexy, Shocking, Silly, Spooky, Stunning, Stylish, Terrifying, Thriller, Violent, Wacky
I do like the idea of generating personalized search and access to even parts of content (segments) using a compelling mixture of gesture and ‘intention/atmosphere’ metadata. Let’s hope someone out there is working in this domain – of course someone is. Perhaps this post may open up a few ideas and someone can run away and create some wonderful IP for themselves, there again, why am I posting this! Where’s the delete button, oh too late 😉
Posted by Gary Hayes ©2005