Yahoo's Long Term View on Search, Personalisation...
She was getting some sun, I was scraping a fascinating interview with Yahoo's new Chief of Research. She didn't exactly care when I started reading snippets from the article (sunglasses back on/eyes closed/ipod volume up) "Yahoo accounts for 12 to 15 percent of all the Web activity worldwide (Yahoo’s numbers). "We have an amazing outreach," Raghavan said. "Ten terabytes of data, which for a scientist is pretty appealing."
"We have two views of better search. Most people are not interested in search—they want to get things done. The future has to be more friendly to people getting tasks done. You don’t want to spend two weeks of evenings sitting at a keyboard and piecing together a vacation plan. You want a system to go out and find the answers, based on future technology that goes beyond crawling and indexing pages."
I hesitate to use the buzzword of 'Semantic Web'–but it is about entity extraction, XML queries, unstructured queries, semantic ambiguity. We have to build a view of the world. When you issue a query, it has richer view than a text index. We’ll start to see manifestations of this in five years."
"We want to inspire the audience to give more data and more. If someone creates a snippet of music and others remix it and it finally becomes a hit, how do you divvy up the proceeds amongst all the constituents? That [economic incentive network] has to be figured out. There is a lot of microeconomics that is not fully understood, and it’s one of the areas we want to understand. There will be Nobel Prize in economics award for this stuff, and I wouldn’t be unhappy if it came from our group."
"We have a plethora of opportunities looking at different social networks, such as blogs, instant messaging, My Web, Yahoo 360, and other services, across Yahoo properties," Raghavan said. Yahoo's social search engine My Web 2.0, for example, allows Yahoo users to archive, tag and annotate search results and share them with other people using the service. Users can also search their contacts' My Web and browse content that others on Yahoo's network have shared.
But determining what data from the pools of Yahoo services and billions of inputs is useful to people and will create a breakthrough in the user experience is one of his team's challenges. "It’s a classic problem in statistical machine learning—you might have 200 data points, but how do you zero in on the three that make a difference?"
"Personalizing is a loaded word, and it sometimes gets trivialized. It’s not about customizing the colors on the MyYahoo page," Raghavan said. "It’s more of a social phenomenon that takes into account what others are doing, especially people like yourself. Content, context and community coming together is a long-standing dream in our business—we are all going after it. But, the catch is when the user is not only a consumer but also creator of content. It leads to interesting possibilities in tandem with data mining and the user experience. You have to decide what content to show that users will find valuable, and not irritate users with too much content."
Raghavan has also spent time looking at how to mine blogs for predicting the movement of products and developing new user experiences. "We are looking at sources of information– text, photos, podcasts–whatever we can mine from the back end. Then we look at what users want, and bring the two together to create an application from all chatter going on," Raghavan said. "We can dream up cool experiences, but they have to be grounded in product reality. As we develop technology, markets start to react, so mining begets a reaction from market and begets more mining, so we are constantly working on more scenarios."
With 345 million unique users per month across 25 countries and in 13 languages, Yahoo, as well as its competitors–especially Google–has some experience in planetary scale computing.
While the progress over the last ten years of the Web has been significant, we are still in the Stone Age of search, social networks, incentive models and personalization. With the competitive juices flowing in research labs, and wide open commercial opportunities, the next ten years will be more about answers than links, but not without some serious flailing…"
"We have two views of better search. Most people are not interested in search—they want to get things done. The future has to be more friendly to people getting tasks done. You don’t want to spend two weeks of evenings sitting at a keyboard and piecing together a vacation plan. You want a system to go out and find the answers, based on future technology that goes beyond crawling and indexing pages."
I hesitate to use the buzzword of 'Semantic Web'–but it is about entity extraction, XML queries, unstructured queries, semantic ambiguity. We have to build a view of the world. When you issue a query, it has richer view than a text index. We’ll start to see manifestations of this in five years."
"We want to inspire the audience to give more data and more. If someone creates a snippet of music and others remix it and it finally becomes a hit, how do you divvy up the proceeds amongst all the constituents? That [economic incentive network] has to be figured out. There is a lot of microeconomics that is not fully understood, and it’s one of the areas we want to understand. There will be Nobel Prize in economics award for this stuff, and I wouldn’t be unhappy if it came from our group."
"We have a plethora of opportunities looking at different social networks, such as blogs, instant messaging, My Web, Yahoo 360, and other services, across Yahoo properties," Raghavan said. Yahoo's social search engine My Web 2.0, for example, allows Yahoo users to archive, tag and annotate search results and share them with other people using the service. Users can also search their contacts' My Web and browse content that others on Yahoo's network have shared.
But determining what data from the pools of Yahoo services and billions of inputs is useful to people and will create a breakthrough in the user experience is one of his team's challenges. "It’s a classic problem in statistical machine learning—you might have 200 data points, but how do you zero in on the three that make a difference?"
"Personalizing is a loaded word, and it sometimes gets trivialized. It’s not about customizing the colors on the MyYahoo page," Raghavan said. "It’s more of a social phenomenon that takes into account what others are doing, especially people like yourself. Content, context and community coming together is a long-standing dream in our business—we are all going after it. But, the catch is when the user is not only a consumer but also creator of content. It leads to interesting possibilities in tandem with data mining and the user experience. You have to decide what content to show that users will find valuable, and not irritate users with too much content."
Raghavan has also spent time looking at how to mine blogs for predicting the movement of products and developing new user experiences. "We are looking at sources of information– text, photos, podcasts–whatever we can mine from the back end. Then we look at what users want, and bring the two together to create an application from all chatter going on," Raghavan said. "We can dream up cool experiences, but they have to be grounded in product reality. As we develop technology, markets start to react, so mining begets a reaction from market and begets more mining, so we are constantly working on more scenarios."
With 345 million unique users per month across 25 countries and in 13 languages, Yahoo, as well as its competitors–especially Google–has some experience in planetary scale computing.
While the progress over the last ten years of the Web has been significant, we are still in the Stone Age of search, social networks, incentive models and personalization. With the competitive juices flowing in research labs, and wide open commercial opportunities, the next ten years will be more about answers than links, but not without some serious flailing…"



<< Home