Results from Wolfram Alpha: All the Questions We Ever Wanted to Ask About Software as a Service

May 6, 2009

Software as a Service, known in earlier decades as Application Service Providers, upends the relationship between computer users and software. Although it works deceptively like a stand-alone software installation, SaaS really changes everything. And as technologists have watched its tremendous growth over the years, they’ve asked lots of questions about SaaS. Most of the questions fall along the lines of data integrity and reliability, but some concern privacy and innovation as well.

As inadvisable as I know it always is to suggest that any technology has reached a “limit” or “ultimate” point, I’m seriously tempted to say that Wolfram Alpha takes the SaaS model to its extreme. So Wolfram Alpha’s chances at scaling the heights of fame should force us to stop for a moment and run our own calculations concerning the value to us of data integrity, reliability, privacy, and innovation.

Wolfram Alpha: an introduction

Alpha is a kind of “instant research associate” that can answer questions posed by visitors on a huge number of topics. It combines high-quality data sources with intelligently chosen calculations to return information tailored to each question. Alpha’s inventor, the world-renowned Stephen Wolfram, has been giving semi-private demos of Alpha over the past few months, one of which I caught this week in Cambridge, Massachusetts. For half an hour—enough time to bore me a bit—he pummeled the Alpha site with questions ranging from population to weather to nutrition to genetics.

Instead of covering the functionality of the service in detail, I’ll hone in on the characteristics that highlight the issues in this article.

Alpha’s behavior certainly comes across at first as magic. One can trigger a calculation through an explicitly mathematical syntax, such as asking for “Europe nation population/area,” or pose the question in more natural language, such as “What is the relative population density in European nations?” In either case, Alpha retrieves the populations and areas of the specified nations and performs the calculation. But like an overly eager store clerk or a garrulous acquaintance starved for social contact, Alpha feels it has to go further than you ask, extending the screen with cascades of related facts. The overall effect is of bottomless insight.

The magic fades rapidly as you probe areas where the Wolfram Research team has accumulated fewer databases. Soon your results are reduced to a few bare facts, and most pitiably to “splat screens” that suggest alternative questions.

(Pop culture prediction: a new pastime of the socially challenged will be finding queries that cause Alpha to return amusing splat screens.)

Alpha is good, I’ll acknowledge that right off the bat. Any bored computer addict can load up statistics till his hard disk grinds to a halt, but Alpha serves up statistics with a panache that suggests excellent linguistic analysis and the kind of fancy analysis long associated with Dr. Wolfram’s most famous product, Mathematica.

How does Alpha work?

Alpha relies first of all on reliable data sources. Wolfram Research staff spend a good deal of time checking for accuracy and consistency; Dr. Wolfram even said they rejected many proprietary sources that looked promising because they turned out to have lousy quality.

But Alpha relies even more on knowing what calculations to apply. Its storehouse has to pack up not only data, or even algorithms, but rules about which algorithms to apply to which data. You’d be ill-served by crunching ordinal data with a measurement designed for interval data. I assume that Alpha’s designers have taken care to avoid such mistakes.

Will Alpha handle being popular?

Performance is not a topic of this article, but the unusual service offered by Alpha lends itself to conjecture. Its searches and reductions make ideal candidates for both data parallelism and task parallelism, so Alpha naturally runs on a massively distributed grid. Even so, I wonder whether the impressive response time it offers when Dr. Wolfram is using it alone will hold up if millions of people run queries on Alpha with the frequency they visit Wikipedia or Google—a popularity to which Dr. Wolfram clearly aspires.

One key task in maintaining good response time is keeping each query short. As long as you can return results quickly, you can process lots of queries effortlessly. If a processor gets stuck on one query, though, others tend to queue up and eventually you start timing out on visitors. Alpha looks like the kind of service that could require a lot of processing time, especially if visitors start entering long strings and request complex operations. (How much of each protein will they have to search to find a requested sequence?)

No doubt Wolfram Research has hired top performance experts. Their business model seems ready to handle necessary hardware investments. But we won’t know whether they can handle large loads until they become a household word. Industry observers and media reports seem to be taking on that job for them.

The questions we need to ask about Software as a Service

Let’s look now at each of the key issues I listed at the beginning of the article.

Data integrity

Researchers sweat a good deal over obtaining the best possible data. Most of us downgrade our standards for data validity when we go online for casual searches, just as we downgrade our standards for taste and texture when we get lunch from a hot dog stand. But when we depend on SaaS, we need tough standards.

The other side of the coin in data integrity is trusting a service not to lose or corrupt any data we upload, as with Google Docs or Amazon S3. Wolfram plans to offer a Alpha for-pay service that handles client data, so it may face the usual questions of trust there, but these won’t be the subject of this section; I’ll touch on them later in the article.

Wolfram Alpha aspires to the status of a serious research tool, so we have to return to the way researchers work. If they don’t collect their data directly, they query and compare sources. And Dr. Wolfram assures us his staff has done the same for Alpha. But behind every serious measurement lies the question how far one can trust the data, and we have to ask how far we trust Alpha’s data.

During the Cambridge demo, Dr. Wolfram pointed out that sources are listed in footnotes to the results. Interestingly, I heard from someone who attended an earlier demo that Dr. Wolfram had touted Alpha as a trustworthy primary source and said nothing about footnotes. Perhaps they were a recent addition. In any case, they’re a crucial backstop for people worried about such matters as the age, margin of error, and consistency of their data.

Validity is still at risk whenever you combine data from different sources, which Alpha does promiscuously. Jonathan Zittrain, the Harvard Law School professor who moderated the Cambridge presentation, raised the example of comparing daily temperatures in two different places, only to find that one place collected data in the morning and the other at noon. This is only one link in the chain of our next issue, reliability.

Reliability

Nearly every real-world calculation, like nearly every data source, has a margin of error. Alpha provides, on a click, the code to each calculation it performs. But Dr. Wolfram pointed out that questions of validity quickly become intractable when running multiple calculations involving lots of different data. Is Wolfram Alpha better or worse in this regard than our current ways of reaching decisions?

For most research, frankly, I’d trust Alpha over my own calculations. For instance, Wolfram Research probably consulted with top nutritionists concerning cholesterol levels, so if I’m trying to figure out the meaning of various tests that came back from my blood sample, I’m more likely to get good answers from Alpha than from whatever hare-brained calculations I might do on my own. But a trained nutritionist has more options.

I predict, then, that Wolfram Alpha will become a trusted stand-by in three kinds of situations:

Casual searches done out of curiosity, as people do on Google and Wikipedia.
Speculative, back-of-the-envelope calculations. For instance, when I decide whether to propose a new book on JavaScript at O’Reilly Media, I do some crude searches to determine whether its use is increasing and how it compares to other programming technologies. Everybody knows that the results will be very rough and provisional, so a source like Alpha will do fine.
Preliminary tests for hunches and alternative hypotheses. For instance, when a sociologist is considering the factors that could have an impact on observed phenomena—changes in immigration, changes in income, changes in educational level, and a few others—she may query Alpha to find which offer the most promise and research those first.

So Alpha could change the way science is done—at early stages. But no competent researcher will put Alpha results in a paper, any more than he’d quote Wikipedia as an authoritative source.

But the most serious limitations of Alpha don’t stem from the questions of data integrity and reliability, but from the questions we’ll get to later of innovation.

Privacy

Storing data in a SaaS service creates privacy risks, as the Electronic Privacy Information Center points out in its page on cloud computing. About six weeks ago they actually filed a complaint against Google with the Federal Trade Commission, which shows how seriously privacy experts take this matter.

Yes, well-known data stores have bugs and present an appealing target to crackers. But my personal opinion is that, given all the cross-site scripting web hacks and zero-day exploits in software, it’s a toss-up whether your data is safer in the cloud or tucked away inside your desktop computer.

When privacy crosses the minds of cloud computer users, they worry more about what Google or other services will do themselves with the data they collect. This has even become a barrier to governments’ efforts to reach out and interact more closely with the public online. Because governments take the easy way out and use popular commercial services such as Twitter and YouTube instead of developing new, non-profit services, they suddenly find terms of service that run up against their own regulations, along with policies that sacrifice privacy to the goal of maximizing the services’ advertising reach. (The New York Times recently highlighted some sample problems.)

Nor is government always an innocent participant. Another consideration to give us pause is the lower threshold for protection against government search and seizure when your data is on someone else’s system.

But the privacy concern most relevant to Wolfram Alpha is aggregation of personal information from multiple sources. Dr. Wolfram talked a bit in his Cambridge demo about the potential to abuse Alpha’s powerful tools for correlation and data extraction to find fun facts on individuals. He explicitly ruled out plans to collect data on individuals outside the limelight, precisely to avoid raising such privacy risks.

But private data could still end up in Alpha, even if there’s no concerted effort to put it there. And even if Wolfram Research doesn’t perform data mining on personal data, other people may do so.

The risks of data collection touch on every networking activity today, including SaaS. Each of us chooses his or her place on the spectrum between the McNealy doctrine (“Get over it”) and the people who buy goods with cash and use TOR for all their online communications. Things will probably get worse unless we start making decisions as a social collective.

Innovation

I mentioned at the beginning of the article that the most potent ingredient in Alpha’s secret sauce is the rules about which algorithms to apply to which data. And I pointed out that for conventional inquiries such as the meaning of cholesterol levels, Alpha will probably serve us well.

But great research depends on unconventional inquiries. Ordinary calculations produce ordinary results, which can be valuable to confirm that an organization’s activities are safe and responsible. But paradigm shifts are more likely to result from a researcher thinking up a new angle calling for a new calculation.

The choice to use an SaaS site for calculations depends on one’s knowledge and resources, on how heavily people depend on the results, and on the depth to which you want to explore a data set. Thousands of programmers are creating mash-ups on the Web. Media and government sites are exposing data through APIs to encourage even more mash-ups. Wolfram Alpha may be a boon to amateurs who want to process all this information, and could thereby contribute to the public good. (Dr. Wolfram has also promised a Wolfram Alpha API.) We should welcome the chances for more innovation at this level—but such innovation in a box has limitations.

SaaS has long been criticized by free software advocates on grounds related to innovation. For someone who doesn’t like functionality locked up (as free software proponents view it) behind proprietary barriers, the worst nightmare is SaaS. Free software can be extended at the whim of any programmer; proprietary software is more rigid because it can’t be enhanced unless it offers hooks and APIs; SaaS is theoretically even worse because the vendor can actually take away functionality. You may depend on some feature that the vendor decides is not of interest to their customers, or too hard to support, or detrimental to their business plans, and it could just vanish one morning.

Innovation—or generativity, as Professor Zittrain likes to call it—lies at the foundation of networking. When SaaS vendors provide APIs onto their services, they foster innovation. But there are always features you can’t get to (perhaps for good reason, because you could undermine their service) and that makes them less generative than free or open source software.

Data hidden behind a service is also less free than shared data. Dr. Wolfram himself expressed hopes in his talk for the release of more high-quality data.

Software as a Service and the three stillborn revolutions

Wolfram Alpha may be, as so many enthusiasts say, a game-changer. But free software is always the master game-changer. The free software revolution progressed further than most people expected, and as a game-changer it is still in just the ante-up stage. If Android or another free software system turns every mobile device into an open system…if underdeveloped nations leap forward by creating free software systems for populations not served by current software products…if governments provision their infrastructure with free software and open formats, we’ll really see what it can do.

SaaS vendors are happy to use free software as the underpinnings to their services, and many even contribute back enhancements, but insofar as people depend on SaaS, the free software revolution will be stillborn.

A second revolution waiting for its day in the limelight is the peer-to-peer movement. Early in this decade, its promises included:

Letting people keep control of their data, while combining it in ever-creative ways
Exploiting the compute power at the endpoints to cut down on central server workload (albeit at the cost of increasing network traffic)
Generally thwarting all efforts to hold back and tie down use of the Internet

The difficulties of implementing true peer-to-peer systems quickly emerged, as I pointed out in two articles (From P2P to Web Services: Addressing and Coordination and From P2P to Web Services: Trust) and the movement went into the background as its impulse to contribute and participate got absorbed by SaaS services, the movement eventually dubbed Web 2.0. People are, indeed sharing compute power and data, but they use a web site as a middleman.

A third stillborn revolution lurks in the shadows of SaaS, this one without an easy and familiar moniker like open source or peer-to-peer. This revolution can be seen when people work together online, reach out to diverse populations for fresh viewpoints, and discover where the surpluses of one population can plug into the needs of another. One might call it the tech-splicing revolution, and like the others it builds on both human urges and progress in electronics.

For instance, UC Berkeley professor Tapan S. Parikh reported in a recent Communications of the ACM (January 2009, pp. 54-63) how remote rural farmers in Mexico and Guatemala could follow rigorous quality standards for organic agriculture, and thus become competitive with more technologically advanced areas, through the integration of mobile phones and simple Web forms into an existing certification process.

Tech-splicing is not driven merely by a decrease in the costs of computer technologies, or by their wider dissemination. Nor is it just technology transfer, or the local design of appropriate technology. At its most fertile, the movement is a collaboration across geographic and cultural lines that finds solutions meeting the needs of both sides, blending technologies from each side in creative ways. Tech-splicing is really the game-changer now. SaaS finds a role in this process, but free software and peer-to-peer could trigger explosive progress.

SaaS’s use will continue to grow for the foreseeable future. Its advantages just can’t be ignored:

You’d be crazy to hold on to responsibility for backing up your own data when you can just leave it on an Internet-accessible location all the time. (The exception is when you’re a large organization with so many petabytes that third-party storage is unfeasible—and even then, some services claim to handle it efficiently.)
And who wants to be responsible for installing all the critical updates to his software, including security fixes? Let a professional handle it.
The increasing efficiencies provided by virtualization on the server side, and by shrinking notebooks and mobile devices on the client side, make SaaS more attractive to all parties.

But we can look toward the three stillborn revolutions as beneficial models and as goals for research and experimentation. I’m happy the world will have Wolfram Alpha, and I’ll probably visit it from time to time. Still, the future of the Internet lies in different directions.

We can better determine the role of SaaS, as well as the three revolutions, by continuing to ask questions. How much can we rely on the data and calculations returned by a web site? Can we share data while guarding our privacy through legal or technological limitations? Some standards in these areas may prove valuable. And more open access to data will lead to more and better applications for everyone.

You can reply to this article by signing up for an account on the O’Reilly Media blog site and adding a comment to my associated blog.

May 9, 2009, afternoon: The Aspen Institute has published a book titled Identity in the Age of Cloud Computing: The next-generation Internet’s impact on business, governance and social interaction, available for purchase or free download, that covers many interesting issues related to this article.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Andy Oram is an editor at O’Reilly Media. This article represents his views only.

Author’s home page
Other articles in chronological order
Index to other articles