Search experiments, large and small

In my previous post, I described the components of your web search experience and the principles behind creating a great search experience. There are complex algorithms underlying simple features such as spelling correction and the two line snippets that describe each search result. We figure out what works by running experiments - tiny tests for a small number of users which help us determine whether that feature helps or hurts.



Experimentation is a very powerful tool, and we use it very widely to test potential changes to search. At any given time, we run anywhere from 50 to 200 experiments on Google sites all over the world. I'll start by describing experimental changes so small that you can barely tell the difference after staring at the page, and end with a couple of much more visually obvious experiments that we have run. There are a lot of people dedicated to detecting everything Google changes - and occasionally, things imagined that we did not do! - and they do latch on to a lot of our more prominent experiments. But the experiments with smaller changes are almost never noticed.



For example, can you tell the difference between the two pages below?



Choice 1:



Choice 2:



I'm pretty sure I would not be able to tell the difference if I were to see each of them on their own. But apparently you can! At least in the aggregate, there is a measurable difference with a change like this. In case you can't tell after staring, the white space around the first search result has changed, which makes the first result in Picture 2 slightly more visually prominent. This visual prominence conveys the fact that according to our ranking signals, the first result is a substantially better match than the next result. On the plus side, it helps you focus on the first result. But if you were looking for one of the other results, it can disrupt your scanning of the page. An experiment helps us determine which effect is more prominent, and whether a change would help you search faster.



Another change, almost as minimal visually, is between these two results:



In this case, the difference in user interaction is so clear and marked we could tell extremely quickly which one worked better: the difference is in the thickness of the plus box next to the stock quote. Now, coming to the conclusion that one is "better" is tricky, and there's many a possible slip on the way there. Does more interaction with the plus box mean that it is better? How about if users then miss good results because they are distracted by the more prominent plus box? Keep watching Google to see which version won! If we've done our job right, almost without your noticing, things will work just that little bit better for you. The world will seem rosier. Birds will sing. Or maybe not - but at least you will have the best-designed plus box we can come up with :)



Okay, so not all of our experiments are insane eye tests. My main point in highlighting the above experiments is that we test almost everything, even things that you would think are so small that we could not possibly care (nor could they possibly matter). In fact, small changes do matter, and we do care.



Another class of experiments have to do with changes that are not purely visual, but rather involve changes to the underlying presentation algorithms. For instance, the algorithm that is responsible for the titles and snippets of result pages now highlights stems and some synonyms of the original query term. For the query [hp printer drivers] we will also return results that include and highlight the word "driver".
This sort of "stemming," as it's called, is generally a good idea, because it helps you better identify results that match your query, but not always. Experiments of this sort help us verify (or, occasionally, overturn) our assumptions regarding changes in these algorithms.



There is a further class of experiments - the kind that are hard to miss - which introduces fairly prominent features. Even with these larger features, the goal of experimentation always remains the same: are we adding something that really helps people, or is this just another distraction? Google does not really come with a user manual (actually, there are some nicely-written help pages, but we're pretty sure most of you don't bother to read them!). So features need to stand on their own feet, without the help of a careful explanation. Part of the goal of an experiment is to understand just how a feature will be used, which might be quite different from what we initially intended.



Here's an example of an experiment that lets you comment on search results and move them around on the result page:


At this point, I can't say what we expect from this feature; we're just curious to see how it will be used.



These are a small sample of the kinds of experiments we run as we test everything from the barely visible to the glaringly obvious. So the next time you use Google and it seems a little different - well, maybe it is. Just for you!



Posted by Ben Gomes, Distinguished Engineer