Quick Links|Home|Worldwide

Microsoft Research News & Highlights
99,999 Innocent Bystanders Spammed
by Suzanne Ross

If only one out of every 100,000 people responds to spam and the spammers earn around $11, spamming is profitable. The other 99,999 people who don't respond, who never respond, are innocent bystanders. That's because the cost of spam is so low.

But it's not free.

Researchers at Microsoft found out about the costs of spam during their research into stopping spam. "Some people say spamming is free, and that you just have to raise the cost a tiny amount to make it unprofitable. That turns out not to be true. There are already a lot of costs involved," said Joshua Goodman, a researcher in the Machine Learning and Applied Statistics group at Microsoft Research.

"Spammers have to get bandwidth from somewhere to send their messages. Their ISPs might cut them off, so they may need to get a 'pink letter,' where they bribe the ISP not to cut them off as quickly. They need to buy lists of addresses. Most Web hosts don't like hosting spammers, so they have to pay extra for hosting services that will tolerate bad behavior. They also have the risk that they'll be caught and prosecuted. We've estimated the cost of sending a spam message today to be about .01 cents.

"That's still incredibly cheap. In terms of how much spammers earn, you can cut the math all kinds of ways. Say they sell something where they make a $11 profit per item. If their cost is only .01 cents, they can afford to send out millions of messages and still make a profit. You don't need a very high click through rate to be profitable," said Goodman.

"That also means that when we think about solutions to spam, we can't just raise the cost by a thousandth of a cent - spammers wouldn't notice that - we have to have solutions that raise the cost much more. We're making good progress in this regard."

Challenging Spammers
Goodman and his colleagues, including Cynthia Dwork, have come up with a plan they think will raise the cost for spammers.

Cynthia Dwork and Moni Naor first introduced the concept of requiring a spammer to solve a computational puzzle. Dwork further refined her work with Ted Wobber when she came to work at Microsoft Research.

Goodman and the Anti-Spam team refined her ideas, using safe lists and the email filter. If you receive a message from someone who isn't on your safe list, the email filter will evaluate it to see if it might be spam. If the message is suspicious, your machine will send a challenge to the sender, requiring the sender's machine to solve a computational puzzle.

You might wonder, 'what does that prove?' It isn't intended to prove that the sender is legitimate, it is intended to cost the spammer computational power, increasing the cost of doing business. An individual who isn't a spammer and sends out a normal amount of email will not even notice. But the extra drain on computational resources will impact a spammer who sends out millions of messages a day. The amount of computation would push spammers into buying new computers, so that the cost of spam would far exceed .01 cents.

Moni Naor suggested using Human Interactive Proofs (HIPs) to combat spam. A HIP is an image that contains distorted letters and numbers. Humans can read the HIP, but a machine cannot. Since spammers rely on machines to send and receive messages without expensive human intervention, if spammers were challenged to read a HIP their machines would fail the test. The Anti-Spam team thought this was a good idea.

"You take Cynthia's idea of using computational puzzles, and you take the idea of using HIPs and you combine that with the machine learning filter. If something comes in that's suspicious, the computer sends a challenge. The challenge says solve this computational puzzle, or solve this Human Interactive Proof," said Goodman.

By giving the user the option to solve a HIP, the user won't have to download special software to solve the computational puzzle or update their legacy computer so that it can handle the computation. They can just read the HIP and send back the answer. Spammers would have to employ and pay a lot of human workers to read HIPs, which would make spamming unprofitable.

"It costs at least .2 cents to pay someone to solve a HIP, compared to the .01 cents spammers currently pay - that's part of the reason we studied spammer costs. This changes their profit model by a factor of 20.

"It's less annoying for the sender, but it's a lot harder for the spammer, because the spammer is getting challenged on all of his messages. The idea of using challenges, using the filter, using safe-listing and combining that with Cynthia's great idea of using puzzles, is something that we think makes the whole thing much more feasible," said Goodman.

But any single tactic isn't enough to foil the spammers completely. The researchers at Microsoft Research and on the Anti-Spam Team have been gathering data over the last few years that will help them refine filters and find out how to attack spammers on multiple fronts.

By O/ur Kool s t u ff
One of the lessons that the researchers learned over the past year was that people answer almost illegible spam messages. Spammers obscure and misspell words to fool spam filters. Spammers might write the word 'loan' as 'l/oan' for instance.

At first they weren't worried. They thought that if they messed up the words, people would know that it was spam, and clearly from someone they couldn't trust. The researchers didn't think people would respond, so they didn't have to allow for it in the filter. They found they were wrong. People still responded.

Drugs, sex products, or fake diplomas garner high interest from spam respondents. It seems that if you're interested in getting a fake diploma you don't care if the person trying to sell it isn't erudite. Hulten and Goodman worked together and found an easy way to adjust the filter to automatically identify obscured words as well the obvious 'bad' ones.

Dictionary Attacks
Spammers need to find your email address somehow. One of the most common techniques spammers use for this is called a 'dictionary attack.' They try literally millions or billions of email addresses, and see which ones bounce back to them as undeliverable. The ones that don't bounce are valid email addresses that they can spam with obnoxious ads.

They try all sorts of addresses -- first names (Joshua), last names (Goodman), first names + numbers, such as Joshua100. They are very good at guessing popular addresses. In addition to letting spammers find valid email addresses, these attacks send so many test messages that they tie up email servers substantially, so they're doubly bad.

"The anti-spam team wanted to detect and stop these attacks," said Goodman. "It turns out there are different numbers you can look at to figure out if someone is doing a dictionary attack. One of the things I did was help the Anti-spam team figure out just what numbers to look at. Unfortunately, computing some of those numbers in the obvious way is very inefficient. Dan Simon, a cryptographer in Microsoft Research, provided a great solution." Cryptographers have been looking at how to thwart dictionary attacks for password protection.

"One of the best thing about working at Microsoft Research is that we work in so many different areas, so when I need help solving a problem, I don't have any trouble finding an expert who can help me. The dictionary attack work was a great example of that," said Goodman.

Phishing for $
One of the nastiest spam scams around these days is called 'phishing.' A phisher steals your identity, and a company's identity so that he can clean out your bank account. He baits the hook by sending an email that claims to be from a legitimate company - usually a bank - that reports that your account numbers have been stolen. The next step is to send you to the company site to 'verify' your account numbers. The link that you click in the email goes to a site that looks exactly like what you expect. But the phisher has redirected you to a site that mirrors the legitimate company.

You, worried someone has stolen everything, enter your account information. After all, it looks just like your bank's site, so in your panic you don't stop to question. The phisher now has what he needs to steal your identity.

Phishers have made a lot of money with this scam. Recently, Joshua Goodman and his manager, David Heckerman, discovered a new variant of the phisher scam.

Heckerman called Goodman to say, 'I think I have a message from a phisher.' Goodman did some investigation, and found out that Heckerman had indeed been targeted by a phisher scam. EBay and its payment arm, PayPal, were the company targets.

The eBay/PayPal phishing scam has been around for awhile, but phishers added a new twist to make it seem even more legitimate. In the weeks before Heckerman had received the fake notice from PayPal saying that his account had been compromised, he received emails from various senders asking about items that he had for sale on eBay. However, Heckerman didn't have any items for sale on eBay at the time. His immediate thought was, 'I wonder if someone has my account information.'

When he received the message from the phisher, it should have verified his fears, making him more susceptible to the scam. However, Heckerman is a spam expert and recognized the nefarious goings on. Goodman verified that the emails from senders asking about items were fakes as well and part of a new kind of phisher scam.

Since phishers make much more money from one victim than a typical spammer selling fake diplomas, they can go to much more trouble to hook their prey.

The International Language of Spam
Microsoft has been actively pursuing legislation and prosecutions to stop spammers. For instance, Microsoft supported the U.S. CAN-SPAM act. However, spammers know no boundaries. So Microsoft has also collaborated with government agencies around the world to bring enforcement actions in 14 countries against illegal, deceptive, or fraudulent email.

Goodman and his colleagues Geoff Hulten and Robert Rounthwaite wanted to know whether laws and prosecutions could solve the spam problem. They studied over two million email messages from Hotmail accounts. The messages came from the Hotmail Feedback Loop, a mechanism that allowed over one hundred thousand Hotmail users to give feedback about which of their messages were good and which they considered spam.

Using the data, they were able to find out which countries generated the most spam. Their research showed about 15% of our total spam email originates from IP addresses in China. However, it may actually originate elsewhere, as spammers use tactics to disguise their identity and location. China is probably not the "true" originator of the spam - much of it is sent by US or other spammers who use Chinese Web hosting. In the United States, there are equal parts of good and spam email. Very little spam comes from Japan.

The image on the left shows a color-coded map of the countries with the most spam. Orange areas represent high proportions of spam, while dark green represents lower proportions of spam messages.

The researchers found out that the international language of spam is English. Spammers send more emails in English than any other language. Email users rarely see spam messages in Korean or Japanese, though they are seeing increasing volumes of non-English spam.

When the researchers connect this geographic information with what the spammers are selling, they're able to paint a picture of how laws might be successful in stopping spammers.

They grouped the type of products being sold into three categories. Domestic, semi-domestic, and international. Domestic products require a domestic presence. They include financial services, insurance, and items too expensive to ship internationally.

Semi-domestic products require shipping, but the cost is low. These include products such as Viagra, college diplomas, and magazines.

International products or services include those that don't require physical shipping or a domestic presence. They range from software, to porn, to swindles such as the infamous Nigerian scam.

About one-third of the spam was domestic, about a third semi-domestic, and about a third international. "This kind of analysis is really useful" says Goodman. "Given the large amount of domestic and semi-domestic spam, it shows that good laws and prosecutions in the US and neighboring countries can really help. Also, when you see such a large portion of spam that appears to be coming from overseas, you realize there's a need to globally coordinate efforts to address the problem. It also shows us that while laws can be very useful, they won't solve the problem by themselves. We have to keep working on improving the technology and standards too."

©2006 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement