Corante

Authors

Clay Shirky
( Archive | Home )

Liz Lawley
( Archive | Home )

Ross Mayfield
( Archive | Home )

Sébastien Paquet
( Archive | Home )

David Weinberger
( Archive | Home )

danah boyd
( Archive | Home )

Guest Authors
Recent Comments

Ask Fm Anonymous Finder on My book. Let me show you it.

Ask Fm Anonymous Finder on My book. Let me show you it.

mobile games on My book. Let me show you it.

http://www.gunforums.com/forums/showtopic.php?fid/30/tid/15192/pid/111828/post/last/#LAST on My book. Let me show you it.

temecula dui attorney on My book. Let me show you it.

louboutin chaussures soldes on My book. Let me show you it.

Site Search
Monthly Archives
Syndication
RSS 1.0
RSS 2.0
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

Many-to-Many

« thoughts on twitter | Main | Twitter Tips the Tuna »

March 7, 2007

Spam that knows you: anyone else getting this?

Email This Entry

Posted by Clay Shirky

So a few weeks ago, I started getting spam referencing O’Reilly books in the subject line, and I thought that the spammers had just gotten lucky, and that the universe of possible offensive measures for spammers now included generating so many different subject lines that at least some of them got through to my inbox, but recently I’ve started to get more of this kind of spam, as with:

  • Subject: definition of what “free software” means. Outgrowing its
  • Subject: What makes it particularly interesting to private users is that there has been much activity to bring free UNIXoid operating systems to the PC,
  • Subject: and so have been long-haul links using public telephone lines. A rapidly growing conglomerate of world-wide networks has, however, made joining the global

(All are phrases drawn from http://tldp.org/LDP/nag/node2.html.)

Can it be that spammers are starting to associate context with individual email addresses, in an effort to evade Bayesian filters? (If you wanted to make sure a message got to my inbox, references to free software, open source, and telecom networks would be a pretty good way to do it. I mean, what are the chances?) Some of this stuff is so close to my interests that I thought I’d written some of the subject lines and was receiving this as a reply. Or is this just general Bayes-busting that happens to overlap with my interests?

If it’s the former, then Teilhard de Chardin is laughing it up in some odd corner of the noosphere, as our public expressions are being reflected back to us as a come-on. History repeats itself, first as self-expression, then as ad copy…

Comments (15) + TrackBacks (0) | Category: social software


COMMENTS

1. Ivo on March 7, 2007 2:20 PM writes...

Quite odd, I was recently wondering this exact same thing.

I dont recall being fooled into opening a spam email in a long time but this week it has happened three times. In all occasions the subject line was on topics of interest, twice I realized it was spam immediately after opening it and the last one I had to reread since I wasn't sure it was spam by simply reading the text.


Permalink to Comment

2. Ann E. Mouse on March 7, 2007 6:10 PM writes...

Its the new trend in spam - they went from short one, two and three word subjects with literary text quoted in the body to the new excessively long subjects that are pulled from newsgroups, mailing lists and newsfeeds with the body text being pulled from similar sources. For instance, news and weather alerts are popular since they tend to change and are very timely. But its still generic, AFAIK.

Permalink to Comment

3. Steve on March 7, 2007 6:14 PM writes...

Yep - I've received mails with a blend of the ATOM draft spec and text from various blogs (some of which I read, some of which I don't)

In each case the true message is a spam image at the beginning of the mail.

My guess is the same as yours - just swamp people with mail that will pass their Bayesian filters.

Permalink to Comment

4. Dave Orchard on March 7, 2007 6:40 PM writes...

I've gotten spam that mentioned W3C Specifications, XPath data model and new XQuery operators in the subject. I can't find the email any more but I did open up that one.

Permalink to Comment

5. Steve on March 7, 2007 6:45 PM writes...

I looked through my spam catcher to find other mails and confirmed that yes, this is still just a shotgun approach, not targeted subjects. The same spammer (or maybe just spamming application) has sent a variety of snippets at me; most just didn't get through the filter.

Permalink to Comment

6. Geoffrey Wiseman on March 8, 2007 12:58 AM writes...

I get spam all day at work about Cialis and Viagra -- they /must/ know me! :)

Permalink to Comment

7. Justin Mason on March 8, 2007 7:33 AM writes...

'Can it be that spammers are starting to associate context with individual email addresses, in an effort to evade Bayesian filters?'

As myself and Craig Hughes noted back in 2001 when developing Bayesian filtering in SpamAssassin, this is entirely practical for them to do. They already seed their address lists from scraped google searches, so all they'd need to do is collect the surrounding text near the address on the scraped page, and use that text as a "seed" for a search through other, similar texts -- or just google again -- to collect probably-workable Bayes-buster strings.

However, in this case, I don't think they're personalising the Bayes-busters in this way just yet; every address I have is receiving similar LDP-quoting spam, including the spamtrap addresses which are not publicly correlated to me (or any other techie person, either), in any way.

Permalink to Comment

8. Jamie McCarthy on March 8, 2007 10:57 AM writes...

Yes, as of a couple months ago I think. It's very poor at getting past SpamSieve's Bayesian filter. I'm pretty sure I've only seen the googlescraped phrases in my spam folder.

Permalink to Comment

9. Chico on March 9, 2007 1:10 AM writes...

I've noticed this with my Yahoo! acct for a while, and assumed AT&T which now owns Yahoo! was just continuing to turn over customers' Internet data to the federal govt to scan and sell back to their corporate buddies.

Permalink to Comment

10. Neil Rahillly on March 9, 2007 4:11 PM writes...

I've had the same impression too. I also occasionally get spam from familiar names, except with minor (sometimes humorous) spelling mistakes or hybrid names, with the first name of one of my contacts and the last name of another. My initial conclusion was that spammers must have gotten their hands on my personal information. It later occurred to me, though, that this assumption could be flawed in the same way as the assumption from the order of nature that evolution is purposive. What might really happening, instead, is that the Bayesian filter is acting as a sort of "natural selection" on the population of spam, so that the ones which survive to your inbox appear to have an "intelligent design". ha- bit of a ridiculous analogy, and somewhat misleading since the next round of spam will not represent the offspring of these survivors (unless, of course, you respond to them), but thought I'd share it nonetheless. Would be interested in any answers you find on the matter.

Permalink to Comment

11. mike on March 9, 2007 8:36 PM writes...

it is quite possible that spammers using specially programmed spyders crawling pages for email address left unsecure could associate the page content [eg. slashdot=computers, linux etc.] and thus generate a suitable subject line appropriate for the associated email address.

Permalink to Comment

12. Bob DuCharme on March 10, 2007 11:31 AM writes...

I got one that caught me off guard and wrote about it at One Namespace to Rule Them All.

Permalink to Comment

13. Anshu Sharma on March 11, 2007 10:22 PM writes...

Spam is an economic problem and the only solution that works will be one that makes the costs higher than the expected returns. This is true for many challenging problems we face- global warming, deforestation, immigration and is true for spam. Technology solutions are like fences on Mexican border - they won't work in the long term.

Permalink to Comment

14. lucychili on March 28, 2007 11:44 PM writes...

Justin: "They already seed their address lists from scraped google searches, so all they'd need to do is collect the surrounding text near the address on the scraped page, and use that text as a "seed" for a search through other, similar texts -- or just google again -- to collect probably-workable Bayes-buster strings."

Yup this makes sense. I get spates of political spam after I do political posts and linux spam other times. I am also getting Brazilian spam which makes me wonder how I mesh into different language sets.

Interesting times.

Janet

Permalink to Comment

15. lucychili on March 28, 2007 11:45 PM writes...

Justin: "They already seed their address lists from scraped google searches, so all they'd need to do is collect the surrounding text near the address on the scraped page, and use that text as a "seed" for a search through other, similar texts -- or just google again -- to collect probably-workable Bayes-buster strings."

Yup this makes sense. I get spates of political spam after I do political posts and linux spam other times. I am also getting Brazilian spam which makes me wonder how I mesh into different language sets.

I have been writing education stuff lately and am getting Gagne spam. What happens when the spam reaches higher signal to noise than the threads?

Interesting times.

Janet

Permalink to Comment

TRACKBACKS

TrackBack URL:
http://www.corante.com/cgi-bin/mt/teriore.fcgi/61151.

Listed below are links to weblogs that reference Spam that knows you: anyone else getting this?:

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Spolsky on Blog Comments: Scale matters
"The internet's output is data, but its product is freedom"
Andrew Keen: Rescuing 'Luddite' from the Luddites
knowledge access as a public good
viewing American class divisions through Facebook and MySpace
Gorman, redux: The Siren Song of the Internet
Mis-understanding Fred Wilson's 'Age and Entrepreneurship' argument
The Future Belongs to Those Who Take The Present For Granted: A return to Fred Wilson's "age question"