Wednesday, July 1, 2009

Thoughts on Exchange

A Geekish thought so for you non-geek friends (do I have any?) You can skip this.

Ok, now that I am done with the disclaimer here we go. I think the way Exchange stores email needs to drastically change. I've been battling with backups and our major resource hog is the Exchange Database. It is extremely bloated. I know you are all just shocked at this statement. I think MS needs to rethink how they store emails. How much space is wasted on mail servers with several people receiving the exact same email? Now think about that same email with that funny video you've see god who knows how many times? Some backup systems seem to have solved similar problems with files, by detecting duplicates and only backing up one copy and using, what I would assume, some pointer to fake the rest. I've heard earlier today that ZFS is also going to incorperate the same thing in their FS.

De-duplication could same a huge about of disk space for Exchange. So that when I send out an email to 300+ people in house to let them know that the network is going to be down, there isn't 302+ (this 2 extra are my send mail message and the copy I will get myself since I am part of the distro list). There only needs to be one copy made and 301+ pointers to that message. So that from everyones perspective they each got the message. Can you imagine how much space this would save if for intance that email had an attachment? I will admit for small emails only to a few people this method may not save much space and may not be needed, so that is where some sort of smart algorithm comes into play to see if De-Duplication is really neccesary (works for zipping files). De-Duplication could also save space on incomming emails from the outside world. How many people are signed up for LiveNation, Ticketmaster, Circuit City, Amazon, and the millions of other newsletters and advertisements out there? (Oh and don't forget the penis ads, and that super hot girl that for some reason wants to show you her naked pics) Imagine if the email server so see these messages come in and determine they are all the same (or mostly the same) and could process them in a sort of shapshot and keep one copy and send pointers to everyone. And then say (maybe this could be an optional setting) enough people mark the message as SPAM/phishing and it then gets send to the junk mail folder (or erased) from everyone's inbox? Now we have a community of people helping catch what SPAM filters miss and we only have to store one copy of this garbage in the database.

What if people use POP3 or archive their mailbox? Fine it copies the file to their HD and now they have to deal with the mess, it isn't our problem anymore. What if the last pointer is deleted? Well then the message itself is deleted.

I think this could save on HD space, backup times, money, time, and saving the lives of the people who never delete ANYTHING and insist on keeping everything because someday they may need it. And I just can't imagine the overhead of De-Duplication will have an overall negative effect to out weigh the possitives, no more than the current virus scanning, and spam filtering that already occures. If done right all these processes could be combined and complement each other. And it isn't like any of the concepts are new, it will just be using them in a new venue.

What are your thoughts? Am I crazy in thinking something like this needs to be done?