Friday, January 30, 2009

Replicating Boot Volumes

One key strategic decision that allowed for almost immediate failover to a DR site was replicating the boot volumes of tier 1 servers. I personally architected this with VMs, but the same can be done with a Windows server booting from the SAN. I have never personally done much booting from SAN with non-VMs. I tried this back in the Windows 2000 days, but there were issues with the page file.

It makes it easier to manage as well because you don't have to build out a server on the target side, ensure that it is up to the same spec as the source, etc. When you power up the server in the DR site it not only has the replicated data LUNs, whether that be SQL, Exchange, flat files, etc., but you also have an exact copy of the underlying Operating System.

I used RecoverPoint to do the replication. The last couple versions had the option to specifically replicate the boot volume and so it made it easy. One thing you really have to pay attention to is the necessity to quiesce or create point-in-time copies of the boot volume and not just replicate it sync or async. If you do this, there is a probability for lost in-flight transactions and you will blue screen (which I have done, of course)

RecoverPoint has a command line utility that I scheduled in scheduled tasks, which quiesces the server and creates a point-in-time image that is sent to the remote side. When you failover, always choose one of these images and not the latest I/O. The good thing with RecoverPoint is that you can choose any point in time and then change it if it doesn't work. The problem with the boot volume is that you have to allow direct access to the LUN, which then erases all of the other point-in-time copies, so you have to get it right the first time or you are screwed.

Once you choose the latest clean image on the target, you then do the same for the data LUNs and boot the server. It will come up with the same IP address so you have to have a solution for this. There are a couple ways of doing this.

One way is to use a global load balancer like Cisco or F5 Big IP. This provides a front-end IP address that end users use to connect to the back end servers. If the primary is up, the F5 forwards traffic there, if the DR side is up and the primary isn't up, it will forward traffic to the DR site. Never have them both up at the same time. If you do, make sure to set the F5 to use the primary at all times when available.

The other way to handle this is to use a stretched VLAN. There are downsides to this such as increased traffic, including broadcast traffic, that could fill the pipe. It does, however, allow you to boot up a server with the same IP address at a different site and the switches will see the change and forward accordingly.

There is of course the option to bring up the server and change the IP address, then change DNS to point to the new IP, but you will have to flush all of the end users DNS cache. This can be done with third party products automatically, but I think the first 2 options are a better fit.

More text mining reference

Categorization Identifies main themes in a document by placing the document into a pre-defined set of topics. Relies on a thesarus.

Clustering Groups documents on the fly instead of categories that are pre-defined

Concept Linking Links documents based on their common shared concepts. Helps find information they wouldnt normally find using traditional searching

Information Visualization A visual representation of documents or corpus

Information Retrieval Indexing and retrieval of textual documents, finding a set of ranked documents that are relevant to query

Tuesday, January 20, 2009

SANPulse

Over the many years of doing data center migrations, tech refreshes, data migrations, tiering, increasing utilization, etc.  I have always imagined a tool that could perform these manual tasks.

When vendors like EMC, HP, IBM, etc. normally go in and do data migrations, they do it manually.  Send in a bunch of heads and manually write scripts to do the migration.  I have managed similar efforts with and without the storage vendors.  They suck.

No situation is the same, of course, but the similarities are never used to your advantage.  Every migration is different so every data migration is written, basically, from scratch.  The vendors don't care as they are billing per/hour for heads.

This year is a big year for consolidations/migrations.  As companies merge, go out of business, cut costs or take advantage of low cost infrastructure, the need to move data around the data center becomes critical.

I have been talking with a couple people from SANPulse technologies.  Some of them actually worked in the same organization that I was a part of which did data/data center migrations.  They created a product which I could only imagine years ago.

You can read their white papers on their website: http://sanpulse.com 

This takes all of the manual headache out of migrating data.  From what I gather it offers the ability to see the environment, make changes, migrate data and report the results up to management.

I don't see how anyone could do a data migration without this in the coming months.  I think they will continue to be successful.  I suggest you read their stuff if you are a migrator.

I am a migrator.

Thursday, January 8, 2009

Primary Storage Optimization

There is now a lot of talk in the industry on Primary Storage Optimization.  I would define Primary Storage Optimization (PSO) as reducing the amount of physical capacity used compared to the amount of actual data in your primary storage.

Techniques used for this include compression, de-duplication, single instancing, etc.  

PSO is tough though.  It is easy to optimize non-primary storage using Storage Capacity Optimization techniques similar to above, but primary storage has different requirements and properties.

First of all you can get more optimization out of backup storage because there is more redundancy.

The PSO solution must run with minimum latency so it doesn't affect the application.  This has been a sticking point.

Who wants to add another point of failure or dynamic to the simple act of reading and writing storage?

There are some solutions out there, mostly software solutions that can do PSO.  To get around the latency I believe some of them do it post processing compared to in real-time.

To me it makes sense to do this on the SAN level.  Pop in an SSM blade and de-duplicate the storage on the network level.  I also think this is the best way to virtualize storage.  They can work hand-in-hand.

Tuesday, January 6, 2009

Virtualization Cost Model


Back after the New Year.  Feels good.  I wanted to post my virtualization cost model that I developed when virtualization got out of control at a previous engagement.  See my virtualization crack blog posting for more information on that.

The cost model front end was easy to develop.  If you have been doing consulting for many years I am sure your Excel skills could do the lists, if statements, links and data filtering that is on the front end of this model.

The hard part is the back end.  There are a lot of equations that I worked for days on to figure out specifics on the model.  Some examples being: How do you calculate price per port?  How do you calculate price of cooling?  How do you calculate the cost of physical space?  Etc.

I had to sit down and figure out these equations and then sit down with business owners, finance and IT to get these base costs to plug into the equations.  

Do we include purchase price of past equipment?  How do you calculate the cost of a GB?  Is it just the cost of the array/# of GB?  What if you add disk?

I came up with some weird ideas, but it worked and was agreed upon in the end.  I also had to work with someone across the world to calculate DR and backup replication costs.

For the network, I just did the total cost of the equipment/ports for the cost per port.  This isnt the best way to do it, but for the network gear, that is what was agreed on.

Backup costs include de-duplication.  How do you figure that out?  Actually that is easy.  I just did some averages for ratio and came up with a total size based on that.  For example a 5 TB de-duplicated array really is 50 TB, so I used that number.

Anyways, after I figured out all of the equations I plugged in the numbers into a worksheet and the front end references those.  There are drop down lists, which contain if statements based on the choice.

Here is a screenshot of the first version of the model.  For some reason I cant find the latest version on my USB stick.

Let me know if you have questions.