Tuesday, December 1, 2009

Exchange Top Level Security Permissions

In order to see the top level inherited security permissions in Exchange you have to add the following reg key and restart ESM (Exchange 2003):

Hkey_Current_User/ Software/ Microsoft/ Exchange/ ExAdmin

Create a new Dword Value of "ShowSecurityPage" and give it a value of 1.

Now restart ESM and you will see the security tab at the Exchange org levele

Thursday, November 19, 2009

Messaging Records Management (MRM) Exchange 2007 Entire Mailbox

If you are using managed folder policies and would like to apply deletion rules to folders outside of the default folders such as a user that creates a folder at the root and not in the Inbox, you may have to work with "Entire Mailbox"

I have used this setting in the past and it can cause havoc. In other words it will apply to every single folder, including calendar, contacts, etc. when you use it.

The only way to ensure that it wont apply to a specific folder is to have that managed folder part of the same policy.

So if you have Entire Mailbox selected to delete emails after 90 days, everything in Outlook older than 90 days will be deleted. If you also have a Calendar managed folder policy for 120 days as part of the same policy, which includes Entire Mailbox, the 90 day rule wont apply to Calendar. The calendar items will delete after 120 days and everything else 90 days.

Another option is to choose for the Entire Mailbox policy to only delete emails and not all items. IF you do this then it will only delete emails for the entire mailbox and not items like calendar entries, tasks, etc.

Be careful.

Tuesday, November 10, 2009

Exchange 2007 File Share Witness Multi-subnet clustering

If you are placing the file share witness at the same site as your active Exchange 2007 server, when you have a primary site failure Exchange will not be able to start at the DR site. This is because it will only have 1 of 3 votes as both the quorum and the active are down.

You will have to run a /fq (net start clussvc /fq) to force the quorum to start at the secondary site.

Check this one out:
http://blogs.technet.com/timmcmic/archive/2009/04/26/file-share-witness-fsw-placement-and-the-cluster-group.aspx

Wednesday, October 21, 2009

Migrating user from Exchange 2003 to 2007 BES

Most people that I talk to suggest to restart BES services if you migrate a user between Exchange servers. I have found that all you have to do is run handheldcleanup -u. It works for me every time. Try it.

Thursday, September 17, 2009

Windows 2008, Exchange 2007 CCR Cluster multi-subnet

To decrease the fail-over time of a multi-subnet CCR cluster running Exchange 2007, decrease the TTL of the DNS record. When a fail-over occurs in a multi-subnet environment, the DNS record must change to the new IP. To make this happen faster, run the below command.

The default is 20 minutes, plus the 10 minutes it takes for the cluster to even change the record, plus the amount of time a client caches the record. This can lead to long fail-over times.

To change the TTL to 5 minutes run: Cluster.exe res /priv HostRecordTTL=300

Windows Server 2008, Cannot Create cluster

If you are having issues creating a cluster in Windows 2008, try this:

Change the value for the MS failover cluster virtual adapter
1. Open Registry Editor.
2. Locate the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Class\{4D36E972-E325-11CE-BFC1-08002BE10318}
3. Under this subkey, find the subkey that holds a DriverDesc string value entry whose value is "Microsoft Failover Cluster Virtual Adapter."
4. Under the subkey that you found in step 3, add the following string value registry entry:
Name: DatalinkAddress
Value data: 02-AA-BB-CC-DD-01
5. Restart the computer.
6. Repeat step 1 through step 5 on other computers on which you experience this problem. When you do this on other computers, replace the value data of the registry with different values in order to set a unique value for each node. For example, set the value on the second node to 02-AA-BB-CC-DD-02, and set value on the third node to 02-AA-BB-CC-DD-03. If you notice this behavior on distinct clusters, make sure that you use an address for each node that is unique across all clusters.
7. Try creating the cluster again.

Exchange 2007 cannot uninstall Exchange Tools

If you cant uninstall the Exchange tools because the option is grayed out, run this command:

MsiExec.exe /X{24B2C164-DE66-44FE-B468-A46D9D5E6B31}

Windows 2008 CCR Cluster, Exchange 2007 error

Microsoft.Exchange.Cluster.ReplayService (7012) Log Verifier e0a 31573001: An attempt to open the device name "\\source\share$" containing "\\source\share$\" failed with system error 5 (0x00000005): "Access is denied. ". The operation will fail with error -1032 (0xfffffbf8).

If you get this error in your Windows 2008 CCR cluster on Exchange 2007, you can ignore it, for now. It should be fixed in future releases. Check this link:

http://blogs.technet.com/timmcmic/archive/2008/12/21/windows-2008-exchange-2007-sp1-ese-522-errors-on-ccr-passive-or-scr-target-machine.aspx

Friday, September 11, 2009

SnapDrive Windows 2008 Cluster Access Denied

After installing snapdrive on a cluster member, I got access denied when trying to add a disk. Apparently you have to install SnapDrive on both servers before proceeding even though it is a CCR cluster

Wednesday, September 9, 2009

Netapp Cannot Delete Snapshot Error LUN Clone

I ran into this error whilst trying to delete a snapshot created by SME. I found this page:

http://www.oneandonemakesthree.com/?q=node/50

Which states:

When you clone a LUN, you are left with snapshots that won't be deleted even after you delete the cloned LUN. These snapshots will start to consume huge amounts of space.

If you try to delete the snapshot, it will say that the snapshot is busy and can't be deleted. You have to delete any snapshots that were taken while that LUN clone existed as the LUN clone exists in those snapshots.

Issue the command:

lun snap usage volume_name snapshot_name

The will be the snapshot that is currently busy that you are trying to delete. The results of the command will be a list of snapshots that you have to delete before the busy snapshot can be deleted, and you can reclaim all that space.

Monday, August 17, 2009

iPhone, Exchange 2003, OWA, ISA 2006

I just got done setting up OWA with ISA 2006 for iPhone test users. One thing that I ran into, which I have seen in the past, is an issue when you have an ISA server with only one leg, purely in the DMZ.

With OWA if you don't want users to have to type in /Exchange when connecting to the Exchange server you have to basically create a redirect rule that comes into effect when a user just types in the root of the OWA server. That rule then forwards them to /Exchange. Simple, but a PIA if you are trying to figure it out.

Here is how to do it: http://www.messagingtalk.org/exchange-2007-owa-url-redirection-using-isa-2006

Friday, August 7, 2009

vsphere, TPS, high memory utilization on Nehalem processors

After upgrading from ESX 3.5 to vsphere all of VMs were throwing memory alerts. I found this article

Basically with MMU and large page sizes enabled, TPS doesnt work so well because of the larger page size. For now I disabled the alerts, but there is also the option of disabling large page sizes.

Thursday, August 6, 2009

vsphere 4, paravirtual SCSI

From the Admin Guide:

VMware recommends that you create a primary adapter (LSI Logic by default) for use with a disk that will host the system software (boot
disk) and a separate PVSCSI adapter for the disk that will store user data, such as a database.

Monday, August 3, 2009

ESX using NTP has Wrong Time zone on command line

The ESX servers at my customers site all use NTP. This is working great. All the VMs have the correct time using VMware tools, when I check the time under configuration, it shows the correct time, etc. However, when I log into the ESX server itself via SSH it has the wrong time when I run the date command.

This is because there is an incorrect time zone set. This is easy to fix and shouldnt affect your NTP settings.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1436

Wednesday, July 29, 2009

HP NIC Teaming, Netapp CIFS, vRanger Pro Issue

I was seeing extremely slow CIFS performance on servers with HP NIC teams. This includes our vRanger backup server. I would map a drive and it would take minutes to come up, etc. I finally found a NOW article suggesting to run:

options ip.fastpath.enable off

This command, from my understanding, will not use cached MAC addresses. Because of the way our HP NIC team is setup, different MACs were being presented and this caused an issue. So far so good.

Virtualizing Citrix

I have recently virtualized a customers entire Citrix environment. It is performing quite well actually. No issues so far. I did follow some of the recommendations here

I did not, however, follow everyones suggestion not to P2V the Citrix servers. I was forced to P2V them and they are running perfectly thus far. The 2 main things that I followed were disabling the memory balloon driver and shared folders within VMware Tools.

The memory balloon driver acts as a program within the VM and will give and take memory based on the host and VM requirements amongst other things. With Citrix, I decided to not enable it.

For information on how the balloon driver and the swap file work, check it here

Varonis on Netapp

To get Varonis to manage Netapp CIFS volumes you must run the following fpolicy commands to create an fpolicy.

fpolicy create Varonis screen
fpolicy options Varonis required off
fpolicy enable Varonis

Monday, July 27, 2009

Replicating and failing over VMs

When you automate the replication and failover of virtual machines, you will need to make a change to the config file if you dont want to answer a question per-VM on failover because the UUID changed in the new environment.

In order to automate this process, when you build a VM edit the vmx file to include:

uuid.action = "keep"

NETAPP WHITE PAPERThis will keep ESX from asking a question on whether or not you want to change the UUID.

Friday, July 24, 2009

Windows Server 2008 Cannot Ping or Connect to Internet

I am building out some Windows Server 2008 machines. One of them couldnt be pinged, couldnt connect to the Internet, but could reach servers on its own subnet. When I ran an ipconfig I could see that one of the default gateways was 0.0.0.0. When I went to network properties it wasnt there. I tried deleting the entry from the registry, it didnt work.

The solution was simple: run the command route delete 0.0.0.0 and then go into network settings and add the real gateway again.

Monday, July 20, 2009

ESX NFS Snapshot Issues

I have been having snapshot issues with ESX on Netapp NFS. The server is hanging for a long time when deleting snapshots. We are also seeing ESX Ranger issues when doing backups.

The Netapp guide mentions to install patch ESX350-200808401-BG. I am on the version 3 of ESX 3.5 so it wasnt necessary. All I had to do was add this line to the esx config file.

prefvmx.consolidateDeleteNFSLocks = "TRUE"

Tuesday, June 30, 2009

ESX Ranger, ESX, Netapp, NFS

If you have ESX running on Netapp, you sure as hell probably installed the Netapp ESX tools. If you didnt then you should. A couple things that the tools do is change NFS settings within ESX.

I ran into an issue with ESX Ranger failing on backups from Netapp NFS to a Netapp CIFS share. The error was:

The Backup file /vmfs/volumes/NFS_TIER_BLAH/Server1//Server1-flat.vmdk that was transfered appears to be Invalid! Transfered Size: 112546265463, Expected Size: 94667312308.

The solution seems to be changing the NFS timeout values within ESX by:

Changing the value of NFS.HeartbeatFrequency to 12
Changing the value of NFS.HeartbeatMaxFailures to 10


This will work for most people if they havent already installed the Netapp tools and therefore already have these settings. This didnt fix my problem, but it may yours.

My solution was to Use NFS instead of CIFS for the target. It works thus far.

Sean

Wednesday, June 24, 2009

Some Basic Netapp Tips 1

Here are some tips that I share with clients. They are not rules of thumb, simply ways of doing things. For most of my clients I share a VM Tips file, which I update constantly and a storage tips file, in this case Netapp.

After creating volumes log into the filer and turn on volume autosize with the trigger set to volume

After creating volumes log into the filer and set the snap autodelete to on

Set the snapreserve % to 0 when creating volumes that will house LUNs. Just create the volume big enough to handle a LUN plus the snaps manually.

Make sure that when you grow a volume that you login to the filer and reconfigure volume autosize. This can be done by typing vol autosize reset; followed by vol autosize on

Always set the default security to restricted for iSCSI and add initiators as needed

Continued.....

Tuesday, June 23, 2009

Extreme Networks ESX Port Grouping

By default if you create a port group on an Extreme switch it is layer 2. If you then create a port group in ESX or a vswitch and choose IP hash for load balancing, it wont work properly. You need to change the port grouping on the switch to L3_L4.

The same applies to a Netapp dynamic VIF as Netapp uses IP load balancing as well.

Tuesday, June 16, 2009

Netapp and ESX Network Setup


Here is a quick snapsnot of a setup I have done in the past with ESX and Netapp

Netapp Volume Creation

I have decided to make a quick reminder of things that I usually do after creating a Netapp volume:

vol autosize (vol) on - This will autogrow the volume once a threshold is reached
snap autodelete (vol) on - This will auotmatically delete snapshots when a threshold is reached
sis on (vol) - This turns on single instance storage. You can check the status of SIS by running a df -s
Change the security on the NFS mount for NFS volumes to allow for the IPs of the servers that will use the volume.

Normally this is enough. I will have more information if you are creating a volume that is going to be SnapMirrored.

Netapp SnapMirror with SMVI

If you are using SnapManager for Virtual Infrastructure with SnapMirror, you may have seen this error:

[ERROR - SnapMirror update from source snapshot null to destination location DESTFILER:dest_vol failed]

The reason for this is that I have edited the snapmirror.conf file in order to force SnapMirror to use a specific NIC port.

When you do this SMVI cannot initiate SnapMirror because it expects a different filername in the SnapMirror.conf. Since I changed the name to point to the other interface, you must do the following to the snapmirror.conf in order for SMVI to now work:

(sourcefilername) = multi (sourcefilerIP,TargetFilerIP)

Wednesday, May 13, 2009

HP DL 380 G6 Memory Configuration


Here is how to populate an HP DL 380 G6 with memory. With the new BIOS you can populate 12 out of 18 slots and run at 1333 Mhz as long as you have the right memory.

Thursday, April 30, 2009

DL 380G6, Netapp, iSCSI, NFS, Network Config



Here is a good ESX config for a DL380-G6 with a Netapp array using NFS and iSCSI for VM placement.

Wednesday, April 29, 2009

HP DL380 G6 using Intel X5550 processors


I recently purchased some servers to host our ESX environment for a client. We went with the X5550 processors. Be careful of how you purchase the memory cause the more you buy the slower it is.

Basically the server has 3 channels with 1 CPU and 6 channels with 2 CPUs. Each channel has 3 slots for a total of 18 slots with 2 CPUs. If you populate 1 slot per channel you can run at 1333 MHz. If you put in one more memory chip in a channel, you run at 1066. One more and you go down to 800 MHz.

The 8 GB DIMMs do not run at 1333 so basically if you want to run at that speed you can only have 24 GB of memory. Here is the table:

Tuesday, April 21, 2009

Convert VMFS VMDK to RDM

vmkfstools –i /vmfs/volumes/data1/W2K3standardgoldenmaster/W2K3standardgoldenmaster.vmdk -d rdm:/vmfs/devices/disks/vmhba1:0:6:0 /vmfs/volumes/data1/rdmvir/rdmvir.vmdk

Vsphere 4

I am in the middle of designing a virtualization solution and I knew this was coming. I think I may have to step back a bit. Check out the release of the new VMware now called Vsphere 4.

http://searchservervirtualization.techtarget.com/news/article/0,289142,sid94_gci1354214,00.html

Wednesday, April 1, 2009

Running VMs in NFS

I am currently working on purchasing a Netapp array for a client along with some VMware. My plan is to run most of the VMs on NFS for various reasons. First of all, I think it is cool and that is always most important. Since the VMs are just chilling on a remote file system, you think of them like you should, just files. Create directory structures, one say for email with all your email servers, one for you file servers, etc.

With the Netapp de-duplication, which is built into their array, you can save a lot of space when running the Vms in NFS. You can do similar when running VMFS on Netapp SAN, but the space doesn't get returned to the file system like it does with NFS.

If you look at the performance numbers, only FC beats it out. iSCSI is dead even. With Netapp you get all of the NAS features with it including snaps, replication, etc.

I will post my results with this client. I have had luck doing this in the past, but never in production. From what I read and hear it works like a charm. Some say it is the best thing to happen in their DCs in a while.

Replicating VMs

When replicating VMs to a remote site
If the LUNs being presented at the remote site are to differnet ESX hosts then you can change the LVM.DisallowSnapshotLun = 0 and then rescan which will allow you to see the LUN without resignaturing it. You can then browse to the vmx and register it.
You could also change the LVM.EnableResignature = 1 which will resignature the LUN and therefore require the UUID in the vmx to be changed before registering it.

Sean

Friday, March 27, 2009

equallogic

It looks like Dell/Equallogic won the mid-size storage array of the year. I am currently working with a customer to get rid of one. My guess on why it has such a large install base in this industry is the fact that it is the standard of one of the large IT support firms that do support in my industry.

It does work. It can be fairly slow and uses RAID-50. Yes, RAID-50. Each unit itself can only have 14 disks and dual controllers. It is pure iSCSI. If you want to add disk, you add a unit and create a group. The group is basically a multitude of units in which a pool of storage is created. Hosts connect to the group IP and that forwards you to the correct unit to get your data.

It is basically the same thing the big guys are trying to do with storage virtualization. EMC and Netapp have been doing similar stuff on a larger scale. The idea is cool. You pop one of these and add it to the group and it auto-configures its RAID groups, IP info, disks, access, etc. Pretty neat.

It is a slow performer though and that is why we are getting rid of it, on top of the fact that the way it grows vertically is something we dont desire. I do suggest reading a bit about RAID-50 though.

Friday, February 20, 2009

vmware navisphere agent install

Here is how to install the navisphere agent on esx

Log into ESX server via ILO
VI /etc/ssh/sshd_config
Change permit root login = yes to uncommented
comment out permit root login=no
run service sshd restart
Use winscp to connect to the server via ssh
Copy the navisphere files to /home
Log into the ESX server via ssh
See if the agent is already installed by running rpm -qi naviagent
rpm -ivh naviagent-.rpm
add user system@(IP address of SPA) to the agent.config file in /etc/navisphere/agent.config using vi
add user system@(IP address of SPB) to the agent.config file in /etc/navisphere/agent.config using vi
You can also do the above steps using the navicli by typing: navicli remoteconfig -setconfig -adduser system@SPA IP (Do the same for SPB)
run service naviagent start from the command line
Open up firewall ports by typing the below commands
[root@server home]# esxcfg-firewall -o 6390,tcp,in,naviagent
[root@server home]# esxcfg-firewall -o 6391,tcp,in,naviagent
[root@server home]# esxcfg-firewall -o 6389,tcp,in,naviagent
[root@server home]# esxcfg-firewall -o 6392,tcp,in,naviagent
[root@server home]# esxcfg-firewall -o 443,tcp,out,naviagent
[root@server home]# esxcfg-firewall -o 2163,tcp,out,naviagent
[root@server home]# esxcfg-firewall -o 6389,tcp,out,naviagent
run service firewall restart
run esxcfg-firewall -q to see open ports
run service naviagent restart

Friday, February 13, 2009

VMware round robin load balancing

I have been curious to test out round robin load balancing in ESX 3.5. Recently I have been only working with Clariion (active/passive) arrays. Now at my new client they have active/active arrays (Symmetrix).

From my understanding the way it works is it will push I/O on an HBA path until a set amount of blocks are sent. At that point it looks to see which HBA has a smaller queue and it will use that path until the number of blocks is reached again.

You dont have to set it like this. You can have it use a preferred path, etc., but this option seems to make the most sense.

I dont know if this is how it really works, but that is my understanding. Pretty cool stuff if it works. As to this day it still says experimental.

Anyone use this?

Wednesday, February 11, 2009

SYSLOG with ESX

To use a syslog server with your ESX environment:

In /etc/syslog.conf file add the line "*.* @IP Address”
service syslog restart
"esxcfg-firewall -o 514,udp,out,syslog" to allow syslog outgoing trafic
"esxcfg-firewall -l" to load config

Monday, February 9, 2009

Convert RDM to VMFS

To convert a VMware RDF device, which is harder to manage than a VMFS volume, you can run the below command. I have needed to this for various reasons, namely data center migrations. You can conver it back later, if needed, which I have also done.

vmkfstools –i /vmfs/volumes/data1/W2K3standardgoldenmaster/W2K3standardgoldenmaster.vmdk -d rdm:/vmfs/devices/disks/vmhba1:0:6:0 /vmfs/volumes/data1/rdmvir/rdmvir.vmdk

VMware queue depth

I know every ESX admin has dealt with storage performance issues at some point. Most detailed research includes modifying the HBA queue depth.

Basically queue depth is the number of outstanding requests between the HBA and storage. I believe the default is 32. This is normally fine, but with VMware you are sharing the HBA so if you have a multitude of high i/o servers, you may have more than 32 requests.

When the queue length is reached there is basically a SCSI reset and communications is briefly halted. This can cause performance issues if it happens often.

You can use ESXTOP in disk mode to see your queues. If you decide to change them you can use esxcfg-module as per below (from http://communities.vmware.com/message/790859#790859):

Set the HBA queue length to 64 on all adapters

to get driver: vmkload_mod -l | grep qla for q logic
vmkload_mod -l | grep lpfcdd for exmulex
to change q logic
esxcfg-module -s "ql2xmaxqdepth=nn"
esxcfg-boot -b or esxcfg-boot -m
reboot
to change on emulex
esxcfg-module -s "lpfc0_lun_queue_depth=nn lpfc1_lun_queue_depth=nn"
esxcfg-boot -b or esxcfg-boot -m
reboot

Make sure to defragment your VC DB

I have been doing this for years, but wanted to make sure that I shared this. If you see any performance issues with Virtual Center, this usually helps the problem.

I would assume everyone does it, but when I talk to people about it, they generally dont.

This is how to defrag the DB:

Log in to Microsoft SQL Server Management Studio as an administrator.
Right-click on the database that VirtualCenter is using.
Click New Query.
In the New Query window type:
Use (DB)
go
dbcc showcontig (VPX_HIST_STAT,VPXII_HIST_STAT)
go
where represents (DB) the name of the database that is running VirtualCenter.
Click Execute.
Look at the amount of fragmentation
To defragment
Log in to Microsoft SQL Server Management Studio as an administrator.
Right-click on the database that VirtualCenter is using.
Click New Query.
In the New Query window type:
dbcc indexdefrag ('', 'VPX_HIST_STAT', 'VPXII_HIST_STAT')
go
where represents the name of the database that is running VirtualCenter.
Click Execute.

Friday, January 30, 2009

Replicating Boot Volumes

One key strategic decision that allowed for almost immediate failover to a DR site was replicating the boot volumes of tier 1 servers. I personally architected this with VMs, but the same can be done with a Windows server booting from the SAN. I have never personally done much booting from SAN with non-VMs. I tried this back in the Windows 2000 days, but there were issues with the page file.

It makes it easier to manage as well because you don't have to build out a server on the target side, ensure that it is up to the same spec as the source, etc. When you power up the server in the DR site it not only has the replicated data LUNs, whether that be SQL, Exchange, flat files, etc., but you also have an exact copy of the underlying Operating System.

I used RecoverPoint to do the replication. The last couple versions had the option to specifically replicate the boot volume and so it made it easy. One thing you really have to pay attention to is the necessity to quiesce or create point-in-time copies of the boot volume and not just replicate it sync or async. If you do this, there is a probability for lost in-flight transactions and you will blue screen (which I have done, of course)

RecoverPoint has a command line utility that I scheduled in scheduled tasks, which quiesces the server and creates a point-in-time image that is sent to the remote side. When you failover, always choose one of these images and not the latest I/O. The good thing with RecoverPoint is that you can choose any point in time and then change it if it doesn't work. The problem with the boot volume is that you have to allow direct access to the LUN, which then erases all of the other point-in-time copies, so you have to get it right the first time or you are screwed.

Once you choose the latest clean image on the target, you then do the same for the data LUNs and boot the server. It will come up with the same IP address so you have to have a solution for this. There are a couple ways of doing this.

One way is to use a global load balancer like Cisco or F5 Big IP. This provides a front-end IP address that end users use to connect to the back end servers. If the primary is up, the F5 forwards traffic there, if the DR side is up and the primary isn't up, it will forward traffic to the DR site. Never have them both up at the same time. If you do, make sure to set the F5 to use the primary at all times when available.

The other way to handle this is to use a stretched VLAN. There are downsides to this such as increased traffic, including broadcast traffic, that could fill the pipe. It does, however, allow you to boot up a server with the same IP address at a different site and the switches will see the change and forward accordingly.

There is of course the option to bring up the server and change the IP address, then change DNS to point to the new IP, but you will have to flush all of the end users DNS cache. This can be done with third party products automatically, but I think the first 2 options are a better fit.

More text mining reference

Categorization Identifies main themes in a document by placing the document into a pre-defined set of topics. Relies on a thesarus.

Clustering Groups documents on the fly instead of categories that are pre-defined

Concept Linking Links documents based on their common shared concepts. Helps find information they wouldnt normally find using traditional searching

Information Visualization A visual representation of documents or corpus

Information Retrieval Indexing and retrieval of textual documents, finding a set of ranked documents that are relevant to query

Tuesday, January 20, 2009

SANPulse

Over the many years of doing data center migrations, tech refreshes, data migrations, tiering, increasing utilization, etc.  I have always imagined a tool that could perform these manual tasks.

When vendors like EMC, HP, IBM, etc. normally go in and do data migrations, they do it manually.  Send in a bunch of heads and manually write scripts to do the migration.  I have managed similar efforts with and without the storage vendors.  They suck.

No situation is the same, of course, but the similarities are never used to your advantage.  Every migration is different so every data migration is written, basically, from scratch.  The vendors don't care as they are billing per/hour for heads.

This year is a big year for consolidations/migrations.  As companies merge, go out of business, cut costs or take advantage of low cost infrastructure, the need to move data around the data center becomes critical.

I have been talking with a couple people from SANPulse technologies.  Some of them actually worked in the same organization that I was a part of which did data/data center migrations.  They created a product which I could only imagine years ago.

You can read their white papers on their website: http://sanpulse.com 

This takes all of the manual headache out of migrating data.  From what I gather it offers the ability to see the environment, make changes, migrate data and report the results up to management.

I don't see how anyone could do a data migration without this in the coming months.  I think they will continue to be successful.  I suggest you read their stuff if you are a migrator.

I am a migrator.

Thursday, January 8, 2009

Primary Storage Optimization

There is now a lot of talk in the industry on Primary Storage Optimization.  I would define Primary Storage Optimization (PSO) as reducing the amount of physical capacity used compared to the amount of actual data in your primary storage.

Techniques used for this include compression, de-duplication, single instancing, etc.  

PSO is tough though.  It is easy to optimize non-primary storage using Storage Capacity Optimization techniques similar to above, but primary storage has different requirements and properties.

First of all you can get more optimization out of backup storage because there is more redundancy.

The PSO solution must run with minimum latency so it doesn't affect the application.  This has been a sticking point.

Who wants to add another point of failure or dynamic to the simple act of reading and writing storage?

There are some solutions out there, mostly software solutions that can do PSO.  To get around the latency I believe some of them do it post processing compared to in real-time.

To me it makes sense to do this on the SAN level.  Pop in an SSM blade and de-duplicate the storage on the network level.  I also think this is the best way to virtualize storage.  They can work hand-in-hand.

Tuesday, January 6, 2009

Virtualization Cost Model


Back after the New Year.  Feels good.  I wanted to post my virtualization cost model that I developed when virtualization got out of control at a previous engagement.  See my virtualization crack blog posting for more information on that.

The cost model front end was easy to develop.  If you have been doing consulting for many years I am sure your Excel skills could do the lists, if statements, links and data filtering that is on the front end of this model.

The hard part is the back end.  There are a lot of equations that I worked for days on to figure out specifics on the model.  Some examples being: How do you calculate price per port?  How do you calculate price of cooling?  How do you calculate the cost of physical space?  Etc.

I had to sit down and figure out these equations and then sit down with business owners, finance and IT to get these base costs to plug into the equations.  

Do we include purchase price of past equipment?  How do you calculate the cost of a GB?  Is it just the cost of the array/# of GB?  What if you add disk?

I came up with some weird ideas, but it worked and was agreed upon in the end.  I also had to work with someone across the world to calculate DR and backup replication costs.

For the network, I just did the total cost of the equipment/ports for the cost per port.  This isnt the best way to do it, but for the network gear, that is what was agreed on.

Backup costs include de-duplication.  How do you figure that out?  Actually that is easy.  I just did some averages for ratio and came up with a total size based on that.  For example a 5 TB de-duplicated array really is 50 TB, so I used that number.

Anyways, after I figured out all of the equations I plugged in the numbers into a worksheet and the front end references those.  There are drop down lists, which contain if statements based on the choice.

Here is a screenshot of the first version of the model.  For some reason I cant find the latest version on my USB stick.

Let me know if you have questions.