Software Marketing People Have Too Much Fun

8/27/2008 10:02:00 AM

Tags:

Matt

8/22/2008 5:31:00 PM

I don't know who the hell Matt is, or where the hell Matt is, but damn, I think he's the greatest.

Tags:

General

Volume Shadow Copy Service operation failed, error 0x800423f0

8/17/2008 2:27:00 PM

I run WS08 on my laptop, mostly so I can run Hyper-V, but also because it seems to provide a better Vista experience than Vista does. Part of what I like about WS08 is Windows Server Backup. WSB has some quirks and hassles (e.g. system state backup is just awful), but for the basic "back up your machine to a hard disk" scenarios, it's easy and fast. I generally back up to my BFS[1] over the network, which takes about 20 minutes for a full 90GB.

I've been building virtual machine images for the workshops I'm doing for Pacific IT Pros next week, and after I got the Active Directory Disaster Recovery VMs all put to bed, I thought it would be a good time to back up the machine. I ran WSB and tried to perform a full backup to the BFS, and I received the error "Volume Shadow Copy Service operation failed. Error 0x800423f0. Backup not started." Ick, I was not expecting that.

A little poking around determined that the error was due to the inability of one of the VSS writers to create a consistent snapshot prior to the backup. I ran VSSADMIN to see if it would provide a clue, with the following results:

Microsoft Windows [Version 6.0.6001]
Copyright (c) 2006 Microsoft Corporation.  All rights reserved.

C:\Windows\system32>vssadmin list writers
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2005 Microsoft Corp.

Writer name: 'Microsoft Hyper-V VSS Writer'
   Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
   Writer Instance Id: {6162e336-7448-4371-93a7-29581512b103}
   State: [8] Failed
   Last error: Inconsistent shadow copy

Writer name: 'System Writer'
   Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}
   Writer Instance Id: {b1c47cec-0e12-4ac6-b84b-9c1d8292a9f3}
   State: [5] Waiting for completion
   Last error: No error

Writer name: 'Registry Writer'
<... more similar stuff...>

The interesting bit is the Hyper-V entry. It apparently was the VSS writer that could not establish a consistent snapshot. Not coincidently, I had just finished up building a bunch of images with Hyper-V.

The cause of the problem: I built my VMs using differencing disks (a type of virtual disk that starts with a read-only base image, and writes all modifications to a separate image). To improve performance, I placed the base image on my laptop's internal drive, and the differencing images on an external USB drive. When I started the backups, I had disconnected the external USB drive. The VSS writer for Hyper-V tried to build a consistent snapshot of the differencing drives, and couldn't because the differencing images were offline. Hence the error.

I plugged the USB drive back in, and fired up the backup with no trouble at all.

Note to WSB product team: A little more effort in the error message department wouldn't hurt.

[1] BFS = big freakin' server

Mysterious Authoritative Restore Behavior - Solved!

8/16/2008 1:55:00 PM

Quite a while ago, Brent Harman and I delivered the world-famous Guido and Gil's Masters of Disaster AD Disaster Recovery Workshop in Dallas. Things went pretty smoothly except for one mysterious behavior with authoritative restore. We figured it out after an hour or so, but I never recorded the problem before.

The lab setup included two domains, with two DCs in each domain. The first DC in the root domain was the sole DNS name server. The first DC in each domain was also a GC. Each domain contained OUs with groups and users in them, and there were domain local as well as universal groups containing members from both domains. Domains and forest were WS 2003 functional level.

The lab included two auth restore exercises. Each DC had a current system state backup. The first exercise had the student delete a user from an OU in the child domain, and then use auth restore to recover the user in the child domain, and non-auth restore and LDIFDE to recover the domain local group memberships in the root domain. The second exercise was essentially the same, except the goal was to delete and auth restore an entire OU, including the group memberships of the users in the OU. Pretty basic, if tedious, stuff.

To refresh your memory, the auth restore process in a multi-domain environment looks something like this:

  1. Boot DC in child domain into DSRM
  2. Use NTBACKUP to perform system state (non-auth) restore of DIT
  3. Use NTDSUTIL to perform auth restore of objects to be recovered. This increases the version number on the object's attributes by 10000 for each day between the date of the backup and the date of the restore, which causes the object to replicate out to the other DCs, rather than being overwritten by the tombstoned object from the other DCs. The auth restore process also creates LDIF files containing forward link information (e.g. group memberships) and a text file containing the GUID of the restored object for use in other domains.
  4. Boot the child DC into normal mode.
  5. Use LDIFDE to recreate the group memberships for the restored object in the child domain (not necessary in our case since we were in WS2K3 DFL/FFL)
  6. Boot a DC in the root domain into DSRM
  7. Use NTBACKUP to perform system state (non-auth) restore of DIT
  8. Use NTDSUTIL to create LDIF files for the root domain group memberships of the restored object, using the text file from step 3 as input.
  9. Boot the root DC into normal mode.
  10. Use LDIFDE to recreate the group memberships of the restored object in the root domain

There were no significant problems with the first exercise. Pretty much everyone was able to restore the deleted user, along with the user's group memberships in both domains with no trouble. The problem showed up in second exercise where we recovered the OU.

Several people ran into a problem after restoring the OU however. After restoring the OU (which happened to contain the user they recovered in the first exercise), the students discovered that the user they restored in the first exercise was gone! Even worse, when they tried to auth restore the user again, the user would appear, and then magically disappear again within a few minutes.

After making sure the students were actually doing the auth restore properly, we pondered the problem for a while until we understood what was happening. A local Dallas MSFT fellow who was sitting in on the class figured it out.

Basically, the problem occurs when you auth restore the same object from the same backup more than once in a single day. Let's just look at a single attribute of the object, say distinguishedName. the version number of the attribute is initially 1. This is the value of the version number that is stored in the system state backup as well. After the object is deleted the version number is set to 2, and the new distinguishedName value replicates out from the originating DCs. When you use NTBACKUP to non-authoritatively restore the entire DIT, the version number of the distinguishedName attribute is again 1. If you were to restart the DC in normal mode, the other DCs in the domain would overwrite the distinguishedName attribute because their version number is higher. That's why you have to use NTDSUTIL to perform an authoritative restore of the object before restarting the DC in normal mode. The authoritative restore process increments the version number of all of the object's attributes by 10000 per day since the backup, which results in a version number larger than the corresponding version numbers on the DCs in the domain. The attributes from the authoritatively restored object replicate out to the DCs replication partners, and overwrite the values there.

When the students went through the same process with the OU, they auth restored the OU subtree, including all the objects contained in the OU. Because the date of the backup and the date of the restore were unchanged, the version numbers were all incremented by 10000 again. The attributes of the OU and its contained objects all replicated out, overwriting the corresponding attributes on the other DCS, except for the attributes of the originally deleted user object. Why? Let's trace through what happened to the originally deleted user object and see.

When the object was originally created, the version number on its attributes was set to 1 (say). When it was deleted, the version numbers were incremented to 2. When the object was non-authoritatively restored from backup, the version numbers were restored to 1. When the object was authoritatively restored, the version numbers went to 10001, and that's what replicated out to the other DCs. When the OU (and the objects contained in it) was deleted for the second exercise, the version numbers went to 10002. The students did the non-authoritative restore from backup, which returned the version numbers to 1. The authoritative restore process then set the version numbers for all the restored objects to 10001. This caused the attributes for all the restored objects to replicate out to the other DCs, except for the attributes of the originally deleted object. It's version numbers on the other DCs were 10002, which is still greater than 10001. So the originally deleted object was restored, but inbound replication overwrote it, even though it had been authoritatively restored!

There are a couple of ways around this problem. One is to use the VERINC option of NTDSUTIL to increase the version numbers by more than 10000 per day. The other is to perform another backup immediately after the original authoritative restore. Or in the case of this workshop, just delete and restore a different OU :).

You Had Me At EHLO... : The Autodiscover Song

8/14/2008 9:18:00 AM

The intersection between those who develop software and those who write music is broad, and a source of constant amusement.

You Had Me At EHLO... : The Autodiscover Song

First Ignite Phoenix a Success

8/13/2008 9:48:00 AM

I dropped by the first Ignite Phoenix event last night (it was Kiry's birthday, so no way could I stay for the whole thing). Really interesting motif: Random speakers, random topics, 5 minutes, 20 slides, exactly 15 seconds per slide, go! Apparently Ignite is something that O'Reilly started a while ago, but this is the first time it has appeared in Phoenix. There were about 100 people in the room (at Jobing.com on the 2nd floor of the NetPro building, which was nice), and apparently about 40 more watching the video stream. The two talks I saw were interesting and pretty well done: a perspective on human vs. geologic time scales, and a discussion of the diversity of peanut butter & jelly sandwich styles (I am not making this up). If the remaining talks were as well done as the first two, then I would say I was impressed.

The population was largely 20-to-30-somethings, most with a tech bent as near as I could tell, although the woman sitting next to me was in PR/marketing. The point? I'm not entirely sure... networking, conversation, thought-provocation, hanging with some interesting people (and I guess drinking at the Half Moon afterwards).

Congrats to the Ignite for pulling off a fine event. I'll definitely participate in the next one.

Pacific IT Pros Meeting

8/6/2008 2:23:00 PM

I had a chance to speak to the Pacific IT Pros admin group meeting last night about diagnosing AD performance problems. I did the session over Live Meeting and the telephone, so it's difficult to say exactly how it all went. There were a lot of good questions, and I got a nice round of applause over the phone, so I guess it went ok. I did a demo of Server Performance Advisor also, and that went well too. Amazingly, almost 2 years after writing about SPA in Windows IT Pro magazine, only a couple of the 150 people present had ever used SPA.

One fellow asked if SPA would run on Windows 2000. I said yes, but I may be wrong about that, at least according to this KB article. I wonder if the issue is just not having the .Net 2.0 framework installed, or it its something deeper. SPA relies heavily on Event Tracing for Windows (ETW), and I know W2K AD was a ETW provider, so it should work. I'll to try it out.

There was another question about SPA providing Exchange performance information. The answer is no, not really. The Exchange store (in EXCH2003 and I presume in EXCH2007) provides ETW counters that you can collect with SPA, and there are a host of Exchange-related perf counters as well. And you can compose a new data colelctor in SPA easily. The tricky part is creating the report and rules, which requires some XML/XSL wrangling. In theory it should be doable, but in practice it will be hard, as there is nothing but a couple of samples to provide you guidance.

If you're an IT Pro-type person in the Northern California area, check out PacITPros. They host their meetings in San Francisco primarily, but the do live 'casts to other locations as well. They're very active, despite the relatively stale website. Check them out at www.pacitpros.org.

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen

Search

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008