[Casper] Timing logins and applicaion startup
Thomas Larkin
tlarki at kckps.org
Fri Feb 13 06:58:49 PST 2009
Replies in bold...
___________________________
Thomas Larkin
TIS Department
KCKPS USD500
tlarki at kckps.org
blackberry: 913-449-7589
office: 913-627-0351
>>> Clinton Blackmore <clinton.blackmore at westwind.ab.ca> 02/12/09 6:37 PM >>>
Replies inline:
On 12-Feb-09, at 9:11 AM, Thomas Larkin wrote:
Sigh. No users experienced this problem when I was nearby today, so I was unable to do that. I did look through the log files on a unit that exhibited problems, and have a long section at the end of this where I have annotated a log. Incidentally, I found that colorizing the log made it less mind-numbing to trawl through. A perl script called loco http://www.zjuul.net/~jules/loco/ ( http://www.zjuul.net/~jules/loco/ ) , when used like "./loco CJHS-iMacLab-22/system.log | less -R" worked nicely. [I wonder if TextWrangler's syntax highlighting could be (ab)used to do this.]
I'm now using the 10.5.6 server admin tools -- does that cause problems with previous versions of the OS (are are problems more likely when using outdated tools?)
Yes, using the wrong server tools versions can cause issues, especially with Work Group Manager, it can cause BSD database corruption, which will hose your LDAP. Ever see a new user generate a negative UID number?
I don't believe so. Indeed, we seldom even touch our servers to upgrade them.
Huh. Well, this is the first year we've used replicas -- previously ever site had its own master and was a universe unto itself.
We had a small problem. In the master image the client is bound to the ODM. I then have casper policies that change bindings via a shell script. Well, for some reason they weren't running and all 6,000 clients were bound to the ODM and they proceeded to bend over my Xserve and throw it to it's knees. Everything ran slow. That is remedied now, the casper policy is running and all client machines get bound to the ODR in their building.
The CPU graph on the server at a school called CJHS -- where, in particular, I was having problems -- is at a constant 75% -- which is about 10 times what I would expect. [I wish I had proper monitoring in place and could go back further than seven days. It was hovering at a constant 60% almost a week ago, and then jumped up to 75% and remained there.] Running top on the machine, I see that AFP has gone off the deep end -- using 599.9% of the available CPU time. Time to reboot that box. [Only one of our other servers was misbehaving in the same way.] I had turned on all the AFP logging features on that machine, and now, when they could be useful, the access log starts at Jan 29 and ends on Feb 5th. It was too verbose, so I have turned off many of the logging features.
how many connections are you seeing on AFP? I assume that all home folders are on AFP? Do you do portable home directories?
Seeing the AFP problem, I've changed my mind about putting up long traces from the current log. This does rather explain why rebooting the open directory master didn't help this particular student much.
I am suspecting that this instance of the user (repeatedly) being unable to log in is attributable to the AFP service on the school's server being too busy. The log has lines like:
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: Checking for policies triggered by "login" for user "Nels288"...
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: Gathering Policy Information from https://192.168.65.185:8443/...
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: The disk you specified could not be found.
Feb 11 13:52:31 CJHS-iMacLab-22 loginwindow[1964]: USER_PROCESS: 1964 console
Feb 11 13:52:33 CJHS-iMacLab-22 loginwindow[1964]: Couldn't create temp file /Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/CJHS_Grade_07/[full name redacted]/Library/Keychains/~hQVDheOBTW3E955I: Unknown error: 118
Feb 11 13:52:33 CJHS-iMacLab-22 loginwindow[1964]: ERROR | -[Login1 setupEnvironment] | Unable to unlock the keychain, SecKeychainLogin returned 100118
Feb 11 13:52:33 CJHS-iMacLab-22 com.apple.launchd[1] (com.apple.UserEventAgent-LoginWindow[1970]): Exited: Terminated
Feb 11 13:52:33 CJHS-iMacLab-22 com.apple.launchd[2026] (0x103c30.zombie[1972]): Failed to add kevent for PID 1972. Will unload at MIG return
Feb 11 13:52:36 CJHS-iMacLab-22 SecurityAgent[1974]: NSExceptionHandler has recorded the following exception:\nNSUncaughtSystemExceptionException -- Uncaught system exception: signal 11\nStack trace: 0x3721e 0x9183309b 0xffffffff 0x61466 0x6e84d 0x6491d 0x62067 0x907ba95e 0x92e0cb45 0x92e0ccf8 0x962ebda4 0x962ebbbd 0x962eba31 0x91ca6505 0x91ca5db8 0x91c9edf3 0x10fc7 0x202a 0x1
Feb 11 13:52:37 CJHS-iMacLab-22 ReportCrash[2192]: Formulating crash report for process SecurityAgent[1974]
Feb 11 13:52:38 CJHS-iMacLab-22 ReportCrash[2192]: Saved crashreport to /Library/Logs/CrashReporter/SecurityAgent_2009-02-11-135236_CJHS-iMacLab-22.crash using uid: 0 gid: 0, euid: 0 egid: 0
Feb 11 13:52:42 CJHS-iMacLab-22 ARDAgent [2162]: ********ARDAgent Launched********
Feb 11 13:52:42 CJHS-iMacLab-22 blued[46]: [_setUserPreference] syncs returns false
Feb 11 13:52:43 CJHS-iMacLab-22 ARDAgent [2162]: ********ARDAgent Ready********
Feb 11 13:52:43 CJHS-iMacLab-22 blued[46]: [_setUserPreference] syncs returns false
and lots and lots of lines like:
Feb 11 13:53:38 CJHS-iMacLab-22 /System/Library/CoreServices/SystemUIServer.app/Contents/MacOS/SystemUIServer[4176]: FolderManager: Failed looking up user domain root; url='file://localhost/Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/CJHS_Grade_07/[full name redacted]/' path=/Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/CJHS_Grade_07/[full name redacted]/ err=-120 uid=7100 euid=7100
Thanks for your time. I will see if I am able to get a proper trace of what is going on, especially if I can attribute it to something other than AFP.
Cheers,
Clinton Blackmore
That last line where it can't look up the home folder path, kind of makes me think, DNS issue. Is your DNS fully resolved both forwards and backwards? In OS X Server the changeip command is actually what is used to check this, and of course set this. I have had my share of small DNS issues and they will always come back to bite your leg off. So, make sure you get your DNS in order. So, you can ssh into your server and run this command
xs106-a:~ root# changeip -checkhostname
Primary address = 10.160.3.30
Current HostName = xs106-a.kckps.org
DNS HostName = xs106-a.kckps.org
The names match. There is nothing to change.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.jamfsoftware.com/pipermail/casper/attachments/20090213/756025a1/attachment.htm
More information about the Casper
mailing list