[Casper] Timing logins and applicaion startup
Clinton Blackmore
clinton.blackmore at westwind.ab.ca
Thu Feb 12 16:37:50 PST 2009
Replies inline:
On 12-Feb-09, at 9:11 AM, Thomas Larkin wrote:
> Are there any errors with the log ins? Like if you ssh into a
> client and watch it's system log while a user tries to log in, does
> it produce any errors? If all your servers are 10.5.5 you should be
> in good shape. I did notice vast amounts of improvements when we
> ditched 10.5.3 and 10.5.4 on our servers. 10.5.4 was a pile of dung
> if you ask me. Also, make sure you are using the correct version of
> server tools, as this can also cause issues if you are using
> mismatched versions.
>
> I would suggest you watch a client log in and see what happens by
> ssh into it and watching the systemlog while it tires to log in.
>
Sigh. No users experienced this problem when I was nearby today, so I
was unable to do that. I did look through the log files on a unit
that exhibited problems, and have a long section at the end of this
where I have annotated a log. Incidentally, I found that colorizing
the log made it less mind-numbing to trawl through. A perl script
called loco http://www.zjuul.net/~jules/loco/ , when used like "./loco
CJHS-iMacLab-22/system.log | less -R" worked nicely. [I wonder if
TextWrangler's syntax highlighting could be (ab)used to do this.]
I'm now using the 10.5.6 server admin tools -- does that cause
problems with previous versions of the OS (are are problems more
likely when using outdated tools?)
> Also, have there been any changes to your servers and I assume that
> at one point in time this was all working great?
>
I don't believe so. Indeed, we seldom even touch our servers to
upgrade them.
> When we had our similar problems I got an Apple engineer involved
> and they pretty much told me that OD Masters and Replicas are kind
> of built around the idea of having no more than 1,000 simultaneous
> connections at once.
Huh. Well, this is the first year we've used replicas -- previously
ever site had its own master and was a universe unto itself.
>
> Also, if you do folder syncing you may want to look at your AFP data
> throughput charts in Server Admin and see if they fall way below for
> any reason, then also check out your servers CPU usage history as
> well.
>
The CPU graph on the server at a school called CJHS -- where, in
particular, I was having problems -- is at a constant 75% -- which is
about 10 times what I would expect. [I wish I had proper monitoring
in place and could go back further than seven days. It was hovering
at a constant 60% almost a week ago, and then jumped up to 75% and
remained there.] Running top on the machine, I see that AFP has gone
off the deep end -- using 599.9% of the available CPU time. Time to
reboot that box. [Only one of our other servers was misbehaving in
the same way.] I had turned on all the AFP logging features on that
machine, and now, when they could be useful, the access log starts at
Jan 29 and ends on Feb 5th. It was too verbose, so I have turned off
many of the logging features.
Seeing the AFP problem, I've changed my mind about putting up long
traces from the current log. This does rather explain why rebooting
the open directory master didn't help this particular student much.
I am suspecting that this instance of the user (repeatedly) being
unable to log in is attributable to the AFP service on the school's
server being too busy. The log has lines like:
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: Checking
for policies triggered by "login" for user "Nels288"...
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: Gathering
Policy Information from https://192.168.65.185:8443/...
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: The disk
you specified could not be found.
Feb 11 13:52:31 CJHS-iMacLab-22 loginwindow[1964]: USER_PROCESS: 1964
console
Feb 11 13:52:33 CJHS-iMacLab-22 loginwindow[1964]: Couldn't create
temp file /Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/
CJHS_Grade_07/[full name redacted]/Library/Keychains/
~hQVDheOBTW3E955I: Unknown error: 118
Feb 11 13:52:33 CJHS-iMacLab-22 loginwindow[1964]: ERROR | -[Login1
setupEnvironment] | Unable to unlock the keychain, SecKeychainLogin
returned 100118
Feb 11 13:52:33 CJHS-iMacLab-22 com.apple.launchd[1]
(com.apple.UserEventAgent-LoginWindow[1970]): Exited: Terminated
Feb 11 13:52:33 CJHS-iMacLab-22 com.apple.launchd[2026]
(0x103c30.zombie[1972]): Failed to add kevent for PID 1972. Will
unload at MIG return
Feb 11 13:52:36 CJHS-iMacLab-22 SecurityAgent[1974]:
NSExceptionHandler has recorded the following exception:
\nNSUncaughtSystemExceptionException -- Uncaught system exception:
signal 11\nStack trace: 0x3721e 0x9183309b 0xffffffff 0x61466
0x6e84d 0x6491d 0x62067 0x907ba95e 0x92e0cb45 0x92e0ccf8
0x962ebda4 0x962ebbbd 0x962eba31 0x91ca6505 0x91ca5db8
0x91c9edf3 0x10fc7 0x202a 0x1
Feb 11 13:52:37 CJHS-iMacLab-22 ReportCrash[2192]: Formulating crash
report for process SecurityAgent[1974]
Feb 11 13:52:38 CJHS-iMacLab-22 ReportCrash[2192]: Saved crashreport
to /Library/Logs/CrashReporter/SecurityAgent_2009-02-11-135236_CJHS-
iMacLab-22.crash using uid: 0 gid: 0, euid: 0 egid: 0
Feb 11 13:52:42 CJHS-iMacLab-22 ARDAgent [2162]: ********ARDAgent
Launched********
Feb 11 13:52:42 CJHS-iMacLab-22 blued[46]: [_setUserPreference] syncs
returns false
Feb 11 13:52:43 CJHS-iMacLab-22 ARDAgent [2162]: ********ARDAgent
Ready********
Feb 11 13:52:43 CJHS-iMacLab-22 blued[46]: [_setUserPreference] syncs
returns false
and lots and lots of lines like:
Feb 11 13:53:38 CJHS-iMacLab-22 /System/Library/CoreServices/
SystemUIServer.app/Contents/MacOS/SystemUIServer[4176]: FolderManager:
Failed looking up user domain root; url='file://localhost/Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/CJHS_Grade_07/
[full name redacted]/' path=/Network/Servers/cjhs.wwsd.net/Volumes/
DataHD/CJHSstudents/CJHS_Grade_07/[full name redacted]/ err=-120
uid=7100 euid=7100
Thanks for your time. I will see if I am able to get a proper trace
of what is going on, especially if I can attribute it to something
other than AFP.
Cheers,
Clinton Blackmore
This email has been scanned by Barracuda Network's Anti-Virus and Spam Firewall.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.jamfsoftware.com/pipermail/casper/attachments/20090212/ce48a117/attachment.htm
More information about the Casper
mailing list