[Casper] Timing logins and applicaion startup

Clinton Blackmore clinton.blackmore at westwind.ab.ca
Thu Feb 12 16:37:50 PST 2009


Replies inline:

On 12-Feb-09, at 9:11 AM, Thomas Larkin wrote:

> Are there any errors with the log ins?  Like if you ssh into a  
> client and watch it's system log while a user tries to log in, does  
> it produce any errors?  If all your servers are 10.5.5 you should be  
> in good shape.  I did notice vast amounts of improvements when we  
> ditched 10.5.3 and 10.5.4 on our servers.  10.5.4 was a pile of dung  
> if you ask me.  Also, make sure you are using the correct version of  
> server tools, as this can also cause issues if you are using  
> mismatched versions.
>
> I would suggest you watch a client log in and see what happens by  
> ssh into it and watching the systemlog while it tires to log in.
>

Sigh.  No users experienced this problem when I was nearby today, so I  
was unable to do that.  I did look through the log files on a unit  
that exhibited problems, and have a long section at the end of this  
where I have annotated a log.  Incidentally, I found that colorizing  
the log made it less mind-numbing to trawl through.  A perl script  
called loco http://www.zjuul.net/~jules/loco/ , when used like "./loco  
CJHS-iMacLab-22/system.log | less -R" worked nicely. [I wonder if  
TextWrangler's syntax highlighting could be (ab)used to do this.]

I'm now using the 10.5.6 server admin tools -- does that cause  
problems with previous versions of the OS (are are problems more  
likely when using outdated tools?)


> Also, have there been any changes to your servers and I assume that  
> at one point in time this was all working great?
>

I don't believe so.  Indeed, we seldom even touch our servers to  
upgrade them.

> When we had our similar problems I got an Apple engineer involved  
> and they pretty much told me that OD Masters and Replicas are kind  
> of built around the idea of having no more than 1,000 simultaneous  
> connections at once.

Huh.  Well, this is the first year we've used replicas -- previously  
ever site had its own master and was a universe unto itself.
>
> Also, if you do folder syncing you may want to look at your AFP data  
> throughput charts in Server Admin and see if they fall way below for  
> any reason, then also check out your servers CPU usage history as  
> well.
>

The CPU graph on the server at a school called CJHS -- where, in  
particular, I was having problems -- is at a constant 75% -- which is  
about 10 times what I would expect.  [I wish I had proper monitoring  
in place and could go back further than seven days.  It was hovering  
at a constant 60% almost a week ago, and then jumped up to 75% and  
remained there.]  Running top on the machine, I see that AFP has gone  
off the deep end -- using 599.9% of the available CPU time.  Time to  
reboot that box.  [Only one of our other servers was misbehaving in  
the same way.]  I had turned on all the AFP logging features on that  
machine, and now, when they could be useful, the access log starts at  
Jan 29 and ends on Feb 5th.  It was too verbose, so I have turned off  
many of the logging features.


Seeing the AFP problem, I've changed my mind about putting up long  
traces from the current log.  This does rather explain why rebooting  
the open directory master didn't help this particular student much.

I am suspecting that this instance of the user (repeatedly) being  
unable to log in is attributable to the AFP service on the school's  
server being too busy.  The log has lines like:


Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: Checking  
for policies triggered by "login" for user "Nels288"...
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: Gathering  
Policy Information from https://192.168.65.185:8443/...
Feb 11 13:52:31 CJHS-iMacLab-22 com.apple.loginwindow[1964]: The disk  
you specified could not be found.
Feb 11 13:52:31 CJHS-iMacLab-22 loginwindow[1964]: USER_PROCESS: 1964  
console
Feb 11 13:52:33 CJHS-iMacLab-22 loginwindow[1964]: Couldn't create  
temp file /Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/ 
CJHS_Grade_07/[full name redacted]/Library/Keychains/ 
~hQVDheOBTW3E955I: Unknown error: 118
Feb 11 13:52:33 CJHS-iMacLab-22 loginwindow[1964]: ERROR | -[Login1  
setupEnvironment] | Unable to unlock the keychain, SecKeychainLogin  
returned 100118
Feb 11 13:52:33 CJHS-iMacLab-22 com.apple.launchd[1]  
(com.apple.UserEventAgent-LoginWindow[1970]): Exited: Terminated
Feb 11 13:52:33 CJHS-iMacLab-22 com.apple.launchd[2026]  
(0x103c30.zombie[1972]): Failed to add kevent for PID 1972. Will  
unload at MIG return
Feb 11 13:52:36 CJHS-iMacLab-22 SecurityAgent[1974]:  
NSExceptionHandler has recorded the following exception: 
\nNSUncaughtSystemExceptionException -- Uncaught system exception:  
signal 11\nStack trace:  0x3721e  0x9183309b  0xffffffff  0x61466   
0x6e84d  0x6491d  0x62067  0x907ba95e  0x92e0cb45  0x92e0ccf8   
0x962ebda4  0x962ebbbd  0x962eba31  0x91ca6505  0x91ca5db8   
0x91c9edf3  0x10fc7  0x202a  0x1
Feb 11 13:52:37 CJHS-iMacLab-22 ReportCrash[2192]: Formulating crash  
report for process SecurityAgent[1974]
Feb 11 13:52:38 CJHS-iMacLab-22 ReportCrash[2192]: Saved crashreport  
to /Library/Logs/CrashReporter/SecurityAgent_2009-02-11-135236_CJHS- 
iMacLab-22.crash using uid: 0 gid: 0, euid: 0 egid: 0
Feb 11 13:52:42 CJHS-iMacLab-22 ARDAgent [2162]: ********ARDAgent  
Launched********
Feb 11 13:52:42 CJHS-iMacLab-22 blued[46]: [_setUserPreference] syncs  
returns false
Feb 11 13:52:43 CJHS-iMacLab-22 ARDAgent [2162]: ********ARDAgent  
Ready********
Feb 11 13:52:43 CJHS-iMacLab-22 blued[46]: [_setUserPreference] syncs  
returns false

and lots and lots of lines like:

Feb 11 13:53:38 CJHS-iMacLab-22 /System/Library/CoreServices/ 
SystemUIServer.app/Contents/MacOS/SystemUIServer[4176]: FolderManager:  
Failed looking up user domain root; url='file://localhost/Network/Servers/cjhs.wwsd.net/Volumes/DataHD/CJHSstudents/CJHS_Grade_07/ 
[full name redacted]/' path=/Network/Servers/cjhs.wwsd.net/Volumes/ 
DataHD/CJHSstudents/CJHS_Grade_07/[full name redacted]/ err=-120  
uid=7100 euid=7100



Thanks for your time.  I will see if I am able to get a proper trace  
of what is going on, especially if I can attribute it to something  
other than AFP.

Cheers,
Clinton Blackmore






This email has been scanned by Barracuda Network's Anti-Virus and Spam Firewall.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://list.jamfsoftware.com/pipermail/casper/attachments/20090212/ce48a117/attachment.htm 


More information about the Casper mailing list