Archived community.zenoss.org | full text search
Skip navigation
Currently Being Moderated

Dev Chat 06/23/2011

VERSION 2  Click to view document history
Created on: Aug 3, 2011 9:55 AM by Nick Yeates - Last Modified:  Aug 3, 2011 10:45 AM by Nick Yeates

Service Dynamics explain, where is core going, daemon stopping, programmatically delete roles, Avalon mysql details

 

[23-Jun-2011 10:52:59] <nyeates> Hi all
[23-Jun-2011 10:53:05] <Hackman238> nyeates: hey
[23-Jun-2011 10:53:12] <nyeates> dev meeting in 8 mins
[23-Jun-2011 10:53:15] <rocket> run its nick!
[23-Jun-2011 10:53:17] <nyeates> bbiab
[23-Jun-2011 10:53:29] <nyeates> agggh, the two headed community mgr!
[23-Jun-2011 10:53:31] <nyeates> :-)
[23-Jun-2011 10:54:07] <rmatte> 3 headed is cooler, like cerberus or hydra
[23-Jun-2011 10:54:09] <Hackman238> dhopp: As funny as this sounds my massive avaon test has 69 zenoss daemons excluding zenhub children
[23-Jun-2011 10:54:09] <rmatte>
[23-Jun-2011 10:54:28] <Hackman238> dhopp: And its a 8 core box
[23-Jun-2011 10:54:43] <rmatte> then again hydra technically had more than that
[23-Jun-2011 10:55:06] <dhopp> Hackman238:  funny because it's that many, or funny because it's '69'...
[23-Jun-2011 10:55:10] <dhopp> sorry couldn't resist
[23-Jun-2011 10:55:13] <rmatte> lol
[23-Jun-2011 10:55:21] <Hackman238> dhopp: Both. LOL.
[23-Jun-2011 10:55:41] <Hackman238> dhopp: Have amost 3000 devices monitoring on this single box using 3 instances of zenperfsnmp
[23-Jun-2011 10:56:01] <dhopp> Hackman238:  poll time?
[23-Jun-2011 10:56:06] <dhopp> Hackman238:  how many data points?
[23-Jun-2011 10:56:07] <Hackman238> dhopp: Has only a raid 1 for os, raid 10 for zenoss and 32GB ram
[23-Jun-2011 10:56:18] <Hackman238> 5 minutes- all stock datapoints
[23-Jun-2011 10:57:00] <dhopp> Hackman238:  I'm confused…how do you have that many daemons then?
[23-Jun-2011 10:57:29] <rocket> dhopp: multiple enterprise collectors on the same box
[23-Jun-2011 10:57:30] <Hackman238> dhopp: Avalon has more daemons per instance
[23-Jun-2011 10:57:37] <dhopp> ahh
[23-Jun-2011 10:57:43] <rocket> and there are more daemons etc ..
[23-Jun-2011 10:57:45] <Hackman238> dhopp: And 3 istances on this box
[23-Jun-2011 10:57:53] <dhopp> Hackman238:  gotcha
[23-Jun-2011 10:58:15] <davetoo> how many datapoints, what kind of physical disk?
[23-Jun-2011 10:58:15] <rocket> those daemons perform a heck of a lot better now though ..
[23-Jun-2011 10:58:40] <rocket> Hackman238: werent you saying you were able to consolidate 5 collectors down to 1
[23-Jun-2011 10:58:42] <davetoo> (oh, sorry, still waking up, I see somebody already asked the first question)
[23-Jun-2011 10:59:07] <Hackman238> dhopp: 2142K datapoints
[23-Jun-2011 10:59:28] <Hackman238> rocket: Physical Imeant
[23-Jun-2011 10:59:57] <davetoo> 2.1 million datapoints?
[23-Jun-2011 11:00:00] <Hackman238> load average: 2.59, 3.66, 4.26
[23-Jun-2011 11:00:20] <rocket> Hackman238: yes I meant physical servers as well ..
[23-Jun-2011 11:00:44] <Hackman238> rocket: oh. yeah this is all on one box
[23-Jun-2011 11:01:09] <Hackman238> rocket: added two extra local collectors to breakup load so that config times werent crazy and the cycle times were reasonable
[23-Jun-2011 11:01:32] <dhopp> Hackman238:  davetoo has a good question…did you mean 2142 data points..or 2142*1000 data points?
[23-Jun-2011 11:01:35] <davetoo> Where are your mysql table files, and what kind of mysql tuning have you done/
[23-Jun-2011 11:01:49] <Hackman238> yes, 2.1 mil approx
[23-Jun-2011 11:02:02] <Hackman238> About 700 dp per device, most are large switches
[23-Jun-2011 11:02:05] <davetoo> Hackman238: so, a lot of datacenter switches?
[23-Jun-2011 11:02:06] <davetoo> heh
[23-Jun-2011 11:02:18] <Hackman238> davetoo: yep
[23-Jun-2011 11:02:24] <dhopp> Hackman238:  makes sense..now that I think about it..2142 datapoints for 3000 devices would be pretty much impossible math
[23-Jun-2011 11:02:24] <dhopp> heh
[23-Jun-2011 11:02:32] <davetoo> are you keeping the interface error rrds?
[23-Jun-2011 11:02:45] <Hackman238> davetoo: Just the stock tests right now
[23-Jun-2011 11:02:50] zenphil_ is now known as zenphil
[23-Jun-2011 11:03:22] <davetoo> I rarely found it useful to add error-rate datapoints, at least on a permanent baseis
[23-Jun-2011 11:03:50] <Hackman238> davetoo: We generally collect them, yes
[23-Jun-2011 11:04:46] <nyeates> Hi all! Welcome to this weeks dev meet
[23-Jun-2011 11:05:08] davetoo is now known as dcarmean
[23-Jun-2011 11:05:41] <nyeates> To start things off, I know I have a line up of general non-dev questions that have come from Jane. Jane are you around?
[23-Jun-2011 11:06:25] <themactech> answer her question anyway, she will get the meat of it in the chat session transcripts
[23-Jun-2011 11:06:32] <rmatte> Perhaps you could answer the questions anyways and we can relay them back to her?
[23-Jun-2011 11:06:40] <nyeates> yup
[23-Jun-2011 11:07:13] <nyeates> So yesterday Zenoss Inc has released Zenoss Service Dynamics and series of new functionalities.
[23-Jun-2011 11:07:25] <nyeates> Her questions revolved around what is this and how does it relate to core.
[23-Jun-2011 11:07:35] <nyeates> You can see details at our .com site zenoss.com
[23-Jun-2011 11:07:38] <rmatte> I assume that's the new impact manager
[23-Jun-2011 11:07:41] <rmatte> and it's enterprise only
[23-Jun-2011 11:07:42] <rmatte> correct?
[23-Jun-2011 11:09:13] <nyeates> Generally it is our enterprise offering - avalon was our code name and service dynamics is the go to market name and includes impact (service-based montioring) and analystics (new reporting) among other improcvements
[23-Jun-2011 11:09:37] <dhopp> Avalon is a cooler name…just saying
[23-Jun-2011 11:09:51] <nyeates> The product formerly known as Zenoss Enterprise is encapsulated in the resource management/monitoring component of ZSD.
[23-Jun-2011 11:09:58] <Hackman238> dhopp: Agreed- service dynamics sounds bloated
[23-Jun-2011 11:10:08] <dcarmean> PFKA
[23-Jun-2011 11:10:10] <nyeates> How does core relate...
[23-Jun-2011 11:10:53] <nyeates> There will be a core release sometime after the enterprise release - date TBD. The community release will provide resource management/monitoring only.
[23-Jun-2011 11:11:24] <Hackman238> nyeates: AKA no impact?
[23-Jun-2011 11:11:36] <nyeates> not with core, no
[23-Jun-2011 11:11:40] <rmatte> So from my point of view, this means that the focus on event correlation and such is now on an enterprise only tool.  Does this mean that basic correlation features for Zenoss Core are never going to be developed?  Even something as simple as being able to define manual dependencies (this device depends on this device and so on), would be a huge improvement.
[23-Jun-2011 11:12:20] <nyeates> we are feeling out other parts to be included in core - and I have a good feeling about it - but i cant give any details yet. The details are coming likely tomorrow or begining of next week in a blog post from Bill
[23-Jun-2011 11:12:37] <rmatte> k
[23-Jun-2011 11:12:41] <Hackman238> nyeates: I dont know why zenoss excludes major functionality
[23-Jun-2011 11:12:59] <Hackman238> nyeates: Isnt it easy enough to sell service? Zenoss is complicated as hell!
[23-Jun-2011 11:13:15] <Jane_Curry> hi guys and gals
[23-Jun-2011 11:13:24] <Hackman238> Jane_Curry: You're late!
[23-Jun-2011 11:13:29] <Hackman238> Jane_Curry: 
[23-Jun-2011 11:13:46] <Jane_Curry> yup - still hacking roles
[23-Jun-2011 11:14:08] <nyeates> On a seperate note, we also recognize that our "feedback channel" has not been nurtured as much as it should be lately. You can expect to see more requests for feedback and community involvement in the near future.
[23-Jun-2011 11:14:21] <Hackman238> Jane_Curry: Live dangerous-all users now zenmanagers!
[23-Jun-2011 11:14:51] <rmatte> The way it looks to be is that more and more core functionality is being shifted to other enterprise only products.  As an example, I get a feeling that Zenoss' mindset is "Well, there's no point in further improving the (somewhat broken) reports packaged with core, because we now have a new reporting solution." or... "Why would we focus on developing any type of event correlation system in Core when we have a brand new shiny product that does it?"
[23-Jun-2011 11:15:25] <nyeates> Again, a large directive announcement for core directions and strategy will come out this week or next and I think that everything is going in the right direction. More resources will be focused on core.
[23-Jun-2011 11:15:44] <Hackman238> rmatte: Its a complex engine, but I doubt it'll take long for someone to write a core equal
[23-Jun-2011 11:15:57] <nyeates> ok done, any additional questions, id like to save for after or as side convos
[23-Jun-2011 11:16:04] * rmatte nods
[23-Jun-2011 11:16:08] <rmatte> k
[23-Jun-2011 11:16:13] <nyeates> unless you all really want to squable amongst yerselves :-)
[23-Jun-2011 11:16:14] <Jane_Curry> rmatte +1
[23-Jun-2011 11:16:23] <Hackman238> nyeates: I have one
[23-Jun-2011 11:16:40] pmcguire is now known as ptmcg
[23-Jun-2011 11:16:59] <Hackman238> nyeates: Can we please get the daemon restart functionality to acctually wait until the daemon stops rather than just waiting a stock 10 seconds?
[23-Jun-2011 11:17:39] <Hackman238> nyeates: Its super annoying when I have size zenperfsnmp's that I need to restart and I cant just issue a restart
[23-Jun-2011 11:17:54] <Hackman238> *s/size/six/
[23-Jun-2011 11:18:10] <rmatte> Hackman238: haha, I don't even use restart anymore, I do zenoss stop, zenoss status (to make sure they all actually stopped) then zenoss start
[23-Jun-2011 11:18:39] <Hackman238> rmatte: We cant do that, starting an entire collector all at one would blow up the NOC
[23-Jun-2011 11:18:45] * rmatte nods
[23-Jun-2011 11:18:58] <rmatte> well, you can always just do it with the one daemon
[23-Jun-2011 11:18:59] <themactech> Maybe you can answer a question for me Jane, since you are playing with roles now.  I want my clients to have bare ZenUser privileges because I don't want them to be able to mess stuff up, but I would like them to be able to make maintenance windows so I don't get waken up at 3 am when they decide to do maintenance without telling me, is that possible?
[23-Jun-2011 11:19:01] <rmatte> zenperfsnmp stop
[23-Jun-2011 11:19:02] <nyeates> hackman: Much agreed. Lets make it into a feature request and see if we cant hack something up ourselves or push it through here. It would be nice to know if daemons ACTUALLY stop
[23-Jun-2011 11:19:05] <rmatte> zenperfsnmp status
[23-Jun-2011 11:19:09] <rmatte> etc...
[23-Jun-2011 11:19:23] <Hackman238> rmatte: Hubs too....I dont want to have to issue stop and keep checking to start it back up
[23-Jun-2011 11:19:31] <rmatte> yeh
[23-Jun-2011 11:19:34] <Hackman238> Zen1.dfw1 has 9 hubs alone
[23-Jun-2011 11:20:14] <Hackman238> Its crazy to have to wait for each. Sometimes I just wanna toss a kill -9  at them since I dont have time to baby sit the process to restart it
[23-Jun-2011 11:20:34] <Hackman238> Other than that...serious WIN on avalon
[23-Jun-2011 11:20:34] <rmatte> lol
[23-Jun-2011 11:20:46] <zenphil> The daemon restart commands now wait only as long as needed.
[23-Jun-2011 11:20:55] <dhopp> Hackman238:  I think you mean service dymanics :-P
[23-Jun-2011 11:21:12] <Hackman238> zenphil: Then thats a b6 bug- ecause they dont
[23-Jun-2011 11:21:29] <rmatte> dhopp: that's what happens when you let sales guys name a product lol
[23-Jun-2011 11:21:43] <Hackman238> dhopp: right right
[23-Jun-2011 11:21:56] <Hackman238> rmatte: Hum...theres not enough BS in the word Avalon
[23-Jun-2011 11:22:03] <rmatte> yeh, exactly
[23-Jun-2011 11:22:13] <rmatte> they do the same things here with our services, drives me nuts
[23-Jun-2011 11:22:14] <Hackman238> rmatte: ...let me just get out my sale BS thesearus here...
[23-Jun-2011 11:22:15] <themactech> wait until the marketing folks pitch in
[23-Jun-2011 11:22:21] <rmatte> I'm a firm believer than simple is better
[23-Jun-2011 11:22:25] <rmatte> that*
[23-Jun-2011 11:24:13] <nyeates> I follow you guys on the naming - apparently from what ive learned, we - tech folk, speak and enjoy a different language and naming pattern that CXOs at big enterprises
[23-Jun-2011 11:24:57] <Hackman238> nyeates: Nice polite way of agreeing Nick XD
[23-Jun-2011 11:24:57] <rmatte> lol
[23-Jun-2011 11:25:35] <themactech> that's true, key works unlock budgets too, it's strange.  We were talking up our SNMP monitoring to a client and he wasn't byting.  Then we mentioned Disaster Recovery as part of it, that was it, done deal.  He had a budget for DR
[23-Jun-2011 11:25:37] <rmatte> "This next release of Zenoss is so amazing that we couldn't actually find a word in the english dictionary to describe it, so we made one up... Zenoss Strumbolicious (tm)"
[23-Jun-2011 11:25:48] <nyeates> Any devs know what effort it would take to implement daemon service restarts and stops to be atomic? aka, give effective feedback on if the service was actually stopped?
[23-Jun-2011 11:26:19] <Jane_Curry> themactech: The ZenUser role has Maintenance windows View role but not MaintenanceWindows Edit - is this the extra permission that you want to grant??
[23-Jun-2011 11:27:26] <themactech> Yes, thought I admit I haven't really looked into it, was asking in case you had an easy answer
[23-Jun-2011 11:27:35] <Hackman238> Bah- all management should be 75% as qualified in the areas of the people they oversee.
[23-Jun-2011 11:27:55] <Hackman238> Anything less leads to such nonsense as "I need pretty graphs"
[23-Jun-2011 11:28:03] <rmatte> Hackman238: in a perfect world
[23-Jun-2011 11:28:04] <themactech> I have had 120 alerts come up because a client decided to power cycle in tech core in middle of night without telling us first
[23-Jun-2011 11:28:10] <rocket> Hackman238: bleh .. I better start finding smarter guys if I want to be a manger?
[23-Jun-2011 11:28:14] <rmatte> Hackman238: In reality it's closer to 0%
[23-Jun-2011 11:28:14] <Hackman238> And "Service dynamics" sounds better than "Avalon"
[23-Jun-2011 11:28:36] <rmatte> You know what sounds better than Service Dynamics?  Enterprise
[23-Jun-2011 11:28:52] <Hackman238> rmatte: Im fortunate here at Rackspace everyone up to our director acctualy knows what I mean when I talk about python, GIL, etc.
[23-Jun-2011 11:28:53] <rmatte> lol
[23-Jun-2011 11:29:01] <rmatte> nice
[23-Jun-2011 11:29:02] <Hackman238> rmatte: Agreed!
[23-Jun-2011 11:29:05] <themactech> At least Enterprise gives us nerd Trekkies the warm and fuzzies
[23-Jun-2011 11:29:33] <Hackman238> rocket: Well theres a limit LOL
[23-Jun-2011 11:29:42] <mrchippy> rmatte: there were some great internal suggestions such as The Cromulent Embiggenator, Cloudy with a Chance of Awesome, Cat Herder and ServiceWax
[23-Jun-2011 11:30:10] <Hackman238> mrchippy: XD
[23-Jun-2011 11:30:14] <Jane_Curry> I vote for The Cromulent Embiggenator
[23-Jun-2011 11:30:41] <dhopp> wow…I didn't mean to start a coup
[23-Jun-2011 11:30:43] <rmatte> When I hear something like "Service Dynamics", I think of a used car salesman trying to sell me something, lol
[23-Jun-2011 11:30:51] <rocket> mrchippy: I thought Cat Herder won ..
[23-Jun-2011 11:31:00] <rocket>
[23-Jun-2011 11:31:03] <rmatte> cat herder, lol
[23-Jun-2011 11:31:43] <dhopp> However, I haven't seen an answer to: nyeates
[23-Jun-2011 11:31:43] <dhopp> 10:25
[23-Jun-2011 11:31:43] <dhopp> Any devs know what effort it would take to implement daemon service restarts and stops to be atomic? aka, give effective feedback on if the service was actually stopped?
[23-Jun-2011 11:31:48] <Hackman238> rmatte: Agreed. Or like a as seen on TV guy pitching something
[23-Jun-2011 11:32:00] <Jane_Curry> themactech:  the work I am currently doing on roles includes creating a new role on ZenPack install but if you just want to augment the existing ZenUser role
[23-Jun-2011 11:32:02] <dhopp> Hackman238:  buy wait there's MORE!
[23-Jun-2011 11:32:06] <dhopp> *but
[23-Jun-2011 11:32:17] <rocket> dhopp: it would take monitoring the pid is gone etc ..
[23-Jun-2011 11:32:33] <Hackman238> nyeates: Thats what you need- you need a Billy Mayes. Whos as good as him who is still alive?
[23-Jun-2011 11:32:47] <rocket> dhopp: it probably isnt significant work to do .. it just takes time to fit into the schedule and prioritize etc ..
[23-Jun-2011 11:32:52] <Hackman238> nyeates: Ron Ronco?
[23-Jun-2011 11:32:54] <Jane_Curry> then goto http://<your zenoss>:8080/zport/manage_access
[23-Jun-2011 11:33:26] <rmatte> words that drive me nuts in product names: "Plus", "Dynamic", "Guarantee", "Expert", "Super", "Digital", and any other catch words
[23-Jun-2011 11:33:30] <themactech> I will give that a shot when I am back in the lab, i'm out in the field today so I can test this now, thanks for the info.
[23-Jun-2011 11:33:31] <Hackman238> Aw....fail. Zenhub just exploded in Avalon
[23-Jun-2011 11:33:31] <Jane_Curry> scroll down the permiussions list and add a tick alongside MaintenanceWindows Edit in the ZenUser role and then Save
[23-Jun-2011 11:33:39] <Jane_Curry> I think that should do it
[23-Jun-2011 11:33:40] <zenphil> For the restart command, it's been improved to where it will wait for the daemon to stop (up to 10 seconds) then start it up  again.
[23-Jun-2011 11:33:57] <Jane_Curry> Does anyone know whether this is going to cause havoc???????
[23-Jun-2011 11:34:01] <rmatte> Hackman238: I've been hearing about your Avalon exploding daily and it's making me nervous about that lol
[23-Jun-2011 11:34:20] <Hackman238> rmatte: I'm intentionally abusing the product to baseline
[23-Jun-2011 11:34:25] <rmatte> ah
[23-Jun-2011 11:34:36] <Hackman238> rmatte: Takes alot of abuse comparred to v2.5 or 3
[23-Jun-2011 11:34:45] <rocket> rmatte: I am sure there is no tuning or tweaking either
[23-Jun-2011 11:34:51] <rmatte> cool
[23-Jun-2011 11:34:55] <Hackman238> Oops...my bad.
[23-Jun-2011 11:34:58] <Hackman238> Out of RAM
[23-Jun-2011 11:34:59] <Hackman238> LOL
[23-Jun-2011 11:35:10] <rmatte> lol
[23-Jun-2011 11:35:19] <dhopp> zenphil:  wait…it's been improved to wait only 10 seconds?
[23-Jun-2011 11:35:24] <rocket> rmatte: we typically tune the large installs like Rackspace quite a bit differently than most small shops
[23-Jun-2011 11:35:43] <dhopp> Hackman238:  OOM Killer?
[23-Jun-2011 11:35:44] <rocket> typically we set it up out of the box for the smaller shops
[23-Jun-2011 11:35:48] <rmatte> rocket: well, just the sheer volume is pretty crazy
[23-Jun-2011 11:35:54] <Hackman238> rmatte: Rockets right. Usually we need high cache settings and all sorts of magery to make things run smooth
[23-Jun-2011 11:36:01] <rmatte> hehe
[23-Jun-2011 11:36:18] <Hackman238> Shit my drac locked up too
[23-Jun-2011 11:36:21] <rocket> if its running at all at Rackspace without tuning thats impressive in itself ..
[23-Jun-2011 11:36:23] <Hackman238> Grr.
[23-Jun-2011 11:36:34] <Hackman238> rocket: Agree.
[23-Jun-2011 11:36:35] <rocket> and you are using older hardware too right Hackman?
[23-Jun-2011 11:36:48] <nyeates> dhopp: agreed, 10 seconds doesnt seem like it will help the situation - would be better if after 10 sec it said "service process did not shutdown"
[23-Jun-2011 11:36:49] <Hackman238> rocket: All of IAD on a single box...once re just retired
[23-Jun-2011 11:37:07] <Hackman238> *s/once/one/
[23-Jun-2011 11:37:16] <Hackman238> *s/re/we/
[23-Jun-2011 11:37:26] <Hackman238> Sorry...heavy threading over here.
[23-Jun-2011 11:37:27] <dhopp> nyeates:  or genereated a zenoss event..heh…although that wouldn't do much if it was zenhub or zenactions
[23-Jun-2011 11:37:44] <nyeates> dhopp: good point on the event derrr
[23-Jun-2011 11:38:00] <Hackman238> rocket: Though I really hate that the heartbeat madness seems to be the same
[23-Jun-2011 11:38:14] <dhopp> nyeates:  we have a number of application servers that for our bigger/busier clients 10 seconds is not enough time to wait for all db connections to close properly
[23-Jun-2011 11:39:06] <rocket> Hackman238: which madness do you speak?
[23-Jun-2011 11:39:27] <Hackman238> rocket: toggle between hb fail and clear
[23-Jun-2011 11:40:26] <Hackman238> rocket: some fuzzy logic in there would be nice. Like if we have 8 cores and our load is 12, lets let the daemons exceed their hb timeout period by 5%
[23-Jun-2011 11:40:29] <zenphil> Yes, the restart command currently has a timeout but it sounds like that is causing problems, so I'm entering a ticket for that.
[23-Jun-2011 11:41:15] <Hackman238> rocket: Just antialiasing around the hb events
[23-Jun-2011 11:42:02] <rocket> Hackman238: open a ticket .. usually people dont care about heartbeats unless there are two or more ..
[23-Jun-2011 11:42:35] <rmatte> the heartbeats are very touchy
[23-Jun-2011 11:42:39] <rmatte> I just outright ignore them
[23-Jun-2011 11:42:40] <nyeates> dhopp: hackman238: I think zenphil is refering to a ticket about the dameons shuttindown thing
[23-Jun-2011 11:42:48] <rocket> they ignore them unless they have escalated one level and then send alerts etc ..
[23-Jun-2011 11:42:58] <Hackman238> rocket: ticket for a fix or ticket for a workaround?
[23-Jun-2011 11:43:27] <rocket> just a ticket describing the issue, let the devs figure out a fix etc
[23-Jun-2011 11:44:13] <rocket> most likely the heartbeat code can be updated in place now as most if not all daemons share the same framework
[23-Jun-2011 11:44:25] <rocket> and the heartbeat code is part of that framework if I recall properly
[23-Jun-2011 11:44:35] <nyeates> oops i misread something back there, ignore last comment
[23-Jun-2011 11:45:29] <nyeates> zenphil: can u put me as a CC on that ticket that you create?
[23-Jun-2011 11:45:34] <dhopp> yeah I think the 10 seconds is probably find for smaller shops…but you get up into the 1000+ devices and I could see that being an issue if the daemon is busy when you attempt a restart..glad to hear there is a ticket going to be opened
[23-Jun-2011 11:45:45] <dhopp> *fine
[23-Jun-2011 11:46:43] <nyeates> zenphil: also, can you put it in trac external so that core users can reference it
[23-Jun-2011 11:47:55] <Hackman238> rocket: Gotcha. TY
[23-Jun-2011 11:48:26] <Jane_Curry> Another roles question....
[23-Jun-2011 11:48:39] <Jane_Curry> Anyone know how to programmatically delete a role?
[23-Jun-2011 11:49:17] <nyeates> Jane: btw, from before - changing that MaintenanceWindow edit role should have no adverse affects that I know of. Worthy of testing though, Im sure.
[23-Jun-2011 11:49:18] <jrh0090> I think this will work: zport.acl_users.roleManager.removeRole(role)
[23-Jun-2011 11:49:33] <Jane_Curry> There is code to add a role in ZenModel/migrate/zenmanagerrole.py that uses the addRole method on a roleManager
[23-Jun-2011 11:49:51] <Jane_Curry> but there's no deleteRole
[23-Jun-2011 11:50:13] <Jane_Curry> and grep'ing through directories (both Zenoss and Zope) I cannot find that addRole method
[23-Jun-2011 11:52:09] <rocket> http://www.networkcomputing.com/private-cloud/231000254
[23-Jun-2011 11:52:26] <rocket> http://blog.zenoss.com/2011/06/the-next-step-zenoss-service-dynamics/
[23-Jun-2011 11:53:24] <Hackman238> rocket: That second article talks up a bit of the analysis I've observed
[23-Jun-2011 11:53:41] <Hackman238> rocket: Maybe I'm just doing something wrong
[23-Jun-2011 11:54:05] <rocket> Hackman238: hrmm?
[23-Jun-2011 11:54:55] <Hackman238> rocket: Looking at impact
[23-Jun-2011 11:55:05] <zenphil> Hackman, can you describe the fuzzy logic you meant about the heartbeats? Would it be reasonable to simply make the timeouts configurable? Currently the heartbeats are twice the cycle interval, so if you lengthen the cycle times you're less likely to get heartbeat timeout events.
[23-Jun-2011 11:55:43] <Hackman238> zenphil: I was thinking a dynamic offset scaled with the oad on the box
[23-Jun-2011 11:56:04] <Hackman238> zenphil: but flat configuration dialog is just as good
[23-Jun-2011 11:57:49] <rocket> Hackman238: I am not sure I understand what your analysis was.  I am just not following that line of thought
[23-Jun-2011 11:58:35] <dhopp> so if heartbeats give a lot of false positives, what are people doing as a work around?
[23-Jun-2011 11:58:49] <Hackman238> zenphil: Found another bug- where more than one local collector is in addition to the initial instance of collectors, the daemons page of te initial instance only shows the daemons from the initial instance and the next instance
[23-Jun-2011 11:58:58] <Hackman238> rocket: Less automagical than stated.
[23-Jun-2011 11:59:23] <rocket> jane removeRole is found in lib/python/Products/PluggableAuthService/plugins/ZODBRoleManager.py
[23-Jun-2011 11:59:53] <nyeates> Everyone - I have to run - I have another meeting. If you have any additional questions specific to me, feel free to PM me
[23-Jun-2011 12:00:01] [disconnected at Thu Jun 23 12:00:01 2011]
[23-Jun-2011 12:00:02] [connected at Thu Jun 23 12:00:02 2011]
[23-Jun-2011 12:00:34] [zenoss-logger (logger bot) has joined #zenoss]
[23-Jun-2011 12:00:40] <Hackman238> nyeates: later!
[23-Jun-2011 12:00:50] <nyeates> Thx all, later
[23-Jun-2011 12:01:01] <rocket> Hackman238: I believe its just a wrapper around the rrdcache daemon
[23-Jun-2011 12:01:10] <Hackman238> rocket: Ah gotcha.
[23-Jun-2011 12:01:47] <rocket> we may not expose any .. I havent played with it
[23-Jun-2011 12:01:51] <Jane_Curry> Thanks jrh0090 - removeRole seems to work.  Do you know how I also remove a role from the manage_access panel programmatically?
[23-Jun-2011 12:02:23] <rocket> that daemon does have some options according to the man page. You might need to modify the wrapper script if for some reason you really needed to change something
[23-Jun-2011 12:02:35] <rocket> Hackman238: what do you need to tweak?
[23-Jun-2011 12:02:43] <rocket> or is it more curiosity?
[23-Jun-2011 12:02:57] <Hackman238> rocket: Just hunting for options
[23-Jun-2011 12:03:15] <rocket> http://oss.oetiker.ch/rrdtool/doc/rrdcached.en.html
[23-Jun-2011 12:03:28] <Hackman238> rocket: Wanted to twist some dials to see if I could improve ui performance- is terrible
[23-Jun-2011 12:03:38] <tsener> aww bash and bc is nto meant for that
[23-Jun-2011 12:03:42] <Hackman238> rocket: But thats to be expected since this box is hosed with work
[23-Jun-2011 12:03:53] <tsener> would be great if in zenoss there is a ready option fo a dynamic threshold
[23-Jun-2011 12:04:02] <tsener> based on statistics
[23-Jun-2011 12:04:29] <tsener> which should not be very hard
[23-Jun-2011 12:04:44] <Hackman238> rocket: very cool- TY
[23-Jun-2011 12:04:50] <rocket> tsener: you can always write your own threshold code ..
[23-Jun-2011 12:05:14] <rocket> tsener: just take MinMaxThreshold.py as your example and hack away.
[23-Jun-2011 12:05:37] <rocket> or use the point threshold I wrote as a community member
[23-Jun-2011 12:05:59] <rocket> in either case adding threshold types isnt hard
[23-Jun-2011 12:06:19] <rocket> but what do you mean based on statistics?
[23-Jun-2011 12:07:30] <Hackman238> rocket: Question
[23-Jun-2011 12:07:48] <Hackman238> rocket: What are the implications of mysql blowing up or restarting while zope is running?
[23-Jun-2011 12:08:28] <rocket> things break?
[23-Jun-2011 12:08:47] <Hackman238> rocket: How badly? E.x. I can restart zeo and have nothing bad happen
[23-Jun-2011 12:08:59] <Hackman238> rocket: Unless I'm running a bunch of dmd ops, etc
[23-Jun-2011 12:09:21] <rocket> is this in SD? or 3.X?
[23-Jun-2011 12:09:28] <Hackman238> rocket: Avalon
[23-Jun-2011 12:09:32] <Hackman238> rocket: ...I wont call it SD
[23-Jun-2011 12:09:34] <Hackman238> LOL
[23-Jun-2011 12:09:38] <rocket>
[23-Jun-2011 12:09:49] <rocket> I have to .. lol
[23-Jun-2011 12:10:07] <tsener> rocket: e.g. using standard deviation - colelcting from the RRDs values from the same day time and weekday
[23-Jun-2011 12:10:09] <rocket> I thought the zeo daemon was gone? is that still there?
[23-Jun-2011 12:10:18] <tsener> and get a standard deviation as a threshold
[23-Jun-2011 12:10:26] <Hackman238> rocket: Refering to zeo, I meant in v2.5 and v3, talking about mysql in Avalon
[23-Jun-2011 12:10:26] <rocket> tsener: did you see the predictive threshold zenpack?
[23-Jun-2011 12:10:43] <rocket> zeo is gone in SD, its all dumped into mysql
[23-Jun-2011 12:10:45] <Hackman238> rocket: Sorry, I wasnt clear
[23-Jun-2011 12:10:49] <tsener> hmm nope
[23-Jun-2011 12:10:55] <tsener> i ll check it
[23-Jun-2011 12:11:14] <Hackman238> rocket: Right. What I mean is in v2.5 and v3 I can restart zeo live and hurt nothing. Can I do the same with mysql in Avalon?
[23-Jun-2011 12:11:29] <rocket> Hackman238: I believe so
[23-Jun-2011 12:11:35] <rocket> Hackman238: I have NOT tested that
[23-Jun-2011 12:11:55] <Hackman238> rocket: I figured as much which is why I'm taring up my /var/lib/mysql
[23-Jun-2011 12:12:00] <rocket> Hackman238: but I think it will do the same thing about complaining it can not find the data .. but it should reconnect
[23-Jun-2011 12:12:07] <Hackman238> rocket: Gotcha
[23-Jun-2011 12:12:26] <rocket> mysql IS the crown jewel in SD
[23-Jun-2011 12:12:42] <rocket> it needs to be taken care of
[23-Jun-2011 12:12:59] <Hackman238> rocket: Bah- just the hub and daemon config system being so unfuxored is nice
[23-Jun-2011 12:13:03] <tsener> rocket: looks useful
[23-Jun-2011 12:13:25] <tsener> still i ll have to write similar for nagios
[23-Jun-2011 12:13:26] <rocket> there should be a common.conf file now in SD
[23-Jun-2011 12:13:39] <Hackman238> rocket: Holy crap....my mysql dir is like 16GB already
[23-Jun-2011 12:13:44] <tsener> not using zenoss in production yet
[23-Jun-2011 12:13:54] <rocket> where common settings can be set for all the daemons ..
[23-Jun-2011 12:14:00] <Hackman238> rocket: Very nice idea
[23-Jun-2011 12:14:01] <rocket> in one place ..
[23-Jun-2011 12:14:19] <tsener> bye guys
[23-Jun-2011 12:14:24] <Hackman238> rocket: concerning zodb in mysql- does it still need packing?
[23-Jun-2011 12:14:27] <Hackman238> tsener: Later
[23-Jun-2011 12:14:31] <tsener> exit
[23-Jun-2011 12:14:40] <tsener> lol too much shell 4 2day
[23-Jun-2011 12:14:51] <Hackman238> rocket: ...this instance was put up yesterday and the db is already 16GB
[23-Jun-2011 12:15:14] <rocket> Hackman238: I am not sure if it does .. my understanding is that it does not
[23-Jun-2011 12:15:22] <rocket> Hackman238: do you have that many events?
[23-Jun-2011 12:15:42] <Hackman238> No....no syslog or other bulk events, just ping/snmp
[23-Jun-2011 12:17:59] <Hackman238> rocket: Darn it....looks like it imploded my db
[23-Jun-2011 12:19:29] <Hackman238> rocket: Wow.....mysql going on 5 minutes to start back up
[23-Jun-2011 12:22:42] <rmatte> probably lots of events in it and it's scanning for errors
[23-Jun-2011 12:22:57] <Hackman238> rmatte: Thats the thing....very few events
[23-Jun-2011 12:23:08] <rmatte> hmmm, strange then
[23-Jun-2011 12:23:10] <Hackman238> rmatte: Less than 4,000 I'd say
[23-Jun-2011 12:23:21] <rmatte> maybe there's a hung mysql process?
[23-Jun-2011 12:23:24] <Hackman238> rmatte: 4,000 total
[23-Jun-2011 12:23:35] <Hackman238> no such luck
[23-Jun-2011 12:23:55] <rmatte> one of our MySQL databases takes 2 hours to start up if I don't disable the scanning on it
[23-Jun-2011 12:24:05] <rmatte> (not zenoss)
[23-Jun-2011 12:24:07] <Hackman238> rmatte: Damnit....skip locking did it
[23-Jun-2011 12:24:15] <rmatte> ah
[23-Jun-2011 12:24:40] <Hackman238> rmatte: LOL drves me crazy. How does one disable scanning?
[23-Jun-2011 12:25:29] <rmatte> let me see if I can remember
[23-Jun-2011 12:26:30] <rmatte> I disabled it ages ago, just need to dig through the files and see what I did lol
[23-Jun-2011 12:26:51] <Hackman238> rmatte: LOL. I appreciate it
[23-Jun-2011 12:28:42] <zenphil> I'm leaving for another meeting but it was good talking to you about your experiences. I'm entering 4 tickets based on today's chat so hopefully your issues can get addressed. Goodbye!
[23-Jun-2011 12:28:57] <Hackman238> zenphil: Thanks! TTYL
[23-Jun-2011 12:29:13] <nyeates> Thx Phil!

Comments (0)