About that issue that Erik Brooks found
March 5 2010
A few days ago, Erik Brooks wrote a blog entry entitled, "8.5.1 FAIL. Your code may just break." Unsurprisingly, that blog post got a fair bit of attention, from many of you as customers/partners as well as within IBM. I received a number of emails and pings from people who were worried about the issue, even though some said they couldn't reproduce it.
At any rate, a few of you opened PMRs and SPRs, and I quickly escalated them from my side as well. I didn't "turf mark" the conversation on Erik's blog, but our best people were in there already and I knew the issue was getting needed focus.
This morning, Chad Scott from our support organization posted an update in comments on Erik's site...if you aren't monitoring his comments, you might not have seen it. So, here it is as well:
IBM understands the implications of this issue and has put a concerted effort into identifying a resolution. The development team has investigated the GetDocumentByKey and GetAllDocumentsByKey issue that was first introduced by the fix delivered for SPR AJMO7LHMK9. A plan has been defined that will change the way we fixed the issue in SPR AJMO7LHMK9 to avoid the problem. This will allow code to execute without error or going into an infinite loop. The fix will not require a developer to edit their LotusScript or Java code nor will it require the use of a Notes.ini parameter. Customers seeking the fix to this issue should refer to SPR CSCT836HFL (see my post above for key SPRs). We plan to have this fix delivered by March 12th. Note: The current issue does not affect XPage server-side JavaScript.There are some good reasons why the feature was coded in 8.5.1 the way it was, but in the real world, things sometimes play out differently. That's why we have a support organization, and some really great people in the labs. They've worked hard for the last few days to find the right solution here. But I want to thank all of you for helping the process along with your reports, ideas, and tests. Fix is coming soon.
Post a Comment
- 2
John James http://www.wildunknown.com | 3/5/2010 12:19:13 PM
Hi Ed,
Thanks for the follow-up and the nudge from the 'inside'.
- 3
Denny Russell http://www.sherpasoftware.com | 3/5/2010 12:38:59 PM
Ed,
I appreciate this more than you know. I guess next time I'll start to blog issues like this. We have been going round and round for months on this issue with Support. In fact, the original error for this was
'% a's certification error'.
The fix that IBM produced simply renamed this error to 'The collection has become invalid' and told us we should try other ways around using this code.
Either way, you will hear our developers celebrating when I pass this along.
- 4
John Vaughan http://jonvon.net | 3/5/2010 1:01:52 PM
This fix is very much appreciated and will be just in time for us. Thanks to everyone involved!
- 5
Rita-Lyn Sanders http://systeminetwork.com | 3/5/2010 2:36:22 PM
Ed, always mollifying the masses!
- 6
Paul Withers http://hermes.intec.co.uk/intec/blog.nsf | 3/5/2010 3:30:26 PM
Will the fix be available for the 7.0.x codebase? I have quite a large customer encountering the %a's certification log error periodically on 7.0.4 clients. Although there are plans to migrate to 8.5.x in the future, they are not ready to do so yet and a fix to the 7.0.x codebase would be welcome.
- 7
Erik Brooks | 3/5/2010 6:09:45 PM
Thanks for the help, Ed. When the support rep I was dealing with over the phone kept repeating that it wasn't a bug and that "this was how it was supposed to work all along" I proceeded to explain the obviously huge impact this would have, and the huge impact my subsequent blog entry would have.
He wasn't too keen on bumping this up the chain, just ready to help me write an enhancement request. Which I knew wouldn't get this thing addressed in any sort of a reasonable timeframe, if at all.
When the door was politely (but decidedly) slammed in my face I thought about emailing you or Scott Morris directly at first (I don't want to contribute negative press). But I figured this was such a big deal that others needed to know ASAP. Now that I've got Pete's email address I'll bug him in the future. ;-)
You may want to have somebody look into this procedure-wise a bit though, because apparently I wasn't the only one hitting a brick wall. From John L James' comment on my blog:
"Funny how I felt like I was fighting a battle when I opened a support ticket with IBM regarding this. Had to keep providing existing SPR numbers and the like. I only got anywhere once I provided the link to this post, almost as if I had to point out that I was not the only one experiencing this."
Overall though, this is still a really scary thing to have had happen on the backwards-compatibility front. I mean, why was there zero mention of this in the Readme documentation? I don't mean to be insulting, but it's as if nobody involved in this change took a second to stop and think about it before checking it in.
In any case, thanks for jumping on this. And most importantly thanks for being so keyed-in with this community. Things were sounding very "Big Blue Megacorp" for a minute there but you got on it *fast*.
- 8
Nathan T. Freeman http://nathan.lotus911.com | 3/5/2010 9:17:45 PM
@7 - Why be abashed about it? The bug reporting process for the entire IBM product line is stuck in the 90s. It's in dire need of an overhaul. IBM support needs to wake up and smell the internet.
Support is still a cathedral in a world of bazaars. PMRs should be crowd-sourced. It's almost (but not quite) as important as having a Lotus App Store. This latest experience is simply one more example of how the 21st-century Lotus is having to drag IBM's process along by the nose, one executive order at a time.
Ed got them to change the Passport download file names. I think we have a new mission assignment. ;-)
- 9
Tinus Riyanto | 3/6/2010 12:03:17 AM
@7 That reminds me of the time I open a PMR to report the bug on notesdocument.CopyAllItem and CopyItem method that would break the link of a second (and so forth) document link or attachment inside a rich text field. The official response is that they are aware of the flaw in code but choose not to fix it because it is deemed not important. Something to do about the low number of customers reporting this issue.
I understand that this is a rarely used method but the fact that they know it exist and choose not to repair it freaks me out. Furthermore there is no workaround for this error so we have to tell the user to only put one attachment or document link per field.
Not sure if this is fixed yet, a guy in the notes.net forum mentioned that this is "unintentionaly" fixed on 8 (8.0.1 I think) when IBM recode how they work with rich text field. I guess I will find out once we upgrade.
- 10
GarryL | 3/6/2010 8:12:56 AM
@7 "Overall though, this is still a really scary thing to have had happen on the backwards-compatibility front..."
That is the most worrying thing about all of this. Backwards compatibility has been a major plus of Notes and to just implement this 'fix' the way it was should not have happened.
Hopefully after all of the hoo-hah raised this time somebody, somewhere will think twice before doing something like this again.
- 12
Rob McDonagh http://www.CaptainOblivious.com | 3/6/2010 11:59:07 AM
@11 This happened to us recently, too, where we were reporting a bug and IBM couldn't find the problem until we pointed them to a blog posting about a similar issue. In this case, there was an open PMR and SPR for the same issue, but we only knew that because of the blog posting. More to the point, IBM only knew it because we told them about the blog posting.
We've been saying for years that when IBM says "too few reports" that's a fundamentally flawed answer when the people reporting these issues are getting blocked at the 1st or 2nd level and (more importantly) not being given the option to look at existing open issues so that they could jump up and down and say "Hey, that's my issue, right there!!!" - just like several people did on Erik's blog. I doubt it's in IBM's interest to have us all blog every PMR we open, but in self defense we might have to. Now, if only there was some social networking solution to help us share all this knowledge and let us find the Connections between our various problems.......
- 13
Erik Brooks | 3/6/2010 1:34:38 PM
@12 - I was just thinking that. I've been asking Bruce to add some SPR tracking to IdeaJam...
SPRJam?
How about a modification to the Designer help template (or an Eclipse plugin) that lists all known SPRs/limitations around a certain function?
If we knew that a particular class/function/etc. "doesn't work as expected when XXXXX happens, SPR# abcdef" that would be a huge help. Hmmm...
- 14
Nathan T. Freeman http://nathan.lotus911.com | 3/6/2010 2:04:11 PM
@12 - Amen
@11 - Of course it's a diagnosis & troubleshooting problem. That's the challenge of every bug search. Fixing bugs is easy. Finding them is hard. The reason to crowd-source their identification is to make that hard part easier. This is even more true of the kind of elusive, data-change sensitive bug that we're talking about in this case.
The idea of open bug reporting has been knocked around on the devWorks forums for years. Even Design Partners don't have access to PMRs, so we can't easily compare issues. Creating a social context to help identify problems that could be escalated through support to development would be of enormous value to IBM's customers, partners and bottom line.
The one time in the last 10 years that I've reported a bug through channels, I got a "no plans to fix at this time" reply. Should I blog about it so others can say "me too?"
Y'know what's really interesting? You already do it. Through the Eclipse Foundation. https://bugs.eclipse.org/bugs/ Imagine that.
- 15
Bill McCuistion http://www.edna1550.com | 3/6/2010 3:08:29 PM
It used to be, Before IBM, that Lotus Support would support the caller and actually work to resolve any issue.
There was no barrier to getting any level of support. Sometimes these required only minutes. Sometimes these issues took days, but always were somehow resolved.
Later, as in now-a-days, it seems that support is hard to get.
If IBM/Lotus provides a spec for a function, I would expect it to work. In the normal course of business, I would trust IBM/Lotus not to break this contract.
Who's to say that a function is not "widely-used"?
If IBM/Lotus provides a spec, it should support it.
This is one of the main selling-points of the IBM/Lotus Notes platform.
This type of skull-sugary is not in the best interest of anyone.
- 16
Carsten | 3/7/2010 5:23:21 AM
@11/12
I had the same experience with the following SPR BTRS7XEU4R, and when I opened my PMR I had to keep telling lotus support, that there were other people who also opened a PMR with the same problem as mine. I even gave them the existing PMR and SPR number from the other customer, but still they could not really help me.
After the problem was identified and a HotFix was created for another OS, then I got the hotfix for Windows very quickly, I really wanted this hotfix to be part of FP1, but that was turned down, I have asked for this HotFix to be part of FP2, but I got no answer.
It looks to be part of 8.5.2, but I will ask (again) for a new HotFix on top of FP2 when FP2 is released.
- 17
Frank Paolino http://blog.maysoft.org/ | 3/7/2010 11:06:21 AM
Ed, At the meet the developers session at Lotusphere, I asked if we could get support questions answered in 2 business days, like we get from MSDN. Like the partner forum, we get an answer from other developers, but the MS people always confirm that answer or provide more accurate answers.
I even said that those of us who develop for a living would pay for this service (we pay for it at MSDN).
Has this idea gotten any traction inside IBM?
- 18
Chad Scott | 3/7/2010 11:22:07 AM
@15
There was never an intention to change the way these functions work. The prior fix was meant to resolve a specific problem that can cause a hang condition. It did, unfortunately, have a side effect that introduced a new problem. The quick action by our Development team to fix the regression should be proof that yes, we do support these APIs.
- 19
Erik Brooks | 3/7/2010 6:31:51 PM
@18 - I believe @15 is referring to policy in-general, not this specific SPR.
There are many, many bugs that are simply "no plans to fix" because of "not enough customer weight." Things that flat-out don't work properly or have unintended side-consequences. I can name 7-8 programmatic functions off the top of my head that have been broken for years, all with "No plans to fix." It would be nice to see these caveats in the Designer Help but, alas, we get to stumble upon these on our own.
Every shop I know would agree that top-priority bugs are those that cause either of these two things:
- A crash, hang, or make the system irretrievably lose data
- A feature to be completely unusable
These types of bugs simply *must* be fixed. Unfortunately IBM doesn't always agree.
The one exception to this theme that I've encountered is with the Lotus Web Server team. They seem to be insulated enough and have the autonomy to make their own calls, and I've seen some miraculous work come out of them on countless occasions.
- 20
Nathan T. Freeman http://nathan.lotus911.com | 3/7/2010 9:05:54 PM
@19 - Customer demand is not a static value. The perception of it can be generated. Indeed, that is what your blog post did. It's essentially bug advertising. People read it and go "oh yeah! THAT'S what's wrong with my server, and I didn't even realize it."
There was a time when SCOS and DXL were labelled with "no plans to fix." Sometimes a plan is just a list of things that never happen. :-)
- 21
Gavin Bollard http://dominogavin.blogspot.com | 3/8/2010 9:42:51 PM
Thanks Ed; The push is very much appreciated.
I'm also extremely appreciative of the line "The fix will not require a developer to edit their LotusScript or Java code nor will it require the use of a Notes.ini parameter."
After the "resolution" I got on;
PMR 86153,999,616 "Attachment name missing into RTF field "
Which didn't actually give me the same functionality I had before the upgrade - and resulted in us having to make design changes to a whole lot of templates.
Glad to see that IBM is beginning to respond better.
I do think that Nathan has a point though, PMR reporting has got to be easier, more "2.0" and more "shared". It's not good that we get better responses when we blog about our issues. If IBM "owned" a social network on which PMRs were reported, they could moderate the content a little and promote positive views and discussion.
For example; don't underestimate the positive publicity that IdeaJam promotes in everyone who mistakenly believes it to be an IBM product.
- 22
Simon O’Doherty http://www.bleedyellow.com/blogs/Simon?lang=en_us | 3/9/2010 4:33:43 AM
I really wish I could respond in more detail to a lot of these but it outside my scope to detail internal processes. Some good ideas though.
One point I wish to make out that everyone should be aware of though!
Being fully satisfied with the support call is a priority for IBM Lotus Support. As such if at *any time* you feel you are not getting the support you expect you should ask to talk to a Manager at that point in time.
The Engineer will not get upset and will arrange the meeting. It is better that you ask to speak to Manager at the time the issue is occurring (rather then a blog, or a survey after the fact). That way we can do something about it.
- 23
tom oneil http://www.codepress.net | 3/9/2010 11:18:41 AM
@22 I understand what you're trying to say. Honestly, I really don't like working with companies where I have to say "Let me talk to your manager." It's a big turn-off if you know what I mean.
If that happened at store or restaurant, I would be much less likely to return. If I'm spending close to $1 million on a product, I expect better service than that.
- 24
Lance Haverly | 3/9/2010 3:55:26 PM
@3 I recently needed to upgrade my mail servers from Domino 7.03 HF769 to Domino 7.04 FP1 to prevent Blackberry 5.0.1 from crashing multiple times a day. Now after upgrading to 7.04 FP1 I receive dozens of error messages from Mail Attender Agent_ProcessMailUsers : Notes error: %a's Certification Log (4000)
Question for Ed Brill: Is there going to be a FP2 for Domino 7.04 users. I am trapped at this release for the foreseeable future. R8.5 will not be implemented for quite some time to come.
- 25
Simon O’Doherty http://www.bleedyellow.com/blogs/Simon/entry/getting_the_most_from_lotus_support7?lang=en_us | 3/10/2010 7:17:53 AM
@23, I understand what your saying and for many this appears to be a culture thing.
I have seen customer reports where the customer has been upset and when asked why they didn't ask for a manager at that point. The response has been "I didn't want to trouble anyone" or "I didn't want to get the engineer in trouble".
There are many reasons you may want to talk to a Manager, it isn't always because of the engineer dealing with you. For example depending on the support level and where in support the PMR is, that engineer may not have access to the full context of your impact. So a business impact is important when opening a pmr. Blogged on this a long time ago (see link).
You are paying for the Lotus Customer Support. It is not free. So if you do not feel you are getting the service you paid for then it is best to say so at that point. @7 comments is a good example where Erik should of asked to speak to a manager.
He could of sent a mail to Ed and it would of trickled down, but as soon as the manager got it the first thing they ask the engineer is "Did the customer ask to speak to me?".
Btw, EMEA support all engineers are required to have the managers name in the signature for this very reason. So, we don't take it personal. :)
- 26
Erik Brooks | 3/10/2010 9:18:11 PM
@25 - I've opened a bazillion PMRs, and have an extremely high PMR-becomes-SPR ratio. I.E. I try not to waste support's time with a usage question or a non-reproducable scenario. When I report something, the majority of the time it ends up being a flat-out bug.
This was the *third* severity-1 PMR I'd ever opened. That right there should have tipped off the rep that this was serious. I don't "cry wolf". Additional comments in the PMR such as "showstopper" and "slam the brakes on our 8.5.1 upgrade" should have also tipped off somebody to the fact that a simple "closed-no-plans-to-fix" would not suffice.
I'm familiar with support's signatures mentioning managers. But I (and I'm sure others) have always taken that to be an avenue for feedback about typical customer service traits: courtesy, professionalism, etc. I've sent feedback to managers many times (nearly all overwhelmingly positive, I'm thankful to say). If the signatures said "Feel free to contact my manager if you've found a bug and we're not doing anything about it and that's not acceptable" then perhaps I might have done so in this case. Hell, I *know* I would have.
Given all of this information combined with the explanation (not that it should have been needed) that virtually every customer upgrading to 8.5.1 would hit this issue I 100% expected the rep to say "wow, yeah, this really stinks. Let me talk to L3 about it and see what we can do. This is a big deal."
But since the rep said that I was "not the only customer running into this" and that this was "how it should have worked all along" it definitely sounded like somebody else had already reported this. There was even an SPR already written up as an enhancement request in an attempt to fix this. I figured it *had* already been run up-the-chain, and this rep wasn't forthcoming with any options.
In any case, I'll definitely live and learn and ask for a manager in the future.
- 27
Nathan T. Freeman http://nathan.lotus911.com | 3/11/2010 12:43:35 PM
@25 - Simon, I'll accept that you guys are happy to let callers speak to your manager, but if you think that's the proper way to handle support in the 21st century, you need to think again. It simply doesn't scale.
@26 - Your PMR/SPR ratio would be an excellent example of what a karma-based support system would reveal. Consider how effective it might be if a newcomer posted a possible bug, and Erik Brooks came along and confirmed it.
IBM tends to think along the lines of "deployment blocker" bugs, and in your case, that's easy to identify, because that means you simply won't run your SaaS app on the new version. But what does it mean for an existing customer who's looking to deploy new functionality on an existing infrastructure? Some problem might not keep them from deploying the version, but it might keep them from using Domino as the solution to some new problem. And frankly, I think that's worse. Having a customer not move from 8.0.2 to 8.5.1 until a fixpack comes out is much less impactful than that customer simply deciding not to use Domino as, say, their supplier collaboration platform because they hit a wall on their development efforts.
- 28
Erik Brooks | 3/11/2010 3:23:40 PM
@27 - Yup, a karma-based approach would rock. Plus the OSS community loves that kind of thing, which IBM seems to have been trying to align themselves with for the past few years.
If I downloaded the free Domino Designer I might be happy as a clam. I might go to OpenNTF and be happy as a clam. But then if I actually had to report a bug via the IBM Support Portal -> ESR -> PMR route I'd be way, WAY turned off.
I would think a karma/crowd-based bug tracking system would fall right in line with IBM's OSS-embracing strategy. And it'd be a win-win-win all around. I know I've got one outstanding PMR/SPR from forever ago that *almost* has enough weight to get fixed - the Optimize Document Table Map regression bug. If there was some public facility to report a bug and give it more visibility then I would be encouraged to contribute since any exposure I provide might cause somebody else to add their weight to my ticket.
As it stands now there's just the blogosphere. Hmm...
*5 minutes later*
I just registered "notesbugs.com" and the .org and .net variants. Maybe we can help move this process along.
- 30
Charles Robinson http://www.cubert.net | 3/12/2010 8:13:32 AM
I have no idea how many PMR's I opened on the 9 years I did Notes development. A lot of them were eventually resolved by working around the original issue and as far as I can remember those were all closed with "working as designed". I only ever had one get converted to an SPR, and that wasn't even my issue, it was one I opened for a friend.
For my own issues I eventually gave up contacting Lotus support at all and went through my network of contacts and the blogosphere instead. It was far more reliable and much less frustrating.
@8 - "Ed got them to change the Passport download file names."
Not all the file names were updated, just the ones from a certain point in time forward. The others are still the annoying old machine-generated names.
@28 - I've been thinking for a while about replicating the fixlist database and giving people the option to subscribe via RSS so they can know when an SPR has been assigned to a release. That's just an idea, though, I haven't actually done anything with it.
- 31
Maria Helm | 3/19/2010 9:24:22 AM
The currently available hotfixes only address the server-side lookups. Any lookups performed on the client side still potentially have this problem.
NEW SPR # MWID83NSL7 is for the client-side issues. Readers: Please dogpile into this SPR to get it weighted, as the current line from support is that no hotfix will be created, so you won't have a fix until maybe an 8.5.1FP3 "in a couple months".
I will cross-post elsewhere.
- 32
Charles Robinson http://www.cubert.net | 3/19/2010 12:15:59 PM
@31 - Thanks for the follow up. It's pretty obnoxious that there is no concept of "do the right thing" at play here. It's broken, it should be fixed. Why does it require an unknown number of problem reports to motivate someone to actually do it?
- 34
Charles Robinson http://www.cubert.net | 3/22/2010 7:20:53 AM
@33- I'm not being cynical. I'm being honest. It never occurred to me that a "critical client hotfix" would not be strenuously and rigorously tested. After all, you're releasing it because a critical bug made it into the wild. Fixing it is only risky if the fix isn't properly tested. So, putting all that together, it sounds like you need more people in testing. Who can we lobby to get that moving?
- 36
Charles Robinson http://www.cubert.net | 3/22/2010 7:58:48 AM
@35 - Reading back through I think Erik hit the nail on the head @19. I won't flay that particular dead horse. :-) In this particular case I'm looking forward to a CCF so we can put this behind us... hopefully. (Crossing my fingers and hoping there isn't a regression.)
- 37
Volker Weber http://vowe.net | 3/22/2010 4:38:40 PM
Charles, IBM has had a lot more people in testing, but they have been hardest by "ressource actions".


Ed,
Thanks for the great followup on this. I discovered this issue this morning and it's been the focus of my day so far. Opened a PRMr and started making plans.
Just to clarfiy - will w fix be offered for the client as well as the server (iSeries)? I've seen it in both places.