If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Nvidia plays the meltdown blame game
Comment Story doesn't mesh with reality By Charlie Demerjian: Monday, 07 July 2008, 4:32 PM NVIDIA'S STOCK TOOK a long overdue beating the other day, more because Wall Street is collectively horrified that it has been lied to than any fundamentals that are public. That said, the 8K keeps up the firm's tradition of honesty and integrity. The root of the problem is, so far, HP notebooks, but likely others. You can see the HP page here, and at least one lawsuit about the same thing here. No mention of this in the Nvidia statement though. Why would they? If you look at what Nvidia says, it isn't their fault, it is those damn suppliers. The official line is: "While we have not been able to determine a root cause for these failures, testing suggests a weak material set of die/ package combination, system thermal management designs, and customer use patterns are contributing factors". Parsing that, you see that they are blaming fabs and packaging suppliers first, OEMs second, and those damn users third, but they have no fault here, NV can do no wrong. This is really dangerous for three reasons: they are annoying suppliers, annoying OEMs and annoying users. Last we checked, they need all three to remain in business. The weak die/packaging excuse doesn't wash at all. Nvidia is blaming TSMC behind the scenes, trashing them pretty hard through 'unofficial' channels to deflect blame. They are likely to be doing the same to packaging suppliers as well, and others. The reason this doesn't wash is that there are only a handful of suppliers in each of these fields. If they had a problem with Nvidia, there would be problems with other companies. ATI, Altera and dozens of others, would have chips crapping out left and right, especially designs where they are meant to run 24/7 like embedded parts. You would see an industry rife with failures and warning like the bad caps problem of a few years ago. You simply aren't seeing that. Period. No warning from others, no recalls, no TSMC warnings, no nothing. This is a sham to deflect blame from Nvidia, they don't want to dent their shiny image, much less slow down the 'can of whoop-ass' opening. I am calling bull**** on the supplier-blaming problem. Suppliers are a problem for Nvidia though, at least they are now. Trashing your suppliers like this is a dangerous thing to do, Nvidia needs them more than they need Nvidia. Can you imagine the scene at the next TSMC planning meeting where they are discussing who gets what allocation on the next tight process, and how much they pay? TSMC Planner 1: How many wafers do we allocate for Nvidia a month? TSMC Planner 2: The 40nm process is looking tight at first, do you agree? TSMC Planner 1: Yeah, really tight. TSMC Planner 2: Remember that time when NV was calling us [male rooster euphemism][oral suction euphemism]s to anyone who would listen? Wasn't that a fun time. TSMC Planner 1: So 4 then? TSMC Planner 2: 4K? That seems high. TSMC Planner 1: No, 4. Blaming your suppliers publicly is bad. When it isn't their fault, it is worse. Doing so in the sleazy backhanded ways that Nvidia knows so well is tantamount to corporate suicide. Suppliers will find a way to make you pay, and they will get the knife in somehow. Nvidia being bossy and arrogant only makes the situation more enjoyable for them. Look for this PR blunder to have massive long-term effects that manifest themselves in dropped margins, critical parts shortages, and missed deadlines. Bad move #1. Bad move #2 is blaming the OEMs, this is done with the subtle phrase "system thermal management designs" in the 8K. This is engineering code for, "we didn't do anything wrong, those nitwits at HP did". It works like this, Nvidia makes a part and it has a variety of constraints it is meant to be used within. Things like power draw, minimum and maximum temperature, and other things. NV specs these things, and HP makes a notebook to the specs that NV gives them, a process that happens long before the chips come out of the fabs in any decent volume. If the chips are within the promised specs, thing go well. If they are not, there are some tweaks you can pull, but if they are too far out of spec, you are basically screwed. Now this assumes both sides are honest, and people are trying to solve problems, not deflect blame. Nvidia is really good at the latter, bad at the former. They also can't make a chip that isn't a blast furnace. Most of their recent woes, including the massively delayed current round of MCPs, is down to out of control thermals, just like the last round. How do you fix a systemic design problem in silicon on a time scale that doesn't sink an entire season's notebook sales? Easy, you fudge the spec sheet. If you have a TDP of 20W for a part, and it is coming in at 25W from the fab, you can lower the speed or change what TDP means. If you promised HP a chipset that has an 800FSB and it can only hit 667, well, that is problematic. If you give them a chipset with a 20W TDP, and the definition of TDP changed between the last generation and this one, well, "that is how we do it now". If it is HP incompetence as Nvidia is stating, then it would simply be a case of a line or two of notebooks that went bad. HP system engineering is one of the very best in the industry, period, subject to management whims. This is not to say they can't screw up, they most definitely can, but it is pretty rare on anything major. HP does seem to have QC process engineering down well. Does this mean they are perfect? No, not even close. Have they screwed up on a notebook? Sure, probably several here and there over the past few years. If you look at the HP page, once again here, you will see there are 24 models affected. I can believe there are one, two, maybe four screwups, but 24 model lines all with the same problem? All with cooling related failures? All with cooling related video failures? All with cooling related video failures on Nvidia parts? What NV is doing is smearing the good name of HP and it's engineers here. There is no way in hell that HP totally botched every Nvidia based notebook for a generation in the same way. Not a chance. This is once again a smear job, and it will once again come back to bite Nvidia in the bottom line, give it time. Companies like this have long memories. The only thing you can say from this is that it is not HP's fault. Well, actually, you can say more. If HP specced cooling for a theoretical 20W, and the Nvidia chip puts out more than 20W, what happens is you get more heat in the system than you can get rid of, and temperatures slowly climb. It will either keep climbing, or level off, but likely it is out of the thermal bounds set by Nvidia. The system will get really hot or simply crash. The problem? This puts them out of the thermal tolerances for the packaging. That is OK for short periods, but repeatedly staying above the limits causes the packaging material to degrade prematurely. Worse yet, repeated heating and cooling caused by the laptops heating up and then crashing, then being left off for a bit to cool and 'work again', is horrible for the packaging. This is how solder joints and bumps crack, and substrate warps. Coupled with weakened materials from overheating, and you have dead GPUs. This is hugely unlikely to be a HP problem, or a substrate problem. It is most likely a bad engineering design decision that Nvidia tried to sweep under the rug. Sometimes it works, other times it doesn't. This time is an 'other', and companies like TSMC and HP don't like being publicly crucified for Nvidia's screwups. They really don't like it. The third bad move is 'customer use patterns': so, it isn't our fault, it is those crazy kids! A Scooby Doo villain couldn't have said it better after a failed whoop-ass attempt. From the look of things, the customers are doing things like turning on and off laptops, something likely unanticipated by Nvidia product planners. I mean who does that? Blaming customers would be bad move number three, but I doubt most of them will realise it is Nvidia's fault, they will blame HP or the host of other OEMs that haven't been named yet. Either way, if you take bad move #2 into account, if I were an OEM, I would tell everyone calling in for warranty support unequivocally that it is Nvidia's fault for supplying bum chips. In this case, it wouldn't be deflecting blame. In any case, the 'crazy kids' blame game is pointless and will only hurt Nvidia if people hear it. They likely won't, but there is no upside unless they think analysts are several steps dumber than a slow sheep. In the end, the whole thing can be summed up by bad engineering, covering your ass, and hoping it blows over. Nvidia corporate messaging is pretty much incompetent, more driven by the fact that they are pawns of people higher up the food chain than anything else, and they only have one tool, a hammer. When something goes wrong, they don't know how to solve problems, only hit things. This situation was dealt with by surprising Wall Street with a collective kick in the hedge funds. There was no explanation, no softening of the blow, and no word to the press, just a 'Surprise, we are tanking' governmental form, followed by stonewalling and finger pointing at blameless people. Botched doesn't begin to describe this response, but it is a good start. They utterly flunked Crisis Management 101. Given the last sentence of the 8K, " There can be no assurance that we will not discover defects in other MCP or GPU products," this is far from over. In fact, we know it is; there are many more lines and products affected. Now that you know about how the Nvidia parts failed leading to the massive loss, plummeting stock, and management fast-talking, what everyone wants to figure out is where the buck stops. That is not a simple question, but several industry insiders have told us the same story, it all depends on who got burned, and how big they are. The one we know about is HP, here and here, but it is far from over. Nvidia is chiming in now because it is very likely they are footing the bill for the class action settlement, or at least a very large chunk of it. When they gave the prescient advice that, "There can be no assurance that we will not discover defects in other MCP or GPU products", they aren't joking, this problem hasn't cropped up in desktop parts yet, but it most assuredly will. We are getting reports of other afflicted items, but it is premature to name them. So, basically, Nvidia totally screwed up, and is blaming everyone but the one company they should, itself. The OEMs know it, consumers know it, suppliers know it, and since the "OMFG, our hair is on fire" performance of last week, just about the entire world knows about it. Everyone who has one of these parts will be seeking restitution, just watch the bills mount now that word has spread. But that brings up the costs and payments. Nvidia took a $150-200 million hit initially over this, but what does that cover? Looking at Dell's web site, going from an integrated GPU to an external Nvidia GPU is either a $50 or $130 upgrade, maybe more on a low volume gaming part. That is what Dell sells the module for, plus profit and overhead. The chips that Nvidia sells, minus GDDR memory, construction etc, are probably in the $10-40 range. If you look at that, there are three million or so parts affected, and can likely be fixed by swapping out an PCIe card. With chipsets, well, things get interesting , they are soldered to the mobo, as are many CPUs, especially in thinner notebooks. In this case, the replacement means a new mobo minimum, possibly a CPU thrown in for good measure. Then there is the cost of fielding the support call, not a trivial matter for a dead notebook. Shipping the part back to the depot, labour to replace the mobo, and shipping it back as well. Added staffing to handle the returns of large portions of 24 notebook lines adds to the bottom line as well. That leads to intangibles like customer ill will, lost productivity, and the odd executive who gets a bum laptop for their kids. You can't put a dollar value on these, but they do have an effect, much of Dell's current woes are due to treating customers like dirt three-five years ago. So, once again, who pays for all of these costs? That is an unequivocal "it depends". Depends on how contracts are written, how much leverage the OEMs have, and how much good will Nvidia has built up. On one side, you have Dell, one-time masters of the supply chain, and squeezers of every penny they can get. Industry insiders tell us that Dell will be billing Nvidia for everything, from bad GPUs, mobos, replacement costs, help desk, lawyers, and every truck roll needed to fix something in the field. If Nvidia wriggles out of paying for something, they will pay for it in other ways. HP is a little more flexible, but since Nvidia has been effectively blaming their engineering for it, I can see how they would lean a bit more toward the " right royal *******" side of things. They are close to Dell in what they will charge, but may let some minor things slide. As you move down the food chain to smaller people mobo makers, Tier 2 computer makers, and even little shops, NV will disclaim more and more. Asus and Gigabyte will likely not get everything covered, not even close. Smaller board makers might get credit for the cost of MCPs and GPUs. Unhappiness will abound. They will all get their pound of flesh, it may just take a bit of time. Lawsuits seem to have forced disclosure, and NV is still trying to spin, minimize the downside, and point fingers. This, however, is far from over. Look for desktops to be affected as well as discrete GPUs before this is over, most of them use the same ICs as the mobile parts. There seem to be two currently-affected products, the low-end and the mid-range parts of the last generation. Depending on the failure rate, Nvidia could be looking to eat the majority of a generation's products plus the cost of things they were soldered to, and the tech school dropout used to screw new parts in. This will be very ugly before it is done, very very ugly. Finger pointing early on and the blame game will only harden resolve on the other side, and add to costs. There go their cash reserves, we guess. It couldn't come at a worse time. Then again, doing everything wrong does have a cost. µ http://www.theinquirer.net/gb/inquir...own-blame-game |
#2
|
|||
|
|||
Nvidia plays the meltdown blame game
Why is it that hammerheads like you will post 100+ lines of unoriginal copy
and paste (mostly bull**** to boot) instead of providing a link for a pointless post . A link with identical crap. From a 100% reliable source like theinquirer.net. You're killfile material. |
#3
|
|||
|
|||
Nvidia plays the meltdown blame game
"Augustus" wrote:
From a 100% reliable source like theinquirer.net. True enough, but The Inq is sticking by their reading of it: "All Nvidia G84 and G86s are bad" http://www.theinquirer.net/gb/inquir...vidia-g84-g86- bad I have no opinion on the matter. -- Regards, Bob Niland http://www.access-one.com/rjn email4rjn AT yahoo DOT com NOT speaking for any employer, client or Internet Service Provider. |
#4
|
|||
|
|||
Nvidia plays the meltdown blame game
"rjn" wrote in message ... "Augustus" wrote: From a 100% reliable source like theinquirer.net. True enough, but The Inq is sticking by their reading of it: "All Nvidia G84 and G86s are bad" http://www.theinquirer.net/gb/inquir...vidia-g84-g86- bad I have no opinion on the matter. A better headline would be "All G84 and G86's are lousy 3D performers. Do you seriously believe that every single 8300GS, 8400GS. 8500GT, 8600GS, 8600GT and 8600GTS of the how many millions ever made are faulty? Unlikely.... |
#5
|
|||
|
|||
Nvidia plays the meltdown blame game
Augustus wrote:
Why is it that hammerheads like you will post 100+ lines of unoriginal copy and paste (mostly bull**** to boot) instead of providing a link for a pointless post . A link with identical crap. From a 100% reliable source like theinquirer.net. You're killfile material. A reasonable thing to do (KF the idiot), but be aware he nym-shifts constantly to avoid filters. |
#6
|
|||
|
|||
Nvidia plays the meltdown blame game
"Augustus" wrote:
Do you seriously believe that every single 8300GS, 8400GS. 8500GT, 8600GS, 8600GT and 8600GTS of the how many millions ever made are faulty? Unlikely.... Even the Inq, which is not letting go of this story, is not claiming "every single", but "... graphics chips fail at alarming rates ..." from recent story: "HP pays half for Nvidia's graphic problems" http://www.theinquirer.net/gb/inquir.../hp-pays-half- nvidia-problems or http://snipurl.com/39gfy [www_theinquirer_net] Independently, The Inq is also reporting that the NV 790i has a problem: "Nvidia 790i board pulled by makers" http://www.theinquirer.net/gb/inquir...1/nvidia-790i- board-pulled-makers or http://snipurl.com/39gh7 [www_theinquirer_net] Inq reporting can be iffy. Here's a more respected source: "Nvidia reports problem with laptop chips" http://snipurl.com/39gn9 [www_computerworld_com] "Nvidia Corp. has uncovered a problem with some older graphics chips that shipped in "significant quantities" of laptop PCs ... ... Nvidia will take a charge against second-quarter earnings of $150 million to $200 million to cover the expected cost of repairing and replacing the products ... ... The products have been failing in the field at "higher than normal rates," Nvidia said. ..." I'm only following this because I'm getting ready to build a new PC, and obviously want to avoid any known problems. My current PC has an older NV chipset, and frankly it was never very reliable, although the blue screens may not necessarily be from the chipset. -- Regards, Bob Niland http://www.access-one.com/rjn email4rjn AT yahoo DOT com NOT speaking for any employer, client or Internet Service Provider. |
#7
|
|||
|
|||
Nvidia plays the meltdown blame game
rjn wrote:
I'm only following this because I'm getting ready to build a new PC, and obviously want to avoid any known problems. My current PC has an older NV chipset, and frankly it was never very reliable, although the blue screens may not necessarily be from the chipset. The problem is the heating-cooling cycle, on-off hot-cold expand-contract. It's always been a problem to some degree but correct engineering and manufacture minimized the impact. Mobile units are under frequent hot-cold cycles and generally have poorer cooling solutions so they run hotter so the change in temps is greater during their cycle. Like lab rats injected with poison to see how many get cancer to extrapolate the data to humans, you need to see how many notebook chipsets have failed over how long (and what usage pattern) to see how many desktop chips will fail. An ugly analogy, but appropriate. Data is still being suppressed so no proper predictions regarding desktop components can be made, however the suppression of that data is enough to red flag those chips *as if they have failed already*. If it is not known good, it is not reliable. Besides, the P and X intel series have previously demonstrated superior performance over the nforce 5 6 and 7 series for mobos and high-end GeForce cards do not seem to be affected. Strongly consider an Intel system board for your next build, as well as an ATI video card, that combination is currently very strong. But, you could still consider a desktop 7 series nvidia board and GeForce card, as long as you understand the potential risks can minimize them individually and have a lengthy warranty. I would not even consider a mobile with any nvidia chip, it's too high a risk for premature failure. |
#8
|
|||
|
|||
Nvidia plays the meltdown blame game
"Augustus" wrote in message
news:sR8dk.2961$1o6.2773@edtnps83 "rjn" wrote in message ... "Augustus" wrote: From a 100% reliable source like theinquirer.net. True enough, but The Inq is sticking by their reading of it: "All Nvidia G84 and G86s are bad" http://www.theinquirer.net/gb/inquir...vidia-g84-g86- bad I have no opinion on the matter. A better headline would be "All G84 and G86's are lousy 3D performers. Do you seriously believe that every single 8300GS, 8400GS. 8500GT, 8600GS, 8600GT and 8600GTS of the how many millions ever made are faulty? Unlikely.... If the numbers were minimal you'd expect Nvidia to defuse the situation by providing the data. Instead it has lied at every stage. First it said there was no problem. Then it said it was only one bad batch that went to HP. Then HP and Dell said there is a problem (HP Nth America has extended the warranty on its affected models by 24 months and Dell is under pressure to follow suit), yet Nvidia won't even publicly identify the problematic GPUs let alone talk numbers. The collective corporate refusals to reduce customer angst are indirect evidence the problem is much bigger than anyone wants to acknowledge. |
#9
|
|||
|
|||
Nvidia plays the meltdown blame game
"Mr.E Solved!" wrote:
Mobile units are under frequent hot-cold cycles and generally have poorer cooling solutions so they run hotter so the change in temps is greater during their cycle. Fudzilla has now piled on, with the same perspective: "Nvidia having issues with desktop GPUs, as well" http://www.fudzilla.com/index.php? option=com_content&task=view&id=8730 "What it boils down to is the solder material between the chip and the packaging being sub-standard. Issues only occur if the GPU is heated up and cooled down repeatedly, much as it would be in a laptop ..." Let me guess that this is not lead (Pb)-based solder. Assuming the purported failure reports, and presumed root cause, are true, is this one of the unintended (but entirely predicted) side effect of RoHS ? -- Regards, Bob Niland http://www.access-one.com/rjn email4rjn AT yahoo DOT com NOT speaking for any employer, client or Internet Service Provider. |
#10
|
|||
|
|||
Nvidia plays the meltdown blame game
rjn wrote:
"Mr.E Solved!" wrote: Mobile units are under frequent hot-cold cycles and generally have poorer cooling solutions so they run hotter so the change in temps is greater during their cycle. Fudzilla has now piled on, with the same perspective: "Nvidia having issues with desktop GPUs, as well" http://www.fudzilla.com/index.php? option=com_content&task=view&id=8730 "What it boils down to is the solder material between the chip and the packaging being sub-standard. Issues only occur if the GPU is heated up and cooled down repeatedly, much as it would be in a laptop ..." Let me guess that this is not lead (Pb)-based solder. Assuming the purported failure reports, and presumed root cause, are true, is this one of the unintended (but entirely predicted) side effect of RoHS ? -- Regards, Bob Niland http://www.access-one.com/rjn email4rjn AT yahoo DOT com NOT speaking for any employer, client or Internet Service Provider. So far, every RoHS disadvantage has been demonstrated: cracks, brittleness, susceptibility to thermal cycling. All we need is some Tin Whiskers and the worst case scenario has come to pass. Don't blame RoHS however, as other industries besides the Computer Manufacturing Industry has used RoHS to advantage, such as the automotive industry. I'm guessing TMSC or whatever foundry made these chips simply had bad materials and/or technique from lack of experience, assuming no malice was involved, who can say at this point. |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Nvidia plays the meltdown blame game | NV55 | Nvidia Videocards | 19 | August 18th 08 05:14 PM |
nVidia Game Profiles | RHinNC | Nvidia Videocards | 4 | October 27th 06 06:00 PM |
Please help with strange meltdown | Jake | General | 3 | January 28th 05 12:33 PM |
epson c82 meltdown | Jaron | Printers | 1 | May 2nd 04 02:02 PM |
Intel plays change the socket game again | steve | Asus Motherboards | 0 | July 11th 03 04:13 PM |