If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
http://www.beyond3d.com/content/reviews/51
NVIDIA GT200 GPU and Architecture Analysis Published on 16th Jun 2008, written by Rys for Consumer Graphics - Last updated: 15th Jun 2008 Introduction Sorry G80, your time is up. There's no arguing that NVIDIA's flagship D3D10 GPU has held a reign over 3D graphics that never truly saw it usurped, even by G92 and a dubiously named GeForce 9-series range. The high-end launch product based on G80, GeForce 8800 GTX, is still within spitting distance of anything that's come out since in terms of raw single-chip performance. It flaunts its 8 clusters, 384-bit memory bus and 24 ROPs in the face of G92, meaning that products like 9800 GTX have never really felt like true upgrades to owners of G80-based products. That I type this text on my own PC powered by a GeForce 8800 GTX, one that I bought -- which is largely unheard of in the world of tech journalism; as a herd, we never usually buy PC components -- with my own hard-earned, and on launch day no less, speaks wonders for the chip's longevity. I'll miss you old girl, your 20 month spell at the top of the pile is now honestly up. So what chip the usurper, and how far has it moved the game on? Rumours about GT200 have swirled for some time, and recently the rumour mill has mostly got it right. The basic architecture is pretty much a known quantity at this point, and it's a basic architecture that shares a lot of common ground with the one powering the chip we've just eulogised. Why mess too much with what's worked so well, surely? "Correctamundo", says the Fonz, and the Fonz is always right. It's all about the detail now, so we'll try and reveal as much as possible to see where the deviance can be found. We'll delve into the architecture first, before taking a look at the first two products it powers, looking back to previous NVIDIA D3D10 hardware as necessary to paint the picture. NVIDIA GT200 Overview The following diagram represents a high-level look at how GT200 is architected and what some of the functional units are capable of. It's a similar chip to G80, of that there's no doubt, but the silicon surgery undertaken by NVIDIA's architects to create it means we have quite a different beast when you take a look under the surface. http://www.beyond3d.com/images/revie...2-26-05-08.png If it's not clear from the above diagram, like G80, GT200 is a fully- unified, heavily-threaded, self load-balancing (full time, agnostic of API) shading architecture. It has decoupled and threaded data processing, allowing the hardware to fully realise the goal of hiding sampler latency by scheduling sampler threads independently of, and asynchronously with, shading threads. The design goals of the chip appear to be the improvement of D3D10 performance in general, especially at the Geometry Shader stage, with the end result presumably as close to doubling the performance of a similarly clocked G92 as possible. There's not 2x the raw performance available everywhere on the chip of course, but the increase in certain computation resources should see it achieve something like that in practice, depending on what's being rendered or computed. Let's look closer at the chip architecture, then. The analysis was written with our original look at G80 in mind. The architecture we discussed there is the basis for what we'll talk about today, so have a good read of that to refresh your memory, and/or ask in the forums if anything doesn't make sense. The original piece is a little outdated in places, as we've discovered more about the chip as time goes by over the last year and a half, so just ask about or let us know about something that doesn't quite fit. GT200: The Shading Core http://www.beyond3d.com/images/revie...hader-core.png GT200 demonstrates subtle yet distinct architectural differences when compared to G80, the chip that pioneered the basic traits of this generation of GPUs from Kirk and Co. As we've alluded to, G80 led a family of chips that have underpinned the company's dominance over AMD in the graphics space since its launch, so it's no surprise to see NVIDIA stick to the same themes of execution, use of on-chip memories, and approach to acceleration of graphics and non-graphics computation. At its core, GT200 is a MIMD array of SIMD processors, partitioned into what we call clusters, with each cluster a 3-way collection of shader processors which we call an SM. Each SM, or streaming multiprocessor, comprises 8 scalar ALUs, with each capable of FP32 and 32-bit integer computation (the only exception being multiplication, which is INT24 and therefore still takes 4 cycles for INT32), a single 64-bit ALU for brand new FP64 support, and a discrete pool of shared memory 16KiB in size. The FP64 ALU is notable not just in its inclusion, NVIDIA supporting 64-bit computation for the first time in one of its graphics processors, but in its ability. It's capable of a double precision MAD (or MUL or ADD) per clock, supports 32-bit integer computation, and somewhat surprisingly, signalling of a denorm at full speed with no cycle penalty, something you won't see in any other DP processor readily available (such as any x86 or Cell). The ALU uses the MAD to accelerate software support for specials and divides, where possible. Those ALUs are paired with another per-SM block of computation units, just like G80, which provide scalar interpolation of attributes for shading and a single FP-only MUL ALU. That lets each SM potentially dual-issue 8 MAD+MUL instruction pairs per clock for general shading, with the MUL also assisting in attribute setup when required. However, as you'll see, that dual-issue performance depends heavily on input operand bandwidth. Each warp of threads still runs for four clocks per SM, with up to 1024 threads managed per SM by the scheduler (which has knock-on effects for the programmer when thinking about thread blocks per cluster). The hardware still scales back threads in flight if there's register pressure of course, but that's going to happen less now the RF has doubled in size per SM (and it might happen more gracefully now to boot). So, along with that pool of shared memory is connection to a per-SM register file comprising 16384 32-bit registers, double that available for each SM in G80. Each SP in each SM runs the same instruction per clock as the others, but each SM in a cluster can run its own instruction. Therefore in any given cycle, SMs in a cluster are potentially executing a different instruction in a shader program in SIMD fashion. That goes for the FP64 ALU per SM too, which could execute at the same time as the FP32 units, but it shares datapaths to the RF, shared memory pools, and scheduling hardware with them so the two can't go full-on at the same time (presumably it takes the place of the MUL/SFU, but perhaps it's more flexible than that). Either way, it's not currently exposed outside of CUDA or used to boost FP32 performance. That covers basic execution across a cluster using its own memory pools. Across the shader core, each SM in each cluster is able to run a different instruction for a shader program, giving each SM its own program counter, scheduling resources, and discrete register file block. A processing thread started on one cluster can never execute on any other, although another thread can take its place every cycle. The SM schedulers implement execution scoreboarding and are fed from the global scheduler and per thread-type setup engines, one for VS, one for GS and one for PS threads. |
#2
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
GT200: Sampling and the ROP http://www.beyond3d.com/images/revie...0-arch/tpc.png For data fetch and filtering, each cluster is connected to its own discrete sampler unit (with cluster + samplers called the texture processing cluster or TPC by NVIDIA), with each one able to calculate 8 sample addresses and bilinearly filter 8 samples per clock. That's unchanged compared to G92, but it's worth pointing out that prior hardware could never reach the bilinear peak outside of (strangely enough) scalar FP32 textures. It's now obtainable (or at least much closer) thanks to, according to NVIDIA, tweaks to the thread scheduler and sampler I/O. We still heavily suspect though that one of the key reasons is additional shared INT16 hardware for what we imagine actually is a shared addressing/filtering unit. Either way, each sampler has a dedicated L1 cache which is likely 16KiB and all sampler units share a global L2 cache that we believe is double the size of that in G80 at 256KiB. The sampler hardware runs at the chip base clock, whereas the shading units run at the chip hot clock, which is most easily thought of as being 2x the scheduler clock. Along with the memory clock, those mentioned clocks comprise the main domains in GT200, just like they did in G80. The hardware is advertised as supporting D3D10.0, since its architecture is marginally incapable of supporting 10.1, by virtue of the ROP hardware. D3D10 compliance means the ability in hardware for recycling data from GS stage of the computation model back through the chip for another pass. The output buffer for that is six times larger in GT200 than in G80, although NVIDIA don't disclose the exact size. Given that the GS stage is capable of data amplification (and de- amplification of course), the increased buffer size represents a significant change in what the architecture is capable of in a performance sense, if not a theoretical sense. The same per-thread output limits are present, but now more GS threads can now be run at the same time. That covers the changes to on-chip memories that each cluster has access to. Quickly returning to the front of the chip, It appears that the hardware can still only setup a single triangle per clock, and the rasteriser is largely unchanged. Remember that in G80, the rasteriser worked on 32 pixel blocks, correlating to the pixel batch size. GT200 continues to work on the same size pixel blocks as it sends the screen down through the clusters as screen tiles for shading. http://www.beyond3d.com/images/revie...0-quad-rop.png At the back of the chip, after computation via each TPC, the same basic ROP architecture as G80 is present. With the external memory bus 512 bits wide this time and each 64-bit memory channel serving a ROP partition, that means 8 ROP partitions, each partition housing a quartet of ROP units. 32 in total then. Each ROP is now capable of a full-speed INT8 or FP16 channel blend per cycle, whereas G80 needed two cycles to complete the same operations. This guarantees that blending isn't ROP limited, which could already be the case on G80 and would have become even more of a problem with a higher memory/core clock ratio. It might also initially seem odd that FP16 is also supported at full-speed despite being certainly bandwidth limited, but remember that full-speed FP16 also means that 32-bit floating point pixels made up of three FP10 channels for colour and 2 bits for alpha also go faster for free and that's not easy to do otherwise. The ROP partitions talk to GDDR3 memory only in GT200. We mention that in passing since it affects how the architecture works due to burst length, where you need to be sure to match what the DRAM wants every time you feed it or ask for data in any given clock cycle, especially when sampling. GDDR4 support seems non-existant, and we're certain there's no GDDR5 support in the physical interface (PHY) either. The number of ROP partitions means that with suitably fast memory, GT200 easily joins that exclusive club of microprocessors with more than 100GB/sec to their external DRAM devices. No other class of processor in consumer computing enjoys that at the time of writing. The ROP also improves on peak compression performance compared to both G80 and G92, allowing it to do more with the available memory bandwidth, not that 512-bit and fast graphics DRAMs mean there's a lack of the stuff available to GT200-based SKUs, more on which later. That's largely it in terms of the chip's new or changed architectural traits in a basic sense. The questions posed now mostly become ones of scheduling changes, and how memory access differs when compared to prior implementations of the same basic architecture in the G8x and G9x family of GPUs. GT200: General Architecture Notes We mentioned that the big questions posed now mostly become ones of scheduling changes, and how memory access differs when compared to prior implementations of the same basic architecture in the G8x and G9x family of GPUs. Where it concerns the former question, it becomes prudent to wonder whether the 'missing' MUL is finally available for general shading (along with the revelation about its inclusion in G8x and G9x, which we might one day share). We've been able to verify freer issue of the instruction in general shading, but not near the theoretical peak when the chip is executing graphics codes. NVIDIA mention improvement to register allocation and scheduling as the reason behind the freer execution of the MUL, and we believe them. However it looks likely that it's only able to retire a result every second clock because of operand fetch in graphics mode, effectively halving its throughput. In CUDA mode, operand fetch seems more flexible, with thoughput nearer peak, although we've not spent enough time with the hardware yet to really be perfectly sure. Regardless, at this point it seems impossible to extract the peak figure of 933Gflops FP32 with our in-house graphics codes. How much this matters depends on whether you can use the MUL implicitly through attribute interpolation the rest of the time, which we aren't sure about just yet either. After that it's probably best to worry about GS performance in D3D10 graphical applications, which we'll do when it comes time to benchmark the hardware. The new output buffer size increase is one of the bigger architectural differences, maybe even more so than the addition of the extra SM per cluster. Adoption of the GS stage in the D3D10 pipe has undoubtedly been held back a little by the typical NVIDIA tactic of building just enough in silicon to make a feature work, but building too little to make it immediately useful. The increase in register file, a doubling over the number of per-SM registers available to G8x and G9x chips, means that there's less pressure for the chip to decrease the number of possible in-flight threads, letting latency hiding from the sampler hardware (it's the same 200+ cycles latency to DRAM as with G80 from the core clock's point of view) become more effective than it ever has done in the past with this architecture. Performance becomes freer and easier in other words, the schedulers more able to keep the cluster busy under heavy shading loads. Developers now need to worry less about their utilisation of the chip, not that we guess many really were with G80 and G92. The other G8x and G9x parts have different performance traits for a developer to consider there, given how NVIDIA (annoyingly in the low-end from a developer perspective) scaled them down from the grandfather parts. That per-SM shared memory didn't increase is interesting too. The way the CUDA programming model works means that a static shared memory size across generations is attractive for the application developer. He or she doesn't have to tweak their codes too much to make the best use of GT200, given that shared memory size didn't change. However given that CUDA codes will have to be rewritten for GT200 anyway if the application developer wants to make serious use of FP64 support.... ah, but that's comparatively slow in GT200, and heck, 16KiB for every SM is a fair aggregate chunk of SRAM when multiplied out across the whole chip. 1.4B transistors sounds like room to breathe, but we doubt NVIDIA see it as an excuse to be so blasé about on-chip SRAM pools, even if they are inherently redundant parts of the chip which will help yields of the beast. Minor additional notes about the processing architecture include improvements to how the input assembler can communicate with the DRAM devices through the memory crossbar, allowing more efficient indexing into memory contents when fetching primitive data, and a larger post- transform cache to help feed the rasteriser a bit better. Primitive setup rate is unchanged, which is a little disappointing given how much you can be limited there during certain common and intensive graphics operations. Assuming there's no catch, this is likely one of the big reasons why performance improvements over G80 are more impressive at ultra-high-end resolutions (along with the improved bilinear filtering and ALU performance which also become more important there). GT200: Thoughts on positioning and the NVIO Display Pipe It's easy enough to be blasé as the writer talking about the architecture. Here's hoping the differences present don't add up to conclusions of “it's just a wider G80” in the technical press. It's a bit more than that, when surfaces are scratched (and sampled and filtered, since we're talking about graphics). The raw numbers do tell a tale, though, and it's no small piece of silicon even in 55nm form as a 'GT200b'. In fact, it's easily the biggest single piece of silicon ever sold to the general PC-buying populace, and we're confident it'll hold that crown until well into 2009. When writing about GT200 I've found my mind wandering to that horribly cheesy analogy that everyone loves to read about from the linguistically-challenged technical writer. What do I compare it to that everyone will recognise, that does it justice? I can't help but imagine the Cloverfield monster wearing a dainty pair of pink ballerina shoes, as it destroys everything in the run to the end game. Elegant brawn, or something like that. You know what I mean. That also means I get to wonder out loud and ask if ATI are ready to execute the Hammer Down protocol. It'll need to if it wants to conquer a product stack that'll see NVIDIA make use of cheap G92 and G92b (55nm) based products underneath the GT200-based models it's introducing today. That leads us on nicely to talking about how NVIDIA can scale GT200 in order to have it drive multiple products scaled not just in clock, but in enabled unit count. GT200 is able to be scaled in terms of active cluster count and the number of active ROP partitions, at a basic level. At a more advanced level, the FP64 ALU is freely removed, and we fully assume that to be the case for lower-end derivatives. For this chip though, it follows the same redundancy and product scaling model that we famously saw with G80 and then G92. So initially, we'll see a product based on the full configuration of 10 clusters and 8 ROP partitions, with the full 512-bit external memory bus that brings. Along with that there'll be an 8 cluster model with 448-bit memory interface (so a single ROP partition disabled there). Nothing exciting then, and what one would reasonably expect given the past history of chips with the same basic architecture. Display Pipe We've tacked it on to the back end of the architecture discussion, but it's worth mentioning because of how it's manfiest in hardware. So as far as the display pipe goes, you've got the same 10bpc support as G80, and it's via NVIO (a new revision) again this time. The video engine is almost a direct cut and paste from G84, G92 et al, so we get to call it VP2 and grumble under our breath about the overall state of PC HD video in the wake of HD DVD losing out to BluRay. It's based on Tensilica IP (just like AMD's UVD), NVIDIA using the company's area- efficient DSP cores to create the guts of the video decode hardware, with the shader core used to improve video quality rather than assist in the decode process. The chip supports a full range of analogue and digital display outputs, including HDMI with HDCP protection, as you'd expect from a graphics product in the middle of 2008. To portend to DisplayPort port support.... it's possible, but that's up to the board vendor and whether they want to use an external transmitter. Portunately they can. |
#3
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
Physicals and GeForce GTX 200 Products
Physically So what about GT200 physically? We need to talk about that before we discuss the products it's going to enable initially, because those physical properties have the most direct effect on board-level metrics like size, power, heat and noise. NVIDIA won't say exactly how big, but we're pretty confident based on a variety of data that the chip is just about 600mm2 in size, at roughly 24.5mm x 24.5mm. It's built by TSMC on their 65nm major process node, and it's the biggest mass-market chip they've ever produced. It's 1.4 billion transistors heavy, and comes clocked in the same rough ranges that G80 was. We'll note the exact clocks for the two launch SKUs shortly. The wafer shot on page one hints at ~90-95 candidate dice per 300mm start. That's not a lot. We'd call it the Kim Kardashian's ass of the graphics world, but that wouldn't be appropriate for such a sober laugh-free publication as ours. The package for the chip has 2485 pins on the underside as you can see (compared to 2140 on the already massive R600), connecting the processor to the NVIO chip which handles all I/O signalling, power, and the connected DRAMs. A significant portion of the pins are for the power plane, more than the chip needs for I/O and memory connectivity combined. In terms of clocking, the chip sports more aggressive levels of clock gating, helping GT200 achieve considerably lower idle power draw, at about 25W, than a high-end G92-based product like GeForce 9800 GTX (about 3x higher or so). Interestingly, this clock gating helps keep average power consumption a bit lower than you'd expect given the TDP. The chip also has a dedicated power mode for video playback, turning off a significant percentage of the chip to achieve low power figures when pretty much only the VP2 silicon and NVIO are working. The chip supports HybridPower, which lets an IGP on the mainboard be responsible for display output when the chip is mostly idle, including when using it to display HD video content via VP2. The potential power consumption savings and noise benefits are quite large, especially in SLI, should your mainboard support a compatible IGP from NVIDIA. GeForce GTX 200 Series NVIDIA are redefining the product name and model numbering scheme for GeForce, starting with GT200-based products. The first two SKUs are called GTX 260 and GTX 280, the prefix defining the rough segment of performance and the number the relative positioning in that segment. We wonder if the first digit in the number will denote the D3D class, with 2 correlating to D3D10. GTX 3xx for D3D11 or D3D10.1? We doubt even NVIDIA Marketing knows at this point. GeForce GTX 280 http://www.beyond3d.com/images/revie...gtx280-big.jpg The GTX 280 is the current GT200-based flagship. It uses GT200 with all units enabled, giving it 240 FP32 SPs, 30 FP64 SPs, 512-bit external memory bus and 1024MiB of connected DRAMs. The chip is clocked at 602MHz for the base clock, 1296MHz for the hot clock, and thus 648MHz for the global scheduler. DRAM clock is 1107MHz, or 2214MHz effective. That gives rise to headline figures of 933Gflops peak FP32 programmable shading rate (1296 x 240 x 3), 77.7Gflops DP (1296 x 30 x 2), 141.7GB/sec of memory bandwidth, 48.16Gsamples/sec bilinear and nearly 20Gpixel/sec colour fill out of the ROPs. The board power at peak is 236W, it requires two power connections, one of them the newer 8-pin standard, with the other the established 6-pin. There's no way to run the board with only 6-pin power, sadly, because of the extra demands on power draw in graphics mode. The cooler is dual-slot, exhausting air out of the case via the backplane, using a cooler similar to that seen recently with GeForce 9800 GTX, and earlier with GeForce 8800 GTS 512. The backplane sport two dual-link DVI ports, with HDMI possible via active dongle connected to either port. HDTV output via component is supported two, via the smaller 7-pin analogue connector. Al GeForce GTX 280s will support HDCP on both outputs, with dual-link encryption. The board also supports two SLI connectors, for 3-way SLI ability. GeForce GTX 260 http://www.beyond3d.com/images/revie...gtx260-big.jpg The GTX 260 shares almost identical physical properties, including a very similar PCB and component layout, identical cooler and display output options, and 3-way SLI support. With clocks of 576MHz base, 1242MHz hot, 621MHz global scheduler and 999MHz (1998MHz effective) memory, along with the disabling of two clusters and a ROP channel, the power requirements are lessened. The disabled ROP channel means 896MiB of connected GDDR3. Peak board power is 182 watts, and power supply is just 2 x 6-pin this time. The clocks give rise to headline numbers of 715Gflops FP32 via 192 SPs, 59.6Gflops DP via 24 FP64 SPs, 111.9GB/sec memory bandwidth via a 448-bit memory interface, 36.9Gsamples/sec bilinear, and 16.1Gpixels/sec colour fillrate out of the ROPs. Both boards are PCI Express 2.0 native as mentioned, and both sport the VP2 silicon as you know. Our 3D Tables will let you compare to older products to see where things have improved, if you didn't pick it up from the text. Architecture Summary Because GT200 doesn't implement a brand new architecture or change possible image quality compared to G80 or G92, we've been able to skip discussion of large parts of the chip simply because they're unchanged. There's nothing new to talk about in terms of maximum per- pixel IQ, because the crucial components of the chip that make that all happen have no improvements or changes to speak of. It's purely a question of performance and how that's derived. http://www.beyond3d.com/images/revie...200die-big.jpg If you've got your wonky monocle on (every graphics enthusiast has one, so they can squint and replicate Quincunx in real-time, with REAL pixels), it's possible to look at GT200 and see 1.4B transistors and wonder why 2x G92 across the board wasn't because, after all, it's nearly double the transistor count. The reality is that transistors have been spent elsewhere in the chip however, for CUDA among other things. Furthermore, and perhaps more importantly, some potential minor bottlenecks such as triangle setup remain seemingly unchanged while clocks also went down. The stark reality is that GT200 has more of an eye on CUDA and non- graphics compute than any other NVIDIA processor before, and it speaks volumes, especially as they continue to ramp up the CUDA message and deploy Tesla into outposts of industry the company would previously have no business visiting. Oil and gas, computational finance, medical imaging, seismic exploration, bioinformatics and a whole host of other little niches are starting to open up, and what's primarily a processor designed for drawing things on your screen is now tasked with doing a lot more these days. The base computational foundation laid down by G80 now has DP and full-speed denormal support, which is no small matter as a new industry grows up. We'll cover that separately, since we took in a recent Editor's Day in Satan Clara related to just that. We've not been able to spend much time with hardware to date, but we've been able to throw some graphics and CUDA programs at a chip or two (and a special thank you to the guy who helped me run some simple D3D9 on a GTX 280 last night, très bien monsieur!). Performance is high when testing theoretical rates in the shader core, and although we can't see NVIDIA's claimed 93-94% efficiency when scheduling the SFU, ~1.5 MULs/clock/SP is easy enough with graphics code, and we see a fair bit higher under CUDA. That contrasts to the same shaders on G80, where the MUL is resolutely missing in graphics mode, regardless of what you dual-issue with it. We can almost hit the bilinear peak when texturing, which proves general performance claims there, and if I could be bothered to reboot to Windows and fire up Photoshop, I could make a version of the old 70Gpixel/sec Z-only image, only this time it'd be around the 1/10th of a terazixel mark. That's right, I said terazixel. While we haven't measured blend rate, it's next on our list to do properly, but we're confident the theoretical figures will be born out during testing. With 3D games, and it's prudent to talk about the split execution personalities of the chip in terms of a graphics mode and the compute mode, because they affect performance and power consumption, depending on the game and resolution tested (of course, caveats ahoy here), 30-100% or so more performance is visible with GeForce GTX 280, compared to a single GeForce 9800 GTX. Quite a swing, and the upper end of that range is seemingly consumed by D3D9 app performance. Compared to the multi-GPU product du jour right now, GeForce 9800 GX2, it does less well and is often beat at high resolution depending on memory usage. Yes, we're waving our hands around a bit here without graphs to back it up, so we forward you to the fine fellows at The Tech Report and Hardware.fr for more data. As the single chip paragon of graphics virtue and greater internets justice, GT200 has no peers, and GeForce GTX 280 looks like a fine caddy for this latest of our silicon overlords. We question heavily whether the asking price is worth it, with GTX 260 looking much better value there if you must buy GT200 right now, but we daren't say much more until we've had proper hands-on testing with games and some more CUDA codes. There's something about having 512-bit and 1GiB on one chip though. We mentioned at the top of the article that G80 has finally been truly usurped in our eyes for 3D. At a simple level, 1.4B transistors holding huge peak bilinear texturing rates, 256 Z-only writes per clock and around 780 available Gflops in graphics mode will tend to do that. More on GT200 and its competition over the coming next couple of weeks, and keep your eyes peeled on the forums for more data before new articles show up. B3D discussion thread: http://forum.beyond3d.com/showthread.php?t=48563 |
#4
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
Your copy and post pastes direct from web pages are so helpful to people that don't know how to use a web browser! Heres a great article I found on bowling! http://www.articlesbase.com/sports-a...es-313261.html Further Enhancing Your Bowling Strategies Author: Jimmy Cox The general style of the advanced bowler is already set. Below are listed pointers eliminating faults, increasing speed and handling spares. These are a great start to improving your game! It might be well to point out right here that any change in one's style almost automatically means a temporary drop in average. For instance, if you decide to change your footwork, you might as well face the fact that you will lose points while correcting yourself. The important thing to remember, if and when you are satisfied in your own mind that you are doing something fundamentally wrong, is that by correcting the fault you will bring your average up higher than it was. The best time to do this correction work or practice is in the summertime, when your experiments will not be at the expense of your teammates. During this period, you have three or four months to work out those kinks and to incorporate into your style the correct methods you failed to use previously. One fault leads to another. It is an axiom of bowling that one key fault can cause two or three other faults. Suppose a bowler takes his first step too fast. That is the key fault, but it also results in poor timing, too fast footwork, and being off balance at the foul line. Another key fault might be allowing the right shoulder to be pulled back and out of line, which brings on such other faults as improperly facing the pins, finishing sideways at the foul line and a poor follow-through. The key fault of lunging at the foul line ruins timing, makes the release jerky, and may cause the bowler to hop. Get rid of individual faults only when necessary. You may have a particular flaw in your game, but if you do the same thing consistently and successfully, do not change. There are bowlers today averaging 200 who do not have a good follow-through, or who have too high a backswing or who possess some other fault. But they have learned to incorporate that flaw into their game so well that they are consistent, and their game might fall apart if they attempted to change it. In this regard, I might point out that I am not referring here to those bowlers who are not high average bowlers and are afraid to change, despite the fact that they possess an obvious flaw in their game. There are several ways in which to increase your speed. You might use any or all of these to succeed. Here they a a. Hold the ball higher in your starting position. This will help give you a longer pendulum swing. b. Use more pushaway when you begin. Push it farther out, if you have been negligent in that phase. c. Increase your backswing. Perhaps you have been bringing the ball no higher than your waist on the backswing. Remember that you can bring it back as high as the shoulder without violating the fundamental rule in this regard. d. Work on more perfect timing. Perfect timing gives you the maximum amount of natural speed. If you have had trouble getting good speed, perhaps you have been coming to a full stop at the foul line before your right arm begins its swing. Perfect timing will increase your speed and is far better for you and for your game than trying to force the ball. Do you play spares properly? Here are the three rules: a. Face your target from the correct angle. Square your shoulders to the target. b. Walk directly toward your target. In the cases of the 7-pin and the 10-pin, this means walking directly toward that pin, which will cause you to go to the foul line at a slight angle. c. Make sure that you have your right arm following through directly toward your target. Get your right arm out to where you are looking, whether this be a pin or a spot. Work on the above points conscientiously and your game will improve dramatically. Just keep going! |
#5
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
Tim O wrote:
Your copy and post pastes direct from web pages are so helpful to people that don't know how to use a web browser! Heres a great article I found on bowling! http://www.articlesbase.com/sports-a...es-313261.html And here's a great recipe I found that NV55 is sure to enjoy! Bon-appetite. http://www.cooks.com/rec/view/0,193,...238200,00.html WIENER WATER SOUP 1 pkg. wieners 3 c. water Combine wieners and water in a two quart saucepan. Bring to a boil until wieners are cooked. Throw the wieners in the garbage. Serve soup. Serves 3. |
#6
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
On Mon, 16 Jun 2008 15:26:57 -0400, Tim O
wrote: Your copy and post pastes direct from web pages are so helpful to people that don't know how to use a web browser! But I don't know how to read. Ya think the moron could come to my house and read it to me? |
#7
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
"NV55" wrote in message ... Yup and they can keep it. Not impressed at all. |
#8
|
|||
|
|||
NVIDIA GT200 GPU and Architecture Analysis
On Thu, 19 Jun 2008 07:17:09 GMT, "Cool" wrote:
"NV55" wrote in message ... Yup and they can keep it. Not impressed at all. I won't be dumping my GX2 anyway. :-) |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
NVIDIA GT200 GPU and Architecture Analysis | NV55 | AMD x86-64 Processors | 5 | June 19th 08 07:01 PM |
Nvidia GT200 GPU fp performance @ 1 to 1.33 TFLOP | NV55 | Ati Videocards | 2 | April 26th 08 05:43 AM |
AMD R600 Architecture and GPU Analysis (long read) | RadeonR600 | AMD x86-64 Processors | 6 | May 18th 07 03:09 AM |
AMD R600 Architecture and GPU Analysis (long read) | RadeonR600 | Nvidia Videocards | 5 | May 17th 07 07:25 PM |
AMD R600 Architecture and GPU Analysis (long read) | RadeonR600 | Ati Videocards | 5 | May 17th 07 07:25 PM |