If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Increasing disk performance with many small files (NTFS/ Windowsroaming profiles)
Due to applications as the SAP client and AutoCAD 2002 our users roaming
profiles contain thousands of very small files. I have noticed that the average transfer rate of those small files (~350Bytes in size) over the network is extremely slow compared to normal to large sized files (300KB up to a few MB). With the normal sized files I'm seeing transfer rates to the workstations of 4MB to 15MB per second, with the small files this drops to as low as 75KB per second with an average of ~200KB per second. The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. The server is a Windows 2000 SP4 machine, the workstations are NT4 SP6a. The network is 100Mb switched with a 1000Mb connection to the fileserver. Is there anything I can do with the RAID stripe size or the cluster size to increase the throughput of those small files without affecting transfer speed the normal sized files to much? Are there any benchmark programs that I can use to test this? Could the TCP/IP Windows size be an issue here? -- Thanks, Benno... |
#2
|
|||
|
|||
Benno... wrote:
The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. I was thinking, the RAID5 drive consists of 6 disk. Normally the more spindles the better the performance but is this also true with those very small files? Could a large number of spindles have a negative performance effect? -- Benno... |
#3
|
|||
|
|||
"Benno..." wrote in message ... Due to applications as the SAP client and AutoCAD 2002 our users roaming profiles contain thousands of very small files. I have noticed that the average transfer rate of those small files (~350Bytes in size) over the network is extremely slow From your later comments, it sounds as if you already recognize that your performance problem likely has little to do with the network: the performance that you're seeing is consistent with the requirement for a separate disk access for each small file (on a fairly fast disk). compared to normal to large sized files (300KB up to a few MB). With the normal sized files I'm seeing transfer rates to the workstations of 4MB to 15MB per second, with the small files this drops to as low as 75KB per second with an average of ~200KB per second. The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. The server is a Windows 2000 SP4 machine, the workstations are NT4 SP6a. The network is 100Mb switched with a 1000Mb connection to the fileserver. Is there anything I can do with the RAID stripe size or the cluster size to increase the throughput of those small files without affecting transfer speed the normal sized files to much? No. The only thing that could help in that area is sufficient cache on the array (you might consider changing the read/write balance: if the cache isn't helping much at all now, that may not help much more, but your current heavy skew toward writes may not be helping much either) or in the system file cache to keep the small files memory-resident. A file system like Reiserfs that can aggregate many such small files in a single directory node might help, if the accesses to them are clustered within directories. The only analogous approach with NTFS would be somehow to manage to create the files in a clean MFT in the order that they're accessed by the user, and depend upon the disk's read-ahead mechanism to prefetch multiple files at a time (though if other activity is also contending for the disk that could interfere with the read-aheads, or vice versa if you force read-ahead on every access). - bill |
#4
|
|||
|
|||
On Mon, 19 Jul 2004 09:28:50 +0200, "Benno..." wrote:
Benno... wrote: The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. I was thinking, the RAID5 drive consists of 6 disk. Normally the more spindles the better the performance but is this also true with those very small files? Could a large number of spindles have a negative performance effect? More spindles also gives better performnance with those very small files. The array controller then has the option to read multiple files simultaneously from different spindles. Especially with small files you should set the stripe size to the maximum that the controller supports. So it already has the optimum setting. (The idea behind that is that the small files are stored on as few disks as possible, which will increase the chance that you can read several files simultaneously) But whatever you do, you might increase performance, but you will never ever get large transferrates with small files. The reason for that is the following: The time it takes to search for that file on the disk and open it is very large in comparison to the time it takes to transfer it. The same might also happen on the network. I'm not sure what overhead you get on small files in the network. But you might want to take a look at what happens when you access those small files directly on the server, compared to what happens when you access them over the network. Are you doing a lot of writing to the array controller? You have to accept that writes will never be fast. Extra cache will not fix that. And it might slow the the reads down a bit in such a way that on average the user experience is slower. I don't have personal experience with roaming profiles, but I'd guess they require more read than write capacity. More cache in the array controller might help. if you have the option to add more. You could experiment with the cluster size, but I don't think that that will help. The cause of the slow performance is the relatively large seektime when accessing small files. And that doesn't change when the clustersize is smaller. I'm afraid that I can't think of much to improve the situation. Basically the applications shouldn't create so many extremely small files, because that will always hurt performance. (Your backup software is probably not too happy about it either) Marc |
#5
|
|||
|
|||
Marc de Vries wrote:
On Mon, 19 Jul 2004 09:28:50 +0200, "Benno..." wrote: Benno... wrote: The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. I was thinking, the RAID5 drive consists of 6 disk. Normally the more spindles the better the performance but is this also true with those very small files? Could a large number of spindles have a negative performance effect? More spindles also gives better performnance with those very small files. The array controller then has the option to read multiple files simultaneously from different spindles. The same might also happen on the network. I'm not sure what overhead you get on small files in the network. But you might want to take a look at what happens when you access those small files directly on the server, compared to what happens when you access them over the network. I setup a test server to do some performance tests. I collected a dataset of 26 profiles (216MB in 46.075 files and 1773 directories). Copying them from the server to a workstation gives an average speed of 420KByte/sec (the test server is newer and has therefor better performing disks/arraycontroller then the production server in my previous post. The production server gets around 230KB/sec on the test dataset). If I copy this dataset on the server itself from the RAID1 boot/system partition to the RAID5 data partition I see 2500KByte/sec. |
#6
|
|||
|
|||
"Marc de Vries" wrote in message On Mon, 19 Jul 2004 09:28:50 +0200, "Benno..." wrote: Benno... wrote: The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. I was thinking, the RAID5 drive consists of 6 disk. Normally the more spindles the better the performance but is this also true with those very small files? Could a large number of spindles have a negative performance effect? More spindles also gives better performnance with those very small files. Nope, only on busy servers that do alot of them simultaniously. The array controller then has the option to read multiple files simultaneously from different spindles. Therefor still reads at full stripe width transfer rates. Especially with small files you should set the stripe size to the maximum that the controller supports. So it already has the optimum setting. Not if this is not a "busy" server. The bigger the stripe size the more small files that sit on a single disk and transfer at single disk speeds. If that's not compensated by the shear number of them that a part of is read simultaniously all the time then you loose. (The idea behind that is that the small files are stored on as few disks as possible, which will increase the chance that you can read several files simultaneously) So you actually make them slower, to read more of them simultaniously. On a not so busy server you are insuring that the small files will transfer even slower compared to doing nothing. Nice one. When you leave it as it is that you thought was best, at least you don't make the ones that fill a stripe width slower, and, when they are less than that, smaller files automatically fill up the gap when the server is busy and has many outstanding IO. But whatever you do, you might increase performance, but you will never ever get large transferrates with small files. Right, now for yourself to let that sink in. The reason for that is the following: The time it takes to search for that file on the disk and open it is very large in comparison to the time it takes to transfer it. The same might also happen on the network. I'm not sure what overhead you get on small files in the network. But you might want to take a look at what happens when you access those small files directly on the server, compared to what happens when you access them over the network. Are you doing a lot of writing to the array controller? You have to accept that writes will never be fast. Extra cache will not fix that. Not on a busy server, no. And not if the write speed is not disk related. It will if it is and the cache can catch up in less busier periods acting as a buffer. And it What "it"? might slow the the reads down a bit in such a way that on average the user experience is slower. I don't have personal experience with roaming profiles, but I'd guess they require more read than write capacity. More cache in the array controller might help. if you have the option to add more. You could experiment with the cluster size, but I don't think that that will help. The cause of the slow performance is the relatively large seektime when accessing small files. And that doesn't change when the clustersize is smaller. Actually it does when already small files fragment because of it. I'm afraid that I can't think of much to improve the situation. Basically the applications shouldn't create so many extremely small files, because that will always hurt performance. Unless they sit on a dedicated drive that is not mechanical in nature. Solid State Disk. (Your backup software is probably not too happy about it either) That obviously depends on the type of backup. Marc |
#7
|
|||
|
|||
"Benno" wrote in message
Benno... wrote: The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. I was thinking, the RAID5 drive consists of 6 disk. Normally the more spindles the better the performance Depends on filesize and stripe size, really. When a transfer is not the full stripe width the transfer is slower than optimal. but is this also true with those very small files? Depends on filesize and stripe size. You choose your stripe size depending on filesize and stripe width. If you change the stripe width without changing the stripe size then small files may find themselfs sitting in a less that full stripe(width) and not benefit from the same full stripewidth transfer rate that bigger files get. Could a large number of spindles have a negative performance effect? Sure, when you don't adjust your stripe size accordingly. |
#8
|
|||
|
|||
"Benno" wrote in message
Due to applications as the SAP client and AutoCAD 2002 our users roaming profiles contain thousands of very small files. I have noticed that the average transfer rate of those small files (~350Bytes in size) over the network is extremely slow compared to normal to large sized files (300KB up to a few MB). With the normal sized files I'm seeing transfer rates to the workstations of 4MB to 15MB per second, with the small files this drops to as low as 75KB per second with an average of ~200KB per second. 512 bytes (one sector) or 4 kB (one cluster) reside in a single 64kB stripe so it transfers at single drive speed. At an STR of 51MB/s this file transfers in .1 ms or .4 ms With an average access time of 12 ms your average transfer rate is from (.1/12.1)*51MB/s 420kB/s to 1.65MB/s (.4/12.4)*51MB/s Your 350 byte file may run at 350/4096*1.65 MB/s = 400KB/s. (And yes, because of that huge difference in access time and actual trans- fer time it is trivial whether the disk system reads a sector or a cluster). The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size Any file of 64kB is now a small file. Whether it is read in parallel now depends on it being fragmented and how. (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. The server is a Windows 2000 SP4 machine, the workstations are NT4 SP6a. The network is 100Mb switched with a 1000Mb connection to the fileserver. Is there anything I can do with the RAID stripe size or the cluster size to increase the throughput of those small files without affecting transfer speed the normal sized files to much? Little to none. Are there any benchmark programs that I can use to test this? Could the TCP/IP Windows size be an issue here? Maybe, for the difference between your 75kB/s and my 400kB/s. |
#9
|
|||
|
|||
"Benno..." wrote in message ... Due to applications as the SAP client and AutoCAD 2002 our users roaming profiles contain thousands of very small files. I have noticed that the average transfer rate of those small files (~350Bytes in size) over the network is extremely slow compared to normal to large sized files (300KB up to a few MB). With the normal sized files I'm seeing transfer rates to the workstations of 4MB to 15MB per second, with the small files this drops to as low as 75KB per second with an average of ~200KB per second. The small files are stored in the MFT, so a single read opens the file and reads the data. Since 10K drives do about 100 IO/s, you won't ever copy over 100 files/s with a single threaded copy. Actually, it compares timestamps before copying, but same argument. The problem is roaming profiles. Create a home directory for each user instead. The roaming profiles are stored on a RAID5 logical drive with a 64KB stripe size (I think this is the maximum for the Smart Array 5300 controller) and the NTFS partition is formatted with the default 4KB cluster size. The Array Controller cache is configured 25% read / 75% write to compensate for the RAID5 slower writes. The server is a Windows 2000 SP4 machine, the workstations are NT4 SP6a. The network is 100Mb switched with a 1000Mb connection to the fileserver. |
#10
|
|||
|
|||
"Benno..." wrote in message ... I setup a test server to do some performance tests. I collected a dataset of 26 profiles (216MB in 46.075 files and 1773 directories). Copying them from the server to a workstation gives an average speed of 420KByte/sec (the test server is newer and has therefor better performing disks/arraycontroller then the production server in my previous post. Try the same experiment twice, once pushing(xcopy on server) the file set and once pulling(xcopy on workstation) the fileset. The production server gets around 230KB/sec on the test dataset). If I copy this dataset on the server itself from the RAID1 boot/system partition to the RAID5 data partition I see 2500KByte/sec. |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Best drive configuration? | Noozer | General | 20 | May 27th 04 03:10 AM |
RAID card for my PC?? | TANKIE | General | 5 | May 22nd 04 01:09 AM |
performance degradation backing up small files | alan | Storage & Hardrives | 2 | April 27th 04 05:47 AM |
Strange files saved the hard disk | SunMyoung Yoon | General | 1 | January 3rd 04 04:44 AM |
SDLT wear & tear (small files vs. big files) | George Sarlas | Storage & Hardrives | 12 | September 29th 03 11:07 PM |