If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
interfilesystem copies: large du diffs
I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a
Netapps system. Did a 'du -sk' against each to verify the transfers: 2894932960 sources total, KB 2751664496 destination total, KB That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up nothing that was not originally transferred. I often note similar behaviour with smaller transfers between servers with similar OS/fs combos and have always seen it to come extent with transfers between systems of any type. It's just that the usual discrepancies in this case are magnified greatly by the sheer volume of data. Needless to say, 140GB going missing would be a bit of a problem and it's not much fun picking through 2.8TB for MIA data. Can anyone shed some light on why this happens? tia |
#2
|
|||
|
|||
orgone wrote:
I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a Netapps system. Did a 'du -sk' against each to verify the transfers: 2894932960 sources total, KB 2751664496 destination total, KB "df" uses actual blocks allocated. "du" takes the file size and concludes that all blocks are allocated. That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up nothing that was not originally transferred. I often note similar behaviour with smaller transfers between servers with similar OS/fs combos and have always seen it to come extent with transfers between systems of any type. It's just that the usual discrepancies in this case are magnified greatly by the sheer volume of data. Needless to say, 140GB going missing would be a bit of a problem and it's not much fun picking through 2.8TB for MIA data. Can anyone shed some light on why this happens? My best guess is the NetApp somehow handles sparsely allocated files differently so that "du" sees the block actually allocated not just the file size using the address of the last byte. Alternate theory that is far less likely: On your source tree you have a history of making hundreds of thousands of files and then deleting nearly all of them, leaving a lot of very large directories. On your target tree the directories are much smaller. Yet another alternate theory: Smaller blcok/fragment/extent size on the target. So on the source any file has a fairly large minimum block count but on the target smaller files take fewer blocks. You would need very many small files to account for a 3% difference, but a few 100K files under 512 bytes should cause this. |
#3
|
|||
|
|||
In article .com,
orgone wrote: I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a Netapps system. Did a 'du -sk' against each to verify the transfers: 2894932960 sources total, KB 2751664496 destination total, KB That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up nothing that was not originally transferred. I often note similar behaviour with smaller transfers between servers with similar OS/fs combos and have always seen it to come extent with transfers between systems of any type. It's just that the usual discrepancies in this case are magnified greatly by the sheer volume of data. Needless to say, 140GB going missing would be a bit of a problem and it's not much fun picking through 2.8TB for MIA data. Can anyone shed some light on why this happens? tia First, this is only a 5% difference. I could easily imagine the difference being much larger: The du command (and the underlying st_blocks field in the result of the stat() system call) reports the amount of space used. But: - A filesystem uses space not only for the data component (the bytes that are stored in the files), but also for overhead: directories, per-file overhead like inodes and indirect blocks, and more, often referred to as metadata. How efficiently this overhead is stored varies considerably by filesystem. And whether this overhead is reported as part of the answer from du also varies. In some extreme cases (filesystems that separate their data and metadata physically) this overhead is not reported at all. The ratio of metadata to data varies considerably by file system type and by file/directory size, but for many small files 5% is not out of line. - The amount of space allocated to a file typically has some granularity, which is often 4KB or 16KB (historically, it has ranged from 128 bytes for the cp/m filesystem to 256KB for some filesystems used in high-performance computing). This means the size of the file is rounded up to this granularity, which if your files are typically small can make a huge difference. Say your files are all 2KB, and you store them on a file system with a 512B allocation granularity and on another one with 16KB allocation granularity, you'll get a result from du that is different by a factor of 32! - Are any of your files sparse? I think every commercial filesystem that's in mass-production today supports sparse files. But exactly how can vary widely. What is the granularity of holes in the file? What is the metadata overhead for holes (in extent-based filesystems this can make a significant difference if implemented carelessly)? Also, it is quite possible (maybe even likely) that your rsync copying turned sparse files into contiguous files. Given that your total space usage shrank instead of increased, this doesn't seem likely to be the main effect here. - On the netapp, did you have snapshot turned on? If yes, does the result from du include the snapshots? - It isn't even completely clear what the result from du is supposed to be. The real disk usage? The size of the file rounded to kilobytes? Here is a suggestion to stir the pot: Assume you have a 1MB file stored on a RAID-1 (mirrored) disk array. I think du should report the space usage as 2MB, because you are actually storing two copies of the file (you are using 2MB worth of disks). If you now migrate the file to a compressing filesystem that is not mirrored, du should report the space usage as 415KB, if that's how much disk space it really uses. No filesystem today would report those values, they would all report something pretty close to 1MB. For you, this is my suggestion: Instead of looking only at the total, make a complete list of the disk usage for each file. An easy way to do this from the command line is this. Make two listings of space usage, one each for source and destination, merge the lists, and look at the differences. Here is a quick attempt at a script which does this (just typed this in, you may have to debug it a little bit, and it assumes you don't have spaces in file names, if you do you'll have to do a lot of quoting and null-terminating): cd $SOURCE find . -type f | xargs du -k | sort +1 /tmp/source.du cd $TARGET find . -type f | xargs du -k | sort +1 /tmp/target.du cd /tmp join -j 2 source.du target.du both.du awk '{print $1, $3 - $2}' both.du | sort -n +1 diff.du In the end, you'll have a listing of the difference in space usage in diff.du, sorted (I hope, I can never remember whether the -n switch to sort works correctly for negative numbers). Then pick a few examples of files that have large differences, or see whether you can make out a trend (maybe most files have a small difference). Then spot-check a few files, to make sure they were copied correctly. You can also use "join -j 2 -v 1 source.du target.du" to find files that were not copied, and the same with "-v 2" to find files that showed up in the copy uninvited. Now changing gears: Speaking as file system implementor (and somewhat of an expert), I would wish that the du command and the underlying information returned by the stat() system call would go away. On one hand, they are just to crude and don't begin to describe the complexity of space usage in a modern (complex) filesystem. On the other hand, they don't give the information answers that a system administrator (or an automated administration tool) really needs. As we saw above, for a 1GB file, the correct answer for space usage might be any of (all the numbers are made up) - 1GB worth of bytes - 1GB is the file size, but it is sparse, so it only uses 876MB. - 1GB worth of bytes on the data disk, plus 7.4MB of metadata on the metadata disk. - 2GB worth of bytes, because of RAID 1. - 437MB worth of bytes, because of compression. - 0.456GB on datadisk_123, 1.234GB on datadisk_456, and 2.345GB on datadisk_789, plus 7.4MB on metadisk_abc and 3.7MB on metadisk_def. - 5.678GB on disk, because of RAID 1, asynchronous remote copy (still 0.3GB worth of copying to be done, currently held in NVRAM), and fourteen snapshot copies, all slightly different, not to mention that the remote copy is compressed, and this figure includes metadata overhead on the metadata disks. - 4.567GB on expensive SCSI disks (at $3/GB plus $0.50/year/GB), and 1.234GB on cheap SATA disks (at $1/GB plus $0.25/year/GB). As you see, returning one number is woefully inadequate. We need to ask ourselves: What is the purpose of the space usage information? It is not to verify that the file system has correctly stored the data (for that it is too crude), it is to enable administrating the file system, so it needs to give the information a system administrator might care about. If I had my way (fortunately, nobody ever listens to me), I would remove the du command and completely remove all notions of space usage from the user-mode application API, and put all space usage information into a file system management interface. There, questions like the following need to be answered: - How much space is user fred using (or files used by the wombat project, or files stored on storage device foobar)? - Has fred's usage increased recently? - How expensive is the storage used by fred? Original purchase, lease payments, yearly provisioning and administration cost? - Are the wombat projects requirements for data availability being met, or could I improve them by allocating more space to it and storing more redundant copies of their data? - If I move the wombat project to the netapp, and then use the free space on the cluster filesystem to put fred's files on, would that save me money or increase speed or availability? - Is the netapp still a cost-effective device, given that we just started using the fancy new foobar device from Irish Baloney Machines with the new cluster filesystem from Hockey-Puckered? (If it isn't clear, all mentions of the word "netapp" and oblique references to large computer companies are meant as humor, and are intended to neither praise nor denigrate my current, former or future employers). -- The address in the header is invalid for obvious reasons. Please reconstruct the address from the information below (look for _). Ralph Becker-Szendy |
#4
|
|||
|
|||
On 24 Aug 2005 02:08:46 -0700, orgone said something similar to:
: I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a : Netapps system. Did a 'du -sk' against each to verify the transfers: : : 2894932960 sources total, KB : 2751664496 destination total, KB : : That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up : nothing that was not originally transferred. What are the native block sizes of the two filesystems? If you've got a large enough number of files and directories there, a smaller block size on the destination could account for the discrepancy in terms of less unused space at the end of the last block of each file. Another thing that I've seen cause discrepancies like this on occasion is when the source directories once had many more files in them then they currently do. Once more blocks have been allocated to a directory, they don't get deallocated when the number of files drops. |
#5
|
|||
|
|||
orgone wrote:
I recently rsync'd around 2.8TB between a RHE server (jfs fs) and a Netapps system. Did a 'du -sk' against each to verify the transfers: 2894932960 sources total, KB 2751664496 destination total, KB That's a 140GB discrepancy. Subsequent verbose rsyncs have turned up nothing that was not originally transferred. I often note similar behaviour with smaller transfers between servers with similar OS/fs combos and have always seen it to come extent with transfers between systems of any type. It's just that the usual discrepancies in this case are magnified greatly by the sheer volume of data. Needless to say, 140GB going missing would be a bit of a problem and it's not much fun picking through 2.8TB for MIA data. Rsync has a "-c" option for producing checksums, I imagine that would give me some reassurance that the transfer ocurred correctly. There is also the "-v" verbose option as you noted. To be certain I'd consider checksumming all the files on each system (e.g. something like find mydirectory -exec sum {} \; sysname.sums) and use diff to compare the results. If really paranoid I'd use md5sum instead of sum. I imagine this will take considerable time on 2.8TB so I'd try it on small subsets first :-) |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
WinXP/ X copy Dos Command/ Dick Backup | Jeff | Storage (alternative) | 8 | May 21st 05 01:38 AM |
Advice Please: The Importance of Hard Drive RPMs | Darren Harris | Storage (alternative) | 101 | August 24th 04 11:16 PM |
Large Hard Drive & BIOS upgrade problems | Lago Jardin | Overclocking | 10 | June 13th 04 12:56 AM |
Large Hard Drive & BIOS upgrade problems | Lago Jardin | Homebuilt PC's | 1 | June 12th 04 02:08 PM |
LBA, Normal or Large ? | Al Franz | Homebuilt PC's | 1 | January 10th 04 12:35 AM |