cygwin + bash + rsync
You could have setup cygwin with bash and rsync on the windows machines as well. Rsync is better then cp when it comes to moving files between systems.
Recently I copied 60 million files from one Windows file server to another. Tools used to move files from system to another are integrated into every operating system, and there are third party options too. The tasks they perform are so common that we tend to ignore their limits. Many systems administrators are guilty of not …
I have used rsync extensively, one of the things i commonly use it for is duplicating entire system installs from one system to another... I can do an initial copy while the source system is live to get 99% of the data, and then only need to copy the differences once i have shut down the source system. I have a busy mailserver which uses the maildir format (one file per message) and that had more than 60 million small files on it when i migrated it to new hardware.
You've tried this in cygwin on a windows box have you? And it works?
No'one is disputing rsync works on linux. The article is about how to copy that many files from a windows box. If someone suggests doing it in cygwin it's a more useful suggestion if they actually know whether it works, rather than leaving someone else to do their work for them.
In theory any of the other tools discussed n the article should also work. In practice they didn't.
It has options for 'synchronising' two folders, including file permissions.
Dump the text output into a file then search for 'ERROR', or better, pass it through FINDSTR to filter out anything but lines beginning with 'ERROR'
Sure, it bombs out om folder/filenames longer than 256 characters, but those are an abomination and the users who made them(usually by using filenames suggested by MS Office apps) really needs to be punished anyway.
Been pushing around a few million files this spring and summer...
robocopy also has the /create option which copies the files with 0 data. OK, a total operation take 2+ passes, but it has several advantages :
1. 60million files WILL cause the MFT to expand. If you are filling the disk with data as well as MFT then you can end up fragenting the MFT which can lead to performance problems. If the disk is only writing MFT (such as during a create), then the MFT can expand to adjacent space.
2. Since there is no data, the operation completes is a fraction of the time, so if you log the operation, you can see where any failures will occur, and fix them for the final copy.
For planned migrations, you can run the same job several times (only run the /create the first time), and therefore the final sync should take a lot less time :)
I asked is "cygwin" had been considered, I did not say "Use cygwin! It's the wins! L0LZ!!11!" The author had looked at various tools and not mentioned "cygwin", so my question seems perfectly reasonable to me (others have asked the same question).
You seem to have taken umbrage at a few "cygwin" related posts, do you have an issue with this tool (I used it for some light-weight ssh and rsync work and really like it). If you do, have you filled bugs, got involved?
Or do you know something about "cygwin" and large jobs? "Oh, you can't use cygwin for that because it's job index will overflow, see bug-1234".
Or are you just some reactionary pillock who can't see through the Windows? "!Microsoft==Bad"
I know which conclusion I am drawing at the moment...
You're jumping to the wrong conclusions. The point is simply that before trying you'd expect half the tools tried in the article to work on windows. However, they didn't. So it's helpful to make clear whether you know the solution you're suggesting works or not. It sounds like you don't. Also, you didn't ask whether cygwin had been considered. Re-read what you posted. It's nothing to do with windows / linux / os of choice. I work mostly on bsd and linux derivatives for a living. Oh, and i like cygwin.
And you do know the price of a tape drive, right? Company I worked for bought a pair of Ultrium-4 drives from IBM. The bloody thing itself costs a little over US$3000 a pop, and you need two. If you work in a company where accounts is a /b/tard, you'll know how painful it is to get them to approve the upgrade for a drive, let alone two drives.
+1 This has the most desirable side effect of stopping any bugger from trying to update the "wrong" file, or creating new ones while the copy is in progress.
p.s. Don't forget the second part of any professional data copying activity is to VERIFY that what you copied actually did turn out to be the same as what you copied from. Many a backup has turned out to be just a blank tape and an error message without this stage.
Clearly you've never worked in a decent sized enterprise.
There are LOTS of scenario's when you can't just move the disks or use backup hardware. Maybe not all the data on one disk is being moved. Maybe the data is on different SANs or NAS and can't be swapped or zoned. maybe you can't attach the same type of tape to each server.
"So the best way to move 60 million files from one Windows server to another turns out to be: use Linux."
D'oh? On a serious note, you should also give rsync a go. And try also _storing_ the data on an OS that is not windows, you might find that you are then able to do things you need to do, like say copy 60m files, without resorting to booting virtual machines of another OS just to use that OS's utilities to manage your main OS. Just saying. Please don't flame me.
"Not easy once you try doing a file transfer via rsync through an ssh tunnel, like your suggesting, but the destination server isn't running an ssh server....let alone use / as a path convention."
Well, if the target system is running LInux, then turn on its sshd service!
If it's running Windoze ... well, borrow another PC, boot an appropriate Linux live CD, mount the MS Shared folder as CIFS, start the ssh daemon, and rsync through the temporary Linux system to the MS system.
Linux: the system that provides answers and encourages creativity.
Windoze: the system that erects obstacles and encourages stupidity.
Personally, I swear by Directory Opus, by GPsoftware.
I can attest to it's incredibly reliable performance, error handling and insanely flexible advanced features.
Aside from being able to copy vast quantities of data, handle errors, log all actions, migrate NTFS properties, automatically unprotect restricted files and re-copy files if the source is modified, it also has built-in FTP, an advanced synchronisation feature (useful for mopping up failed files after you've fixed the problem that stopped them being copied), and a truly unparralelled batch renaming system which among other things, can use Regular Expressions.
It also has tabbed browsing (you can save groups of tabs), duplicate file finding, built in Zip management, custom toolbar command creation, file/folder listing and printing....
Stangely, not a lot of sysadmins know about DOpus. I learnt of it during my Amiga days, in what seems like a lifetime ago. I always have a copy installed on my workstation, and at least 1 of my servers
I like it! I'm often frustrated dealing with 10,000 small files in Windows, never mind 60m! On a desktop Windows 7 is painfully slow displaying 2000 photos in a single folder. It shows them straight away (detailed view, not thumbs) but then takes 20 secs to sort them by date modified! Aah!
But why do you have 60m files? Could you store that data in a better way? Could it be put into a database for example?
From the article:
I wanted to give several command-line tools a go as well. XCopy and Robocopy most likely would have been able to handle the file volume but - like Windows Explorer - they are bound by the fact that NTFS can store files with longer names and greater path than CMD can handle. I tried ever more complicated batch files, with various loops in them, in an attempt to deal with the path depth issues. I failed.
....which makes me very interested in what he was doing, and why robocopy wasn't an option
I've managed to use robocopy to create files I couldn't delete from windows before (because I'd gone from c:\ to d:\somefolder\someotherfolder and that pushed the bottom of the folders past the filepath limit)
"Richcopy can handle large quantities of files, but can multi-thread the copy, and so is several hours faster than using a Linux server as an intermediary"
Richcopy is not so fast that there is time to defragment an NTFS partition with 60 million files on it before CP would have finished."
Wut?
Richcopy copies more than one file a time.
If your destination is a Windows server, then doing this causes massive fragmentation. So the total time to finish the copy is "time to copy" +"time to defrag". The goal is not just to get files from server A to server B, but to get them there in a ashion that ensures that server B is ready for prime time.
rsync
I don't know what the state of rsync servers on windows is, but on unix systems it's the one true way for copying files over a network.
It's fast, handles failure well, doesn't get it's nickers in a twist when doing large recursive copies, and will run over ssl with a bit of work. It can also give you plenty of feedback about progress if you need it.
The idea of copying that many files using something like a file manager or cp, no matter how good, just fills me with horror.