Been there, done this for both styles of migration. They all suck
I manage enterprise NFS storage and over the past 15+ years I've worked with various tools and solutions to the problem of moving old files from expensive storage to cheaper storage, either disk or tape. And it all sucks sucks sucks.
If you do backups to Tape, then suddenly you need to have a tight integration with the storage and backup vendors, otherwise restores (and people only care about restores!) become horribly horribly painful to do.
We tested out a product from Acopia, which had an appliance that sat in front of our NFS storage. It would automatically move data between backend storage tiers automatically. Really really slick and quite neat. Then when we implemented it we ran into problems. 1. With too many files, it just fell over. Ooops. We had single volumes with 10Tb of data and 30 million files. Yeah, my users don't clean up for shit. 2. Backups sucked. If you backed up through the Acopia box, then you were bottlenecked there. If you backed up directly from the NFS storage (either NFS mounts or NDMP) then your data was scattered across multiple volumes/tapes. Restores, especially single file restores were a nightmare.
We then tried out a system where our backup vendor (CommVault) hooked into the NFS Netapp filer and used the fpolicy to stub out files that were archived to cheaper disk and then to tape. This worked... backups were consistent and easy to restore since CommVault handled all the issues for us.
But god help you if a file got stubbed out and then the link between got broken, or you ran into Vendor bugs on either end. It also turned into a nightmare, though less of one I admit. But it also didn't scale well, and took up a ton of processing and babying from the admins.
I'd really like to get a filesystem (POSIX compliant please!) that could do data placement on a per-directory basis to different backend block storage volumes. So you could have some fast storage which everything gets written to by default, and then slower storage for long term archiving.
Again, the key is to make restores *trivial* and *painless*. Otherwise it's not worth the hassle. And it needs to be transparent to the end users, without random stubs or broken links showing up. Of course, being able to do NDMP from the block storage to tape would be awesome, but hard to do.
Replication, snapshots, cloning. All great things to have. Hard to do well. It's not simple.
Just scanning a 10tb volume with 30 million files takes time, the metadata is huge. I use 'duc' ( https://github.com/zevv/duc) to build up reports that my users can use via a web page to find directory trees with large file sizes across multiple volumes, so they can target their cleanups. And so I can yell at them as well with data to back me up.
Running 'du -sk | sort -n' on large directory trees is just painful. Again, having a filesystem which could keep this data reasonably upto date for quick and easy queries would be wonderful as well. No need to keep it completely consistent. Do updates in the background, or even drop them if the system is busy. That's what late night scanning during quiet times are for.
It's not a simple problem, and it's nice to see various people trying to solve the issue, but ... been there. Tried it. Gave up. It's just too painful.