The standard tool for copying data between two clusters is probably distcp. It can also be used to keep the data of two clusters updated. Here the update process is a asynchronous process using a fairly basic update strategy. Distcp is a simple tool, but some edge cases can get complicated. For once the distributed copy between two HA clusters is such a case. Also important to know is that since the versions of RPC used by HDFS can be different it is always a good idea to use a read only protocol like hftp or webhdfs to copy the data from the source system. So the URL could look like this hftp://source.cluster/users/me . WebHDFS would also work, because it is not using RPC.
Another corner case using distcp is the need to copy data between a secure and none secure cluster. Such a process should always be triggered from the secure cluster. This would be the cluster the owner of the cluster has a valid ticket to authenticate against the secure cluster. But this would still yield an exception as the system would complain about a missing fallback mechanism. On the secure cluster it is important to set the ipc.client.fallback-to-simple-auth-allowed to true in the core-site.xml in order to make this work.
What is left to to is make sure the user has the needed right on both systems to read and write the data.