Horovod with XLA in Tensorflow¶
Basic usage¶
XLA Horovod ops can be enabled by setting HOROVOD_ENABLE_XLA_OPS = 1
by controlling the registration of the ops to Tensorflow/XLA.
There are two main ways to enable XLA and they could work with Horovod in different ways:
For Explicit compilation with tf.function(jit_compile=True):
os.environ["HOROVOD_ENABLE_XLA_OPS"] = "1"
@tf.function(jit_compile=True)
def compiled_hvd_allreduce(self, dtype, dim):
tensor = self.random_uniform(
[17] * dim, -100, 100, dtype=dtype)
summed = hvd.allreduce(tensor, average=False)
return summed
In this way, all the ops in the compiled_hvd_allreduce
function are lowered into XLA per the compilation requirement. If the XLA Horovod ops are not enabled, XLA will report compilation errors.
For Auto-clustering:
Auto-clustering is a convenient way to use XLA by simply setting TF_XLA_FLAGS=--tf_xla_auto_jit=2
and the XLA JIT automatically selects ops in the Tensorflow graph to be lowered into XLA. In this mode, enabling XLA Horovod ops is optional, because the auto-clustering can work even if the Horovod ops are left to be run by Tensorflow (devices) while only parts of the graphs are lowered onto XLA (devices).