Horovod with XLA in Tensorflow¶
XLA Horovod ops can be enabled by setting
HOROVOD_ENABLE_XLA_OPS = 1 by controlling the registration of the ops to Tensorflow/XLA.
There are two main ways to enable XLA and they could work with Horovod in different ways:
For Explicit compilation with tf.function(jit_compile=True):
os.environ["HOROVOD_ENABLE_XLA_OPS"] = "1" @tf.function(jit_compile=True) def compiled_hvd_allreduce(self, dtype, dim): tensor = self.random_uniform(  * dim, -100, 100, dtype=dtype) summed = hvd.allreduce(tensor, average=False) return summed
In this way, all the ops in the
compiled_hvd_allreduce function are lowered into XLA per the compilation requirement. If the XLA Horovod ops are not enabled, XLA will report compilation errors.
Auto-clustering is a convenient way to use XLA by simply setting
TF_XLA_FLAGS=--tf_xla_auto_jit=2 and the XLA JIT automatically selects ops in the Tensorflow graph to be lowered into XLA. In this mode, enabling XLA Horovod ops is optional, because the auto-clustering can work even if the Horovod ops are left to be run by Tensorflow (devices) while only parts of the graphs are lowered onto XLA (devices).
List of supported XLA Horovod ops¶
The supported op list is: