Horovod documentation¶
Horovod improves the speed, scale, and resource utilization of deep learning training.
Get started¶
Choose your deep learning framework to learn how to get started with Horovod.
To use Horovod with TensorFlow on your laptop:
- Install Open MPI 3.1.2 or 4.0.0, or another MPI implementation.
-
If you've installed TensorFlow from PyPI, make sure that
g++-5
or above is installed.
If you've installed TensorFlow from Conda, make sure that thegxx_linux-64
Conda package is installed. - Install the Horovod pip package:
pip install horovod
- Read Horovod with TensorFlow for best practices and examples.
To use Horovod with Keras on your laptop:
- Install Open MPI 3.1.2 or 4.0.0, or another MPI implementation.
-
If you've installed TensorFlow from PyPI, make sure that
g++-5
or above is installed.
If you've installed TensorFlow from Conda, make sure that thegxx_linux-64
Conda package is installed. - Install the Horovod pip package:
pip install horovod
- Read Horovod with Keras for best practices and examples.
To use Horovod with PyTorch on your laptop:
- Install Open MPI 3.1.2 or 4.0.0, or another MPI implementation.
-
If you've installed PyTorch from PyPI, make sure that
g++-5
or above is installed.
If you've installed PyTorch from Conda, make sure that thegxx_linux-64
Conda package is installed. - Install the Horovod pip package:
pip install horovod
- Read Horovod with PyTorch for best practices and examples.
To use Horovod with Apache MXNet on your laptop:
- Install Open MPI 3.1.2 or 4.0.0, or another MPI implementation.
- Install the Horovod pip package:
pip install horovod
- Read Horovod with MXNet for best practices and examples.
Guides¶
- Overview
- Concepts
- Horovod Installation Guide
- API
- Horovod with TensorFlow
- Horovod with XLA in Tensorflow
- Horovod with Keras
- Horovod with PyTorch
- Horovod with MXNet
- Run Horovod
- Elastic Horovod
- Benchmarks
- Inference
- Horovod on GPU
- Horovod with MPI
- Horovod with Intel(R) oneCCL
- Build a Conda Environment with GPU Support for Horovod
- Horovod in Docker
- Horovod on Spark
- Horovod on Ray
- Horovod in LSF
- Tensor Fusion
- AdaSum with Horovod
- Analyze Performance
- Distributed Hyperparameter Search
- Autotune: Automated Performance Tuning
- Process Sets: Concurrently Running Collective Operations
- Troubleshooting
- Import TensorFlow failed during installation
- MPI is not found during installation
- Error during installation: invalid conversion from ‘const void*’ to ‘void*’ [-fpermissive]
- Error during installation: fatal error: pyconfig.h: No such file or directory
- NCCL 2 is not found during installation
- Pip install: no such option: –no-cache-dir
- ncclAllReduce failed: invalid data type
- transport/p2p.cu:431 WARN failed to open CUDA IPC handle : 30 unknown error
- Running out of memory
- libcudart.so.X.Y: cannot open shared object file: No such file or directory
- FORCE-TERMINATE AT Data unpack would read past end of buffer
- segmentation fault with tensorflow 1.14 or higher mentioning hwloc
- bash: orted: command not found
- Contributor Guide