                               README Notes
                    Broadcom Netxtreme bnxt - IB peer memory Driver

                              Broadcom Inc.
                         5300 California Avenue,
                            Irvine, CA 92617

                   Copyright (c) 2015-2024 Broadcom Inc.
                           All rights reserved


Table of Contents
=================


Introduction
============

This package includes the components required for testing GPUDirect over Broadcom
Netxtreme -E adapter.

bnxt_en : Ethernet driver for Broadcom adapters
bnxt_re : RoCE driver for Broadcom adapter
ib_peer_mem: Interface module between ib_core and bnxt_re/nvidia_peermem module.
	     It gets registered to ib_core as a client module.
	     This module gets a callback for every RoCE devices registered with IB stack.
             Thus this module can build a list of IB devices that support GPUDirect.
 
             - Exports functions so that nvidia_peermemory gets registered with this module
             - Exports function so that bnxt_re or vendor drivers can request the peer memory to be
               registered.
             - Maintains a mapping between the ib_core User memory structures with gpu peer memory
               structure
nvidia_peermem: An interface module between NVIDIA kernel driver and ib_peer_mem module.
		This module can be downloaded from below github maitained by NVIDIA.

		https://github.com/NVIDIA/open-gpu-kernel-modules

		Minor changes are required in build scripts/Kbuild so that this module can be
                compiled and used with Broadcom's ib_peer_mem module.

		A sample patch file "gpudirectbuild.patch" is included in this relese to enable
                user to modify nvidia-peermem module and use it with Netxtreme adapter
Prerequisites:
=============

	- Install all the  Infiniband Packages and development tools  in the OS

	- If NVIDIA GPUs are used, download the NVIDIA CUDA from NVIDIA and follow the
	  installation instructions (https://developer.nvidia.com/cuda-downloads)
	- For AMD GPUs, install RoCM packages as per the AMD RoCM package instructions.
	- Package is tested with Ubuntu 16.04 (4.4.0-21-generic) and
				 Ubuntu 18.04 (4.15.0-20-generic)
				 Ubuntu 20.2
				 CentOS 7.3 + OFED 4.8-2/CentOS 7.x and later/CentOS 8.x and later

Note: For NVIDIA GPUs, the nv_peer_mem module is deprecated by NVIDIA. Use nvidia-peermem
      module as an alternate

Installation
============

Installation using Source:

If external OFED is used export the OFED version using the following command.
after changing the OFED version accordingly.

export OFED_VERSION=OFED-4.8-1


Compilation Instructions:-

	- Untar the netxtreme-peer-mem-<version>.tar.gz
	- cd netxtreme-peer-mem-<version>
 - Install Broadcom Drivers and ib_peer_mem
	- make
	- make install
 - Install NVIDIA peer memory module
	- Uninstall any existing CUDA package & nvidia drivers

	- Re-install CUDA package with kernel open module flavor enabled:-
		* Select below option from CUDA installation menu:
			: Options->Driver Options->"Install the kernel open module flavor"

	- Download open-gpu-kernel-modules from below github project

		https://github.com/NVIDIA/open-gpu-kernel-modules

	- Make sure to fetch a specific version matching CUDA version installed
          in the system

	- Refer "gpudirectbuild.patch" file and make required changes in build
          scripts/KBuild files.

	- Set BNXT_PEER_MEM_INC environment variable which points to absolute path of
          Broadcom's peer_mem module. Ensure that the path has the Module.symvers file.
	  If Module.symvers is not available, build the peer mem again.

	- Follow instructions from README.md (included in open-gpu-kernel-modules package)
          to complete installation

Module load instructions:

  - To load Broadcom Drivers and ib_peer_mem
	- modprobe ib_peer_mem
	- modprobe bnxt_en
	- modprobe bnxt_re
  - To load the NVIDIA peer memory module
	- modprobe nvidia_peermem
  - For AMD GPU, reload amdgpu driver after loading ib_peer_mem

Installation using DKMS:

The DKMS package distributed by Broadcom includes bnxt_en, bnxt_re and ib_peer_mem
binaries only. The interface driver for the Graphic accelerator (similar to
nvidia_peermem) needs to be installed separately.

Customers using NVIDIA GPUs needs to download the nvidia_peermem from github and
make required changes as captured in this document and install it.

Configuration Tips & Known Issues
=================================

1. Tested with perftest using --use_cuda option.  Use perftest with CUDA support
    https://github.com/linux-rdma/perftest.git

2. FW crash seen while testing RoCE traffic with GPU CUDA memory on PLX PCIe bridge
 
   This is because the root port is responding with a UR for an access to the peer
   device and bnxt adapter FW hangs due to this response. To avoid this failure,
   disable all the control fields from ACS(Access Control Service) PCIe extended capability

	Example usage for PLX switch (PLX BDF is 19:10.0)
		1. lspci -nn | grep PLX
		2. lspci -s 19:10.0 -vvv (Check for ACS capability)
		3. setpci -s 19:10.0 f2a.w=0000 (To disable ACS capability)
3. To support IB atomic operations (like ib_atomic_bw) over peer devices like GPUs,
   device should advertise Atomic Operation completion capability.
   To confirm if this capability is enabled, check the PCI express capability for
   AtomicOpsCap using
   lspci -s <b:d.f> -vvv
   Trying atomic operations over peer device which doesn't support atomic operations
   can cause failures due to PCI errors seen by netxtreme devices.

4. If kernel has support for peer_mem API, user has to select recommended client
   driver module from GPU vendor. For example, refer below link for recommended
   client driver from NVIDIA if system has NVIDIA GPU

   https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem

5. While compiling open-gpu-kernel-modules against netxtreme-peer-mem package,
   user might encounter build error as shown below.
   ERROR: modpost: GPL-incompatible module nvidia-peermem.ko uses GPL-only symbol
   'ib_unregister_peer_memory_client'
   ERROR: modpost: GPL-incompatible module nvidia-peermem.ko uses GPL-only symbol
   'ib_register_peer_memory_client'
   This is a known issue with certain open-gpu-kernel-modules versions. Enable
   USE_NVIDIA_GPU flag to workaround this issue as shown in the example below
   Ex. make USE_NVIDIA_GPU=1.
