SMC Troubleshooting

Hardware Setup Verification

Verify Device Availability

Use lspci to check RoCE and ISM device availability:
  $ lspci
  0000:00:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520
               Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
  0001:00:00.0 Non-VGA unclassified device: IBM Internal Shared Memory (ISM)
               virtual PCI device

z/VM Only: Verify Card Attachment

  $ vmcp ‘QUERY PCIFUNCTION’
  PCIF 00000280 ATTACHED TO S8360018 00000280 ENABLED  10GbE RoCE
  PCIF 000002E2 ATTACHED TO S8360018 000002E2 ENABLED  ISM

IBM Z Only: PNET ID Verification

Verify PNET IDs are set and match. Use smc_rnics and/or smc_dbg if available. Otherwise:

  # RoCE device
  $ cat /sys/class/net/ens1/device/util_string | iconv -f IBM-1047 -t ASCII
  NetworkA

  # ISM device with PCI ID 0001:00:00.0 according to lspci:
  $ cat /sys/bus/pci/devices/0001:00:00.0/util_string | iconv -f IBM-1047 \
        -t ASCII
 
NetworkA

  # OSA or HiperSockets device
  $ cat /sys/devices/css0/chp0.`cat /sys/class/net/enccw0.0.f500/device/chpid` \
        /util_string | iconv -f
IBM-1047 -t ASCII
  NetworkA


OS Setup Verification

RoCE Express Cards: Verify Port and NIC States:

  $ cat /sys/class/infiniband/mlx4_0/ports/1/phys_state
  5: LinkUp
  $ cat /sys/class/infiniband/mlx4_0/ports/1/state
  4: ACTIVE
  $ ip link show ens2
  3: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 \
    qdisc mq state UP mode DEFAULT group default qlen 1000
      link/ether 82:03:14:32:f1:a0 brd ff:ff:ff:ff:ff:ff

RoCE Express Cards with VLANs: Verify Interfaces

Make sure the VLAM interface exists and has an IP address assigned.
Note that the interface can be down.
  $ ip addr show ens2.201
  7: ens2.201: <BROADCAST,MULTICAST,DOWN,LOWER_UP> mtu 1500
qdisc \
     fq_codel master virbr0 state UNKNOWN group default
qlen 1000 \
      link/ether fe:54:00:f9:cf:be brd ff:ff:ff:ff:ff:ff
      inet 192.168.23.42/24 scope global vnet3
         valid_lft forever preferred_lft forever

Check Available Free Memory via /proc/meminfo

  $ grep "^Mem" /proc/meminfo
  MemTotal:        1710584 kB
  MemFree:           83404 kB
  MemAvailable:    1125752 kB

If things do not work: Check smcss Output for Reason Code

  $ smcss -a
  State  UID   Inode  Local Address    [...] Intf Mode
  ACTIVE 20000 115762 10.101.4.8:60594 [...] 0000 TCP 0x05000000/0x0000521e
See smcss man page (smc-tools v1.3 or later, or see here for the latest version) for an explanation of the reason codes.

Troubleshooting z/OS

Autonomics” function might disable connections to peers that are unlikely to benefit from SMC (most typically: Short-lived connections exchanging few data) for a certain period of time. In the TCP/IP profile add NOAUTOSMC as follows:
  GLOBALCONFIG SMCGLOBAL NOAUTOSMC
See here for further details.


How to enable Unruly Applications

Some applications have involved startup procedures that will not easily work with smc_run.

Example

DB2 requires registration of environment variables through the db2set command:
  $ db2set -i db2inst1 DB2ENVLIST="LD_LIBRARY_PATH LD_PRELOAD"
  $ smc_run db2start

If smc_run does not work for an application with PID <p>:
Check /proc/<p>/environ whether LD_PRELOAD is set correctly, pointing to libsmc-preload.so!

Alternative Approaches

Set LD_PRELOAD in the user ID’s profile that starts the respective processes, e.g. the DB2 instance owner:
  $ echo “export LD_PRELOAD=libsmc-preload.so” >> ~/.profile
Or use /etc/ld.so.preload to enable the entire system:
  $ cat /etc/ld.so.preload
  libsmc-preload.so

Note: This will add the preload library to all processes on the entire system or all processes started by the respective user! This includes processes not performing any socket calls at all, e.g. the ls command.
Therefore, always prefer usage of smc_run.

No comments:

Post a Comment

Popular Posts