qeth Devices: Promiscuous Modes, Live Guest Migration, and more

qeth devices, namely OSA-Express and HiperSockets, have a vast array of functionalities that is easy to get lost in. This entry illustrates some of the most commonly sought functionalities, while trying to avoid confusing the reader with too much background information.

 

Bridgeport Mode

Initially, the only way to enable promiscuous mode on OSA-Express adapters and HiperSockets was through the so-called bridgeport mode. The concept of the bridgeport mode distinguishes between between ports as follows:

  • Primary bridgeport: The primary bridgeport receives all traffic for all destination addresses unknown to the device. Or, in other words: If the device receives data for a destination unknown to it, instead of dropping it, it will be forwarded to the current primary bridgeport instead. Which further implies that as soon as an operating system registers a MAC address with the device, traffic destined for that MAC address becomes "invisible" to the bridgeport.
    Note: Only a single operating system can use the primary bridgeport on an adapter at any time.
  • Secondary bridgeport: Whenever the operating system that currently has the primary bridgeport gives up on it, one of the secondary bridgeports will become the new primary. An arbitrary number of operating systems can register as a secondary bridgeport.

Bridgeport mode is available in Layer 2 mode only. Furthermore, HiperSockets devices need to be defined as external-bridged in IOCDS.
Use attributes in the device's sysfs directory as follows:

  • bridge_role: Set the desired role (none, primary, or secondary), and query the current one.
  • bridge_state: Query the current state (active or inactive)

Bridgeport mode effectively provides a promiscuous mode. But note that in addition to enabling the primary bridgeport mode, the respective interface has to have the promiscuous mode set, still!
All in all, here is how usage of this feature typically looks like:

  $ echo primary >/sys/devices/qeth/0.0.bd00/bridge_role

  # verify that we got primary bridgeport, not secondary, and are active:

  $ cat /sys/devices/qeth/0.0.bd00/bridge_state 
  active
  $ cat /sys/devices/qeth/0.0.bd00/bridge_role

  primary

  # enable promiscuous mode on the interface
  $ ip link set <interface> promisc on

The downside of this approach is that only a single operating system per device can enable the primary bridgeport mode, which scales only that far. Therefore, something better, with more functionality was introduced to the platform.


VNIC Characteristics

Introduced with IBM z14/LinuxONE II for HiperSockets, and IBM z15/LinuxONE III for OSA, the VNIC characteristics feature provides promiscuous mode for multiple operating systems attached to the same device, and provides additional functionality which can be very handy especially with KVM.
The VNIC characteristics can be controlled through a number of attributes located in an extra subdirectory called vnicc in the device's sysfs directory.

Let us focus on two main functionalities.

Promiscuous Mode

Technically, VNIC does not provide a traditional promiscuous mode (just like bridgeport mode did not in the literal sense), but rather emulates a self-learning switch. However, for users looking for a promiscuous mode that is usable in conjunction with a Linux bridge or an Open vSwitch, the end-result is the same.

To activate, set the attributes as follows:

  echo 1>/sys/devices/qeth/0.0.bd00/vnicc/flooding
  echo 1>/sys/devices/qeth/0.0.bd00/vnicc/mcast_flooding
  echo 1>/sys/devices/qeth/0.0.bd00/vnicc/learning

Again, in addition to enabling the promiscuous mode on the device, the respective interface has to have the promiscuous mode set, still:

  ip link set <interface> promisc on


KVM Live Guest Migration

Providing connectivity to virtual servers running in KVM, administrators have two choices to provide connectivity:

  • Via Open vSwitch: Requires a promiscuous mode, see above. Virtual servers migrated between the two Open vSwitches will have uninterrupted connectivity thanks to the devices being configure in promiscuous mode, provided that the networking architecture is set up accordingly. The two Open vSwitches may or may not share the same networking device.
  • Via MAC Address Takeover: This is only required in case both, the source and the target KVM host share the same device and use MacVTap to connect to it. While the traffic will still run through the same device, some handshaking has to take place to make sure that the MAC address is configured correctly, and traffic forwared to the target KVM host once migration has completed. This has to be authorized - otherwise, an attacker could divert traffic elsewhere.

Luckily, VNIC characteristics offers functionality for MAC address takeover, too. To enable, set the VNIC characteristics as follows:

On the source KVM host:

  echo 1>/sys/devices/qeth/0.0.bd00/vnicc/takeover_learning

On the target KVM host:

  echo 1>/sys/devices/qeth/0.0.bd00/vnicc/takeover_setvmac


Final Words

Note that bridgeport mode and VNIC characteristics are mutually exclusive! Meaning as soon as e.g. a single VNIC characteristics-related attribute is activated, bridgeport-related functionality is not available anymore until that VNIC-characteristic is disabled again.

Furthermore, check your Linux distribution's tools on how to persist the changes outlined above. On many distros, chzdev (comes with the s390-tools package) does the job, but not (yet) on all.

This article only provides a brief overview. Both, promiscuous mode and the VNIC characteristics have a lot more to it than what was covered in this brief overview, which merely aims to provide just enough information to get readers started with the most common usecases. For a deeper understanding, check the respective sections in the Device Drivers, Features, and Commands book.

New Release: Systemd v249

Systemd v249 has been released. One substantial change that users need to be aware of is that this version will introduce a new naming scheme for network interfaces on RoCE Express adapters!

While previous versions would present a variety of interface names for RoCE Express adapters which were almost unpredictable, Systemd v249 changes the interface names to a new, simple scheme based on the user identifier (UID):

  eno<UID in decimal>

Use 1-9 for the UID of your interfaces to avoid the need to calculate the decimal value of the Linux interface name from the original UID value in hexadecimal.

However, this change may break network configurations that reference the interface name, depending upon the Linux distribution in use, as Linux distros often reference configurations using interface names. Therefore, adding a copy of the existing configuration using the new interface name might be a good idea! Since this change did not yet hit any of the major Linux distributions on Z, we will have to monitor what the actual experience is like as things develop further.

Note that usage of interface altnames might be a handy tool for backwards compatibility in existing scripts.

While most of the time, the exact interface name is of not much interest (otherwise, by now users would have revolted against the common PCI-based naming schemes that provide for easy to remember names like enP3p0s0np0...), they can be crucial to know for some of the distributions when e.g. installing to a new partition: Interface names might be used in PRM files for network-specific configuration at installation time.
With the previous scheme, guessing the interface name required a deep technical understanding of the interaction between systemd, the specific kernel level, and the IBM Z I/O configuration. With the new scheme, it becomes very easy - just beware to pay attention to the fact that the UID is displayed/handled in hex in most places. The interface name is the only spot to use decimal!

This change requires Linux kernel 5.13 or later.

New video: Enterprise Key Management for Pervasive Encryption of Data Volumes

A new video on how to easily manage pervasive encryption keys using an enterprise key management solution for Linux on Z and LinuxONE was released. See the IBM Z version here, and another one for LinuxONE here.

New Release: s390-tools v2.17

s390-tools v2.17 is out. This release is in support of Linux kernels 5.12 and 5.13.

For a detailed list of changes, see the changelog.

New Release: smc-tools v1.6

smc-tools v1.6 is now available for download here.

Highlights in this release:

smcd/smcr: New stats Command

Display statistics on SMC usage to gain valuable insights: 

  root@tux> smcd stats
  SMC-D Connections Summary
    Total connections handled       152730
    SMC connections                 152730
    Handshake errors                     0
    Avg requests per SMC conn          813.1
    TCP fallback                         0

  RX Stats
    Data transmitted (Bytes)  270311225427 (270.3G)
    Total requests                61619256
    Buffer full                     114746 (0.19%)
             8KB   16KB   32KB   64KB  128KB  256KB  512KB >512KB
    Bufs       0      2    140 2.103K      0      2      0 95.97K
    Reqs  54.07M      3 7.552M      0      0      0      0      0

  TX Stats
    Data transmitted (Bytes)  271274963896 (271.3G)
    Total requests                62565728
    Buffer full                          0 (0.00%)
    Buffer full(remote)              90038 (0.14%)
    Buffer too small                     0 (0.00%)
    Buffer too small(remote)             0 (0.00%)
             8KB   16KB   32KB   64KB  128KB  256KB  512KB >512KB
    Bufs       0 2.384K    142      0      0      2      0      0
    Reqs  54.92M      3 7.552M      0      0      0      0      0

  Extras
    Special socket calls                 0

Requires Linux kernel 5.14. See the respective man page for further details.

smc_run: New Command-Line Switches

Use command-line options -r and -t to request specific buffer sizes for receive and transmit buffers respectively:

  root@tux> smc_run -r 1M -t 128K ./foo

Alternatively, use respective environment variables when using the preload library directly:

  root@tux> export SMC_RCVBUF=1M
  root@tux> export SMC_SNDBUF=128K

Again, see the respective man page for further details.

For more information, see the README.

Documentation: New Publications

Available now on the IBM Docs site for Linux on Z and LinuxONE

  • openCryptoki - An Open Source Implementation of PKCS #11 (new)
    openCryptoki is an open source implementation of the Cryptoki API as defined by the PKCS #11 Cryptographic Token Interface Standard. This new publication explains how to manage tokens and how to write PKCS #11 programs that use the cryptographic services of tokens, with a focus on services that exploit IBM Z® cryptographic hardware through IBM cryptographic libraries.
  • Secure Key Solution with the Common Cryptographic Architecture 7.3: Application Programmer's Guide
    CCA now offers new services for format preserving cryptography. It also supports Koblitz elliptic curve cryptography.
    In addition, CCA supports new quantum safe keys for digital signatures, AES keys with derived unique key per transaction (DUKPT) processing, and a new enhanced key wrapping method for applicable verbs.
  • Distribution-specific documentation for SUSE Linux Enterprise Server 15 SP3 on IBM Z (with GA, 6/22)
    You can now process leap seconds through STP, control the LPAR configuration state of cryptographic devices from Linux, and use new commands to display information about SMC-R and SMC-D link groups and devices.
  • kdump - Recommendations for Linux on Z
    This paper gives an overview and recommendations for configuring kdump on Red Hat and SUSE distributions. It provides an introduction to configuring and using kdump. This includes the kdump configuration, kernel page filtering, hardware-accelerated compression and debugging of the kdump setup.




Popular Posts