Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note Issue Date: June 2018 Advanced Micro Devices £1
Open the catalog to page 1© 2018 Advanced Micro Devices, Inc. All rights reserved. The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied...
Open the catalog to page 2Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note Contents Overview
Open the catalog to page 3Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note
Open the catalog to page 456263 Rev. 3.01 June 2018 Performance Tuning Guidelines for Low Latency Response on AMD EPYCM-Based Servers Application Note
Open the catalog to page 5Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note Overview Low latency market segments, such as financial trading or real time processing, require for a server to provide consistent system response under 10 µs. This document provides guidance for tuning servers utilizing the AMD EPYC™ processor to reach this requirement. The guidelines cover hardware configuration, BIOS settings, operating system kernel configurations, scripts to control the environment of the target applications, and a description of the proper technique for collecting code execution...
Open the catalog to page 656263 Rev. 3.01 June 2018 Performance Tuning Guidelines for Low Latency Response on AMD EPYCM-Based Servers Application Note >oot@l Qcal host Architecture: Byte Order: On-line CPU(s) list Th r ead £s) per cure: Carets) per socket: Socket £s) : NUMA node(s): Model name: NUMA nodeO CPU(s) NUMA nodel CPU(s) NUMA node2 CPU(s) node3 CPUfs) node4 CPUfs) nodes CPU(s) node6 CPU(s) node7 CPU(s) Flags: fpu vine de pse tsc msr pae mce cxS apic sep mtrr pge mca cmov pat pse36 clflush ranx fxsr sse sse2 ht syscall nx ranxext fxsr_opt pdpelgb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid...
Open the catalog to page 7Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note Multiple options for the output format are available. An example of the graphical output generated by the following command line is provided in Figure 2. Figure 2. Output from lstopo for a One-Socket System Under Test Hardware Configuration
Open the catalog to page 8Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note To achieve the best response times, optimize the system topology where possible to match your operational needs. Be aware of the memory placement, install memory evenly across the NUMA nodes, and try to maximize the use of local memory. Isolate the cores executing your timecritical application from the operating system scheduler so that other applications and kernel threads do not steal execution time from your application. Isolate your application’s cores from interrupts as much as possible. Utilize...
Open the catalog to page 9Performance Tuning Guidelines for Low Latency 56263 Rev. 3.01 June 2018 Response on AMD EPYC™-Based Servers Application Note • Prefetcher -Enabled • L2 Stream HW Prefetcher - Enabled • SR-IOV - Disabled • Min Processor Idle Power Core C-State - No C-State - Disabled • NUMA Group Size Optimization - Clustered • Memory patrol scrubbing - Disabled • Memory Refresh Rate - 1X If there are other options that differentiate between power and performance, latency, or deterministic performance, choose those over power. For example, if available, choose Performance Determinism to minimize performance jitter....
Open the catalog to page 10Performance Tuning Guidelines for Low Latency Response on AMD EPYC™-Based Servers Application Note Memory Options Disable Node Interleaving to preserve the use of local node memory allocation. Local memory should provide the lowest access latency. The new AMD Secure Memory Encryption feature could cause a small latency increase. Disable if low latency is of higher priority than the new increased security. Virtualization Options If your application scenario does not require virtualization, then disable AMD Virtualization Technology. With virtualization disabled, also, disable AMD IOMMU. It can...
Open the catalog to page 11Performance Tuning Guidelines for Low Latency 56263 Rev. 3.01 June 2018 Response on AMD EPYCM-Based Servers Application Note Consider using the RHEL 7 real time kernel when the latency requirements are below 10 microseconds. This kernel has been especially tuned for low latency deterministic response. The real time kernel can be installed with the following commands: However, assuming the user does not want to use anything but the standard RHEL 7.4 kernel, the remaining tuning tips are illustrated with a two socket AMD EPYC™-based system with the standard kernel installed. Note: The 10 ps target...
Open the catalog to page 1256263 Rev. 3.01 June 2018 Performance Tuning Guidelines for Low Latency Response on AMD EPYCM-Based Servers Application Note Enter the following command to verify the kernel command line parameters used when the kernel was booted: BOOTJMAGE=/vmlinuz-3.10.0-362.el7.x36_64 root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/raat rd.lvm.lv=rhel/swap rhgb quiet idle=poll transparent_hugepage=never audit=0 selinux=0 nmi_watchdog=0 riohz=on clocksource=tsc nosoftlockup mce=ignore_ce cpuidle.off=1 skew_tick=1 processor.max_cstate=0 isolcpus=8-15 rcunocbs rcu_nocb_poll nohz_full acpi_irq_noba...
Open the catalog to page 13Performance Tuning Guidelines for Low Latency 56263 Rev. 3.01 June 2018 Response on AMD EPYCM-Based Servers Application Note Dirty_background_ratio is the limit to which, when dirty page exceeds, then they start getting written to the disk. Sched_lantency_ns represents the preemption latency of a CPU bound task. Increasing this value increases the task's timeslice. Sched_min_granularity_ns is the minimum preemption granularity for CPU bound tasks. Sched_migration_cost_ns determines how long a task remains cache-hot after the last execution and, hence, avoid migration off the CPU. Increasing this...
Open the catalog to page 142 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
12 Pages
2 Pages
12 Pages
2 Pages
12 Pages
2 Pages
2 Pages
12 Pages
12 Pages
2 Pages
2 Pages
2 Pages
2 Pages
5 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
2 Pages
5 Pages
2 Pages
2 Pages
3 Pages
2 Pages
2 Pages