RHCSA Series (4): Managing Memory
Alphabetical List of the Abbreviations used in this article:
CPU = Central Processing Unit
cron = Chron Table (scheduler)
dstat = Disk Statistics
gdb = GNU Debugger
GNU = GNU's Not Unix
htop = Interactive Process Viewer
I/O = Input/Output
LVM = Logical Volume Manager
LLM = Large Language Model
MEM = Memory
MMU = Memory Management Unit
NUMA = Non-Uniform Memory Access
OOM = Out-Of-Memory
pid = Process ID
RAM = Random Access Memory
RHCSA = Red Hat Certified System Administrator
RSS = Resident Set Size
sar = System Activity Reporter
Slab = Memory Pool
strace = System Call Tracer
sysstat = System Statistics
THP = Transparent Huge Pages
TLB = Translation Lookaside Buffer
VSZ = Virtual Memory Size
vmstat = Virtual Memory Statistics
Executive Summary
As a Red Hat Certified System Administrator (RHCSA), mastering effective memory management is critical for optimizing system performance and stability.
This involves a foundational understanding of memory architecture, including virtual memory, which allows the operating system to extend physical RAM capacity by temporarily swapping data to disk storage, and the paging mechanism managed by the kernel to move fixed-size data blocks between RAM and swap space.
Swap space, a dedicated disk partition or file, serves as a fallback resource when physical memory is exhausted, ensuring system operations can continue despite limited RAM.
Monitoring memory usage through tools like `free`, `vmstat`, and `top` enables administrators to assess real-time metrics such as available memory, swap activity, and resource-heavy processes, providing insights into potential performance bottlenecks or inefficiencies.
Configuring and managing swap space, using commands such as `mkswap`, `swapon`, and `swapoff`, requires strategic planning to align swap size with system workload demands and hardware constraints.
Additionally, tuning kernel parameters in `/proc/sys/vm`, such as `vm.swappiness` to control swap prioritization, `vm.vfs_cache_pressure` to manage file system cache reclamation, and `vm.dirty_ratio` to regulate memory dirty page thresholds, allows administrators to fine-tune memory behavior for specific use cases.
Memory caching further enhances performance by leveraging unused RAM for disk I/O buffering, with features like Transparent Huge Pages optimizing large memory block management.
Advanced diagnostics using tools like `perf` and `SystemTap` enable granular analysis of memory, CPU, and I/O interactions, helping identify resource contention or inefficient processes.
Troubleshooting common issues, such as excessive swapping, memory leaks, or unresponsive applications, relies on log analysis, process tracing with `strace`, and debugging utilities like `gdb` to resolve root causes and maintain system reliability. Together, these practices form a comprehensive approach to managing memory efficiently in Red Hat environments.
In this article, we'll take a deep dive into the skills and disciplines that will allow an RHCSA to manage memory on GNU/Linux computer systems with a high level of professional and technical expertise.
How I Used Reference 1 in This Article:
Reference 1 cited many features that make up GNU/Linux and other computer operating systems. The third of these features is "managing memory". This third feature will be the 100% focus of this article.
Credits
The folllowing research assistants were invaluable tools that allowed me to complete this article in a timely manner: Mistral (an open-source local large language model - LLM) and HuggingChat (an online portal to about a dozen open source LLMs).
Understanding the GNU/Linux Memory Architecture
RHCSAs must understand GNU/Linux’s virtual memory system, which allows processes to use more memory than physically available by mapping virtual address spaces to physical RAM and disk-based swap. This abstraction relies on paging, where fixed-size memory blocks (pages, typically 4KB) are moved between RAM and swap space as needed. The kernel manages page tables to track these mappings, while the CPU’s Memory Management Unit (MMU) translates virtual addresses to physical ones.
Linux divides memory into zones to accommodate hardware limitations. For example, ZONE_DMA reserves memory below 16MB for legacy devices, ZONE_NORMAL covers up to 896MB (on 32-bit systems), and ZONE_HIGHMEM handles memory beyond that. This ensures compatibility with devices that cannot access all physical memory.
The kernel also supports memory overcommit, allowing processes to allocate more memory than available under the assumption that not all allocations will be used. This behavior is controlled by /proc/sys/vm/overcommit_memory and /proc/sys/vm/overcommit_ratio. RHCSAs must tune these settings to balance resource efficiency and system stability.
Linux distinguishes between anonymous memory (e.g., process heap/stack, not tied to files) and file-backed memory (e.g., memory-mapped files). Anonymous memory is swapped to disk when physical RAM is full, while file-backed memory can be reclaimed by dropping cached data. The slab allocator optimizes small object allocations (e.g., inodes, dentries) by preallocating caches, reducing fragmentation.
For performance, RHCSAs should understand Transparent Huge Pages (THP), which automatically manages larger memory blocks (e.g., 2MB) to reduce Translation Lookaside Buffer (TLB) overhead. This benefits memory-intensive applications but may require tuning for latency-sensitive workloads.
Finally, RHCSAs must interpret memory metrics from /proc/meminfo, including Active/Inactive memory, Slab usage, PageTables, and Dirty pages. Tools like numastat help diagnose memory access patterns in NUMA (Non-Uniform Memory Access) systems, where CPUs access local vs. remote memory at different speeds.
When memory is exhausted, the Out-Of-Memory (OOM) killer terminates processes to free resources. RHCSAs can adjust process OOM scores via /proc/<pid>/oom_score_adj to prioritize critical applications during memory pressure.
Checking Memory Usage on GNU/Linux
Effective memory management for RHCSAs requires proficiency in monitoring tools and interpreting system metrics to identify bottlenecks or inefficiencies. The `free` command provides a snapshot of total, used, and available memory, including buffers, cache, and swap usage. RHCSAs must distinguish between used memory (actively allocated) and available memory (free plus reclaimable cache), as Linux aggressively caches disk data to improve performance, which can mislead simplistic interpretations of memory pressure.
The `vmstat` utility offers detailed insights into memory, paging, and system load. Its output includes metrics like si (swap-in) and so (swap-out), which indicate active swapping that may signal RAM exhaustion. High page faults or frequent swapping can degrade performance, requiring administrators to correlate this data with process behavior from tools like `top` or `htop`. These tools display real-time memory usage per process, helping identify resource-hungry applications or memory leaks.
Interpreting `/proc/meminfo` grants deeper visibility into memory subsystems. Key entries include MemTotal, MemFree, Buffers, Cached, and Slab. RHCSAs should understand how Linux categorizes memory: Buffers store temporary disk metadata, while Cached holds file data. The Active/Inactive memory fields show recently used versus reclaimable pages, and PageTables track virtual-to-physical address mappings. Monitoring these values over time using `sar` (from the sysstat package) helps detect trends in memory utilization across workloads.
Advanced tools like `atop` and `glances` consolidate memory metrics with CPU, disk, and network data, offering a holistic view of system performance. RHCSAs should also leverage `smem` to analyze memory usage by process, user, or mapping, which is critical for troubleshooting. Understanding the difference between resident set size (RSS) and virtual memory size (VSZ) in processes helps diagnose excessive memory consumption.
Regular memory monitoring should be paired with proactive thresholds. For example, configuring `cron` jobs to log `free` or `vmstat` output enables historical analysis, while tools like `nmon` or `dstat` provide customizable dashboards. RHCSAs must correlate memory metrics with system logs (`/var/log/messages`, `journalctl`) to identify anomalies, such as sudden spikes in memory usage or OOM killer activity, ensuring timely intervention before performance degradation impacts users.
Configuring Swap Space on GNU/Linux
RHCSAs must understand how to configure and manage swap space, a critical resource for handling memory pressure when physical RAM is exhausted. Swap space can be implemented as a dedicated partition, a swap file, or a Logical Volume Manager (LVM) logical volume. It serves as an overflow area for the kernel to move less frequently used memory pages, freeing up physical RAM for active processes. While excessive swapping can degrade performance, proper configuration ensures system stability during peak workloads or memory leaks.
Creating swap space involves initializing the device or file with the `mkswap` command. For a partition (e.g., `/dev/sdb1`), this is done with `mkswap /dev/sdb1`. For a file, RHCSAs use `dd` to create a file of the desired size (e.g., `dd if=/dev/zero of=/swapfile bs=1G count=4`) followed by `mkswap /swapfile`. After initialization, `swapon` enables the swap space immediately (e.g., `swapon /swapfile`). To persist across reboots, an entry in `/etc/fstab` (e.g., `/swapfile none swap defaults 0 0`) ensures automatic activation.
Monitoring active swap usage is done via tools like `free`, `swapon --show`, or `/proc/swaps`. RHCSAs should regularly assess swap utilization to identify trends, such as recurring high usage that may indicate under-provisioned RAM, memory-hungry applications, or leaks. Adjusting swap size requires careful planning: traditional guidelines suggest swap space equal to RAM size for systems requiring hibernation support, while servers often use 4-8GB of swap even with large RAM. Modern systems may reduce swap reliance but still require it for edge cases.
Tuning swap behavior involves adjusting `/proc/sys/vm/swappiness`, which controls how aggressively the kernel swaps memory pages. A value of 0 minimizes swapping (favoring RAM), while 100 maximizes it. RHCSAs typically set this to 10-30 for servers to balance performance and stability. The `vm.vfs_cache_pressure` parameter also impacts swapping by prioritizing inode/dentry cache reclamation over process memory. These settings can be applied temporarily via `sysctl -w` or permanently in `/etc/sysctl.conf`.
Troubleshooting swap issues includes verifying `swapon` status, checking file permissions (swap files must not be world-readable), and ensuring `/etc/fstab` entries are correct. If swap space becomes fragmented or corrupted, recreating it after backing up data is often necessary. For LVM-based swap, extending the logical volume with `lvresize` and reinitializing with `mkswap` allows dynamic resizing. RHCSAs should also correlate swap metrics with system logs to detect OOM events or misbehaving applications.
Memory Tuning Parameters on GNU/Linux
RHCSAs must understand how to fine-tune memory behavior using kernel parameters, primarily located in `/proc/sys/vm` and managed via `sysctl`. Key parameters include `vm.swappiness`, which controls how aggressively the kernel swaps memory pages. A value of 0 minimizes swapping (favoring RAM), while 100 maximizes it. For servers prioritizing performance, RHCSAs typically set this to 10-30, balancing RAM utilization with swap as a fallback. This value can be adjusted temporarily with `sysctl -w vm.swappiness=20` or permanently in `/etc/sysctl.conf`.
Another critical parameter is `vm.vfs_cache_pressure`, which governs how the kernel reclaims memory used for caching directory and inode objects. A higher value (default 100) prioritizes reclaiming these caches, freeing memory for processes, while a lower value retains them for faster file system access. Tuning this is useful for systems with heavy file system activity, such as web servers or databases.
The `vm.dirty_ratio` and `vm.dirty_background_ratio` parameters control how the kernel writes dirty pages (modified memory not yet written to disk) to storage. `vm.dirty_ratio` sets the percentage of system memory that can be filled with dirty data before the process writing the data must wait for writes to complete. `vm.dirty_background_ratio` sets the threshold at which background processes (like `pdflush`) start writing data asynchronously. RHCSAs adjust these to optimize disk I/O performance, especially for systems with high write loads.
`vm.overcommit_memory` and `vm.overcommit_ratio` govern memory overcommit behavior. Setting `overcommit_memory` to 2 disables overcommit entirely, ensuring processes do not allocate more memory than the system can physically back. This prevents out-of-memory (OOM) conditions but may limit application flexibility. The `overcommit_ratio` adjusts the percentage of RAM considered available for overcommit when enabled. RHCSAs configure these based on workload requirements, such as databases requiring strict memory guarantees.
The `vm.zone_reclaim_mode` parameter optimizes memory reclamation on NUMA systems, where memory is divided across CPU nodes. Enabling this forces the kernel to reclaim memory locally, reducing cross-node latency. RHCSAs set this for performance-critical applications on multi-socket servers.
Other parameters include `vm.min_free_kbytes`, which sets the minimum amount of memory reserved for atomic allocations (critical for kernel operations), and `vm.drop_caches`, which allows manual freeing of page cache, dentries, and inodes by writing to `/proc/sys/vm/drop_caches`. This is useful for temporary memory relief but should not be used in automated scripts.
RHCSAs must test tuning changes in non-production environments, applying them temporarily via `sysctl -w` before committing to `/etc/sysctl.conf`. Monitoring tools like `sar`, `vmstat`, and `/proc/meminfo` help validate adjustments, ensuring tuning aligns with system performance goals without introducing instability.
Memory Caching on GNU/Linux
RHCSAs must understand how GNU/Linux leverages memory caching to improve system performance by reducing disk I/O latency. Linux automatically uses available free RAM to cache frequently accessed disk data, including files, directory entries `dentries`, and metadata `inodes`. This cached memory is released when applications require more physical memory, ensuring optimal resource allocation. The `Cached` field in `/proc/meminfo` and `free` command output reflects this cached data, while `SReclaimable` and `Slab` metrics in `/proc/meminfo` track reclaimable cache tied to kernel data structures.
The `page cache` is central to memory caching, storing recently read or written file data. When a process reads a file, the kernel caches it in RAM, allowing subsequent reads to access the faster memory instead of slower disk storage. Similarly, write operations are temporarily held in the page cache before being flushed to disk, improving performance. RHCSAs should monitor `dirty_ratio` and `dirty_background_ratio` to control how aggressively the kernel writes dirty data to disk, balancing performance with data integrity.
Tools like `vmstat`, `sar`, and `smem` help analyze caching behavior. The `vmstat -a` command shows active and inactive cache pages, while `sar -B` tracks page-in and page-out activity. For granular insights, `smem -c "pid,user,cache,swap"` displays memory usage by process, highlighting applications consuming significant cache. RHCSAs must correlate these metrics with system performance to identify inefficiencies, such as excessive cache pressure or underutilized memory.
Tuning cache behavior involves adjusting `vm.vfs_cache_pressure`, which prioritizes reclaiming dentry and inode caches over process memory. A higher value (e.g., 150) accelerates cache reclamation, freeing memory for applications, while a lower value (e.g., 50) retains caches longer. This parameter is critical for systems with heavy file system activity, such as web servers or container hosts. Manual cache clearing via `echo 1 > /proc/sys/vm/drop_caches` frees page cache, or `echo 3` to clear page, dentry, and inode caches simultaneously, though this should be used sparingly to avoid performance dips.
Transparent Huge Pages `THP` further optimize memory management by automatically allocating large memory blocks (e.g., 2MB) for processes, reducing Translation Lookaside Buffer `TLB` misses. Enabled by default via `/sys/kernel/mm/transparent_hugepage/enabled`, THP settings can be adjusted to `always`, `madvise`, or `never` depending on workload needs. For latency-sensitive applications, `madvise` allows explicit THP usage via application hints, while `never` disables THP entirely. RHCSAs can check THP status with `cat /sys/kernel/mm/transparent_hugepage/enabled` and manage performance with tools like `hugeadm`.
Best practices include regularly monitoring cache metrics, aligning `vfs_cache_pressure` with workload profiles, and avoiding aggressive cache drops in production environments. Understanding caching behavior ensures RHCSAs maximize system responsiveness while maintaining stability under varying memory demands.
Monitoring and Analyzing System Performance on GNU/Linux
RHCSAs must master performance monitoring tools to identify bottlenecks in memory, CPU, disk I/O, and network activity. The `top` and `htop` utilities provide real-time overviews of system resource usage, highlighting processes consuming excessive memory or CPU. `htop` offers enhanced features like color-coded metrics and interactive process management. RHCSAs should interpret fields like `%MEM`, `RES` (resident memory size), and `TIME+` (CPU time) to prioritize troubleshooting efforts.
The `vmstat` command delivers granular insights into memory, swapping, and system load. Running `vmstat 1` polls metrics every second, showing `si` (swap-in) and `so` (swap-out) activity that signals memory pressure. High `pi` (pages paged in) and `po` (pages paged out) values indicate frequent disk I/O, which may degrade performance. RHCSAs correlate these metrics with `iostat` output to isolate disk subsystem issues.
For disk I/O analysis, `iostat -x` from the `sysstat` package provides extended statistics like `%util` (device utilization) and `await` (average I/O request time). A `%util` value consistently over 70% may signal disk saturation, requiring optimization or hardware upgrades. `pidstat -d` tracks I/O metrics per process, helping identify rogue applications causing disk contention.
The `sar` utility, also part of `sysstat`, logs historical performance data for trend analysis. Running `sar -r` shows memory usage trends, while `sar -u` tracks CPU utilization across intervals. RHCSAs configure `sysstat` to persistently log data via `/etc/cron.d/sysstat`, enabling long-term analysis of performance patterns. Tools like `mpstat` and `pidstat` further dissect CPU and process-specific metrics.
Advanced monitoring tools like `dstat` and `nmon` consolidate metrics into customizable dashboards. `dstat --top-mem` highlights top memory consumers, while `nmon` offers interactive views of memory, CPU, and disk activity. For network performance, `nload` or `iftop` visualize bandwidth usage, helping RHCSAs detect network-related bottlenecks.
For deep diagnostics, `perf` and `SystemTap` trace kernel-level events. `perf top` shows real-time CPU consumption by functions, while `perf record` and `perf report` analyze historical data. `SystemTap` scripts (e.g., `stap -l 'kernel.function("*")'`) probe kernel behavior, such as lock contention or slab allocator inefficiencies. RHCSAs use these tools to troubleshoot latency issues in high-performance environments.
Finally, correlating performance metrics with system logs is critical. `journalctl -b` filters logs for the current boot session, helping RHCSAs link performance anomalies to kernel oops, OOM killer activity, or service failures. Scripts combining `sar`, `vmstat`, and `cron` automate alerts for thresholds, such as sustained high memory usage or disk latency.
Troubleshooting Common Memory-Related Issues on GNU/Linux
RHCSAs must identify and resolve memory-related issues to maintain system stability and performance. High memory usage often stems from memory leaks in applications, excessive cache consumption, or improper swap configuration. The `free` command provides a quick overview of total, used, and available memory, while `vmstat` and `sar` help identify trends in memory pressure. If `free` shows low available memory and high `buff/cache`, Linux is likely leveraging unused RAM for caching, which is normal and not a performance concern.
Persistent high swap usage (`si`/`so` in `vmstat`) signals insufficient physical RAM or misconfigured swap behavior. RHCSAs adjust `vm.swappiness` to reduce aggressive swapping or add more RAM. The `top` or `htop` command highlights processes with high `%MEM` values, indicating memory-hungry applications. For processes consuming excessive memory, RHCSAs can prioritize restarting or tuning the application, terminating non-critical processes with `kill` or `kill -9`, or adjusting their `OOM score` via `/proc/<pid>/oom_score_adj`.
The `dmesg` command is critical for diagnosing Out-Of-Memory (OOM) events. Messages like `oom-killer: gfp_mask=0x201da, order=0` indicate the kernel forcibly terminating processes to free memory. RHCSAs correlate these logs with timestamps from `/var/log/messages` or `journalctl -b` to identify root causes, such as unbounded memory growth in applications or insufficient swap space.
Memory leaks in user-space applications can be diagnosed with tools like `strace` and `gdb`. Attaching `strace -p <pid>` to a suspected process traces system calls related to memory allocation (`brk`, `mmap`), while `gdb` can analyze core dumps to pinpoint faulty code. For kernel-level memory leaks, `slabtop` identifies excessive slab cache usage (`SUnreclaim` in `/proc/meminfo`), often tied to file system metadata (`dentry`, `inode`) or network buffers. Clearing reclaimable slab caches via `echo 2 > /proc/sys/vm/drop_caches` temporarily frees memory but does not address root causes.
High page faults (`pgfault/s` in `vmstat`) may indicate inefficient memory access patterns. RHCSAs optimize applications using `perf` to profile memory-intensive functions or enable Transparent Huge Pages (`THP`) for workloads with large memory footprints. The `numastat` command helps diagnose NUMA-related memory imbalances, where processes prefer non-local memory nodes, increasing latency.
Proactive monitoring with `cron` jobs running `sar -r` or `vmstat` logs baseline memory metrics, enabling RHCSAs to detect anomalies early. For recurring issues, tuning `/proc/sys/vm/min_free_kbytes` or `vm.zone_reclaim_mode` improves memory allocation efficiency. If memory pressure persists despite tuning, scaling horizontally (adding more servers) or vertically (upgrading hardware) may be necessary.
Conclusions
This concludes Article 4 of my RHCSA series. We discussed many aspects of managing memory on GNU/Linux computer systems:
- RHCSAs must understand GNU/Linux’s virtual memory system, which allows processes to use more memory than physically available by mapping virtual address spaces to physical RAM and disk-based swap.
- Effective memory management for RHCSAs requires proficiency in monitoring tools and interpreting system metrics to identify bottlenecks or inefficiencies.
- RHCSAs must understand how to configure and manage swap space, a critical resource for handling memory pressure when physical RAM is exhausted.
- RHCSAs must understand how to fine-tune memory behavior using kernel parameters, primarily located in `/proc/sys/vm` and managed via `sysctl`.
- RHCSAs must understand how GNU/Linux leverages memory caching to improve system performance by reducing disk I/O latency.
- RHCSAs must master performance monitoring tools to identify bottlenecks in memory, CPU, disk I/O, and network activity.
- RHCSAs must identify and resolve memory-related issues to maintain system stability and performance.
References:
[1] 2020 - Lecture - CSCI 275: Linux Systems Administration and Security - Moe Hassan - CUNY John Jay College - NYC Tech-in-Residence Corps. Retrieved June 26, 2025 from https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1053&context=jj_oers