Due: 4/21/2025 11:59pm
Recommended Background Reading
- Architecture Page Table Helpers
- Linux page table management
- Example heatmaps
- DAMON: Data Access MONitor
- Paging
- Examining Process Page Tables
Part 1: Page Table Walk via /proc/$pid/pagemap
- Starter code: https://github.com/MoatLab/LeapIO/blob/master/Runtime/pagemap.c
Please use the starter code above and enhance it to measure the VA->PA address translations using the /proc pagemap
interface. Detailed instructions below:
1.1 Write a simple program (name it as memalloc.c
) which allocates 2GB DRAM, please use calloc()
or equivalent to make sure you zero-out the allocated memory buffer. Do the memory allocation in the following three ways:
- allocate 4KB DRAM each time, loop until you get 2GB DRAM in total
- allocate 2MB DRAM each time, loop until you get 2GB DRAM in total
- allocate 2GB DRAM in one
calloc()
.
1.2 Enhance the pagemap.c
to translate the virtual addresses of your allocated 2GB DRAM in memalloc.c
. Please log the VA->PA to a file and visualize the layout of the corresponding virtual and physical addresses in the virtual and physical address space respectively. Describe your observations (e.g., are virtual/physical addresses continuous? How does the layout differ for the above three different allocation schemes?).
1.3 Enhance the pagemap.c
to track the latency for each VA->PA translation, and plot the latency CDF (in pdf):
- X is latency in microseconds
- Y is the percentile
- Title:
$YourFirstName-$YourLastName-Pagemap-Lat-CDF
- Caption: describe your observations
1.4 Submissions
- Your modified
pagemap.c
file - The visualization of both VA and PA layout in their respective address space.
- The CDF figure of translation latencies.
Part 2: Roll Your Own Linux Kernel Page Table Walker
In this part, you will need to implement a Linux kernel module to perform various addr translation related operations:
Let’s simplify memalloc.c
to only allocate 1GB buffer in one malloc()
call.
2.1 Refer to the code here for translating VAs to PAs in the kernel.
2.2 Write a Linux kernel module (lkp25p3.ko
) to
- Expose a
/proc/lkp25p3
interface to allow userspace andlkp25p3.ko
communication - In particular,
/proc/lkp25p3
should accept writing VAs to it (e.g.,echo "0x123456" > /proc/lkp25p3
) which will triggerlkp25p3.ko
to translate the VA and log the latency of page translation to the kernel log file (/var/log/kern.log
) - As in previous part, plot the CDF of addr translation latencies. In the caption, explain the latency difference compared to the
pagemap
approach in Part 1.
2.3. Enhance the kernel module to support the following feature:
echo 0 > /proc/lkp25p3
to iterate through the entirestruct vm_area_struct
list of processPID=1
, print out the following fields for each area:vma->vm_start
,vma->vm_end
,vma->vm_flags
,vma->vm_file
,vma->vm_pgoff
Sample output format:
VMA1: [0x00400000 - 0x00452000] (size: 328 KB)
Permissions: r-xp
Type: File-backed Mapping
File: /usr/bin/bash
Offset: 0x0
VMA2: [0x00651000 - 0x00652000] (size: 4 KB)
Permissions: r--p
Type: File-backed Mapping
File: /usr/bin/bash
Offset: 0x51000
VMA3: [0x00652000 - 0x0065b000] (size: 36 KB)
Permissions: rw-p
Type: File-backed Mapping
File: /usr/bin/bash
Offset: 0x52000
VMA4: [0x00e5a000 - 0x00e7b000] (size: 132 KB)
Permissions: rw-p
Type: Heap
…
- You can validate your implementation by running
cat /proc/1/maps
in userspace and compare the output with your kernel module.
2.4 Page access monitoring
Improve your kernel module to run a kernel thread in the background, the thread needs to periodically scan the access bit (A)
of all the 4KB pages in the 1GB buffer and aggregate the access counts every 100ms. In your memalloc.c
, keep reading all the pages in [0, 256MB] sequentially as fast as you can and reading from all the pages in [256MB, 512MB] every 1ms. Run the entire process for 2min, you kernel module should log down the overall access frequency statistics for all the 4KB pages every 1s to the kernel log. Then plot a heatmap using the log where X-axis is time in seconds (1-120), Y-axis is the VA range of the 1GB buffer, and use different colors to represent the heat (i.e., per-page access frequencies).
Notes: The “A-bit” of a PTE is used to discover whether a PTE was accessed during virtual address translation. If it has, then the bit is set, otherwise, it is not. This bit will not be cleared by the CPU, so that burden falls on the OS (if it needs this bit at all).
Submit ${YourPID}-lkp25-p3.tar.gz
on Canvas.
- Source code:
part1/*.[ch]
,part2/*.[ch]
,Makefile
- Figures properly named under the corresponding part