MemoryLayouts.jl ๐ง โก
Optimize your memory layout for maximum cache efficiency.
Documentation for MemoryLayouts.
๐ The Problem vs. The Solution
Standard collections in Julia (Dicts, Arrays of Arrays, structs) often scatter data across memory, causing frequent cache misses. MemoryLayouts.jl packs this data into contiguous blocks.
๐ฎ How it works
| Function | Description | Analogy |
|---|---|---|
layout( x ) | Aligns immediate fields of x | Like copy( x ) but packed |
deeplayout( x ) | Recursively aligns nested structures | Like deepcopy( x ) but packed |
layout!( x ) | In-place alignment (e.g. for Dicts) | Like layout( x ) but in-place |
withlayout( f ) | Runs f with a scoped layout handle | Automatic memory management |
layoutstats( x ) | Dry run statistics for layout( x ) | |
deeplayoutstats( x ) | Dry run statistics for deeplayout( x ) | |
visualizelayout( x ) | Visualizes memory layout using terminal graphics | |
deepvisualizelayout( x ) | Recursively visualizes memory layout |
๐ ๏ธ Usage
The package provides the exported functions layout, deeplayout, layout!, withlayout, layoutstats, deeplayoutstats, visualizelayout and deepvisualizelayout. The distinction between layout and deeplayout is that layout only applies to top level objects, whereas deeplayout applies to objects at all levels. The two examples below demonstrate their use. As for the stats functions, these just do a dry run and print out some statistics on the degree of contiguity improvement a user can expect to see. The visualize functions provide a graphical representation of the memory layout in the terminal.
๐ก Example for layout
The example below demonstrates how to use layout.
using MemoryLayouts, BenchmarkTools, StyledStrings
function original( A = 10_000, L = 100, S = 5000 )
x = Vector{Vector{Float64}}( undef, A )
s = Vector{Vector{Float64}}( undef, A )
for i โ 1:A
x[i] = rand( L )
s[i] = rand( S )
end
return x
end
function computeme( X )
ฮฃ = 0.0
for x โ X
ฮฃ += x[5]
end
return ฮฃ
end
print( styled"{(fg=0xff9999):original}: " ); @btime computeme( X ) setup=( X = original(); );
print( styled"{(fg=0x99ff99):layout}: " ); @btime computeme( X ) setup=( X = layout( original() ); );
;๐ก Example for deeplayout
The example below illustrates the use of deeplayout.
using MemoryLayouts, BenchmarkTools, StyledStrings
struct ๐ฎ{X,Y,Z}
x :: X
y :: Y
z :: Z
end
function original( A = 10_000, L = 100, S = 5000 )
x = Vector{Vector{Float64}}( undef, A )
s = Vector{Vector{Float64}}( undef, A )
for i โ 1:A
x[i] = rand( L )
s[i] = rand( S )
end
return ๐ฎ( [ x[i] for i โ 1:div( A, 3 ) ], [ x[i] for i โ div( A, 3 )+1:div( 2*A, 3 ) ], [ x[i] for i โ div( 2*A, 3 )+1:A ] )
end
function computeme( X )
ฮฃ = 0.0
for x โ X.x
ฮฃ += x[5]
end
for y โ X.y
ฮฃ += y[37]
end
for z โ X.z
ฮฃ += z[5]
end
return ฮฃ
end
println( layoutstats( original() ) )
println( visualizelayout( original() ) )
println( deeplayoutstats( original() ) )
println( deepvisualizelayout( original() ) )
print( styled"{(fg=0xff9999):original}: " ); @btime computeme( X ) setup=( X = original(); );
print( styled"{(fg=0x99ff99):layout}: " ); @btime computeme( X ) setup=( X = layout( original() ); );
print( styled"{(fg=0x9999ff):deeplayout}: " ); @btime computeme( X ) setup=( X = deeplayout( original() ); );
;๐ Dry Run / Statistics
You can inspect the potential improvements in memory contiguity without performing the actual allocation using layoutstats and deeplayoutstats or visually by using visualizelayout and deepvisualizelayout.
using MemoryLayouts
data = [ rand( 10 ) for _ in 1:5 ];
layoutstats( data )
visualizelayout( data )Memory Layout Visualization
Span: 640 b
Min : 0x7f918c0a3580 (leftmost point)
Max : 0x7f918c0a3800 (rightmost point)
Scale: 8 b / char
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโThe output indicates:
- packed: The total size (in bytes) of the data if packed.
- blocks: The number of individual arrays identified.
- span: The current distance between the minimum and maximum memory addresses of the data.
- reduction: The potential reduction in memory span.
๐ค Compatibility
MemoryLayouts.jlis compatible with- this assumes that those packages are loaded by the user (weak dependences)
โก SIMD Alignment
Both layout and deeplayout accept an optional alignment keyword argument (default 1). This allows you to specify the byte alignment for the start of each array in the contiguous memory block.
Proper memory alignment is relevant for maximizing performance with SIMD (Single Instruction, Multiple Data) instructions (e.g., AVX2, AVX-512). On the other hand, such alignment leaves gaps between blocks of memory that are not a multiple of 64 bytes in length.
- AVX2 typically requires 32-byte alignment.
- AVX-512 typically requires 64-byte alignment.
๐ก Example
using MemoryLayouts
struct MyData
a::Vector{Float64}
b::Vector{Float64}
end
data = MyData( rand( 100 ), rand( 100 ) )
aligneddata = layout( data; alignment = 64 )
pointer( aligneddata.a ) # Will be a multiple of 64
pointer( aligneddata.b ) # Will be a multiple of 64๐ Scoped Layout Handles
Use withlayout to automatically manage backing memory. All calls to layout, layout!, and deeplayout inside the block (that do not pass an explicit handle) use a temporary LayoutHandle that is released automatically when the block exits (or throws):
result = withlayout() do
x = deeplayout( a )
y = deeplayout( b )
compute( x, y )
endThis is the preferred way to manage memory: no explicit release! call is needed, and there is no risk of forgetting to free the backing memory.
Arrays created inside the scope are invalidated when the scope exits. Do not let them escape the block.
๐๏ธ Performance Mode (Live Dangerously)
By default, MemoryLayouts performs checks to ensure robustness:
- Cycle Detection: Prevents
StackOverflowErrorif your data structure has cycles (e.g. A -> B -> A). - Aliasing Warnings: Warns if multiple fields point to the same array (which
MemoryLayoutswill duplicate, breaking the shared reference).
If you are confident your data is acyclic and you don't care about shared references (or know you don't have them), you can disable these checks for a small performance boost by setting livedangerously = true.
# Faster, but crashes on cycles!
fast_result = deeplayout( huge_tree; livedangerously = true )โ ๏ธ Things to be mindful of
- it operates on various types of collections including
structs,arrays, anddicts- operating on means that these collections are traversed, possibly recursively
- the only objects that are copied into contiguous memory are
isbitsarrays(think arrays of numbers, InlineStrings (but not regular strings), etcetera ) - the more scattered is the memory before the layout change, the greater is the potential speed gain
layoutcopies, but only the top level; see example 2 abovedeeplayoutcopies all levels- no attempt is made to make empty arrays contiguous
- no attempt is made to make objects that are not one of the covered collections contiguous
- the package assigns one memory block and within that block uses
unsafe_wrapsto obtain Julia arrays- this can have 'interesting' consequences if misused
- ergo, this package should not be used by those new to programming
- objects can be excluded from layout changes via the
excludekeyword
- there is overhead in laying out memory initially and (to a much lesser extent) to running the finalizer
- thus,
MemoryLayoutsworks best for aligning memory in a collection once and then using it for an extended stretch - resizing one of the arrays whose memory was laid out by
MemoryLayoutsis safe, but likely results in that array being moved to another location in memory assigned by Julia (not byMemoryLayouts) - reassigning an array assigned by
MemoryLayoutsto another location, e.g. by writingy[i] = ...does not release the entire memory block - the entire memory block is only released if the entire collection loses scope
- by default,
MemoryLayoutspacks in the isbits arrays as tightly as it can- this may not be optimal, e.g. for AVX-512 computations
- use the
alignment=64option to give up some contiguity and regain alignment desired for optimal AVX-512 performance
- the code has a number of safety checks and features:
- it throws an error on detecting cyclic content (a depends on b depends on a)
- it warns for aliasing
- alignment used is the maximum of user-specified alignment and machine-required alignment for the type
- the code makes an attempt to skip types that are not suitable for aligning, but it may not always succeed; use
excludeto exclude such fields - also exclude fields with low-level objects like pointers
๐ Suppressing the Banner
You can suppress the startup banner by setting the environment variable MEMORYLAYOUTS to "false" or "no".
export MEMORYLAYOUTS="false"๐ Function documentation
MemoryLayouts.layout โ Function
layout(s; exclude = Symbol[], alignment :: Int = 1, livedangerously :: Bool = false)layout aligns the memory of arrays within the object s, whose type should be one of struct, AbstractArray, or AbstractDict
layout creates a new instance of s (or copy of s) where the arrays are stored contiguously in memory.
The alignment keyword argument specifies the memory alignment in bytes. This is particularly useful for SIMD operations, where aligning data to 16, 32, or 64 bytes can improve performance.
The livedangerously keyword argument (default false) disables safety checks for:
- Cyclic dependencies (prevents StackOverflow)
- Shared references / aliasing (prevents silent duplication)
Enable this only if you are certain your data is acyclic and you accept duplication of shared arrays.
Excluded items are preserved as-is (or deep-copied in some contexts) but not packed into the contiguous memory block.
Users should be mindful of the following important implementation details:
- aligned arrays share a single contiguous memory block
- resizing any of the arrays (
push!,append!) will break this contiguity for that array (it will be reallocated elsewhere) - contiguity is maintained until an array is resized or reassigned
- please read the documentation
MemoryLayouts.deeplayout โ Function
deeplayout( x; exclude = Symbol[], alignment :: Int = 1, livedangerously :: Bool = false )deeplayout recursively aligns memory of arrays within x and its fields
Unlike layout, which only aligns the immediate fields/elements of x, deeplayout traverses the structure recursively. In other words, deeplayout is to layout what deepcopy is to copy.
The alignment keyword argument specifies the memory alignment in bytes. This is particularly useful for SIMD operations, where aligning data to 16, 32, or 64 bytes can improve performance.
The livedangerously keyword argument (default false) disables safety checks for:
- cyclic dependencies (prevents StackOverflow)
- shared references / aliasing (prevents silent duplication)
Enable this only if you are certain your data is acyclic and you accept duplication of shared arrays.
Excluded items are preserved as-is (or deep-copied in some contexts) but not packed into the contiguous memory block.
Users should be mindful of the following important implementation details:
- aligned arrays share a single contiguous memory block
- resizing any of the arrays (
push!,append!) will break this contiguity for that array (it will be reallocated elsewhere) - contiguity is maintained until an array is resized or reassigned
- please read the documentation
MemoryLayouts.layout! โ Function
layout!( s :: AbstractDict; exclude = Symbol[], alignment :: Int = 1, livedangerously :: Bool = false )In-place version of layout for AbstractDict.
layout! modifies s such that its values are stored contiguously in memory.
The alignment keyword argument specifies the memory alignment in bytes. This is particularly useful for SIMD operations, where aligning data to 16, 32, or 64 bytes can improve performance.
The livedangerously keyword argument (default false) disables safety checks for:
- Cyclic dependencies (prevents StackOverflow)
- Shared references / aliasing (prevents silent duplication)
Enable this only if you are certain your data is acyclic and you accept duplication of shared arrays.
Excluded items are preserved as-is (or deep-copied in some contexts) but not packed into the contiguous memory block.
Users should be mindful of the following important implementation details:
- aligned arrays share a single contiguous memory block
- resizing any of the arrays (
push!,append!) will break this contiguity for that array (it will be reallocated elsewhere) - contiguity is maintained until an array is resized or reassigned
- please read the documentation
MemoryLayouts.withlayout โ Function
withlayout( f :: Function )Run f in a scope with a fresh LayoutHandle. All calls to layout, layout!, and deeplayout inside f (that do not pass an explicit handle) will use this handle. The backing memory is released automatically when f returns (or throws).
Example
result = withlayout() do
x = deeplayout( a )
y = deeplayout( b )
compute( x, y )
endMemoryLayouts.layoutstats โ Function
layoutstats( s; exclude = Symbol[], alignment :: Int = 1 )Returns a LayoutStats object containing statistics about the memory layout if layout( s ) were called.
The returned object includes:
bytes: Total size (in bytes) of the data that would be packed.blocks: Number of individual arrays identified.span: The distance between the minimum and maximum memory addresses of the data.reduction: The potential reduction in memory span (span - bytes).
MemoryLayouts.deeplayoutstats โ Function
deeplayoutstats( x; exclude = Symbol[], alignment :: Int = 1 )Returns a LayoutStats object containing statistics about the memory layout if deeplayout( x ) were called.
The returned object includes:
bytes: Total size (in bytes) of the data that would be packed.blocks: Number of individual arrays identified.span: The distance between the minimum and maximum memory addresses of the data.reduction: The potential reduction in memory span (span - bytes).
MemoryLayouts.visualizelayout โ Function
visualizelayout( x; exclude = Symbol[], width = 80 )like layoutstats but also provides a graphical representation of the current memory distribution of x.
MemoryLayouts.deepvisualizelayout โ Function
deepvisualizelayout( x; exclude = Symbol[], width = 80 )like deeplayoutstats but also provides a graphical representation of the current memory distribution of x.