MemoryLayouts.jl ๐Ÿง โšก

Optimize your memory layout for maximum cache efficiency.

Documentation for MemoryLayouts.

๐Ÿš€ The Problem vs. The Solution

Standard collections in Julia (Dicts, Arrays of Arrays, structs) often scatter data across memory, causing frequent cache misses. MemoryLayouts.jl packs this data into contiguous blocks.

๐Ÿ”ฎ How it works

FunctionDescriptionAnalogy
layout( x )Aligns immediate fields of xLike copy( x ) but packed
deeplayout( x )Recursively aligns nested structuresLike deepcopy( x ) but packed
layout!( x )In-place alignment (e.g. for Dicts)Like layout( x ) but in-place
withlayout( f )Runs f with a scoped layout handleAutomatic memory management
layoutstats( x )Dry run statistics for layout( x )
deeplayoutstats( x )Dry run statistics for deeplayout( x )
visualizelayout( x )Visualizes memory layout using terminal graphics
deepvisualizelayout( x )Recursively visualizes memory layout

๐Ÿ› ๏ธ Usage

The package provides the exported functions layout, deeplayout, layout!, withlayout, layoutstats, deeplayoutstats, visualizelayout and deepvisualizelayout. The distinction between layout and deeplayout is that layout only applies to top level objects, whereas deeplayout applies to objects at all levels. The two examples below demonstrate their use. As for the stats functions, these just do a dry run and print out some statistics on the degree of contiguity improvement a user can expect to see. The visualize functions provide a graphical representation of the memory layout in the terminal.

๐Ÿ’ก Example for layout

The example below demonstrates how to use layout.

using MemoryLayouts, BenchmarkTools, StyledStrings

function original( A = 10_000, L = 100, S = 5000 )
    x = Vector{Vector{Float64}}( undef, A )
    s = Vector{Vector{Float64}}( undef, A )
    for i โˆˆ 1:A
        x[i] = rand( L )
        s[i] = rand( S )
    end
    return x
end

function computeme( X )
    ฮฃ = 0.0
    for x โˆˆ X 
        ฮฃ += x[5] 
    end
    return ฮฃ
end

print( styled"{(fg=0xff9999):original}: " ); @btime computeme( X ) setup=( X = original(); );
print( styled"{(fg=0x99ff99):layout}: " ); @btime computeme( X ) setup=( X = layout( original() ); );
;

๐Ÿ’ก Example for deeplayout

The example below illustrates the use of deeplayout.

using MemoryLayouts, BenchmarkTools, StyledStrings


struct ๐’ฎ{X,Y,Z}
    x :: X
    y :: Y 
    z :: Z
end


function original( A = 10_000, L = 100, S = 5000 )
    x = Vector{Vector{Float64}}( undef, A )
    s = Vector{Vector{Float64}}( undef, A )
    for i โˆˆ 1:A
        x[i] = rand( L )
        s[i] = rand( S )
    end
    return ๐’ฎ( [ x[i] for i โˆˆ 1:div( A, 3 ) ], [ x[i] for i โˆˆ div( A, 3 )+1:div( 2*A, 3 ) ], [ x[i] for i โˆˆ div( 2*A, 3 )+1:A ] )
end

function computeme( X )
    ฮฃ = 0.0
    for x โˆˆ X.x  
        ฮฃ += x[5] 
    end
    for y โˆˆ X.y 
        ฮฃ += y[37]
    end
    for z โˆˆ X.z 
        ฮฃ += z[5] 
    end
    return ฮฃ
end

println( layoutstats( original() ) )
println( visualizelayout( original() ) )


println( deeplayoutstats( original() ) )
println( deepvisualizelayout( original() ) )


print( styled"{(fg=0xff9999):original}: " ); @btime computeme( X ) setup=( X = original(); );
print( styled"{(fg=0x99ff99):layout}: " ); @btime computeme( X ) setup=( X = layout( original() ); );
print( styled"{(fg=0x9999ff):deeplayout}: " ); @btime computeme( X ) setup=( X = deeplayout( original() ); );
;

๐Ÿ“Š Dry Run / Statistics

You can inspect the potential improvements in memory contiguity without performing the actual allocation using layoutstats and deeplayoutstats or visually by using visualizelayout and deepvisualizelayout.

using MemoryLayouts

data = [ rand( 10 ) for _ in 1:5 ];

layoutstats( data )

visualizelayout( data )
Memory Layout Visualization
  Span: 640 b
  Min : 0x7f918c0a3580 (leftmost point)
  Max : 0x7f918c0a3800 (rightmost point)
  Scale: 8 b / char
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ

The output indicates:

  • packed: The total size (in bytes) of the data if packed.
  • blocks: The number of individual arrays identified.
  • span: The current distance between the minimum and maximum memory addresses of the data.
  • reduction: The potential reduction in memory span.

๐Ÿค Compatibility

โšก SIMD Alignment

Both layout and deeplayout accept an optional alignment keyword argument (default 1). This allows you to specify the byte alignment for the start of each array in the contiguous memory block.

Proper memory alignment is relevant for maximizing performance with SIMD (Single Instruction, Multiple Data) instructions (e.g., AVX2, AVX-512). On the other hand, such alignment leaves gaps between blocks of memory that are not a multiple of 64 bytes in length.

  • AVX2 typically requires 32-byte alignment.
  • AVX-512 typically requires 64-byte alignment.

๐Ÿ’ก Example

using MemoryLayouts
struct MyData
    a::Vector{Float64}
    b::Vector{Float64}
end

data = MyData( rand( 100 ), rand( 100 ) )

aligneddata = layout( data; alignment = 64 )

pointer( aligneddata.a ) # Will be a multiple of 64
pointer( aligneddata.b ) # Will be a multiple of 64

๐Ÿ”’ Scoped Layout Handles

Use withlayout to automatically manage backing memory. All calls to layout, layout!, and deeplayout inside the block (that do not pass an explicit handle) use a temporary LayoutHandle that is released automatically when the block exits (or throws):

result = withlayout() do
    x = deeplayout( a )
    y = deeplayout( b )
    compute( x, y )
end

This is the preferred way to manage memory: no explicit release! call is needed, and there is no risk of forgetting to free the backing memory.

Warning

Arrays created inside the scope are invalidated when the scope exits. Do not let them escape the block.

๐Ÿˆ๏ธ Performance Mode (Live Dangerously)

By default, MemoryLayouts performs checks to ensure robustness:

  1. Cycle Detection: Prevents StackOverflowError if your data structure has cycles (e.g. A -> B -> A).
  2. Aliasing Warnings: Warns if multiple fields point to the same array (which MemoryLayouts will duplicate, breaking the shared reference).

If you are confident your data is acyclic and you don't care about shared references (or know you don't have them), you can disable these checks for a small performance boost by setting livedangerously = true.

# Faster, but crashes on cycles!
fast_result = deeplayout( huge_tree; livedangerously = true )

โš ๏ธ Things to be mindful of

Important details
  • it operates on various types of collections including structs, arrays, and dicts
    • operating on means that these collections are traversed, possibly recursively
  • the only objects that are copied into contiguous memory are isbits arrays (think arrays of numbers, InlineStrings (but not regular strings), etcetera )
  • the more scattered is the memory before the layout change, the greater is the potential speed gain
  • layout copies, but only the top level; see example 2 above
    • deeplayout copies all levels
    • no attempt is made to make empty arrays contiguous
    • no attempt is made to make objects that are not one of the covered collections contiguous
    • the package assigns one memory block and within that block uses unsafe_wraps to obtain Julia arrays
      • this can have 'interesting' consequences if misused
      • ergo, this package should not be used by those new to programming
    • objects can be excluded from layout changes via the exclude keyword
  • there is overhead in laying out memory initially and (to a much lesser extent) to running the finalizer
  • thus, MemoryLayouts works best for aligning memory in a collection once and then using it for an extended stretch
  • resizing one of the arrays whose memory was laid out by MemoryLayouts is safe, but likely results in that array being moved to another location in memory assigned by Julia (not by MemoryLayouts)
  • reassigning an array assigned by MemoryLayouts to another location, e.g. by writing y[i] = ... does not release the entire memory block
  • the entire memory block is only released if the entire collection loses scope
  • by default, MemoryLayouts packs in the isbits arrays as tightly as it can
    • this may not be optimal, e.g. for AVX-512 computations
    • use the alignment=64 option to give up some contiguity and regain alignment desired for optimal AVX-512 performance
  • the code has a number of safety checks and features:
    • it throws an error on detecting cyclic content (a depends on b depends on a)
    • it warns for aliasing
    • alignment used is the maximum of user-specified alignment and machine-required alignment for the type
  • the code makes an attempt to skip types that are not suitable for aligning, but it may not always succeed; use exclude to exclude such fields
  • also exclude fields with low-level objects like pointers

๐Ÿ”‡ Suppressing the Banner

You can suppress the startup banner by setting the environment variable MEMORYLAYOUTS to "false" or "no".

export MEMORYLAYOUTS="false"

๐Ÿ“– Function documentation

MemoryLayouts.layout โ€” Function
layout(s; exclude = Symbol[], alignment :: Int = 1, livedangerously :: Bool = false)

layout aligns the memory of arrays within the object s, whose type should be one of struct, AbstractArray, or AbstractDict

layout creates a new instance of s (or copy of s) where the arrays are stored contiguously in memory.

The alignment keyword argument specifies the memory alignment in bytes. This is particularly useful for SIMD operations, where aligning data to 16, 32, or 64 bytes can improve performance.

The livedangerously keyword argument (default false) disables safety checks for:

  • Cyclic dependencies (prevents StackOverflow)
  • Shared references / aliasing (prevents silent duplication)

Enable this only if you are certain your data is acyclic and you accept duplication of shared arrays.

Excluded items are preserved as-is (or deep-copied in some contexts) but not packed into the contiguous memory block.

important implementation details

Users should be mindful of the following important implementation details:

  • aligned arrays share a single contiguous memory block
  • resizing any of the arrays (push!, append!) will break this contiguity for that array (it will be reallocated elsewhere)
  • contiguity is maintained until an array is resized or reassigned
  • please read the documentation
source
MemoryLayouts.deeplayout โ€” Function
deeplayout( x; exclude = Symbol[], alignment :: Int = 1, livedangerously :: Bool = false )

deeplayout recursively aligns memory of arrays within x and its fields

Unlike layout, which only aligns the immediate fields/elements of x, deeplayout traverses the structure recursively. In other words, deeplayout is to layout what deepcopy is to copy.

The alignment keyword argument specifies the memory alignment in bytes. This is particularly useful for SIMD operations, where aligning data to 16, 32, or 64 bytes can improve performance.

The livedangerously keyword argument (default false) disables safety checks for:

  • cyclic dependencies (prevents StackOverflow)
  • shared references / aliasing (prevents silent duplication)

Enable this only if you are certain your data is acyclic and you accept duplication of shared arrays.

Excluded items are preserved as-is (or deep-copied in some contexts) but not packed into the contiguous memory block.

important implementation details

Users should be mindful of the following important implementation details:

  • aligned arrays share a single contiguous memory block
  • resizing any of the arrays (push!, append!) will break this contiguity for that array (it will be reallocated elsewhere)
  • contiguity is maintained until an array is resized or reassigned
  • please read the documentation
source
MemoryLayouts.layout! โ€” Function
layout!( s :: AbstractDict; exclude = Symbol[], alignment :: Int = 1, livedangerously :: Bool = false )

In-place version of layout for AbstractDict.

layout! modifies s such that its values are stored contiguously in memory.

The alignment keyword argument specifies the memory alignment in bytes. This is particularly useful for SIMD operations, where aligning data to 16, 32, or 64 bytes can improve performance.

The livedangerously keyword argument (default false) disables safety checks for:

  • Cyclic dependencies (prevents StackOverflow)
  • Shared references / aliasing (prevents silent duplication)

Enable this only if you are certain your data is acyclic and you accept duplication of shared arrays.

Excluded items are preserved as-is (or deep-copied in some contexts) but not packed into the contiguous memory block.

important implementation details

Users should be mindful of the following important implementation details:

  • aligned arrays share a single contiguous memory block
  • resizing any of the arrays (push!, append!) will break this contiguity for that array (it will be reallocated elsewhere)
  • contiguity is maintained until an array is resized or reassigned
  • please read the documentation
source
MemoryLayouts.withlayout โ€” Function
withlayout( f :: Function )

Run f in a scope with a fresh LayoutHandle. All calls to layout, layout!, and deeplayout inside f (that do not pass an explicit handle) will use this handle. The backing memory is released automatically when f returns (or throws).

Example

result = withlayout() do
    x = deeplayout( a )
    y = deeplayout( b )
    compute( x, y )
end
Warning

Arrays created inside the scope are invalidated when the scope exits. Do not let them escape the block.

source
MemoryLayouts.layoutstats โ€” Function
layoutstats( s; exclude = Symbol[], alignment :: Int = 1 )

Returns a LayoutStats object containing statistics about the memory layout if layout( s ) were called.

The returned object includes:

  • bytes: Total size (in bytes) of the data that would be packed.
  • blocks: Number of individual arrays identified.
  • span: The distance between the minimum and maximum memory addresses of the data.
  • reduction: The potential reduction in memory span (span - bytes).
source
MemoryLayouts.deeplayoutstats โ€” Function
deeplayoutstats( x; exclude = Symbol[], alignment :: Int = 1 )

Returns a LayoutStats object containing statistics about the memory layout if deeplayout( x ) were called.

The returned object includes:

  • bytes: Total size (in bytes) of the data that would be packed.
  • blocks: Number of individual arrays identified.
  • span: The distance between the minimum and maximum memory addresses of the data.
  • reduction: The potential reduction in memory span (span - bytes).
source
MemoryLayouts.visualizelayout โ€” Function
visualizelayout( x; exclude = Symbol[], width = 80 )

like layoutstats but also provides a graphical representation of the current memory distribution of x.

source
MemoryLayouts.deepvisualizelayout โ€” Function
deepvisualizelayout( x; exclude = Symbol[], width = 80 )

like deeplayoutstats but also provides a graphical representation of the current memory distribution of x.

source