A video is not a stack of images. It is an X-Y-T spacetime volume. Slitscan cuts surfaces through that volume — each spatial column of the output is sourced from a different moment in time, warping footage into shapes that no single frame can contain.
slitscan render input.mp4 output.mp4 --profile ramp --fill wrap
video → video with per-column time displacement
slitscan collapse input.mp4 photofinish.png --slit-position 0.5
video → single image, slit history accumulated over time
A normal video reads the X-Y plane at each time step T, showing every pixel from the same moment. Slit-scan breaks that synchrony. The engine maps each spatial column to a different T-slice, assembling a single output frame from many different moments.
The core operation reduces to one function evaluated once per output frame:
The vanguard edge (delay = 0) tracks the present. The lagging edge (delay = max_delay) looks furthest into the past. As the clip plays, this temporal rake sweeps through the footage — reading across a single output frame from left to right is equivalent to reading backward through time.
input frames: ←──── time ────→
[t-N] ... [t-2] [t-1] [t]
output frame at time t:
col 0 (vanguard) ← from frame t (delay=0)
col 1 ← from frame t-1 (delay=1)
col 2 ← from frame t-2 (delay=2)
...
col W-1 (lagging) ← from frame t-(W-1) (delay=W-1)
The pipeline is fully layered — each stage is independently testable and can be swapped without touching the others.
┌──────────┐ ┌───────────┐ ┌───────────────┐ ┌──────────────────────────────┐ ┌──────────┐
│ Decode │──▶│ Normalize │──▶│ Frame Buffer │──▶│ Engine │──▶│ Encode │
│ (PyAV) │ │ resize │ │ full · ring │ │ delay_map · profile · mods │ │ (PyAV) │
└──────────┘ └───────────┘ └───────────────┘ └──────────────────────────────┘ └──────────┘
A profile is a pure function delay_map(x_coords, output_t, params) → ndarray
that maps each band's position to a delay value. Three profiles ship with v1.
The vanguard position (0.0–1.0) controls which end of the frame leads in time.
| Profile | Delay surface shape | Character |
|---|---|---|
| ramp | Linearly increases from vanguard to opposite edge | Classic temporal rake. One edge leads, one lags. Reading across the frame = reading through time. |
| tent | Delay peaks at center, falls to zero at both edges | Temporal fold. Both edges are "now"; the center is furthest in the past. Creates symmetrical distortion. |
| reverse | Mirror of ramp (vanguard on opposite side) | Inverted rake. Right edge leads. Useful for comparing temporal direction or for paired motion studies. |
Left edge reads frame t, right edge reads t − 1279. The dancer's motion is sheared diagonally across time.
Both edges read the present; the center looks furthest back. Produces a bilateral temporal mirror.
Inverted ramp with a reduced spread of 150 frames — the right edge now leads. Delay range compressed for comparison.
The default axis is x — bands are vertical slices (columns),
delay varies left-to-right. Switching to --axis y makes bands
horizontal slices (rows), delay varying top-to-bottom. Same formula,
rotated 90°.
Vertical column bands. Horizontal motion shears the figure temporally across the frame width.
Horizontal row bands. Vertical motion — like the dancer's arms and legs — is sheared temporally top-to-bottom.
Tent profile on the y-axis. Top and bottom rows are "now"; the horizontal mid-band looks furthest back.
Same as above with --fill wrap, eliminating the black fill zones at clip boundaries for seamless looping.
--max-delay
--max-delay N sets how many frames separate the vanguard and lagging
edges. The default is extent − 1 (full width or height in pixels).
Smaller values compress the temporal window; larger values stretch it across
more of the clip's history.
With the 30fps source at 1280px wide, the default max-delay of 1279 frames spans ~42.6 seconds of footage in a single output frame. Reducing to 300 frames (10s) concentrates the distortion; the dancer's body remains more coherent while subtle temporal seams appear at phase transitions.
Tight temporal window. The figure is recognisable; distortion is subtle shearing.
Wide spread. The trailing edge is 30 seconds behind the leading edge within a single output frame.
Every pixel column references a unique frame. The full 992-frame clip is visible in a single output frame as a temporal panorama.
--slice-width
By default each band is 1 pixel wide, producing a smooth temporal gradient.
Increasing --slice-width groups pixels into wider bands that share
a single source frame, making the discrete temporal structure explicit.
Wide bands reveal the underlying mechanism: each block is a vertical strip
taken from a specific frame, placed side by side. At --slice-width 1
these strips are individual pixel columns and the seams disappear; at wider
values they become visible as a temporal mosaic.
Smooth temporal gradient — 1280 individual column sources, each a unique frame.
32 wide bands of 40px each. The seams between temporal blocks become legible. Each block shows the same column of pixels from its assigned frame.
Banded with fill=wrap. The clip loops seamlessly at band boundaries, allowing infinite playback as an installation.
When a band references a frame before index 0 or after the last frame,
the fill mode determines what appears. Five modes: black,
white, transparent, hold, and wrap.
Wrap is the key to seamless infinite loops.
| Mode | Out-of-range behavior | Use case |
|---|---|---|
| black | Solid black pixel | Default. Shows the clip boundaries explicitly as a fill zone that sweeps across the frame. |
| white | Solid white pixel | Same as black; for lighter-background compositions. |
| transparent | Alpha=0 pixel (RGBA output) | Compositing workflows. Requires .mov (ProRes 4444) or .png/.tiff sequence. |
| hold | Clamp to first/last frame | Freeze-frame at clip boundaries. Avoids black zones without looping. |
| wrap | frame_index % frame_count |
Seamless infinite loop. Out-of-range indices wrap around, so the output plays forever without a seam. |
With --fill black, the lagging bands reference negative frames
at the start of the clip — a black fill zone sweeps across the output from
right to left. With --fill wrap, those same indices are mapped
to index % 992, connecting the end of the clip to its beginning.
The output video has no seam and can play indefinitely in an installation loop.
Note: the black band visible in the --fill black examples below
is also partly a property of the source footage — the clip opens and closes
on near-black frames. This makes the fill zone especially clean here and
usefully illustrates the temporal boundary sweeping across the frame.
The lagging-edge fill zone sweeps left as the clip progresses and resets at the end.
Same parameters; every band is always populated. The clip loops without any visible seam.
Designed for continuous video art playback. The dancer's motion cycles through the temporal rake indefinitely — no hard cut, no fill zone, no visible loop point.
Any render parameter can be driven by an oscillator that varies as a function of output time. The modulation engine resolves new parameters before each frame, keeping the core formula unchanged.
Modulation patches are specified inline with --mod or
in a YAML file with --mod-file. Multiple patches can be
stacked on the same destination.
# Oscillate the vanguard position with a 0.1 Hz sine, ±0.4 amplitude slitscan render 2020.mp4 out.mp4 \ --profile tent \ --mod "vanguard=sine:rate=0.1hz,depth=0.4" # Oscillate max_delay (temporal spread breathes in and out) slitscan render 2020.mp4 out.mp4 \ --mod "max_delay=sine:rate=0.25hz,depth=400" # Both simultaneously — compound modulation slitscan render 2020.mp4 out.mp4 \ --profile tent \ --mod "vanguard=sine:rate=0.1hz,depth=0.4" \ --mod "max_delay=sine:rate=0.05hz,depth=300"
The temporal window expands and contracts sinusoidally. At its widest, the rake spans deep time; at its narrowest, the image nearly collapses to a single moment.
Same modulation with wrap fill. The breathing temporal spread loops without interruption.
The tent profile's peak oscillates — the temporal fold shifts back and forth, swinging the center of symmetry across the frame.
Two independent oscillators — vanguard at 0.1 Hz and max_delay at 0.05 Hz — produce compound motion with a beat period of ~20s.
The modulated vanguard on a y-axis scan causes the temporal fold to ripple horizontally, tracing the dancer's vertical movements through time.
The collapse command accumulates a single slit's history
across the entire clip into one image. Each frame contributes one column
(or row) at the slit position, laid sequentially — time becomes the
horizontal axis, space the vertical.
This is the technique of photofinish cameras and 19th-century chronophotography: a fixed slit records motion as it passes, the film advances, and the result is a spatial-temporal composite in which stillness appears sharp and movement creates smeared streaks. Different slit positions reveal different aspects of the subject.
Each image below is 992 pixels wide (one column per frame) and 720 pixels tall. The slit position selects which vertical slice of the original frame is accumulated. Moving the slit across the dancer's body traces different planes of motion.
Background and edge of the parkway. Sparse motion; parked cars produce faint vertical striations.
The dancer's cyclical movement leaves a sinusoidal trace in the image. Each repetition of the dance appears as a periodic waveform.
Centered slit through the dancer's midline. The full 33-second performance visible as a temporal panorama, left to right.
Same slit position with time accumulated right-to-left. The performance reads in reverse — the end of the clip appears at the left edge.
Edge of frame; cars passing on the right occasionally break through as horizontal streaks.
Wider slit averages multiple columns per frame, softening edge detail and integrating spatial information into the temporal record.
Switching to --axis y accumulates horizontal rows instead of columns.
The output is 1280 pixels wide and 991 pixels tall (one row per frame).
The slit position now selects a height in the frame — feet, waist, chest, or head —
revealing how different parts of the body move independently over time.
High-frequency stepping motion; feet leave rapid oscillating traces.
Lower-frequency lateral hip motion. The waveform period corresponds to the choreography's rhythm.
Composite torso motion — a blend of hip sway and arm movement at this height.
Arms and shoulders: wider lateral range, more complex trajectory than lower body.
Pass a .gif extension and the encoder switches to Pillow's
palette quantizer: 256 adaptive colors per frame, Floyd-Steinberg dithering,
infinite loop. No flags needed.
GIF encoding reduces each frame to an 8-bit palette independently.
The dithering introduces a characteristic grain that reads as motion artifact
and complements the temporal distortion.
Files are large at source resolution — use --resize and
--fps for web deployment.
# Output format is inferred from extension — no codec flag needed slitscan render 2020.mp4 out.gif --fill wrap slitscan render 2020.mp4 out.gif --fill wrap --resize 640x360 --fps 15
Classic ramp profile as an infinite GIF. 256-color dithering visible at full scale.
Modulated tent profile — the temporal fold shifts while the palette flickers at peaks of oscillation.
Y-axis scan as GIF. Horizontal banding from the row-slicer is amplified by palette quantization.
In standard slit-scan, each output column is gathered from its own column position in its source frame. The Trumbull mode changes the gather so that every output column is taken from a single fixed slit in each source frame — replicating the technique used for the 2001: A Space Odyssey Stargate corridor sequence.
The result: a single vertical slit of the source footage — one specific
column of the dancer, picked by --slit-source — is
gathered from 1280 different moments in time and laid across the full width
of the output frame. Rather than seeing a spatial panorama, you see a
temporal panorama of a single spatial point.
When combined with a ramp delay, the slit at position x=640 (center)
sampled from frames t, t-1, t-2, ... t-1279 is tiled across the
full output width, with each column showing a different moment. The
characteristic visual is streaks and halos radiating from motion paths through
the fixed slit — the figure's arms sweep ghostly contrails; static background
elements produce clean vertical bands.
# Slit at frame center, gathering ~42s of history from that one column slitscan render 2020.mp4 out.mp4 \ --slit-source 0.5 \ --profile ramp \ --fill wrap # Slit at left edge of dancer (approx.) slitscan render 2020.mp4 out.mp4 --slit-source 0.35 --profile ramp --fill wrap # Tent profile + fixed slit: bilateral temporal fold from one spatial point slitscan render 2020.mp4 out.mp4 --slit-source 0.5 --profile tent --fill wrap
The frame center slit (x = 640) is sampled from 1280 different moments and tiled horizontally. The dancer's limbs passing through the slit become horizontal streaks; the parkway background — stationary through the slit — produces clean repeating bands. Motion amplitude and timing become directly readable in the horizontal structure of the image.
output[x] = source[t − delay(x)][slit_x].