Aria-HDL: a single .ahdl source through ten backends (updated, WIP)

fpga-meta-compiler-public is an HDL and Rust compiler called Aria-HDL. One .ahdl source, ten backends: Verilog, VHDL, Verilator bundle, SymbiYosys/SVA, Lean 4 proof obligations, CUDA, Metal, OpenCL, a JSON resource report, and an IR dump. Marked WIP in the README and here, honestly.

The name overlap with the gpu-backtest Aria DSL is the |> pipe token and the let / var binding shape. They’re separate compilers with separate frontends; don’t read “same language.”

Backends

From src/main.rs CLI flags and the per-backend files in src/:

Backend Flag File
Verilog --emit-verilog verilog.rs (+ verilog_lint.rs)
VHDL --emit-vhdl vhdl.rs
Verilator bundle --emit-verilator verilator.rs, verilator_driver.rs
SVA/PSL + SymbiYosys --emit-sby formal.rs, formal_aware.rs
Lean 4 --emit-lean4 lean4.rs
CUDA --emit-cuda gpu.rs
Metal --emit-metal metal.rs
OpenCL --emit-opencl opencl.rs
Resource report (JSON) --resource-report resource.rs
IR dump --emit-ir ir/pretty.rs

DSL surface (verified in token.rs / parser.rs)

  • Hardware primitives: module, pipeline (with |> stage boundaries), systolic, stream<T>, fifo<depth=N>, approx().
  • Types: int<N>, uint<N>, fix<W,I>, fp32/fp16/bf16/fp8e4m3, bits<N>, struct, enum (tagged unions), mx<T,N> microscaling blocks.
  • Bindings: let (combinational) and var (register).
  • Inline formal: assert, assume, cover, always, never, eventually are keywords in the token table — not comments or external annotations.

Most of that compiles through lowering.rs into the IR under src/ir/ (expr, module, types, formal) before any backend sees it.

Passes

Under src/passes/:

  • retiming.rs — move registers across combinational logic to balance stages. The scaffolding for Leiserson-Saxe is in leiserson_saxe.rs but most pass calls go through the plainer retiming for now.
  • c_slow.rs — trade latency for throughput by stage insertion.
  • cdc.rs — clock-domain-crossing enforcement. This one actually returns a hard Err (see cdc.rs:102, “errors” vector pushed and surfaced) rather than a warning; unregistered crossings fail the compile.
  • balance.rs — equalise latency across pipeline paths.

Examples that currently parse and emit

examples/ contains: alu.ahdl, blinker.ahdl, constrained_mac.ahdl, cpu8051.ahdl, crypto.ahdl, dnn_inference.ahdl, dsl_showcase.ahdl, pcie_mac.ahdl, pipeline_demo.ahdl, resource_demo.ahdl, sigmoid_lut.ahdl, tcp_ip.ahdl, plus ecp5_85k.arsc (device resource file) and an ONNX-text MNIST classifier used by onnx_import.rs. So the use-case families — systolic MAC, networking offload, small CPU core, activation LUT, crypto unit, ONNX import — have at least one worked example each.

GPU semantic emulation, honestly

The CUDA / Metal / OpenCL emitters are faster-than-RTL functional emulation of the compiled IR. It’s not a cycle-accurate replacement for Verilator — it’s a “does the semantics still do what I meant” check before you start a multi-hour Yosys or Vivado run. Useful for design-space sweeps (e.g. varying a PE grid) where you’re burning iterations on the wrong thing.

What’s real vs thin vs aspirational — being honest

The repo is WIP and so are several of the features. Ranking roughly by maturity:

  • Solid: frontend (lexer, parser, typecheck, lowering), the IR under src/ir/, the CDC pass returning real Err, the example .ahdl files parsing and type-checking.
  • Working but not tooling-validated: the Verilog and VHDL emitters produce output, and the Verilator bundle wires up a build flow — but I have not driven an example all the way through Yosys / Vivado / NVC to a bitstream or a finished sim run, so “synthesizable” is a claim about shape, not a claim I’ve watched a toolchain accept.
  • Thin: formal.rs is a light SVA/PSL + SymbiYosys scaffold, not a fleshed-out proof backend. The Lean 4 emitter produces type and obligation skeletons; closed proofs are not shipped. The default retiming pass is the plain one; the Leiserson-Saxe implementation is in the tree but not on the default path.
  • Aspirational / skeletal: pcie_emu.rs, host_cosim.rs, and the ONNX import path (which covers a partial op subset via a hand-written protobuf reader and a companion textual format) are starting points, not production pipelines. Don’t read them as working end-to-end.

The resource-report JSON (resource.rs) is the one feature I’d describe as “substantial and probably accurate for the design shapes currently exercised” — LUT/register/DSP/BRAM estimates plus an approx-unit Pareto front. But I haven’t cross-checked its numbers against a real Vivado report_utilization output for the same module, so treat the figures as self-consistent rather than vendor-validated.

What I’d do next

  • Drive one example all the way: .ahdl → Verilog → Yosys → bitstream on an ECP5, with a Verilator co-sim parity test in CI. That’s the first honest “end-to-end” claim. Today the compiler emits all ten outputs but I haven’t pinned a full toolchain journey.
  • Promote a Lean 4 emitted obligation into a closed proof for one non-trivial module (e.g. the systolic MAC from constrained_mac.ahdl), to demonstrate the backend does more than scaffold types.
  • Replace the default retimer with Leiserson-Saxe properly and benchmark against the current pass on the DNN example.
  • Add a GPU-vs-Verilator parity test on sigmoid_lut.ahdl so the “semantic emulation” claim has teeth.

Code @ github.com/zeta1999/fpga-meta-compiler-public.

Written on April 12, 2026