Argon 0.1.0
Loading...
Searching...
No Matches
Argon

License Archs C++ Version GitHub Actions Workflow Status GitHub Release Argon

Argon is a library that provides a sane, minimal-overhead C++ interface to ARM's NEON SIMD intrinsics.

Why?

Argon grew out of a desire to use C++ abstractions for vector intrinsics, without sacrificing expressiveness or specificity. Most C++ vector libraries (highway, xsimd) abstract away all platform uniqueness in favor of common denominators. When considering the use case of those libraries, namely HPC or multi-platform portability, that's a strength. Unfortunately, when you're targeting a very specific subset of those architectures, or you have one architecture that needs to be optimized above all others, such abstractions fall short.

This is where Argon was created to exist: A NEON-first, others-second zero-overhead abstraction library with modern constructs and capabilities.

What Argon is:

  • A low-level intrinsics abstraction layer with sane naming conventions and overloaded functions
  • Higher-level encapsulated vector object types built on that layer

What Argon is not (yet?):

  • A library of optimized commonly used algorithms

API Documentation

Documentation on usage and internal details (generated by Doxygen) can be found here

Examples

Object-oriented Argon interface

#include <argon/argon.hpp>
#include <iostream>
Argon<float> triangle(Argon<float> phase) {
phase = argon::ternary(phase > 0.5f, 1.f, 0.f) - phase;
return Argon{-1.f}.MultiplyAdd(phase.Absolute(), 4.f);
}
int main() {
float frequency = 440.f;
float phase_increment = frequency / 44100.f;
Argon<float> phase = Argon{0}.MultiplyAdd(phase_increment, Argon<float>::Iota(0))
Argon<float> wave = triangle(phase);
// Print the result
for (size_t i = 0; i < result.size(); ++i) {
std::cout << "Result[" << i << "]: " << result[i] << std::endl;
}
}
Definition argon_full.hpp:24
static ace argon_type Iota(scalar_type start)
Definition vector.hpp:154
ace argon_type MultiplyAdd(argon_type b, argon_type c) const
Multiply two vectors and add a third vector.
Definition vector.hpp:441
ace argon_type Absolute() const
Get the absolute value of the vector.
Definition vector.hpp:557

Low-level NEON interface

#include <argon/arm_simd.hpp>
#include <iostream>
int main() {
// Example: Multiply-add two vectors using NEON
float32x4_t a = {1.0f, 2.0f, 3.0f, 4.0f};
float32x4_t b = {5.0f, 6.0f, 7.0f, 8.0f};
float32x4_t c = {9.0f, 10.0f, 11.0f, 12.0f};
// Perform vector multiply-add
float32x4_t result = neon::multiply_add(a, b, c);
// Print the result
for (size_t i = 0; i < result.size(); ++i) {
std::cout << "Result[" << i << "]: " << result[i] << std::endl;
}
return 0;
}

Key Differences between NEON and MVE

NEON MVE
Number of quad-word registers 16 8
Narrowing support (see note) to lower double-word register to even or odd scalar registers
Double-word instructions :heavy_check_mark: :x:
64-bit floating point :heavy_check_mark: :x:
Predicated lane instructions :x: :heavy_check_mark:
Native Scatter-Gather :x: :heavy_check_mark:

Narrowing support

  • NEON supports double-word and quad-word operations, MVE only has quad-word.
    • This means that operations that would result in a double-word (such as narrowing) instead go into either the top or bottom registers of the current quadword.
    • i.e. in NEON, an int32x4_t narrow to int16x4_t {x, x, x, x} => {x, x, x, x, o, o, o, o}
    • while in MVE they are placed in the bottom or top of a pair: {x, x, x, x} => {x, o, x, o, x, o, x, o}

Backend Support

Backend Architectures Status Notes
ARM NEON (ARMv7) VFPv3, VFPv3-FP16, VFPv4 :white_check_mark: Primary target
ARM NEON (ARMv8+) AArch32, AArch64 :white_check_mark: Primary target
ARM MVE (Helium) ARMv8.1-M :warning: Secondary target (in-progress)
SIMDe x86-64(SSE2/AVX), RISCV :white_check_mark: Tertiary target, used for portability and testing

Compatibility

Argon can be compiled using the following tool sets:

Compilers:

  • GCC 14.2 or later
  • LLVM Embedded Toolchain for ARM 19 or later (ARMv7-A)
  • LLVM/Clang 20.1 or later (AArch32/AArch64)
  • MSVC 19.44 or later

Testing is currently done across a range of platforms and compilers, including:

Architectures

Compiler ARMv7 ARMv8 ARMv8.1-M X86-64
GCC :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
Clang :x: :heavy_check_mark: (TBD) :heavy_check_mark:
MSVC :x: (TBD) :x: :heavy_check_mark:

Target ABI

Compiler Bare-metal Linux macOS Windows
GCC :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :warning:*
Clang :warning:* :heavy_check_mark: :warning:** :heavy_check_mark:
MSVC :x: :x: :x: :heavy_check_mark:

*: Windows/GCC (via MinGW64) and Bare-metal/Clang (via the LLVM Embedded Toolchain for ARM) are used regularly but not tested via CI.

**: In order to compile with Clang on macOS, you'll need to use the brew-bundled versions of libc++, as Apple's system libraries do not support required C++23 features.

Host Platform

Host ARMv7 ARMv8 X86-64
Linux :warning: :heavy_check_mark: :heavy_check_mark:
macOS :x: :heavy_check_mark: :heavy_check_mark:
Windows :x: (TBD) :heavy_check_mark: