Cover photo

Arbiter - EVM Logic Simulator for Security and Performance Testing

Designed to execute EVM logic with full parity as quickly as possible


This article is a written breakdown of the Primitive Finance team’s seminar delivered at Spearbit on Arbiter - a tool designed to execute EVM logic with full parity as quickly as possible. It is able to do this by interfacing with the Rust implementation of EVM called revm and can simulate price processes and price action with smart contracts under a DeFi Context.

This seminar was delivered by:

Spearbit is a decentralized and industry-leading blockchain security services firm pairing protocols with top security researchers with deep subject matter expertise to identify vulnerabilities in an ever-evolving landscape.

This article is not meant to be a complete replacement of Primitive Finance’s seminar and instead as supplementary material to support deeper understanding. We highly recommend watching the video seminar as well.

Seminar Link:

Why should you use Arbiter?


When the team compared Arbiter to other existing tools, they found that there are similar solutions, however, there are some key advantages with Arbiter.

Note: The team notes that the frameworks mentioned above have their own use cases per their own relevant areas of content. However, when assessing EVM logic for security or performance-related testing, Arbiter shines in certain key areas as shown above.

  1. Open-Source vs. Service Based: Arbiter is an open-source tool, whereas other solutions may operate as service providers, and is openly accessible for modifications or adaptations by its users.

  2. Historical Back Testing: We've incorporated historical back-testing capabilities into Arbiter. However, whether other services have similar features is not definitively known.

Arbiter works alongside Foundry using its Forge tool. Foundry, though, isn't a comparable tool as it doesn't aim to offer functionalities like forward testing or the use of back testing data for simulations.

A more comparable open-source project is Apeworx, which allows users to build on top of an interface similar to Anvil. Users can run contracts directly and simulate transactions as if on a live chain, enabling the creation of forward testing suites. However, Apeworx does not inherently provide back testing data or agent-based simulation capabilities.

In essence, while similarities exist, Arbiter stands apart due to its open-source nature, comprehensive testing capabilities, and the ease with which it can be adapted to fit user-specific needs.

What does Arbiter do?

Arbiter can simulate stochastically generated price paths and include multiple agents, each with their own predefined traits. Thus, allowing users to test composable environments:

  • All output from simulations is recorded, and data can be post-processed, visualized, and analyzed statistically.

  • The tool is designed to be general so that users can interact with real running networks like mainnet or layer twos.

  • Users have the ability to filter and display logs that come out of the Ethereum mainnet, watch these logs, have agents act upon them, see historical data while taking it into account when making decisions, and bundle transactions to send on-chain.

Motivation Behind Building Arbiter

The impetus to develop this new tool was born from the recognition that pre-existing solutions could not adequately cater to certain performance-centric needs. Despite the existence of various tools with similar functionalities, none completely fulfilled the development team's criteria. Coming from an academic background - the team had a keen interest in being thorough and in assessing mechanism design, smart contract vulnerabilities, and DeFi primitives during the development process.

Uniswap V2 Example

Let’s introduce an example using Uniswap V2 to showcase the current capabilities of the Arbiter tool. Uniswap V2 is a protocol that has been rigorously battle-tested and proven reliable, thus it was used to illustrate the tool's simulation testing capabilities. Uniswap V2 is not expected to have any bugs. However, due to the direct running of contracts through the Arbiter tool, there might be an opportunity to uncover potential glitches. These could arise from repeated swapping or adding liquidity, potentially revealing unexpected reverts or unusual states derived from peculiar price paths. The team seeks to uncover these anomalies in their testing process.

The team hypothesizes that Uniswap V2, while a generalist, may not be optimized for specific types of strategies, such as volatility harvesting on a stable pair like USDT and USDC. To test this, an environment is set up using the Arbiter tool.

In this environment, let’s say we establish a single pair of tokens – Arbiter Token X and Arbiter Token Y. A single Uniswap pool is deployed and initialized. Additionally, a dummy contract, referred to as the 'Liquid Exchange,' is also set up.

This contract does not impact the price when swapped against, which makes it a useful testing tool. It also allows the seeding of prices into the environment for the arbitrager - the agent operating in this setup.

The goal of the arbitrage agent is to detect and rectify any disparities between the Liquid Exchange and the Uniswap pool, restoring price alignment through arbitrage trades. This creates a classic scenario that allows for performance recording and observing the potential gains over time. Through this, the team can observe all the price action taking place and monitor the resulting outcomes.


The configuration of this scenario involves the implementation of a price process known as the Ornstein-Uhlenbeck process. This mean-reverting process simulates the behavior of stable tokens, where the mean price is set at 1 and stochastic processes ensure that the price hovers close to this value, demonstrating the typical behavior of a stable token's price.

Key parameters of the setup are as follows:

  1. A capital T amount of prices are defined to run over each price path.

  2. For each seed process, M different paths are established.

  3. N different parameter sets are varied over per process.

Configuration Results

This configuration leads to M N unique scenarios that run, resulting in a total of M N * T price updates. These updates facilitate the observation of any reverts or other potential anomalies.

On the practical side, running this simulation through the Arbiter tool is a straightforward process. As shown in an example from a team member's terminal, by inputting the command for a Uniswap simulation, the system quickly performs the requested operation. In a case of running around 100 different simulations with a thousand price processes, it took approximately 1.4 seconds, resulting in a clear and concise output.

Visualization and Analysis

After generating the output, it's crucial to visualize and analyze the results to ensure the intended operations are functioning correctly. The first priority is to confirm whether the arbitrage agent is operating as intended. To do this, the team inspects recorded data from the simulation.

Simulation Results and Observations

  • In this simulation, the Liquid Exchange and the Uniswap pool are both changing prices due to the arbitrage agent's actions. The agent swaps against both entities, aiming to make a profit and equilibrate the two entities' prices. Because these swaps are consistently occurring, the liquidity provider (LP) earns fee revenue over time through Uniswap V2's 30 bips fee mechanism.

  • The data also records the reserves for both the X and Y tokens, which fluctuate significantly as the arbitrage agent continues to swap. However, the LP's liquidity consistently increases over time, aligning with the team's expectations.

  • This testing phase yielded no bugs in Uniswap V2, which was in line with the team's initial assumption. The performance recorded was as expected. The team believes that the testing capabilities of the tool can also be used to compare the performance of Uniswap V2 with other, potentially more optimized exchanges such as Uniswap V3 or future offerings from Primitive.

Design and Performance

Design Decisions and Performance

For users interested in designing their own simulation using Arbiter, the process involves three main steps, each with its own sub-steps:

  1. Installing Git Submodules: The first step in the tool's architecture is installing your Git submodules. This is accomplished with Foundry's Forge install. The team opted for this approach to enable Arbiter to work with any arbitrary smart contract repository.

  2. Generating Bindings: The next step is generating the bindings for the contracts. Under the hood, this is done using **forge bind**, but Arbiter's bash script provides an abstraction over these. From the Rust bindings, a user-friendly interface is created over the contract APIs, which aids in designing the simulation.

  3. Designing the Simulation: This final step is divided into three sub-steps:

    • Deployment of Contracts: The first sub-step is deploying all of the contracts your simulation will interact with.

    • Sending Initialization Calls: The second sub-step involves sending all initialization calls. These are operations such as token approvals or mints, enabling you to set the state of the Rust EVM to your desired starting point for the simulation.

    • Determining Agents and Price Processes: The final sub-step is deciding on your agents and the price processes you want your simulation to run against. This can be configured in a file that the team has provided an example of for easy use.

This three-step process reflects the thoughtful design and architecture of the Arbiter tool, which has been built to provide a flexible, user-friendly environment for designing and running simulations on smart contract behavior and interactions.

Simulation Design Space

Over the past three to four months, the development team has primarily focused on the Simulate crate. This crate, the largest in the tool, houses most of the features discussed so far, excluding the command line interfacing.

  1. Stochastic Module: This module contains the price processes the team has been discussing. The plan is to expand this module to include other agents that also behave stochastically, such as retail traders, who can often behave unpredictably.

  2. Agent Module: All the traits and behaviors of agents are located in this module. The team plans to continue developing and expanding this module significantly.

  3. Contract Modules: These modules serve as the interface with the Rust EVM. Their purpose is to efficiently deploy contracts and keep track of their deployment status and locations.

  4. Environment Module: The Environment module handles transactions and database requirements necessary to track state changes.

  5. Historic Module: A relatively newer module, 'Historic' is designed to interface with back-testing data or historical data.

  6. Manager Module: The Manager module is effectively the orchestrator of all the components. It manages all agents and the environment, allowing these entities to work cohesively to run simulations, as demonstrated in the previous example.

The Simulate crate's structure is integral to the function and efficiency of the Arbiter tool, showcasing its extensive capability for smart contract simulation.

Forward Testing

The Arbiter tool has demonstrated robust performance in forward testing. Even in an unoptimized environment with dev compile settings, the tool has been able to process over a thousand swaps per second. This measurement was based on data derived from two different stochastic price processes.

In the future, the development team plans to enhance the tool by adding jump diffusion to its stochastic price processes. This feature will allow the simulation of larger volatility events, further expanding the tool's capability to mimic real-world market scenarios.

When compared with the capabilities of tools like Anvil, Arbiter, according to the team, demonstrates a substantial increase in performance. It can process orders of magnitude more swaps per second, underlining the tool's superior processing power and performance efficiency in smart contract simulations.

Repository Architecture

The repository architecture of the Arbiter library is designed to be simple and intuitive, mainly consisting of two primary directories at the top level.

  1. Contracts Directory: This is where git submodules are installed. Any submodule can be installed in this directory, offering flexibility and ease of access for contract management.

  2. Lib Directory: This directory contains two crates - the 'Simulate' crate and the 'On-Chain' crate. These crates are fundamental components of the library, providing the core functionalities and capabilities of the tool.

Lastly, the Arbiter binary is present at the top level, leveraging these resources to perform the operations and tasks necessary for smart contract simulation. This simple and efficient architecture makes it straightforward for users to interact with and use the Arbiter tool effectively.

Looking Forward

Reflecting on the progress and future plans for Arbiter, let's delve into the journey the Arbiter team has embarked upon and their forthcoming intentions.

  • Beginning of the year: At the onset of this year, the Arbiter team dedicated considerable effort to construct scaffolding around revm, learning to interface with revm in an efficient manner. This process was characterized by a steep learning curve, with the team grappling with various challenges initially.

  • End of Q2: Recently, the team has been successful in bringing simulations to life using example data, marking a significant milestone in the project. This achievement allowed the team to accomplish their minimum viable product (MVP) and primary feature set.

  • Q3: In the current quarter, the Arbiter team is focusing on resolving their technical debt and collaborating with various teams and individuals to enhance the usage of Arbiter. Their goal is to improve the usability of Arbiter and extend the test coverage of the repository.

  • Q4: Looking towards the final quarter of this year, the team is planning to concentrate on introducing new features. This will encompass developing new agents, promoting ecosystem modularity, integrating with other valuable tools in the ecosystem, and crafting post-processing tools for advanced statistical analysis and visualization.

The Arbiter team is committed to continuous improvements and looks forward to the growth and evolution of Arbiter in the coming months.

Community Involvement

The Arbiter team has been committed to open-source principles from day one, valuing the shared learning experience and the potential of community-driven development. They see a profound importance in community interaction and aim to build tools that extend beyond their own utility to assist others within the ecosystem. Arbiter, being a relatively new project, offers ample opportunities for enhancements and growth. The team acknowledges that many individuals within the community might have fresh perspectives and innovative ideas to enrich the tool further, ideas that may not have yet occurred to the team itself.

To facilitate such community involvement, the Arbiter team, has devoted substantial time and effort to making the repository welcoming and accessible to new contributors. They have striven to thoroughly document issues, maintaining a range of 'good first issues' for newcomers to tackle. The team is more than willing to engage in discussions with anyone interested in contributing to the project or those who have particular features in mind that would address their specific use cases. They hold a sincere commitment to the betterment of the broader ecosystem.

FAQ and Spearbit Q&A

Q1: Can you command and upload its state into revm?

A: Yes, it is indeed possible and not too challenging to upload its state into revm, but it's not a feature we have built around, so it needs to be done manually. There is a great example of how to do this in the revm repository, maintained by Dragon. As for incorporating this feature into Arbiter, it is high on our priority list. We plan to abstract this feature, making it very user-friendly.

Q2: Can you model block space?

A: You absolutely can model block space. The power of interfacing with revm directly is that it gives us fine granular control over what we can do, thus nothing is really outside of our reach. The only concern would be how challenging it is to ship that feature and how much of a priority it should be.

Q3: Can you build complicated simulations?

A: Building complicated simulations is possible, and we have somewhat modularized this process. However, we aim to further increase the usability and ease at which users can design these simulations.

Q4: Who are the primary users of Arbiter?

A: When considering the primary users of Arbiter, four categories come to mind:

  1. Sophisticated financial actors for on-chain activity, including searchers and different funds involved in statistical arbitrage.

  2. Mechanism auditors, who analyze risk even in formally verified contracts.

  3. Economic mechanism designers building their own protocols who want to evaluate the performance of features iteratively.

  4. Academics in the space, such as those from IC3 at Cornell, who may use Arbiter for research as our industry continues to mature and become more studied in the academic domain.

Q5: Have you had any issues with Primitive types mismatch?

A: Initially, we did experience issues with Primitive types mismatch. Breaking changes with ethers three or four months ago were also challenging for us. However, after discussions with Dragon and Georgios, it seems we're converging on a singular types library and we always strive to use the latest stable versions.

Q6: What can you do that is more granular than when using a higher-level tool like Anvil?

A: Arbiter provides significant advantages in performance, allowing us to build scenarios that would be impossible to model with rigor for smaller scoped simulations. It offers a huge edge with a 20x to 200x speed and performance increase when compared to Anvil. Backtesting and forward testing are not available out of the box with Anvil, but we can perform these with Arbiter.

Q7: How much performance does this tool introduce over Anvil?

A: Preliminary testing has shown orders of magnitude speed improvement over Anvil, with a thousand swaps being processed in a second, in comparison to a thousand reads from Anvil taking 20 seconds. Further rigorous analysis of performance and speed is planned.

Q8: Can this tool be used for economic reviews?

A: Arbiter can be a powerful tool for studying economic performance, portfolio rebalancing strategies, and evaluating economic risk during Black Swan events. Due to its performance and speed, it can also be a valuable tool for evaluating the security of contracts.

Q9: Any recommendations on what stochastic processes work well in the context of DeFi?

A: While geometric Brownian motion can be useful for studying short bursts, volatility changes over time make it less accurate for long periods. The Ornstein-Uhlenbeck process is great for testing something like a stable token, and jump diffusion processes are useful for testing price shocks. Poisson processes that model wait times could be used to model retail agents who trade stochastically over time.

Q10: When will Monte Carlo simulation support be available?

Answer: The timing depends on the specific type of Monte Carlo simulations users are looking for. The team is open to community feedback on what features to build, so users are encouraged to open an issue if they want Monte Carlo simulation support.

Q11: Did you write some really low-level assertions such as testing for access memory or some weird Behavior?

A: Yes, we do have a variety of tests but admittedly our test coverage right now is pretty low. We've been sprinting to ship this MVP (Minimum Viable Product) and our focus over the next quarter is going to be on more extensive testing.

Note: We are fortunate that RevM (which Arbiter depends on) is being used in The Ref client and is well maintained and tested. It's a critical dependency to the greater ecosystem. If you're not familiar with RevM, it's the EVM (Ethereum Virtual Machine) stack machine written in Rust.

Q12: Who are the main potential users of Arbiter?

A: We see a significant interest from arbitragers and researchers due to the high performance of Arbiter and the quantity and quality of data it can produce. Those familiar with high-frequency trading in traditional finance and what goes into statistical arbitrage may find the value of Arbiter more obvious.

Note: The backstory of Arbiter is rooted in MeV (Miner Extractable Value) and LP (Liquidity Provider) performance. We want users to be able to build agents, deploy them in a simulation environment, and then do live network testing.

Q13: Are you open to suggestions or contributions for the project?

Answer: Absolutely. We are always open to feedback and collaboration. If you're interested in specific features or want to contribute, feel free to open an issue or start a discussion on GitHub. Alternatively, you can reach out to us directly through the Primitive Community Discord. We are more than happy to discuss further and are excited to work with the community to make this tool as valuable as possible.

Looking for a Security Review?

Please contact us below via our submission form.

You may also reach out to us via our Twitter.

For a brief overview of what Spearbit is and what we have to offer click here.