Cover photo

Package Management Wars

A New Hope...

Over the past 7 years, I have been involved in various efforts to try and combat the widespread issues with managing code for Ethereum applications. Most notably, I have contributed to the development and adoption of the EthPM v3 standard, first with Brownie and then over the past 3 years integrating it as a core part of Ape.

I know as well as anyone that if you ask any type of seasoned developer, whether they are protocol developers that need to integrate with other applications, security researchers trying to understand the impact of potential bugs, or really just anyone trying to get an ABI to interact with a contract from a web page or python script, you will hear a litany of issues related to figuring out what the source code is and how to compile it and publish it correctly. It is probably the single biggest issue holding us back from truly making accelerated progress as an ecosystem, and is probably Ethereum's worst original sin that there was not a consistent way to share this information from within the protocol.

I probably can't sum up the current state of affairs any better than banteg has:

However, the biggest problem of them all is that there is a fundamental disagreement about what we can do about the problem. Many people think we just need one more tool to rule them all, but that's wrong because we already have dozens of tools with dozens of slightly different ways to accomplish the exact same tasks within different frameworks, and regardless no single tool is going to solve this problem sufficiently well for everyone. Some people think we just need to focus on how the data is being hosted, as if yet another SQL database is the solution to the problem, so long as we can force everyone to use a single database to reduce the problem to one that we can solve. A few out-of-the-box thinkers think that we can just leverage existing tools in specific ways to solve the problem better, and for the most part they are correct, except in small ways where we do have a specific set of requirements that most existing tools do not account for well.

No, the real problem is simply a problem of coordination, and we are not alone in having it. When you look at other programming language ecosystems, such as Javascript and Python, you can't un-see this coordination problem because the root cause is that developers think they can simply build their way out of the problem, and they don't realize that the first step to recovery is just deciding to work together and acknowledge that such a problem exists in the first place.

It is only with this enlightenment we can actually work to build solutions that truly push the needle.

Inventorying the Problem

Okay, so we've taken the first step of admitting we have a coordination problem, what do we do next?

The second step in solving a coordination problem is finding the shape of the problem. Earlier this year, we interviewed several prominent teams who were building at different parts of the packaging spectrum about their processes, problems, and ideas they had for solutions. We collected the results and made an analysis of them, and recently published that here:

It is suggested that you read the report for more information, but we can summarize a few of the points:

  1. Make finding sources, compiler settings, and build artifacts (such as ABIs) as trivial as possible

This one was our top finding. Almost everyone agreed that one of the biggest issues with DevEx across the entire stack is that it is difficult to find the information needed for downstream projects quickly, reliably, and correctly. In fact, they ranked it as the single highest issue preventing greater ease of developing new and innovative projects with smart contracts, for newcomers and advanced users alike.

  1. Make the solution flexible enough to handle complex environments, via old and new tools alike

There is quite a lot of history now to EVM development, and one of the biggest problems to any potential solution these days is it must be able to flexible enough both to handle past projects, and also "level up" Ethereum DevEx with new innovations for years to come. If it's not flexible enough, really it won't be as useful as it could be within developer's stacks as they work on increasingly complex projects, so the value proposition is not there to justify upgrading. Also, if it's too difficult to use, that will also be a barrier to adoption, and prevent us from getting the results we need to in order to create better patterns of behavior and unlock better DevEx for more advanced use cases.

  1. Avoid centralized chokepoints, but also avoid DevEx issues that decentralized solutions have

One of the biggest issues preventing greater use cases with smart contract data today is that the data ends up in centralized silos, from which it's hard to query across. If we want to be able to unlock massive wins on security and developer tooling, we need the ability to obtain large sets of smart contract data for machine learning, security trend analysis, and more. However, acknowledging that the data today lives in federated repositories which are already quite sticky to end-users, we need to create a pathway for greater interoperability of that data without accidentally threatening perceived competitive moats. In other words, our solution needs to show existing participants in our ecosystem why having a slightly smaller piece of a much larger pie is good for all of us.

If we use these desires as the starting point for assessing potential solutions to the problem, we can be sure that we will find something that most people find easy to use, flexible and powerful, and also compatible with all the ways that users want to interact with the data, both now and in the future.

Analyze Existing Solutions

The next step in finding a solution to our problem is that we should assess what current solutions already exist, and compare/contrast them to see if we can find areas of improvement or requirements a new solution must have to be compatible with what users have come to expect.

In terms of current solutions, there are couple of ways that people build and share sources, compiler settings, and build artifacts today:

  1. Publish their project to Github, usually just the sources, framework-specific compiler settings, and dependency information that other tools have to add custom handling if they want their own users to download them as dependencies within their own projects

  2. Publish their smart contract code build via block explorer verification, either by flattening code or through multi-file verification (in both cases manually specifying compilation settings), which others (primarily security researchers or those scripting in non-EVM environments) can consume

  3. Manually create some build artifacts locally to try and match what is expected to interact with on-chain (for example, using Curve from a Solidity project, or working with a contract that has a corrupted ABI published on etherscan or no ABI at all)

There are also a couple of recent developments we can check out:

  1. Soldeer shipping foundry-specific package management as a core feature within Foundry

  2. Vyper v0.4 will ship with a zip archive-based packaging of an existing project's sources and settings for near trivial rebuilding of any vyper contract (such as for verification purposes)

Additionally, we can think outside the box and compare our needs to those other open source packaging communities have adopted:

  1. NPM, PyPI (and other language communities) have quite large and successful repositories of source-available package data useful for composing within other projects using standards

  2. nix is a functional language that allows specifying how to build packages from scratch reliably and without dependency issues because you specify complete dependency information

Lastly, there is also EIP-2678 (aka EthPM v3), which was an attempt to solve this problem many moons ago, and (as I stated before) is in use within Ape (albeit in a slightly-modified form) to a great degree of success in reducing pain emanating from a multi-framework universe of different build systems we need to handle for our users.

Finding Patterns

Obviously, one of the biggest differences in comparing our needs to those of the packaging systems in use by some of the biggest language communities is that our ecosystem is inherently multi-language, meaning we have to take all comparisons with a grain of salt because we are well past the point where focusing only on a single language community is even practical for EVM development (with the rise of Vyper, Huff, many new language communities being built for L2s, etc).

That makes the recognition of something like nix interesting because it is inherently language-agnostic, which is critical for us both in terms of the EVM language we want to work with as well as the way we want to consume this information (for use within frameworks and devtools written in JS, Python, Rust, Go, etc.), and is basically a requirement for any prominent EVM devtool in this day and age. Still, something like nix may be way overkill for what we need as it's not always practical to ship what is essentially an entire functional OS as a dependency in all the different environments we need to work with today (such as the web browser).

What these all have in common though is that there is a single common package specification, so that even if multiple tools or languages are used, we can be sure that they work with the same data in a context-independent manner, no matter what tool is currently en vogue (and what other tools are no longer in style that we still need to work with).

This is what building in a future-compatible way looks like.


The second thing we can see is that there is a stark divide between "filesystem-like" archive formats (e.g. git/Github, PyPI/NPM, soldeer/vyper v0.4 zip archives, etc.) and "virtualized" document formats (e.g. JSON documents from REST APIs like etherscan, EthPM, etc.) that we can work with. It's also worth noting that while REST APIs (which communicate via JSON documents) can obviously be a convenient way to index and share metadata for packages, it's not actually a convenient way to share source files or even hierarchical information about file layout for reproducing a build. All "virtualized" formats end up needing to "re-create" the filesystem context based on the metadata that it has, which may run into difficulties where such metadata is under-specified.

Since most successful implementations of this idea use an archive format (the most common of which is a zip archive of a particular folder structure, with defined files in it, where the archive has a special extension), we probably want to use the same thing in our solution, but also be smart about what metadata we want to expose for easy access within the archives, for use cases such as querying packages from a repository or pre-determining what build artifacts we will have access to once we completely build the package.

Which takes us to...

Building A(nother) Standard

We likely have yet more to do in inventorying, analyzing, and finding patterns vs. what exists out there, but I am gonna walk through what comes next after we do that. Namely, everyone's favorite XKCD comic:

But seriously, we can already see that creating a standard will be an extremely useful to formally specify a format so that other tools and languages can use it, and we can avoid siloing information and unlock the potential that broad accessibility to smart contract data can give us. Thankfully, we already have the EIP process for doing this (most likely should be developed as an ERC), as well a previous examples (namely EthPM v1 v2 v3) that we can use to reference what information was deemed important to share in the past.

Since we are already talking about departing from the previous standard and adopting an archive-like format instead, we can probably not tether ourselves to the mistakes designs of the past and also establish a more reasonable scope for what such a format must satisfy and what it can leave up to future iteration. This will likely take many months, inevitably some bikeshedding, and likely a hot discussion or two, but ultimately what we are building is a strong, framework-independent, multi-registry compatible format that we can start to use to solve production issues, and not just for one particular user or use case but across the whole ecosystem. Doesn't that sound great?

Getting Standards Adoption

The last step in our journey is getting (mass) adoption of our new standard, which is where it goes from being another unfortunate piece of roadkill in a long list of poorly adopted Ethereum devtool standards, to ultimately what we need to hyper-accelerate Ethereum DevEx and create the smart contract data packaging of our dreams! Probably easier said than done, but fortunately we have already interviewed many major players in the Ethereum dev stack to get their input.

The only question that remains to be answered by them (and you, reading this article) is:

Will you join us?

Loading...
highlight
Collect this post to permanently own it.
ApeWorX LTD logo
Subscribe to ApeWorX LTD and never miss a post.
##ethereum#smart contract dev#engineering