The limitations of black box reputation scoring and the rise of trust networks

In light of the recent Farcaster conversation on using black box algorithms for access gating based on reputation, we interviewed Dan (web3pm.eth). Below is a summary, followed by the full transcript.

Summary

While any system for gating access to anything of value (perceived or real) will be divisive, black box reputation scoring systems tend to be frustrating due to their lack of transparency. However, not all open systems are equal either. Systems that rely on behavioral (e.g., did you do X in the last 30 days), as opposed to demographic definitions (e.g., are you a member of Y society) are generally perceived as fairer.
Cross-ecosystem reputation scoring is still nascent because interoperable identity is in its infancy. There are only a handful of networks where reputation is open right now, with most of them coming from the crypto ecosystem. As a result, our tools for evaluating reputation are also quite limited and mostly reimplementations of previous generation scoring systems which were built for single closed ecosystems.
Future systems for evaluating reputation need to become much more nuanced. We have new data inputs, new evaluation models, and new application surfaces where these models can be used. Every application and ecosystem will ultimately implement its own, though implementations may share inputs, models, or surfaces with others.
Trust networks are how the closest knit groups operate in the real world (e.g., invite-only societies and organizations). They also have a unique advantage through the reputational staking that occurs when one member invites another. Only recently have we begun developing the tools to recreate these types of networks digitally. We are very excited about this at Icebreaker because we believe such tools can take the best aspects of the cozy web and make it interoperable, anywhere on the internet.

Full Transcript

esteez.eth: So yesterday this cast caught my eye.

I’m curious what your opinion is here. I think it was in response to David Furlong’s new project Larry, which currently requires a Neynar score of 0.95 or higher to participate. What's your first reaction to that critique? Or maybe an easier way to start is asking if that’s fair criticism?

web3pm.eth: Yeah.

esteez.eth: And I’m also curious if your POV would be different if the Neynar score was open versus closed, and you had a level of transparency around the algorithm. I’m pretty sure this is your favorite topic.

web3pm.eth: Yeah, so first of all, anytime you have an airdrop, or access to an airdrop, or some sort of trading opportunity that you're using some criteria to gate access to, you're going to have people who don't have access feel very angry that they didn't get it. And so if you are using a gate at all, if it is some sort of objective criteria that a user cannot directly qualify for after the fact, then you're going to have people that are unhappy because of FOMO and the desire for gain. And we see this already with existing airdrops. Every time a new protocol launches a token, there are always people that are very angry. And some of these are even driven by networks of people or bots or organizations that are trying to influence the protocols to change the definitions to benefit them. So, it's also hard to tell what is really authentic or not. But in principle, if you're a user and you do not qualify for whatever the thing is and you feel like then you should, you're going to be angry. So I think this is an inherent challenge with any sort of gate that you employ.

esteez.eth: It’s a great example of why you should study humans, not tech. Especially since it’s a financial incentive.

web3pm.eth: Yeah. I think that right now we are still in the early phases of using interoperable identity and reputation. I think onchain is certainly an important component here, but there's also offchain data and pseudo-onchain identity data, like Farcaster. So we're still in early phases and we're figuring out that as a result, we don't have that many tools we can use for gating. There's basically Icebreaker, Neynar, and what used to be Airstack but is now Moxie. There's onchain data like token holdings and POAP and ENS, but beyond that, there's nothing that open to use. There’s of course data from closed ecosystems like Twitter, LinkedIn, or Google, but none of that is super open, and that means that the set of tools we have right now for determining who can get access to something are limited. The easiest thing to use right now are scores, but these have inherent limitations they are based on some algorithm.

I think that we get into hotter water when we are choosing a score that is based on some sort of closed algorithm that is controlled by a company. If you think back to one of the reasons Bitcoin was successful, it’s because it had this sort of immaculate conception in the early phases where there was no gating. Anybody could participate and there's this feeling that it was inherently fair the way that it launched. One of the complaints that Bitcoiners have against Ethereum is almost this sort of “original sin” that it had this insider sale because of the ICO.

So I think with any sort of closed algorithm, people are always going to be unhappy if they're not eligible and they're going to have something to point to as to why it’s unfair. And so that's what I think we saw also manifesting in this cast. Neynar could be doing this as a public good, or not, but the fact that the score is a closed algorithm with no visibility or transparency into what signals they're using to compute the score, is I think what undermines trust in using the scoring system as a way to gate access to something that has value, whether it's a token or an opportunity or an ecosystem something like that. So, I think that not only is the score a challenge, but also the way that the score is made is a problem. This isn't to say that everything needs to always be open. It's just that if you are dealing with gaining access to an opportunity that has financial value, perceived or real, you are going to have a harder time if you are choosing a closed system versus an open system, even if you're using a score to ultimately dictate this.

esteez.eth: Because of the trust issue.

web3pm.eth: Yeah, exactly. The trust and transparency issue. I think if we go even further there's one additional distinction that I would make in thinking about fairness from a psychological perspective, and that is the difference between demographic and behavioral data. Demographics are based on who you are. Usually immutable traits, like what school you went to, how much money is in your bank account, right? These are things that may change slowly over time, but generally it's not something that you can change very quickly.

esteez.eth: Okay.

web3pm.eth: Whereas behavioral traits are based on what you do. So an example of a behavioral trait or just a behavioral signal would be: have you used this protocol 10 times in the last 10 days, or have you used Farcaster at least three times in the last year, or have you gone running for at least five miles in the last year, right? These are things where you can do them regardless of whether you went to Harvard or you're part of a secret society or not.

And I think that the reason that this distinction matters is because there is a much higher perception of fairness when you choose a behavioral definition as opposed to a demographic definition. And so again, the problem with this score is that it's not clear whether it's a demographic definition or not - like if you have at least $100,000 of onchain assets in your wallet, it’s easy to think about how that's kind of unfair, right? It unfairly privileges the people who've been around for a while, the people who have a lot of money. It's not easy to change if you're coming in new, versus if the main way to get that score is by being active every single day on Farcaster for the last month.

esteez.eth: Which would be a behavioral trait.

web3pm.eth: Right, and I think if you were to break down that score further and you were to say, we're going to make it completely transparent, there's still an argument to be made that the way that you manifest that score has a huge impact on the ultimate trust. Just because it's open does not necessarily mean that it'll be perceived as fair. And if you think about what makes it perceived as more fair versus less fair, generally demographic based definitions are going to be perceived as less fair and behavioral definitions are going to be perceived as more fair. Now this isn't to say that you should never use demographic based definitions. Right?

If you're trying to create an event for your college alumni association, you probably do want to gate to people who actually graduated from your college. But all else equal, people are generally going to trust behavior-based definitions as more fair.

esteez.eth: Which makes sense, because you can often influence your behavior but not your demographics. It’s just objectively more fair in that sense. I have another question for you. I reread the cast a handful of times, and of course my immediate takeaway is that it’s a great example of why human powered trust or peer-to-peer trust is just better than any sort of score especially if it's closed.

But it seems like, and perhaps this is why it was deleted, the main gripe of the cast is that ultimately it's a “VC tight grip.” The user made a point to call out that there is an investor overlap with Neynar and Farcaster. And Dan and V have also had a lot of public criticism around “shadow banning”, or lack of transparency into algorithms, like with power badges and boosts. So to me, I'm taking away something very different from maybe the plain text read of the cast, which is the critique that there is actually a very centralized cabal or center of power here that is actually not even so much an indictment of the lack of transparency around the score, but of the company making it, which I thought was interesting. I’m curious if that resonates with you.

web3pm.eth: Yeah. Mhm.

esteez.eth: I guess I’m thinking that this critique wouldn’t have the same impact if it wasn’’t on Farcaster, where there’s already a lot of chatter around this topic, and where Dan and Varun have faced a lot of feedback around this.

web3pm.eth: I think crypto in general has a number of very tight communities and there are a lot of ingroup / outgroup dynamics that are at play. I think Farcaster is kind of a particular example, but even if you were to move away from Farcaster and just look at Crypto Twitter, the same arguments can be made if you look at a crypto startup's fundraise page, they will then show all of the crypto insiders that have invested as angels in their projects. It’s almost like the same people are investing in all of each other's projects. I think that there is truth to that. Granted, when you are an outsider, you tend to overestimate the intensity and the closeness of those circles.

esteez.eth: Okay.

web3pm.eth: So I think the criticism would be made regardless of whether it is a practical inhibitor or not. But I think that the criticism is valid in principle, which is that when you have a black box approach in particular, people are always afraid of things that they can't see or understand, more than when they actually see them. And so I think that that is one of the reasons outsiders always overestimate the insider nature of whatever it is that's in question. We often fear the things that we don't actually understand.

It’s just a human bias that we generally have, which I think we've evolved to exhibit because every one in five times it will cause us to be appropriately paranoid, so it pays off. But unfortunately, that means the other four out of five times we just are living in a heightened sense of FOMO and anxiety. And that FOMO is basically na underlying worry about being abandoned by our tribe. One of our most deep-seated fears as human beings is this notion of becoming an outcast, and worried about everybody else moving forward without you. So, it's one of our oldest human impulses that evolved even before we became humans.

esteez.eth: Which is kind of a crazy concept to apply, but I do think it contextualizes what’s happening here. There’s a very intense sense of community on Farcaster, especially around builders and people that were early participants. It’s why having a low FID is still something that's talked about and has a lot of debate and lore around it.

There’s a lot of conversation about Farcaster being sufficiently decentralized. Obviously, it's a massive open protocol, and there have been a lot of people that have built on top of it. And so I wonder based on what you're just saying, if this FOMO is not so much a gut reaction to the Neynar score being unfair and the desire to want to participate in a new product launch, for example, but rather this idea that “I came here to be able to participate because it's open, and I had some sort of expectation that it wasn't going to be some be like every other web2 platform.”

web3pm.eth: Mhm. Yeah.

esteez.eth: I don’t know if you saw that other cast I sent, where someone clearly wants to participate, and is willing to work to influence or improve their score but either can’t or doesn’t know how.

I feel like Rish gets this question on a near daily basis. My understanding of the answer is that it’s basically “study good users and copy them”, or just simply have good interactions on Farcaster.

web3pm.eth: Yeah. If you’re an outsider, I don’t think that’s a satisfying response.

esteez.eth: Definitely doesn't do anything for trust either.

web3pm.eth: I mean that the analogy is like saying, "I see all these billionaires with private jets and their own islands. How do I become like them?"

esteez.eth: Right.

web3pm.eth: And the elite saying, “study them and emulate them,” when really they are who they are because they had all of these advantages to begin with. You can't just copy their behavior, because that's actually a demographic difference, to begin with versus like you can't achieve that by behavior.

In order to get to that demographic, if you even can achieve it, it requires a different set of behaviors than the ones that demographic is currently exhibiting. Which is a common thing with the founders, how when you see them after they become successful, they're living very different lifestyles than the ones they had when they were in the process of becoming successful. And so there's this false signaling.

So yeah, I think that that's not a good answer. I think it's not very fair and I think people are rightfully dissatisfied with that kind of answer. I think for that reason and also for a number of other reasons, the era of using scores for determining important stuff is maybe not over, but I think that we've essentially reached the limitation of what scores can tell us in many realms. And I think we have become overly reliant on using universal scores in order to dictate things. And we're actually moving towards an era where the more impactful scores are going to be essentially localized ones. So in other words, you shouldn't have to care so much about the Neynar score if you have a bunch of other ways of easily developing scores that are more application specific.

esteez.eth: Right

web3pm.eth: And so in the case of David Furlong's app, he could have launched it in a way where he was using his own score that was based on who the people that he trusts, trust. And that's a much more permeable set of members because there's more than one way to qualify for that list of users. You can get it from David or you could get it from somebody that David trusts, or theoretically, from anybody. And I think that those types of systems have just not been possible before because we haven't even had decentralized identity for more than two years really, in practice. We are now just developing the tools around making identity and user data interoperable, and usable onchain and offchain.

And that's like where I'm excited to take Icebreaker. Where it’s possible to expand the universe of the types of definitions and rules and scores, and the ways you can use them.

esteez.eth: Yeah, one thing that I’m realizing too as you’re saying this, is that it would actually be sort of interesting if a founder only let their trusted circle beta test their product, or could even be their trusted circle extended with second degree trust. It’s actually a great group of beta testers, because they’re far more forgiving, and would probably give a lot better signal and product feedback than all of these angry people casting about how they’re excluded from participating.

And then I start thinking around why using trust feels better and more fair? I think it's really because trust is almost entirely based on behavioral experiences and not ever really on demographic traits. I would actually think this happens all the time with Jack because Jack knows so many people where I'm like, “this person looks like a bot or gives me a weird vibe." And he basically says they’re not and knows their whole life story.

web3pm.eth: Yeah.

esteez.eth: And in nearly all cases, I just end up trusting them because I really trust Jack. And there's something really nice and serendipitous about having this transitive trust or de facto trust. And I guess it becomes one of the fairest ways of gaining access to something because it's not based on anything concrete or immutable. It is purely based on human experience.

I also want to touch on something you said around how we're sort of hitting the limit around scores and rankings. And it reminds me how every time we use a new platform or a vendor, right now I’m thinking about Intercom, everything is about being AI-enhanced, or every service has its own AI version.

There are a lot of companies that I think we used to consider competitors that had AI-driven network matchmaking features, and it always reminds me of a tweet I saw that’s something like “if AI is so good, why are the outputs or recommendations so bad?”

web3pm.eth: Yeah.

esteez.eth: I guess I should say they’re not all bad, but they’re definitely not all good. And it sort of begs the question, “if we know that these scores or rankings aren't that good or reliable, so is the reason we're still using them just because there hasn't been an alternative?”

web3pm.eth: We're using what we're used to and there hasn't been an alternative. I think AI will become better given its exponential improvement path. But every culture around the world has its own Golem myth of basically being afraid of what humans create and losing control over it.

And I think that part of what is scary with these automated systems that are black boxes is this underlying evolved fear about building something that we then lose control over, and that no longer does what it's supposed to do. There are plenty of AI systems that are extremely good at doing what they do, so I don't want to rule out technology as a means of achieving high accuracy for whatever it is that we need to do. But, I think that the point I want to emphasize is that we are at the dawn of an era where we now have a new toolkit of data that we can use to inform reputation as well as new algorithms we can use to propagate and calculate reputation, and new methods of applying reputation across different products and applications that previously were not feasible before. And this is largely driven by interoperable data and identity. As AI tools get better and better, it’s going to be extremely important as not only a complement, but also in some ways a solution to the increasing preponderance of AI throughout our digital and physical worlds.

So let me go back to this a concrete example of gating with a minimum Neynar score. There's only one path to get to that score, which is to basically show Neynar enough data that it will give you that score. There may be different things you can do to achieve that score, but all of those paths go through Neynar. With something like Icebreaker's qBuilder credential, there are actually infinitely many paths that one can take to achieve qBuilder status. And so just from the perspective, there are many more options when you are employing a decentralized architecture.

And let me also flip it to the perspective of the giver, and think about what types of gating generate the best result for your community. If you essentially outsource it to an algorithm, whether it's operated by your own community or by some third party in this case, like Neynar, you are basically relying on them to do the gating. But then once the user has passed that bar, they have very little incentive to continue acting well or to uphold the values of that community beyond what's strictly required for them to maintain the score to stay in the community, if you're continuing to evaluate that.

But when you do it with something like recursive trust networks or some other peer-to-peer attestation, you actually have not only the reputation of the person who is in the group on the line, but also the reputation of the people who invited them. They’re both on the line. And so if you think about who you expect to act better once they've been invited into the community, the person that just passed the strength test, or the person who is being vouched for by the three strongest people in the room.

You're going to have a lot more accountability when it's based on essentially a decentralized social attestation graph.

esteez.eth: I would argue there’s a third reputation, too, which is now the reputation of that group that is relying upon the collective behavior of its members to preserve the group’s integrity. I think about this a lot because there’s a lot of the chatter around there being cabals on Farcaster, and secret group chats on Frens that are pumping and dumping coins together. I remember feeling almost a sense of anxiety when I was invited into what I’ll call a high-signal group chat, like I knew that my reputation and the reputation of the person who added me needed to be upheld, and then by proxy, the reputation or integrity of the group.

It’s actually one of the reasons we shut down our Telegram and made it an update-only channel. When anyone could join, there isn’t really an incentive to participate or uphold your reputation. It’s why there’s so many tools and jobs around community and content moderation - there’s no incentive to actually behave well.

web3pm.eth: Yeah. I mean, go back in time before the internet existed and think about the clubs and social networks that existed. All of them, at least in part, are based on this notion that somebody or multiple people who are current members of the group have to essentially vouch for someone to join. And the more important that organization is, generally the more stringent the requirements are. So that not only do you have to be a good person, but also that even if you were thinking of relaxing your standards, you know that other people's reputations are on the line, so you're going to act better in that environment than you might otherwise in a world where there is no such accountability.

We never achieved that on the internet for a variety of reasons, but now we can start bringing that back. This isn’t to say that we want to recreate every single organization on the internet in this same way; it’s to underscore that the most powerful and trusted environments in the offline world are based on this notion of recursive reputation.

esteez.eth: That's great. It’s basically that classic Munger quote “show me the incentive and I’ll show you the outcome.” I would argue there are certain places on the internet where there’s actually an incentive to behave more poorly than you otherwise would have.

web3pm.eth: Exactly. Yeah.

esteez.eth: Like the places that feel like the wild west.

web3pm.eth: Right. It’s like that saying “when you're on the internet nobody knows you're a dog,” right? If you have complete anonymity, you can be as mean as you want, whereas in real life, in person, most of these people are often quite nice or at least agreeable. This isn’t to say that in order to have nice places on the internet you need to be doxed, but I think that that's another topic how you can blend aspects of anonymity or pseudonymity with reputation. And that is a case where zero knowledge cryptography is uniquely suited to help solve those things when you basically want to have aspects of your identity showing that you’re trusted by people that others would find trustworthy, but don’t want to reveal your full name, or address, or handle.

esteez.eth: Yeah it reminds me of a cast I saw in response to the recent activity from Anoncast, just stating that it’s sort of boring and uninteresting, almost like a “race to the bottom” vibe. The bar to post isn’t very high, the environment is low trust, and you have zero context for who is posting, so it’s just sort of uninteresting. I think it’s a really interesting implementation of anonymity and I’m excited to keep following it. But that critique stuck out to me because it felt like a disappointment with the quality of casts, in an environment with again, low trust, low context, and no incentive to post anything “good.”

web3pm.eth: I think this kind of brings us back to the original discussion around challenges with black box algorithms that determine membership in a community. If you again think pre-internet in the real world, the best communities are the ones where you understand the rules for how you get into the community, and you believe that those rules are good and hard and that they ensure that other people in the community share your values in some aspect. Like in a college community, if you go to the reunion for the class of 2012 at MIT, you can feel like you can go up to anybody else there, and even if you don't know who that person is, you feel like you have a lot in common, and like you have a level of trust with that person. It’s the same thing with a country club where everybody understands the conditions around getting in. They understand that it's hard, and that you can't just get in because you simply bought your way in.

Crypto is still in its early days and still maturing, but we’re finally getting to the point where we have a better way of creating digital communities where it's easy to know that anybody who is in a given community did something that was hard, and that not everybody necessarily can do or will do. That is how we bring the cozy communities we have in the physical world into the digital world. And it’s finally becoming possible, which is really, really exciting.

Dan | Icebreaker

Commented 7 months ago

@esteez.eth and I chatted yesterday about what this cast says about the state of social scoring and where we're going next https://paragraph.xyz/@icebreakerlabs/the-limitations-of-black-box-reputation-scoring-and-the-rise-of-trust-networks