This is in response to V's article on Scaling Farcaster:
Here are my stipulations, observations, and ideas:
Polynya is right about strict global consensus being expensive.
If you've been following Polynya's writings for some time, their main hobby horse has been advocating for rollups & the modular architecture, but a recent blog post of theirs on The Drawbacks of Strict Global Consensus had me thinking about Farcaster Hubs. Currently, FC's Conflict Free Replicated Datatype architecture is more 'eventual consistency' than 'strict consensus', but it is certainly 'not sharded.' Contrast this with RSS or the web writ large, which is naturally sharded due to its client-server architecture. Facebook or Twitter also do some amount of sharding internally, but this is more disk/db sharding of a single unified database that is still 'ACID'.RSS+: The original plan for Farcaster looked a lot more like Bluesky.
The original/v1 Farcaster protocol back in 2022 looked a lot more like Bluesky where each FID would have a 'home' Hub that was a URL that could be updated onchain. Basically, the Farcaster contracts wouldn't just be a canonical registry of FIDs, there would also be phone book mapping of FID to Home Hub URLs. While hubs solve a lot of problems, having an alternate place to download casts could be useful (see next point). Bringing this back would be useful for allowing PDS-like functionality, and Bluesky may want to do this independently for their own DiD purposes.Users have limited time & attention...they don't care about ingesting every message on the network, but the current architecture requires every hub to ingest every message in the hope that some of them are relevant to the users/customers. This improves discoverability but causes massive hardware and bandwidth pains. State expiry helps a little bit, but PDSs solve both for expiry (just keep your own messages as long as you want on your own server), and for relevance (just like RSS, you only subscribe to feeds you are interested in, so you own download data from those URLs during sync).
Gossipsub Hubs should be used for important things.
For a post that is heavily recasted or liked (to be clear these 'like' actions can live on the users PDS, so they don't have to also live on the hub): a SNARK proof could be generated to show that some N number of valid FIDs have liked the post, as a condition of inclusion to be published on the hub. Active/Powerbadged users or channels could also be automatically included onto the hub. The hub history could also be made shorter since most data would be available on PDSs over IPFS, the CRDT expiry could be as short as every 24 hours, since that's the natural cycle of our lives, and therefore, our news cycle.
What I'm advocating for is a hybrid system of Farcaster-esque GossipSub Hubs and BlueSky AT Proto XRPC Personal Data Servers. Convergent evolution is good! Casts should live on both. Hubs are really good for making sure everyone sees the same set of casts eventually, and PDSs might be great for light/mobile clients who want to efficiently find casts from users they care about, by going straight to the source.
In theory, these protocols could interoperate. Ethereum accounts are already a type of Decentralized Identifier ('DiD'), which is the root identity primitive on Bluesky. Some of the schemas for posts and likes may need some work to promote interop, but I think it would be worth it.
Timestamping
This section is more of an addendum that I couldn't fit logically with the other content above, but still on my mind.
Currently, neither Farcaster or Bluesky have a really solid solution for timestamping when a cast actually occurred. On Farcaster, the CRDT creates a sort of 'Merkel Tree across time' where each subtree contains casts from a specific time window (say, one hour), and hubs come to a consensus on whether a cast should go into the previous time bin or the next. Bluesky is even worse and relies on PDSs to self-publish a timestamp, similar to any other webpage. The suggestion is to 'just reject crazy values', which is not great for microblogging, especially since content can be very time-sensitive and worth caring about the real order of events.
The best solution here is Stuart Haber's and W. Scott Stornetta's 1991 Paper on How to Time-Stamp a Digital Document...where they introduced the idea of a chain of blocks, a blockchain if you will... Perhaps Farcaster could launch this blockchain thingamajigger and there would be a token economic incentive for maintaining the chain to timestamp casts. Anyway, don't say I didn't tell you so. 馃槒
Prior Art
I recommend this article from Bluesky engineer Paul Frazee, where he explains they might want to use some sort of onchain registry for DiDs (similar to point #2 from before) but he also goes into why he thinks pure p2p doesn't work: https://www.pfrazee.com/blog/why-not-p2p