Network Science-Based “Fingerprints” in Gitcoin Grants
In our previous article in BlockScience’s ongoing analysis of the Gitcoin Grants ecosystem, we briefly looked at Quadratic Funding and the challenges faced by this resource allocation policy in providing fair allotments of matching funds to community grants. In this article, we will dive deeper into using cadCAD to identify specific instances of potentially collusive or fraudulent behavior and discuss the ramifications of policies aimed at reducing exploitive behavior in the Gitcoin Grants ecosystem. Article by Danilo Lessa Bernardineli, Michael Zargham & Jeff Emmett.
Before we Begin…
As a research organization dedicated to #OpenScience, we intend to approach the data analysis of the Gitcoin Grants ecosystem with rigor, and present the results to our audience as we work. For this reason, you will notice that this article follows the below format:
3. Methodology: data & model
4. Analysis of data
5. Interpretation & discussion
Another important thing to establish before we dive in are definitions for “collusion” and “fraud”, since these are among the primary concerns of the Gitcoin team in ensuring a fair and transparent public goods funding platform. According to Vitalik, Weyl & Hitzig’s academic paper on CLR, collusion is defined as “multiple agents acting in their mutual interest to the detriment of other participants”, and fraud is defined as “a single citizen misrepresenting herself as many.” These definitions are a little bit tricky analytically, since due to the saturation effect in grant rounds this implicates almost all community level mobilization in the Gitcoin Grants system as collusion, since any gains in one community are a detriment to the others when sharing a limited of sponsor funds. Alternatively, Wikipedia defines collusion as a “secret or illegal cooperation or conspiracy, especially in order to cheat or deceive others”, which is better for our purposes because it allows us to account for intent in our analysis.
Through the course of this article, we will begin to examine the collaboration patterns exhibited by various communities participating in Gitcoin Grants, to see if we can identify colluding or fraudulent behavior and how it can be mitigated.
There is an underlying “shape” of communities who use Gitcoin grants. Subjectively, this can take the format of:
- Dense communities, like the Commons Stack / Token Engineering cluster, where you have a high amount of correlated projects with overlapping involved contributors
- Highly integrated communities, like the Ethereum Foundation, where you have a high amount of connectivity without tight clustering
- Isolated communities, like DAppChaser and other regional projects, where you have a large amount of contributors on projects with limited connectivity to other grants
Our hypothesis in this article is that we can interpret these shapes as a “fingerprint” to pattern-match organic community behaviors to better identify exploitive behavior and reduce the ability to siphon away matching funds from the Gitcoin community at large. But before we are able to identify the network structure of a colluding attack, we first must understand the structure of the Gitcoin community. This shape can be detected quantitatively through the usage of network science techniques, like community detection algorithms, which we will explore further in the next section.
In our analysis we make use of several different tools and assumptions, which we will list here given the subjectivity they impose on our analysis.
- Currency aggregation
This analysis compresses all contributions to their USD value equivalent regardless of the currency that donations were made in. This is a simplifying assumption that follows how Gitcoin’s fund matching policies operate.
- Excluding matching funds
This report does not include rigorous analysis of matching funds via the Gitcoin Quadratic Funding algorithm, but rather focuses on the structure of the Gitcoin Grants network. Subsequent analyses can dive deeper into effects of matching algorithms.
- Community detection algorithm: Fluid CommunitiesFluid Communities is a propagation-based algorithm capable of identifying a variable number of communities in a dynamic network. They are based on the idea of fluids interacting in an environment, expanding and contracting as a result of that interaction. Fluid Communities is able to find communities in synthetic graphs with an accuracy close to the current best alternatives. We will refer to these communities interchangeably as ‘subgraphs’ in this article.
- Choice of community cut size: 5
We chose to cut into 5 sub-communities: you always need to start somewhere, and 5 is few enough to still get a sense of the big picture and a large enough number to make the structure interpretable. Keep in mind that these 5 subgraphs have cross links so they can be pasted back together to form the full contribution graph.
Although we simply picked a community detection algorithm and number of subgraphs to cut into for the purpose of rapid exploratory analysis, an object of our ongoing research is to perform more rigorous inspection of sensitivity to hyperparameters such as community cut size, selecting algorithms through a mix of measures like modularity maximization, domain-specific supervised learning, and percolation analysis of the clustering coefficient. These choices will be revisited in the future, to determine whether changes in algorithms or parameters affect our conclusions, keeping with the practice of reflexivity.
Methodology: Data & Model
For this analysis, we are using the Gitcoin Grants round 8 donation data (before matching, as that doesn’t affect the shape of the Gitcoin Grants network structure). We will feed this data into theGitcoin cadCAD modelto perform our data analysis. Through this model we are able to approximate the final matches, valid up to any interventions on the part of the Gitcoin team that mitigate attacks.
Using the cadCAD model, we generate a graph to represent the interconnections between grants and donors in the Gitcoin Grants contributions network. To do this, we apply a Fluid Community algorithm into a NetworkX Graph object, which returns us a list of the detected communities. These communities are composed of both Grants and Contributors as nodes, and the contribution flows between them as edges.
Analysis of Data
Visualizations of the Gitcoin Grant collaboration subgraphs can be seen below. To interpret the graphs, grants are denoted blue nodes, and contributors are denoted as orange nodes. The size of grant nodes are determined by the total contributions by all donors, and the size of contributor nodes are determined by the amount they contribute to all grants. The edges between nodes denote the donations flowing from contributors to grants.
Digging into the data, we chose to analyze three (of many) different community “fingerprints”, to better understand the interaction patterns of different groups and how we can identify colluding or unfair behavior in Gitcoin Grants. cadCAD and Metagame were selected to focus some of our analysis on communities with which we have first hand experience in order to improve our ability to interpret the data with social context. DappChaser was selected to investigate interesting patterns that emerged through this analysis.
The cadCAD Subgraph
This subgraph has an organic appearance, with a tight collaboration cluster around the ecosystem of cadCAD-related grants, plus connectivity with several projects over the ecosystem.
A few statistics about the cadCAD subgraph:
Total subgraph grants: 110
Total subgraph collaborators: 888
Total USD value inside the subgraph: 41,047.76 USD
Top 5 grants in this subgraph:
1. The Gitcoin Open Source Support Fund: 10,333.02 USD
2. Commons Stack Community Fund — Panvala League: 2,628.53 USD
3. Wallkanda: 2,510.39 USD
4. Autonio: 2,500.89 USD
5. DistributedTown: 2,500.89 USD
The MetaGame Subgraph
This subgraph also demonstrates an organic structure, although more dispersed, with a lot of collaboration connectivity but no dense clustering.
A few statistics about the Metagame subgraph:
Total grants: 171
Total collaborators: 825
Total USD value inside the subgraph: 14,859.97 USD
Top 5 grants in this subgraph:
1. Scribble Specifications and Runtime Verification Fund: 2,958.19 USD
2. BeyondNFT: 2,616.18 USD
3. Rekt- The dark web of DeFi journalism: 2,499.94 USD
4. Unitimes-Ethereum developer community platform: 1,552.87 USD
5. vfat.tools Yield Farming Calculators: 1,411.17 USD
Compared with other subgraphs, this network neighborhood looks a bit different. The network does not look as organic as our previous examples, and there is little connectivity between donors in this ecosystem and other Gitcoin grants, suggesting less ecosystem collaboration.
A few statistics about the DappChaser subgraph:
Total grants: 85
Total collaborators: 966
Total USD value inside the subgraph: 26,243.74 USD
Top 5 grants in this subgraph:
1.KERNEL Block II [Panvala League]: 11,755.68 USD
2. EU Crypto Initiative — educating policy makers: 9,866.64 USD
3. Catnip: 3,038.81 USD
4. Meme of the Day: 2,516.04 USD
5. COVID-19 Moment || Food Bag Delivery for families: 2,118.16 USD
At a first glance, this subgraph could offer a few fingerprint patterns that might suggest colluding or fraudulent behavior — large numbers of donors for individual grants, with sparse connections to other grants in the Gitcoin network. But when we look a bit deeper, these could also be well-supported communities in new markets (in one case, China) that are only first interacting with Gitcoin’s public goods funding ecosystem.
These differentiations can be made more clear thanks to BrightID, which is a proof-of-identity system that is used to provide a bonus in matching funds when the collaborators are uniquely identified. The BrightID team deserves credit in raising the flag on this particular pattern, and also in their work to determine the unique identities associated with the grant. Given that Gitcoin specifically created a grant pool for projects from East Asia, it is clearly a system goal that more users are attracted from these locations. So while we need to be wary of collusion, we also need to make sure any detection algorithms are not tuned too aggressively, repelling newcomers in different markets from joining in future Gitcoin Grants rounds.
Interpretation & Discussion
In our analysis above, we can see certain patterns occurring in the data that can give us insight into where to dig for more information. In the DAppChaser cluster pattern, an algorithm could flag potentially collusive behavior, but the question may not be solved by an algorithm alone. It may be necessary to build data collection and community governance tools designed to decide how to craft policy around colluding-behavior, and in the event of a dispute about colluding behavior, how to interpret collusion-policy on a grant by grant basis. Ultimately, we are interested in how Gitcoin’s system elevates relevant data about these behaviors to the governance layer of the system, for community dispute resolution.
While this fingerprint could serve as a pattern for potentially colluding behavior, we need to be careful that our algorithms aren’t naive, and don’t confound new entrants with a community of colluders, resulting in unjust punishment. If our algorithms are too aggressive, we risk excluding entrants from new markets, which is not healthy for the long-term growth of the Grants ecosystem either.
Our aim with this analysis is not to distinguish between “right” and “wrong” policy choices in the Gitcoin ecosystem, but rather to explore the subjective choices of algorithmic policy design using data-driven analysis. In the discussion above, it becomes clear that there is no objective “correct policy” that will address all the challenges in Gitcoin Grant matching. Therefore, there must be subjective choices regarding what trade-offs should be made to preserve fairness as defined by Gitcoin, taking into account the norms and use of their community and their desire for progressive decentralization. This analysis hopes to provide clarity about those trade-offs and provide decision support to the Gitcoin team and community in policy choices for the pursuit of their goals.
We commend Gitcoin on their use of a flagging algorithm that recognizes ranges of potential collusive behavior to appropriately direct team attention, and we’re interested how that algorithm can be further fine tuned. Given that “collusion” is more of a spectrum in Gitcoin Grants than a discrete incident, keeping humans in the loop on collusion analysis and decision making is recommended.
While there are few hard answers when dealing with subjective measures like “fairness”, there are some very good questions we can ask to help us hone in on what we consider exploitive behavior in the Gitcoin Grants system, to help make policy choices accordingly:
1. How many funds are allocated from more organic communities towards more collusive contribution patterns?
2. Examining current Gitcoin Grant policies, how much money would grants get with or without each policy? (Some work is underway on this already)
3. What are the early warning signs of collusion, and how do we escalate those signals to the governance layer of the system?
Given the upside of solving these problems for funding public goods, and the novelty of the Gitcoin Grants dataset, we are keen to unravel more insights from this data to help build more collusion-resistant systems in the next year.
Happy Holidays! 🎄🎅🎉
We encourage the Gitcoin community to explore and experiment with the Gitcoin cadCAD model repository, where you can access much of the data explored in this analysis. You can even play with interactive graphs:
GitHub Repository: https://github.com/gitcoinco/gitcoin_cadcad_model
Stay tuned for our next research digest in the new year: