paint-brush
Behind the Scenes: Data Collection and Statistical Tools for NFT Bias Studyby@cryptopunkforum

Behind the Scenes: Data Collection and Statistical Tools for NFT Bias Study

by CryptoPunk ForumJanuary 25th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section explains the technical details behind analyzing NFT pricing biases. Data from OpenSea was compiled and filtered to identify gender- and race-labeled collections. Statistical methods, including paired and unpaired t-tests with log transformations and outlier trimming, were applied to ensure robust analysis. Tools like SciPy and PySpark facilitated data processing and hypothesis testing.
featured image - Behind the Scenes: Data Collection and Statistical Tools for NFT Bias Study
CryptoPunk Forum HackerNoon profile picture
0-item

Authors:

(1) Howard Zhong, MIT CSAIL ([email protected]);

(2) Mark Hamilton, MIT CSAIL, Microsoft ([email protected]).

Abstract and 1 Introduction

2 Methodology

3 Results

3.1 CryptoPunks

3.2 Aggregate NFT Market

4 Discussion

5. Conclusion/ Acknowledgements/ References


A Appendix

A.1 Implementation Details

A.2 Detailed NFT Information & A.3 Google NFT Searches Map

A.1 Implementation Details

We elaborate on the methods and the implementation details. This sections follows the structure Section 2.

A.1.1 Initial Data Collection

Our dataset only includes NFTs transacted on OpenSea, which is the primary marketplace for NFTs on the Ethereum blockchain. We query the OpenSea “v1/collections” endpoint at the end of November 2022 [15] to retrieve collection metadata as well as each individual NFT’s metadata and last sale price. We choose 790 collection names from the Kaggle Ethereum NFTs dataset [11] and NFTs from the top 30-day and all-time OpenSea volume leaderboard around November 2021. Most collections have around 5,000 to 10,000 items, and our queries yield around 70-80% of the data as the other 20-30% do not have transactions.


After querying the Application Programming Interface (API) using a script written with the PySpark framework [17] on Databricks, we create a dataframe containing each NFT’s collection name, id, sale information, metadata, and image url. We then loop through this dataframe with multiple workers to download the images in parallel. In the end, we obtain a dataset of about ∼ 2.5 NFTs, each that have been transacted upon.

A.1.2 Retrieving Race and Gender Labels

To get gender-labelled data, we select collections that have metadata with the words “male” and “female” to find collections where at least 50% of the images have gender labels. We filter out collections that have more than 75% male or female to avoid rarity factors affecting the price. We find a total of 44 such collections with gender labels that satisfy the criteria.


These collections with gender labels are


It is more challenging to determine the race of an NFT collection because most collections do not directly annotate this information. For the CryptoPunks collection, we label each NFT based on the frequencies of the four different pixel values representing the four different skin tones as displayed in Figure 6. The above process was inspired by [5]. For our analysis, we group darkest and mid as Dark and lighter and lightest as Light. For other collections, we were unable to retrieve race labels from the metadata.


Figure 6: The 4 Different Shades of CryptoPunks. From left to right, the shades are darkest, mid, lighter, and lightest. We group darkest and mid as Dark and lighter and lightest as Light.


We also attempted to find race labels for a subset of the 44 collections with gender labels with the criteria that there were a high proportion of items that could be categorized as either dark or light skinned. We filtered out collections that have more than 75% light-skinned or dark-skinned to avoid rarity factors affecting the price. This required manual review because most collections do not have easy-to-search race labels in the metadata, or even way to categorize race. For Avastar, we categorized “Amber Brown” as dark-skinned and “Pale Pink” as light-skinned in the “skin tone” attribute. For Dynamic Duelers, we categorized “Black” as dark-skinned and “White” as lightskinned in the “Skin color” attribute.

A.1.3 Statistical Tools to Analyze Bias in Gender and Race

We detail the statistical methods used to analyze the bias in NFT pricing for gender and race. Specifically, we go into more detail about the implementation and underlying assumptions of these tests.


Unpaired t-test assumes


  1. independence of observations and each point belongs to either male or female


  2. the variances of male and female prices are equal for the population


  3. the prices follows a normal distribution


  4. no significant outliers in each group


The unpaired t-test is robust to all but large deviation from these assumptions. Assumption 1 and 2 are satisfied, but 3 and 4 require more careful consideration.


Log Transformation: For assumption 3, with rare NFTs worth significantly more than common NFTs, NFT price distributions tend to follow a power law distribution [14]. Because it’s well known that stock prices follow a log-normal distribution [1], we apply the same transformation and found log of price tended to be closer to a normal distribution. We thus decided to also run the t-test on the log of prices, which we refer to as log t-test.


Outlier Trimming: For assumption 4, as outliers may occur due to very high selling prices for rare NFTs or very low selling prices due to humans errors while listing, we address this issue by Winsorization [19], or trimming outliers past a certain percentile. Specifically, we report t-test results while trimming both left and right tails of the distribution at 0.1%, 1%, 2.5%, and 5% percentiles.


Paired t-test: In addition to the unpaired t-test for independent samples, we also run a paired t-test for two dependent samples in order to better isolate the male versus female price difference while fixing time. Specifically, we calculate t-statistics on the paired difference of male and female prices marked to the daily mean price and the weekly mean price of the NFT. Assumptions 1 and 2 are now (1) male - female price must be continuous and (2) observations are independent of one another. Assumptions 3 and 4 from unpaired test remain the same and. We address assumptions 3 and 4 with the log-transformation and Winsorization, as described earlier.


The approach described above is also used to compare the prices of lighter-skinned and darkerskinned CryptoPunks. We utilize the SciPy Python package [21] to implement both types of t-tests.


This paper is available on arxiv under CC BY 4.0 DEED license.