I am not a security expert. But I do love Clean Code and Code Smells
TL;DR: don't trust your hashes.
Yesterday, 2022 Oct 7th one of the larger blockchains had to be halted. This news was shocking since most blockchains are decentralized by definition. Halting a large blockchain is not common news. However, this is not the first time it’s happened.
I pay attention to blockchain and security news even though it is far from my comfort zone when writing technical articles.
However, I've written more than 180 code smells and refactorings. From experience, you learn that there's always an unspoken tension between doing things in the right, clean way versus performance optimization.
Blockchains should be fast.
But, many vulnerabilities are related to cryptic and optimized code. Some of the code used in blockchains will be unacceptable in many large mission-critical systems and codebases.
But, as performance and security are the main drivers on Web3, blockchain and contract codes usually have ample room to be exploited.
Clean code is not so easily exploitable.
I've read a lot of forensic analysis on the problem. One of the best explanations is here:
This tweet has a lot of resources for research.
I will address its main ideas:
What does matter is that due to the way that hash functions are intended to work, we can basically say with certainty that any (path, nleaf) pair will produce a unique hash. If we want to forge a proof, those will need to stay the same
In summary, there was a bug in the way that the Binance Bridge verified proofs which could have allowed attackers to forge arbitrary messages. Fortunately, the attacker here only forged two messages, but the damage could have been far worse
TL;DR: A hash function was exploited.
I've been using hashing functions for decades (not on blockchains of course). There's been a lot of research on the math hashing functions. We teach our students at the university about hash collisions and how we create math functions to avoid them.
We also teach them some corollaries:
Two objects with the same hash might not be the same.
If we override an object's equality, we need to also override the hash.
The last one is very important for hashed collections.
A clean code lesson should be:
Use (fast) hash for fast discard, and use (slow) equality to ensure you are right.
The beautiful image you see as the cover is a PNG image which is a hash itself.
Now, I need to come back to my comfort zone and write this lesson in the standard code smell template I've been using for years. If you like the format, you can read 166 more here:
How to Find the Stinky parts of your Code
Hashing guarantees two objects are different. Not that they are the same
TL;DR: If you check for the hash, you should also check for equality
public class Person {
public String name;
// Public attributes are another smell
@Override
public boolean equals(Person anotherPerson) {
return name.equals(anotherPerson.name);
}
@Override
public int hashCode() {
return (int)(Math.random()*256);
}
// This is just an example of non correlation
// When using HashMaps we can make a mistake
// and guess the object is not present in the collection
}
public class Person {
public String name;
// Public attributes are another smell
@Override
public boolean equals(Person anotherPerson) {
return name.equals(anotherPerson.name);
}
@Override
public int hashCode() {
return name.hashCode();
}
// This is just an example of non correlation
}
Many linters have rules for hash and equality redefinition.
With mutation testing, we can seed different objects with the same hash and check our tests.
Every performance improvement has its drawbacks.
Caches and replications are notable examples.
We can (must) use them carefully.
Also published here.
Code Smell 150 - Equal Comparison
Code Smells are just my opinion.
This will surprise some of your readers, but my primary interest is not with computer security. I am primarily interested in writing software that works as intended.
Wietse Venema