Find the Length of Any String in Solidity

Written by deeppatel | Published 2022/05/31
Tech Story Tags: blockchain | tatum_io | blockchain-writing-contest | solidity | string | ethereum | smart-contracts | tutorial

TLDRWhy bytes(str).length is not enough for getting the length of a string in Solidity, and understanding the strlen method from contracts of ens.via the TL;DR App

Why bytes(str).length is not enough for getting the length of a string in Solidity, and understanding the strlen method from contracts of ens.

In the world of Javascript finding the length of a string is such an easy thing. Just dostr.lengthĀ and thatā€™s all šŸ¤Œ

But strings are not so friendly to work with, inĀ SolidityĀ ā—. In solidity, the string is a group of characters stored inside an array and stores the data in bytes.

There is no length method in string type.

I was going through Buildspaceā€™sĀ build-polygon-ensĀ project and found the link toĀ StringUtils.sol. I knew to find the length of the string in Solidity we can convert the string into bytes and find its length. So it should have been as easy as doingbytes(str).length;šŸ¤Œ but the method in this util file was a bit different:

// SPDX-License-Identifier: MIT
// Source:
// https://github.com/ensdomains/ens-contracts/blob/master/contracts/ethregistrar/StringUtils.sol
pragma solidity >=0.8.4;

library StringUtils {
    /**
     * @dev Returns the length of a given string
     *
     * @param s The string to measure the length of
     * @return The length of the input string
     */
    function strlen(string memory s) internal pure returns (uint256) {
        uint256 len;
        uint256 i = 0;
        uint256 bytelength = bytes(s).length;

        for (len = 0; i < bytelength; len++) {
            bytes1 b = bytes(s)[i];
            if (b < 0x80) {
                i += 1;
            } else if (b < 0xE0) {
                i += 2;
            } else if (b < 0xF0) {
                i += 3;
            } else if (b < 0xF8) {
                i += 4;
            } else if (b < 0xFC) {
                i += 5;
            } else {
                i += 6;
            }
        }
        return len;
    }
}

It had this weird ā€˜forā€™ loop in code which I couldnā€™t understand.

So, the developer in me googled it šŸ•µļøā€ā™€ļø, but all the articles I came across did this to find the length of the stringbytes(str).length;I found some similar code on Stackoverflow but no one actually explained what is happening inside.

for(len = 0; i < bytelength; len++) {
            bytes1 b = bytes(s)[i];
            if(b < 0x80) {
                i += 1;
            } else if (b < 0xE0) {
                i += 2;
            } else if (b < 0xF0) {
                i += 3;
            } else if (b < 0xF8) {
                i += 4;
            } else if (b < 0xFC) {
                i += 5;
            } else {
                i += 6;
            }
  }

After 3 hours of šŸŒ self-exploration I was able to figure it out myself (a little slow but I did it šŸ¾),

So I thought letā€™s write it down so it would be helpful for all the folks like me (not so experienced with bits, bytes 0ļøāƒ£1ļøāƒ£).

Letā€™s try to Unblock/Decode this

How bytes(str).length works

When we convert string to bytes this is what Solidity does:

// if we do bytes("xyz"), solidity converts it as 
xyz -> 78 79 7a // 78=x, 79=y, 7a=z
ABC -> 41 42 43 // 41=A, 42=B, 43=C

Use thisĀ websiteĀ for converting strings to bytes

If you see each character generates 1 byte thatā€™s why when we do bytes(ā€ā€).length we get the length of the string. But there are some characters for which generated bytes are more than one. For example:

ā‚¬ -> e2 82 ac

For the symbol of the Euro, generated bytes are 3.


So if we try to find the length of string which includes the symbol of Euro(ā‚¬) šŸ¤‘ in it, the length returned bybytes(str).length will not return the correct string length for this character asĀ ā‚¬Ā there are 3 bytes generated:


Thatā€™s when that ā€˜forā€™ loop we've seen above comes to the rescue ā›‘ļø

Letā€™s iterate over thisĀ e2 82 ac

bytes array and check whatā€™s happening inside that loop:

for(len = 0; i < bytelength; len++) {
            bytes1 b = bytes(s)[i];
                        // b = e2 for first iteration
            if(b < 0x80) {
                i += 1;
            } else if (b < 0xE0) {
                i += 2;
            } else if (b < 0xF0) {
                i += 3;
            } else if (b < 0xF8) {
                i += 4;
            } else if (b < 0xFC) {
                i += 5;
            } else {
                i += 6;
            }
  }

For the first iterationĀ b=e2there is a condition on the following line

if(b < 0x80) {
     i += 1;
}

Let's decode this. This condition will basically compare decimal values of these hexadecimal characters:

0x80 -> 128
// our b is e2 at the moment, decimal value for e2 = 226
0xe2 -> 226

For regular characters, decimal conversion of their hex character will beĀ < 128Ā , like for a it is 97.

So, if we check all conditions like this

for(len = 0; i < bytelength; len++) {
            bytes1 b = bytes(s)[i];
            if(b < 0x80) { //0x80 = 128 => 226 < 128 āŒ
                i += 1;
            } else if (b < 0xE0) { //0xE0 = 224 => 226 < 224 āŒ
                i += 2;
            } else if (b < 0xF0) { //0xF0 = 240 => 226 < 240 āœ…
                i += 3;
            } 
                        ...
  }

So, if ourĀ i is 3 the condition in ā€˜forā€™ loop will beĀ 3<3, which is false and the loop will break, and the value oflen will be 1Ā at the moment.

And thatā€™s it, it is the correct value for the length of string ā€œā‚¬ā€

If you want to try some more strings like ā€œā‚¬ā€, here is a small list of characters that occupies more than 1 byte:

ā‚¬ -> e2 82 ac 
ƃ -> c3 83
Ā¢ -> c2 a2

Create, a random string anything likeĀ abcĀ¢Ćƒ, for example, and try it out.

Ta-Da šŸŽ‰, and now it works

Connect with me onĀ Twitter:Ā @pateldeep_eth or LinkedIn

My DMs are open to any kind of improvement or suggestions

Originally published here.


Written by deeppatel | Full stack & Blockchain developer
Published by HackerNoon on 2022/05/31