Validating Usernames in a Smart Contract
Introduction
One of the first challenges in building an application is handling user registration.
We often need to allow users to choose their own username, while ensuring that each one is unique.
Usernames also usually require validation, with constraints on the minimum and maximum number of characters and the types of characters allowed.
These constraints are easy to handle in traditional web applications, but they can be tricky to implement within the confines of a smart contract.
The Puzzle
Let's write a smart contract method that validates a username against the following constraints:
A username must contain 5-15 characters (same as X handles)
A username can only contain lowercase letters, digits, and hyphens
A username cannot contain consecutive hyphens
A username cannot start or end with a hyphen
Validating the Character Set
The valid character set is:
>>> import string
>>> print(f"{string.ascii_lowercase}{string.digits}-")
'abcdefghijklmnopqrstuvwxyz0123456789-'
In regular Python, it's easy to check whether a string only contains these characters:
VALID_CHARS = "abcdefghijklmnopqrstuvwxyz0123456789-"
def is_valid(username: str) -> bool:
return all(c in VALID_CHARS for c in username)
But if we adopt a similar approach in a smart contract, we'll end up iterating over VALID_CHARS
too many times and running out of opcodes.
It would look something like this:
for a in username:
for b in VALID_CHARS:
if a == b:
continue
raise ValueError("Invalid character")
The goal should be to iterate over the username once and avoid iterating over anything else.
So how can we achieve that?
Encoding
Lowercase letters, digits, and hyphens are all supported in ASCII encoding.
There are 256 ASCII characters, each of which can be represented by a single 8-bit integer.
To check whether a particular character in a string is valid as part of a username, we can convert it to an integer and check whether it falls in the correct range.
The valid numbers corresponding to the character set are:
45 = hyphen
[48, 57] = the digits [0, 9]
[97, 122] = the lowercase letters, a-z.
Which we can check for in Python using:
valid_number = lambda x: x == 45 or 48 <= x <= 57 or 97 <= x <= 122
Adapting the previous example:
def is_valid(username: str) -> bool:
valid_number = lambda x: x == 45 or 48 <= x <= 57 or 97 <= x <= 122
return all(valid_number(ord(c)) for c in username)
The ord
function in Python converts a character to its Unicode number.
Smart Contract: First Approach
To make it a bit more readable, let's define some constants:
HYPHEN = 45
ZERO = 48
NINE = 57
LOWER_A = 97
LOWER_Z = 122
Then we can validate a username against the constraints as follows:
class Registration(ARC4Contract):
@arc4.abimethod
def validate_username(self, username: String) -> None:
assert 5 <= username.bytes.length <= 15, "Username must be between 5 and 15 characters"
prev = UInt64(HYPHEN)
for byte in username.bytes:
curr = op.btoi(byte) # ord
assert not curr == prev == HYPHEN, "Username cannot start with a hyphen or contain consecutive hyphens"
assert curr == HYPHEN or LOWER_A <= curr <= LOWER_Z or ZERO <= curr <= NINE, "Username can only contain lowercase letters, digits, or hyphens"
prev = curr
assert prev != HYPHEN, "Username cannot end with a hyphen"
A username cannot start or end with a hyphen, and it can't contain any consecutive hyphens.
To track this, we iterate over the username one byte at a time, storing the last byte seen in a variable called prev
.
If at any point we find that the current character and the previous character are both hyphens, the username is invalid.
prev
is initially set to the hyphen number, which means we don't need to implement separate logic to check that the first character is not a hyphen.
If it is, this will raise an error:
assert not curr == prev == HYPHEN
After the username has been iterated over, we need to add one more step to check that the last character is not a hyphen:
assert prev != HYPHEN
This works pretty well, and can validate a username of up to 15 characters while staying in the opcode budget.
But the conditional steps leave some room for improvement.
Smart Contract: Second Approach
Since there are only 256 ASCII characters, we can store the mapping in a bitmask.
Each set bit represents a valid character.
For example, if the bitmask is 00100000...
, it means that the ASCII character corresponding to the number 2 is valid.
Instead of chaining conditions:
curr == HYPHEN or LOWER_A <= curr <= LOWER_Z or ZERO <= curr <= NINE
All we need to do is get the corresponding bit: op.getbit(bitmask, curr)
, and check whether it's set.
We can construct the bitmask in regular Python, and then use it as a constant in algopy
:
from functools import reduce
VALID_CHARS = "abcdefghijklmnopqrstuvwxyz0123456789-"
ords = {ord(c) for c in VALID_CHARS}
bitmask = reduce(lambda acc, i: acc | (1 << (255 - i)), filter(ords.__contains__, range(256)), 0).to_bytes(32, "big")
The updated contract is:
VALID_CHARS = b'\x00\x00\x00\x00\x00\x04\xff\xc0\x00\x00\x00\x00\x7f\xff\xff\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
class Registration(ARC4Contract):
@arc4.abimethod
def validate_username(self, username: String) -> None:
assert 5 <= username.bytes.length <= 15, "Username must be between 5 and 15 characters"
prev = UInt64(HYPHEN)
for byte in username.bytes:
curr = op.btoi(byte)
assert not curr == prev == HYPHEN, "Username cannot start with a hyphen or contain consecutive hyphens"
assert op.getbit(Bytes(USERNAME_CHARS), curr), "Username must only contain lowercase alphanumeric characters and hyphens"
prev = curr
assert prev != HYPHEN, "Username cannot end with a hyphen"
Conclusion
Bitmasks can be a great way to save opcodes in smart contracts.
They allow us to efficiently represent a large number of binary variables.
We can precompute the values off chain, enabling quick single-operation lookups within the contract.
Check out my previous article 'Building a Sudoku Validator on Algorand' if you want to take a deeper dive on bitmasks.