Published on

TIL: You can have "duplicate" keys in Python dicts

Lemma 1: You can override __hash__ for any object.

Lemma 2: You can also override the __str__ and __repr__ methods.

Combine these two, and you can end up in a situation like this-

...
>>> mydict
{"a": 1, "b": 2, "a": 3}  # notice something funny?

Behind the scenes, we have two a objects, each with a different hash, but with the same stringified value ("a"). This issue becomes particularly relevant when using dataclasses with inheritance:

@dataclass(unsafe_hash=True)
class FunkyStr:
    val: str
    def __repr__(self):
        return self.val

@dataclass(unsafe_hash=True, repr=False)
class SubclassStr(FunkyStr):
    # no extra members, all I wanted
    pass

mydict = {FunkyStr("a"): 1, FunkyStr("b"): 2, SubclassStr("a"): 3}
print(mydict)  # gives {"a": 1, "b": 2, "a": 3}

Both the a objects have different hashes because they are instances of different classes. The real title of this post could have been- Python hashing is nominal, not structural! (but that's not clickbait-y enough)

To be honest - in the above example, the real crime is abusing __repr__ to return the internal value. See docs for correct usage.

More info at: Multiple identical keys in a Python dict - yes, you can!