You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User:Ottomata/CamelCase is bad

From Wikitech-static
Jump to navigation Jump to search

Capital letters in data keys is a really bad idea

Data moves around. It will be used in different languages with different typing and different naming rules. It will certainly be used in SQL systems, which are for the most part case insensitive. The only common identifier naming rule that will function in all of these systems is snake_case. (As opposed to CamelCase.)

Any time data passes through a case insensitive system, it will be normalized, most likely to all lower case. Names like isPartOf and mainEntity will become ispartof and mainentity. Longer names that include acronyms get even worse. In camelCase, it isn't clear what the acronym capitalization rules are. E.g. HTTPURLID? HttpUrlId? Whatever the camelCase acronym rule is, the name will be normalized in SQL systems to e.g. httpurlid. Data integration automation code has to reason about which fields are the same. If ingesting data that has capital letters, it is possible that two different fields end up normalized to the same lower cased name. Then we just have to guess about how to ingest data.

Every time someone needs to move camelCased data identifiers in case insensitive systems, they will have to write code that reasons about the case changes. If we avoid upper cased field names in our schemas, we are less likely to encounter bugs and breakages in data pipelines.

Additionally, I've heard that camelCase can be difficult for non native English speakers. incomingHTTPRequestIpAddress (which is normalized to incominghttprequestipaddress) is (subjectively) more difficult to read than incoming_http_request_ip_address.

See also: