Snowflake Case Sensitivity: A Migration Perspective

Ilan Zaitoun
Jun 1
3 min read

The Case Sensitivity Challenge in Data Migration

Imagine meticulously planning a seamless data warehouse migration, only to find that something as simple as uppercase and lowercase letters can disrupt queries, data transformations, and reports. This isn't just a minor technicality—it’s a fundamental shift in how Snowflake handles text if your origin DWH was defined to be case insensitive. Understanding and addressing this early can make all the difference in ensuring a smooth transition.

The Case Sensitivity Conundrum

In databases, collation rules are used to dictate how characters will be treated. Some traditional relational databases, such as SQL Server and MySQL, often default to case-insensitive comparisons, meaning 'CUSTOMER' and 'customer' are treated as the same. Snowflake, however, defaults to case-sensitive comparisons, treating them as distinct values.

This difference affects a variety of operations, including (but not limited to): simple comparison, sorting, grouping, join conditions, window functions, scalar / aggregate functions, date clustering and more.

Here are some examples:

SELECT * FROM customers WHERE column1 = column2;
ORDER BY may place lowercase and uppercase versions of the same word in separate positions. In addition, some characters (‘_’, ‘-’, etc..) can be sorted differently. SELECT * FROM products ORDER BY name;
SELECT category, COUNT(*) FROM inventory GROUP BY category;
SELECT customer_name, SUM(sales) OVER (PARTITION BY region ORDER BY customer_name) FROM sales;
SELECT LEAST(name1, name2, name3) FROM clients;
CREATE TABLE customers CLUSTER BY (name);

For example, consider a table with values ('Apple'), ('apple'), ('APPLE'). In some legacy systems, GROUP BY name treats them as one group, while in Snowflake, they’re distinct unless explicitly handled.

Handling Case Sensitivity in Snowflake

Rather than forcing a global collation change, a good approach is adjusting queries and educating consumers to the new system. Here are some techniques:

1. Query-Level Adjustments

Using UPPER() or LOWER() for uniform comparisons: SELECT * FROM customers WHERE UPPER(name) = UPPER('John Doe');
Using ILIKE for case-insensitive pattern matching: SELECT * FROM products WHERE product_code ILIKE 'abc%';
Optimizing performance by applying case transformation in indexed conditions: SELECT * FROM orders WHERE UPPER(status) = 'COMPLETED';

2. Using Snowflake’s Collation Options

Snowflake provides collation settings at different levels:

Account Level: ALTER ACCOUNT SET DEFAULT_DDL_COLLATION = 'en-ci';
Table Level: CREATE TABLE customers (name STRING COLLATE 'en-ci');
Column Level (on queries): SELECT name COLLATE 'en-ci' FROM customers;

The Double-Edged Sword of Changing Collation

Advantages:

Easier migration from case-insensitive databases like SQL Server and MySQL.
Fewer query modifications needed.
More familiar behavior for consumers accustomed to legacy systems.

Disadvantages:

Limited inheritance: CTAS (CREATE TABLE AS SELECT) does not always retain collation.
Performance concerns: Case-insensitive comparisons can affect performance.
Snowflake’s limitations: Some operations override collation settings.
Inconsistencies: Mixed-collation environments/objects can become difficult to manage.

The Verdict: Embrace the Default

After extensive testing, our recommendation is clear: stick with Snowflake’s default case-sensitive behavior. While it may require adjustments, this approach aligns with Snowflake’s architecture and ensures long-term performance and maintainability.

If case insensitivity is absolutely required, apply it selectively at the query level, rather than changing account-wide or object settings. This balances flexibility and consistency without introducing unnecessary complexity.

Migrating to Snowflake isn’t just about replicating old behavior—it’s an opportunity to adopt modern best practices and set your data platform up for future success.