An overview of what’s to come:

I was originally writing this as a high-level introduction to send people when they encounter things like 6/4 returning 1 instead of 1.5 in some Database Management Systems (DBMS)… overtime its sprawled far beyond that. The points below serve as a TLDR, with a sprawling discussion expanding afterward.


  • Operations like division in SQL can be confusing for novices or anyone without a programming background

  • In some DBMS, 6/4 may equal 1 because, dividing two integers a.k.a. integer division returns an integer (the remainder is discarded)

     Casting one of the operands as a float or numeric/decimal will ensure floating point
     division. Be careful with the precision needed in your calculation as there are
     tradeoffs to using certain types.
    

    Syntax and behavior will be different for various DBMS, but if you want 6 / 4 to equal 1.5 and not 1 then you can use something like:

PostgreSQL:
   SELECT 6::FLOAT / 4 AS result -- Type casting using "::" operator
   SELECT 6 / 4::FLOAT AS result -- Casting the divisor
   SELECT CAST(6 AS FLOAT) / 4 AS result -- Using the CAST function

MySQL:
   SELECT 6 / 4 AS result -- MySQL automatically performs type conversion
   SELECT 6 / CAST(4 AS DECIMAL(10,2)) AS result -- Explicit CAST to DECIMAL
   SELECT 6 / CONVERT(4, DECIMAL(10,2)) AS result -- Using the CONVERT function

SQLServer:
   SELECT 6.0 / 4 AS result -- Implicit conversion using decimal point
   SELECT CAST(6 AS FLOAT) / 4 AS result -- Explicit CAST to FLOAT
   SELECT CONVERT(FLOAT, 6) / 4 AS result -- Using the CONVERT function

Oracle:
   SELECT 6 / 4 AS result FROM DUAL -- Automatic type conversion
   SELECT 6 / TO_NUMBER(4) AS result FROM DUAL -- Using the TO_NUMBER function
   SELECT 6 / 4d AS result FROM DUAL -- Explicit type declaration

SQLite:
   SELECT CAST(6 AS REAL) / 4 AS result -- CAST to REAL
   SELECT 6.0 / 4 AS result -- Implicit conversion using decimal point
   SELECT ROUND(6.0 / 4, 2) AS result -- Throwing ROUND in there

-- Methods may have trade-offs in terms of precision, performance, and portability.
-- The choice depends on your specific requirements and the DBMS you're using.

This is NOT an exhaustive list of solutions/syntax for all DBMS as there are many DBMS

  • The idea here is that if you desire a specific output, explicitly casting an operand(s) is the safe bet. If you are the data producer, understanding the consumption patterns and applying the correct type will be critical for your consumers downstream

  • It’s crucial to understand the database management system (DBMS) and environment you’re working in before curating datasets, creating models, or exporting data for reporting


I think that’s a fair synopsis of what I’ve written below. If you’re interested, I’ll continue to cover division across a variety of DBMS, the behavior of data types/operations in SQL, Hotdogs, Docker, rug pulls and BIG DOGS!.

Observations about SQL Writers

Division is a frequent source of logical and syntactical mistakes for novice SQL users. SQL has some round edges for non-technical users like 1 based indexing. 1 based indexing is the SQL standard, and its more straight forward for people who would typically begin counting at 1 and have no familiarity with 0 based indexing found in languages like Python, Java, or C.

Division is more of a sharp edge. Not because it’s difficult to do division, but when you take someone who has always done =6/4 in Excel and received a result of 1.5 and now you show them that it evaluates to 1 when they write SQL against a database… they don’t anticipate that, nor would they have thought to check for it!

Some Typing and Coercion Background

Many users of SQL coming from a “business” background will have encountered different data types in Excel, but they will likely not have given any thought to Static vs Dynamically typed programming languages. I’m now going to play with fire lay out some definitions to provide some context to the previous sentence that are debatable, but I believe they provide a good mental model for SQL.

  • SQL is typically considered to be Statically typed, meaning:

    Every data item has an associated data type defined at compile-time

SQL does have predefined data types for columns in table definitions, which are determined at “compile-time” (more accurately, at table creation time). However, there are dynamic aspects… SQL allows for implicit type conversions in many operations, and the actual type checking often occurs at runtime.

  • SQL is also most properly defined as Strongly Typed:

    Every data item has an associated data type, defining its behavior and allowed usage

SQL uses a mix of static and dynamic typing. Data types are associated with column definitions at table creation time, but many type-related decisions and checks occur at runtime. SQL’s type system allows for implicit conversions and exhibits some characteristics of both strong and weak typing, with the specific behavior often depending on the particular SQL implementation.

Great, so when I think of SQL I’ll think Strong and Static, thanks! Well… no, like I mentioned, this behavior depends on the DBMS. SQLite uses a more general dynamic type system where the datatype of a value is associated with the value itself, not with the column it sits in. This behavior can be modified with the “STRICT” table option keyword.

An in-depth image for visual learners (I’m an aspiring artist): DBMS_DIVISION

Ok… that’s cool, but in SQL what happens when I divide an integer by a float or what if I evaluate an integer and a string for equality? In a more general sense we are asking: How do DBMS handle type coercion?

  • Type Coercion: is an implicit process where the language runtime automatically converts a value from one type to another. This is done to make operations between different data types possible without explicit instructions from the programmer.

  • Type Conversion: is an explicit process where the programmer manually converts a value from one type to another using specific functions or methods. This requires a deliberate action and is often clearer in terms of code readability and intent.

You’ll see that DBMS have extensive facilities for evaluating mixed-type expressions. I’m adding the Postgres docs here because they are good, but there are other docs for other DBMS.

Why should I care, and how about a few examples?

To make this implicit conversion explicitly clear, lets directly compare two values that are explicitly cast as different types:

SELECT CAST(5 AS INT) = CAST('5' AS VARCHAR(10))

  • Postgres: ERROR: operator does not exist: integer = character
  • SQLite: 1
  • Redshift: true

Some programming languages are quite strict when doing comparisons or operations on multiple types. Here is an example in Rust where someone has attempted to add an integer and a float and received an error.

fn main() {
    let float = 5.0;
    let integer = 4;
    let result = float + integer;

    println!("Result: {}", result);
}
//error[E0277]: cannot add an integer to a float
//     let result = float + integer;
//                          ^ no implementation for `{float} + {integer}`
  1. Executing SELECT 5.0 + 4 or even SELECT '5' + 4 in most (maybe all?) DBMS returns 9.
  2. So then what about SELECT CAST(5 AS INT) + CAST('4' AS VARCHAR(10))? This still returns 9 in some DBMS but NOT in Postgres:

In Postgres, an undecorated string literal (e.g., ‘4’) without an explicit type is treated as an unknown type. This is actually one of a few Pseudo-Types that allows Postgres to delay type resolution until more context is available. Here is a nice response from Tom Lane posted 13 years ago.

Using pg_typeof we can see that SELECT pg_typeof(CAST('4' AS VARCHAR(10))); returns character varying while SELECT pg_typeof('4'); returns unknown. Due to the specific casting of ‘4’, the second example errors (integer + character varying) while SELECT '5' + 4 evaluates an unknown + integer which Postgres implicitly coerces to an integer result of 9.

However, this is where you need to be careful. For instance, Redshift is a fork of Postgres, but… guess what: SELECT CAST(5 AS INT) + CAST('4' AS VARCHAR(10)) returns ‘54’. A few other interesting anecdotes from Redshift:

----------------------
-- Updates
----------------------
CREATE TEMP TABLE SOME_TUPLES (
    MY_NUMBER INT,
    MY_WORD VARCHAR(100)
);

INSERT INTO SOME_TUPLES(MY_NUMBER, MY_WORD) VALUES
(4, 'Hello'),
(6, 'Howdy');

-- Redshift has no problem updating the VARCHAR(100) MY_WORD column with a value I explicitly cast as an INT
UPDATE SOME_TUPLES SET MY_WORD = 4::INT
WHERE MY_NUMBER = 4;

-- How about a decimal instead? No problem
UPDATE SOME_TUPLES SET MY_WORD = 4::DECIMAL(38,0)
WHERE MY_NUMBER = 4;

----------------------
-- What about AVG()?
-- I see a lot of people use AVG in SQL that actually don't want the arithmetic mean
----------------------
WITH CHECK123 AS (
    SELECT 6 AS NUM
    UNION ALL
    SELECT 4 AS NUM
    UNION ALL
    SELECT 1 AS NUM
    )
SELECT AVG(NUM) FROM CHECK123 -- Results in 3

----------------------
-- Oh, Also
----------------------
-- Returns 2 in Postgres/Redshift
-- Returns 1 in SQL Server
SELECT CAST(1.99 AS INTEGER)

Ooooook so can we go back to Division now?

In some DBMS, SELECT 6 / 4; would result in 1 due to integer division, other DBMS will still return 1.5 even after being passed two “integers” to operate on.

  • Integer division is the division of two integers resulting in another integer. It discards the fractional part of the division and returns only the whole number quotient
  • In Python print(6/4) returns 1.5 (round edge); however, print(6//4) would return 1, discarding the remainder. We’ll see later on that DuckDB has similar syntax!

However, SELECT 6.0 / 4; or SELECT CAST(6 AS FLOAT) / 4; would result in 1.5 as the DBMS performs implicit type coercion and returns the result to the client as a float.

  • Floating-point division is an operation involving at least one floating-point number that results in a floating-point (decimal) number. Floating-point division is used when precision is important, and the fractional part of the result is significant.

DBMS_DIVISION

Different DBMS have different defaults for this coercion behavior. Sometimes its: “Hey Big Dog looks like you gave me two integers two divide, I’m going to assume that the remainder is important to you and not return an integer” OR “Big DOG! You just served me up two integers and guess what… you are getting an integer back!”

SQL Standard is for the DBMS to refer to their users as Big Dog, just like Tiger.

I’m only talking about type coercion, but keep in mind that there are A LOT of DBMS and they do A LOT of things differently.

    Quick side note, in DuckDB there are two division operators: / and //.
    They are equivalent when at least one of the operands is a FLOAT or a DOUBLE.
    When both operands are integers, / performs floating point division (5 / 2 = 2.5)
    while // performs integer division (5 // 2 = 2).

On top of this, admin can make configuration changes to the DBMS you’re using, possibly making its behavior different than what I describe here! Developing a sound understanding of the environment you plan to write queries in before you or some text to SQL engine starts spinning off queries is important!

If you’d like to compare for yourself, SQL Fiddle is a nice online SQL compiler. You can also use docker. With docker installed, you can pull the official MySQL image and try for yourself:

# Start a new MySQL container named 'testsql' with the root password set
# The -d flag runs the container in detached mode (in the background)
$ docker run --name testsql -e MYSQL_ROOT_PASSWORD=ilovemysql -d mysql:latest

# List all running Docker containers
# This should show the 'testsql' container we just started
$ docker ps

# Start an interactive bash shell inside the running 'testsql' container
# The -it flags allow for an interactive terminal session
$ docker exec -it testsql bash

# Inside the container, start the MySQL client
# The -u flag specifies the user (root in this case)
# The -p flag prompts for the password we set earlier
$ mysql -u root -p

# Now we're in the MySQL prompt. We can run SQL commands here, for example:
SHOW DATABASES;
SELECT 6/4;  # This will return 1.5000 in MySQL

# To exit the MySQL client
EXIT;

# To exit the container's bash shell
$ exit

# Stop the 'testsql' container
$ docker stop testsql

# List running containers again to verify 'testsql' has stopped
$ docker ps

# List all containers, including stopped ones
# This will still show 'testsql', but in a stopped state
$ docker ps -a

In Practice

In my day job, 99% of the time people want floating point division, and I will advise them to explicitly cast one of the operands. For example in Redshift I would write: SELECT 6/4::FLOAT.

  • A good question might be: “Why bother casting if one or both of my columns is already a Decimal/Float, isn’t it irrelevant?”

In current state you would be 100% correct, but are you 100% sure nobody will alter the table and change that one column that was a Decimal to an INT…

  1. Hey remember that old column we created as a Decimal type, we only write integer values between +/- 100000. We can save some space by storing it as an integer… nobody will have any problem with that! Sounds good: ALTER TABLE TBLONE ALTER COLUMN COLUMNONE TYPE INTEGER USING (NULLIF(COLUMNONE, ‘’)::INTEGER);

  2. Hey that table we produce yeah we are actually deleting that column you use, but its ok because you can recalculate it using a JOIN on this other table to bring in another column… make sure it doesn’t break your transform… thanks! (We sent the email about the migration 3 months ago but must have missed you.. haha sorry lol… well we are making the change on Friday, thanks so much!)

I typically prefer writing defensive SQL and checking what I’m being defensive against rather than having the data producer pull the rug on me. Which is why I prefer to be the data producer, and why I prefer not to buy Meme Coins.

I’ll leave a final furthering thought for anyone interested. I’ve gone ahead and stolen this little exercise from the MySQL docs and re-written it for Redshift (should work identically in Postgres as well)

-- Three variables: i (an integer), d (a decimal with 10 digits total and 4 digits after the decimal point), and f (a floating point).
-- The WHILE loop runs 10,000 times, incrementing both d and f by 0.0001
-- After the loop, RAISE INFO prints the values of d and f.
CREATE OR REPLACE PROCEDURE test_precision()
LANGUAGE plpgsql
AS $$
DECLARE
    i INT := 0;
    d DECIMAL(10,4) := 0;
    f FLOAT := 0;
BEGIN
    WHILE i < 10000 LOOP
        d := d + 0.0001;
        f := f + 0.0001;
        i := i + 1;
    END LOOP;

    RAISE INFO 'DECIMAL result: %, FLOAT result: %', d, f;
END;
$$;

-- Call the stored procedure
CALL test_precision();

Here’s a hint if you’re curious about the output, or you can just smash this stored procedure into ChatGPT and it’ll rattle off a coherent answer the choice is yours.