paint-brush
ClickHouse DateTime Parsing: Should You Use 'Base' or 'Best_effort' Mode?by@temirlan100
New Story

ClickHouse DateTime Parsing: Should You Use 'Base' or 'Best_effort' Mode?

by Temirlan AmanbayevJanuary 29th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

ClickHouse has a special setting called `date_time_ input_format` This setting determines how the server interprets incoming string data. To simplify integration, ClickHouse introduced a "strict" mode (`base`), which expects a single format. A "flexible" mode, 'best_effort' mode, attempts to recognize the input date in several popular formats.
featured image - ClickHouse DateTime Parsing: Should You Use 'Base' or 'Best_effort' Mode?
Temirlan Amanbayev HackerNoon profile picture
0-item


In this article, we will take a detailed look at how ClickHouse parses dates and times depending on the date_time_input_format setting. We will cover general information and technical implementation details provide practical examples compare performance and offer recommendations on choosing the optimal parsing mode.


1. Overview of the date_time_input_format Setting

ClickHouse has a special setting called date_time_input_format, which determines how the server interprets incoming string data when converting it to DateTime, DateTime64 types, and during input/output formats (e.g., FORMAT JSONEachRow, FORMAT CSV, etc.).

1.1. Main Modes: 'base' and 'best_effort'

The most commonly mentioned modes are:

  1. base (or alias basic):
    • A more strict parsing mode.
    • Expects a date and time format strictly like YYYY-MM-DD HH:MM:SS or with an additional fractional part of seconds YYYY-MM-DD HH:MM:SS[.fractional] (e.g., 2025-01-14 09:31:30.123456).
    • Parsing errors occur if deviations from this format are detected.
  2. best_effort:
    • A flexible mode that supports a wide range of common date/time formats (ISO 8601 with T separator, Z suffix, fractional seconds, timezone offsets, etc.).
    • Can correctly parse strings like YYYY-MM-DDTHH:MM:SSZ, 2025-01-14T09:31:30.725617089Z, and others.
    • However, due to the extra logic for recognizing multiple formats, it might work slightly slower on large data volumes.

1.2. Why Multiple Modes Are Needed

Different applications, systems, and libraries may output date and time differently. The most common standard is ISO 8601, but there are other variations (with or without spaces, fractional seconds, timezone letters, etc.). To simplify integration as much as possible, ClickHouse introduced the idea of having a "strict" mode (base), which expects a single format, and a "flexible" mode (best_effort), which attempts to recognize the input date in several popular formats.


2. Technical Details

The implementation of date and time parsing in ClickHouse is divided into several functions. Two key ones are:

  1. parseDateTimeBestEffort.cpp — for the 'best_effort' mode.
  2. parseDateTime.cpp — for the 'base' mode (sometimes referred to as 'basic').


The relevant source code links in ClickHouse (GitHub repository) are:

  • parseDateTimeBestEffort.cpp#L86 — where the "best effort" mode iterates over a list of formats.
  • parseDateTime.cpp#L638 — the code section where strict date and time reading in the YYYY-MM-DD HH:MM:SS[.fractional] format occurs.

2.1. Parsing in best_effort Mode

In best_effort mode, the input string is sequentially matched against predefined templates in ClickHouse. These include:

  • ISO 8601 with a T separator (e.g., YYYY-MM-DDTHH:MM:SS).
  • Versions with the Z suffix (UTC) (YYYY-MM-DDTHH:MM:SSZ).
  • Fractional seconds: YYYY-MM-DDTHH:MM:SS.ssssss[Z].
  • Possible timezone offsets like +HH:MM or HH:MM.
  • In some cases (depending on the ClickHouse version), more relaxed variants are supported.


If the string doesn't fit one format, the algorithm tries the next one, and so on. If no matches are found, parsing fails with an error.

2.2. Parsing in base Mode (or basic)

The base mode (often referred to as basic in documentation and code) assumes a strictly defined format:

  • Primary: YYYY-MM-DD HH:MM:SS (e.g., 2025-01-14 09:31:30).
  • Additional (if fractional seconds are present): YYYY-MM-DD HH:MM:SS.fffffffff (e.g., 2025-01-14 09:31:30.123456789).


Any deviation (e.g., T instead of a space, presence of Z, missing seconds section) will result in an error:

Code: 41. DB::ParsingException: Cannot parse datetime ...

3. Practical Examples

Below are several scenarios that illustrate the differences between base and best_effort.

3.1. When best_effort Is Essential

Imagine receiving data in ISO 8601 format, including a T separator and Z (UTC) suffix. For instance, 2025-01-14T09:31:30.725617089Z.


In base mode, ClickHouse expects a string like 2025-01-14 09:31:30.725617089 (with a space, without Z). Therefore, an insertion attempt will produce an error. In best_effort mode, ClickHouse correctly recognizes this timestamp, accounts for fractional seconds, and interprets Z as UTC.

3.2. Example of Manual INSERT with JSONEachRow

Suppose we have a table:

CREATE TABLE events (
    event_date_time DateTime64(9),
    event_value Int32
)
ENGINE = MergeTree
ORDER BY event_date_time;

And we want to insert data in JSONEachRow format:

INSERT INTO events FORMAT JSONEachRow
{"event_date_time": "2025-01-14T09:31:30.725617089Z", "event_value": 100}
  • With date_time_input_format = 'base':

    A parsing error occurs because 2025-01-14 09:31:30.725617089 (without T and Z) is expected.

  • With date_time_input_format = 'best_effort':

    The string is successfully parsed and written to the table.

3.3. Using SET date_time_input_format='best_effort'

To enable flexible parsing locally within the current session, you can execute:

SET date_time_input_format = 'best_effort';

After this, any date-time insertion/read within the session will work in "best effort" mode.


An analogous example with INSERT:

INSERT INTO events
SETTINGS date_time_input_format='best_effort'
FORMAT JSONEachRow
{"event_date_time": "2025-01-14T09:31:30.725617089Z", "event_value": 100}

3.4. Globally enabling best_effort via users.xml

If you want all requests from a particular user or role profile to use best_effort by default, you can configure this in the users.xml configuration file. For example:

<yandex>
    <profiles>
        <my_profile>
            ...
            <date_time_input_format>best_effort</date_time_input_format>
        </my_profile>
    </profiles>
    <users>
        <my_user>
            <profile>my_profile</profile>
            ...
        </my_user>
    </users>
</yandex>

When ClickHouse is restarted, all new sessions of user my_user will run in best_effort mode.

3.5. Example of inserting a large JSON string with several columns

When parsing JSON data in ClickHouse, you often encounter a situation where a JSON file may contain fields that are not present in the target table. For example:

{"event_date_time": "2025-01-14T09:31:30Z", "event_value": 42, "some_extra_field": "ignored"}


If FORMAT JSONEachRow is used, the default “extra” fields may cause an error. To skip them, you can enable the setting:

SET input_format_skip_unknown_fields = 1;


Then ClickHouse will safely ignore some_extra_field, and parse the event_date_time according to the logic of the current date_time_input_format. If we have best_effort, then 2025-01-14T09:31:30Z will be successfully read.


4. Comparative performance analysis

4.1. Why base is faster

The base mode uses a clearly fixed parsing function: it checks strictly positively that the year consists of 4 digits, followed by a hyphen, followed by 2 digits of the month, and so on. Virtually no branching or checks for “what if there's a T or a space, if there's a Z suffix” and the like.


The best_effort mode, on the other hand, stores a set of patterns - sometimes quite large (especially in modern versions of ClickHouse that support time zone offsets, different delimiter options, fractional seconds, etc.). Each string is checked sequentially by several rules, which gives additional load.

4.2. When this can become noticeable

On small amounts of data, the difference is unlikely to be noticeable. However, if you are loading, say, billions of rows, and a large number of values have to be converted from rows to dates/times, the parsing time can become a bottle neck.

4.3. A small benchmark

To test the difference, you can prepare a dataset yourself (e.g. with 10^7 or 10^8 rows in CSV/JSON format) and run the insert twice - once with date_time_input_format='base', another time with best_effort'. Measure the query execution time or average insertion speed (rows/sec).


There are no official benchmarks from ClickHouse developers on this topic (at least at the time of writing), so you will have to make your own estimation based on actual usage conditions.


Benchmark:

-- Data preparation (10M rows)
SELECT now() + number AS ts FROM numbers(10000000) INTO OUTFILE 'data.tsv'

-- Insertion time measurement
clickhouse-client --query "INSERT INTO test_table FORMAT TSV" < data.tsv

5. Recommendations and Best Practices

5.1. When to Enable best_effort

  • Your data comes from external systems in various formats (e.g., ISO 8601 with T, Z, or timezone offsets).
  • You lack strict control at the application level, requiring flexible parsing capabilities.
  • Maximum insertion speed is not critical, but error-free handling of external date formats is more important.

5.2. When to Prefer base

  • The input date and time format is exactly known and does not change.
  • You need maximum performance for large data volumes.
  • It is critical that ClickHouse produces an error for any format deviation (strict control).

5.3. Accuracy Support (DateTime64(N))

For DateTime64(N) types, fractional second parsing is particularly important. In base mode, the exact number of digits after the decimal point must match the declared precision. In best_effort, ClickHouse can recognize and truncate fractional seconds to the required precision.

5.4. Impact of T, Z, and Timezone Offsets

  • base mode expects a space between the date and time and does not account for Z (UTC).
  • best_effort mode recognizes T as a separator, interprets the Z suffix as UTC, and parses timezone offsets (e.g., +03:00, 05:00).
  • To store local time consistently, use timezone-aware columns or store all data in UTC and convert it during read operations.

5.5. Debugging Parsing Issues

  • Use INSERT INTO ... FORMAT JSONEachRow with additional settings:

    SET input_format_skip_unknown_fields = 1;
    
    
  • Functions like parseDateTime64BestEffort(...) or parseDateTimeBestEffort(...) can assist during SELECT/INSERT operations to convert string columns into dates/times.

  • Check ClickHouse logs (clickhouse-server.log) for specific parsing error messages.

  • For detailed logs, increase the logging level carefully (large logs may be generated).


6. Conclusions

The choice between 'base' and 'best_effort' primarily depends on the nature of your data and performance requirements:

  • base (or basic):
    • Offers maximum parsing speed with a strict YYYY-MM-DD HH:MM:SS[.fractional] template.
    • Useful when you have complete control over the input date format.
    • Suitable for high-load systems where every millisecond matters.
  • best_effort:
    • A flexible mode that understands various date and time formats (e.g., ISO 8601 T, Z, offsets, fractional seconds).
    • Convenient for integration with systems where date formats are inconsistent or may change.
    • Might be slightly slower with large data volumes but minimizes errors due to incompatible formats.


In practice, if you exchange data with microservices that output ISO 8601 formats (T, Z), enabling 'best_effort' is reasonable. For fixed formats with performance-critical requirements, 'base' is better.


Thus, the date_time_input_format setting in ClickHouse helps balance strict format control with flexibility in handling diverse date/time formats. Choose the approach that best fits your project.