Add date parsing to SQLite using Rust

Superintendent.app, a SQL spreadsheet desktop app, was based on SQLite before moving to DuckDB. Anyone who uses SQLite a bit regularly would quickly find out that SQLite lacks many expected SQL functionalities like date parsing, form satting, or regular expression.

Initially, I was trying to use WITH RECURSIVE to parse YYYY-MM-DD, but the SQL code was too mind-bending for anyone to read and understand, which might be an interesting software engineering interview question.

Eventually, I decided to extend SQLite to support date parsing, formatting, and regular expression, so I embarked on the journey to achieve it, and that's what I wanted to share today.

The first task to figure out is to: How to write date parsing in C or whatever language that can be compiled into a SQLite loadable extension

Using C would be the easiest path, but there isn't really a good C solution that would work on Windows, Mac, and Linux. Then, I've realized that Rust works pretty well on all of these platforms. Rust offers rich standard libraries that include date parsing, formatting, and regular expression. Rust also can interop well with C. So, Rust was chosen as the programming language to make this work.

Quick requirement: we probably want a date_parse(pattern, value) for parsing value (varchar) into the Date type. However, a date in SQLite is just a string whose format is YYYY-MM-DD. In practice, we would parse value into a date in Rust and format it back to YYYY-MM-DD.

Following the tutorial in SQLite, first I'd need to initialize my extension in ext.c with:

#ifdef _WIN32
__declspec(dllexport)
#endif
int sqlite3_ext_init(
  sqlite3 *db,
  char **pzErrMsg,
  const sqlite3_api_routines *pApi
){
  int rc = SQLITE_OK;
  SQLITE_EXTENSION_INIT2(pApi);
  (void)pzErrMsg;  /* Unused */

  rc = sqlite3_create_function(db, "date_parse", 2, SQLITE_UTF8|SQLITE_INNOCUOUS, 0, wrap_date_parse, 0, 0);
  if (rc != SQLITE_OK)  {
    return rc;
  }

  return rc;
}

The key part is: sqlite3_create_function(db, "date_parse", 2, SQLITE_UTF8|SQLITE_INNOCUOUS, 0, wrap_date_parse, 0, 0). Here is the breakdown:

  • date_parse is the SQLite function that your users would be able to use in their SQL statement.
  • 2 is the number of arguments for the date_parse function.
  • wrap_date_parse is the C function that you will implement.

Other parts of the code are merely ceremonies, so I won't explain them.

Since we just referenced wrap_date_parse, we will implement it as follows:

#include 
#include "sqlite3ext.h"
#include "rust_ext.h"
SQLITE_EXTENSION_INIT1

static void wrap_date_parse(
  sqlite3_context *context,
  int argc,
  sqlite3_value **argv
){
  char *pattern = (char*) sqlite3_value_text(argv[0]);
  char *value = (char*) sqlite3_value_text(argv[1]);

  if (value == NULL) {
    sqlite3_result_null(context);
    return;
  }

  char *ret = date_parse(pattern, value);

  if (ret != NULL) {
    sqlite3_result_text(context, ret, -1, free);
  } else {
    sqlite3_result_null(context);
  }
}

As you can see here, we expect 2 arguments where the first argument is the pattern and the second argument is the value. Then, we pass these 2 arguments to the date_parse(...) function which is defined in Rust.

To make it interop with Rust, we will need to declare the function signature first in rust_ext.h as shown below:

char *date_parse(char *pattern, char *value); // Rust function

Then, in Rust, we will write code to parse the date:

extern crate chrono;
extern crate libc;
extern crate regex;

use std::ffi::CString;
use std::mem::{forget, transmute};
use std::ptr::null;

use chrono::{NaiveDate, SecondsFormat, Utc, TimeZone};
use libc::c_char;

fn get_string(input: *mut c_char) -> String {
    let c_str = unsafe { CString::from_raw(input) };
    let s = c_str.to_str().unwrap().to_owned();
    forget(c_str);

    return s;
}

fn to_c_char_pointer(input: String) -> *const c_char {
    let c_str = CString::new(input).unwrap();

    let data: *const CString = unsafe { transmute(&c_str) };
    forget(c_str);
    return unsafe { (&*data).as_ptr() };
}

#[no_mangle]
pub extern "C" fn date_parse(
    pattern: *mut c_char,
    value: *mut c_char,
) -> *const c_char {
    let pattern = get_string(pattern);
    let value = get_string(value);

    return match Utc.datetime_from_str(&value, &pattern) {
        Ok(time) => to_c_char_pointer(time.to_rfc3339_opts(SecondsFormat::Millis, true)),
        Err(_) =>  match NaiveDate::parse_from_str(&value, &pattern) {
            Ok(time) => to_c_char_pointer(time.format("%Y-%m-%d").to_string()),
            Err(_) => null(),
        },
    };
}

The above code is basic Rust code, so I'm not gonna go into the detail. One key detail worth paying attention to is how we handle CString.

Next we set up Cargo to build the Rust code and produce a static library by using the below Cargo.toml:

[package]
name = "ext"
version = "0.0.1"

[lib]
crate-type = ["staticlib"]

[dependencies]
libc = "0.2"
chrono = "0.4"

Since Rust has a great build system, we can just run `cargo build --release` on all platforms (Windows, Mac, and Linux), and it'll produce a static library for each platform. .lib for Windows, and .a for Mac and Linux.

Then, we will compile our C code with the output static library to produce a dynamic library on each platform. Here is the compile command for each platform:

  • Mac: gcc -g -fPIC -dynamiclib ext.c target/release/libext.a -o ext.dylib
  • Linux: gcc -g -fPIC -shared ext.c target/release/libext.a -I c -o ext.so
  • Windows: vcvars64.bat && cl ext.c target\\release\\ext.lib advapi32.lib ws2_32.lib userenv.lib -I c -link -dll -out:ext.dll

Windows is a little weird. First, we'll need to load the C compiler environment by running vcvars64.bat. You can download it from the Microsoft C++ toolset. Then, we'll run the C compiler, cl, and add 3 specific libraries: advapi32.lib ws2_32.lib userenv.lib, which are needed for some reasons.

That is it. Now you can load the extension in your SQLite environment.