BigQuery: A guide to managing Cron Expression using UDF

Carlo Nuccio
2 min readFeb 23, 2022

Let’s see how to manage cron expression on BigQuery using User-defined functions.

User-defined functions (UDFs) are a feature of SQL supported by BigQuery that enables a user to create a function using another SQL expression or JavaScript. These functions accept columns of input and perform actions, returning the result of those actions as a value.

Imagine a scenario where you have to check a cron expression with a timestamp value:

Cron expression: 0 7 * * 3
Timestamp: 2021-11-02 09:57:47.558576 UTC

I’ve defined a UDF:

CREATE OR REPLACE FUNCTION `carlonuccio.CronMatch`(
cron STRING,
custom_timestamp TIMESTAMP
) AS ((
with schedule as (
select regexp_substr(cron,'[0-9,*]+',1,1) min,
regexp_substr(cron,'[0-9,*]+',1,2) hour,
regexp_substr(cron,'[0-9,*]+',1,3) day,
regexp_substr(cron,'[0-9,*]+',1,4) mon,
regexp_substr(cron,'[0-9,*]+',1,5) wday
)
SELECT IF (
(min = '*' or cast(extract(MINUTE from custom_timestamp) as string) in UNNEST(SPLIT(min, ',')))
and (hour = '*' or cast(extract(HOUR from custom_timestamp) as string) in UNNEST(SPLIT(hour, ',')))
and (day = '*' or cast(extract(DAY from custom_timestamp) as string) in UNNEST(SPLIT(day, ',')))
and (mon = '*' or cast(extract(MONTH from custom_timestamp) as string) in UNNEST(SPLIT(mon, ',')))
and (wday = '*' or cast(extract(DAYOFWEEK from custom_timestamp) -1 as string) in UNNEST(SPLIT(wday, ','))) ,
1,
0
)
FROM schedule
));

So let’s see in the action:

SELECT carlonuccio.CronMatch('0 7 * * 3', timestamp('2021-11-02 09:57:47.558576 UTC'))0
SELECT carlonuccio.CronMatch('0 7 * * 3', timestamp('2021-11-17 07:00:47.558576 UTC'))1

Let’s try multiple schedules:

SELECT carlonuccio.CronMatch('0 7 2,17 * 3', timestamp('2021-11-17 07:00:47.558576 UTC'))1
SELECT carlonuccio.CronMatch('0 7 2,17 * 3', timestamp('2021-11-16 07:00:47.558576 UTC'))0

You can use it to check if a Cron Job has run:

with cron_table as (
select 1 as job_id, '0 7 * * 3' as cron
),
job_logs as (
select job_id, timestamp_job
from `<project_id>.<dataset>.job_logs`
)
select c.job_id,
j.timestamp_job,
carlonuccio.CronMatch(c.cron, timestamp_job)
from job_logs j join cron_table c on j.job_id = c.job_id

or to calculate next runs:

with cron_table as (
select 1 as job_id, '0 7 * * 3' as cron
),
job_logs as (
select job_id, max(timestamp_job) as last_timestamp_job
from `<project_id>.<dataset>.job_logs`
group by job_id
),
next_minutes as (
SELECT *
FROM UNNEST( GENERATE_TIMESTAMP_ARRAY(TIMESTAMP('2021-01-01 00:00:00'), TIMESTAMP('2022-01-01 00:00:00'), INTERVAL 1 MINUTE)) AS next_timestamp_job
)
select c.job_id,
next_timestamp_job
from next_minutes n, cron_table c join job_logs j on c.job_id = j.job_id
where carlonuccio.CronMatch(c.cron, next_timestamp_job) = 1
and next_timestamp_job > last_timestamp_job
order by 2

I hope you have enjoyed this UDF.

Documentation

  • Implement / and -

Documentation

--

--

Carlo Nuccio

Senior data Engineer. GCP Certified Professional Data Engineer and dbt Certified developer.