My experience with timescaledb Compression

2 min readMar 6, 2021

TLDR: I got a response from Michael about the the space inflation problem i am observing. You can see the response in the comment. The problem here is my chunks are too small <1 MB and compress_segmentby by stock ticker will make each segment too small, hence a lot of overhead at db level. Either use orderby option or set_chunk_time_interval to make the chunk time interval bigger value

Full story.

I took ~28 Million data points of stock price data and put it in postgres.

It was all tradable stocks daily data going max back in the history.The total size of data was ~1.8GB.

The schema looks like this

CREATE TABLE stock_price_daily_all_tradable (
        ticker TEXT NOT NULL,
        date TIMESTAMP NOT NULL,
        open REAL NOT NULL, 
        high REAL, 
        low REAL , 
        close REAL, 
        volume FLOAT,
        PRIMARY KEY (date , ticker)
    );

I migrated this to timescaledb hyper tables using this tutorial in the webpage

Short summary of migration

CREATE TABLE new_table (LIKE old_table INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
-- Assuming 'time' is the time column for the dataset SELECT create_hypertable('new_table', 'time');  -- Insert everything from old_table 
INSERT INTO new_table SELECT * FROM old_table;

Timescale hypertable size

The hypertable size is much higher than the postgre. Roughly ~2.3G.This looks to be a big hit in space ~500 MB

daily_price=# SELECT hypertable_detailed_size(‘stock_price_hyper’) ;
 (2024120320,264527872,25296896,2313945088,)

Now i thought turning on compression mode will help but i didn’t, in reality it increased the size

daily_price=# ALTER TABLE stock_price_hyper SET ( timescaledb.compress,
timescaledb.compress_segmentby = 'ticker'
);SELECT add_compression_policy('stock_price_hyper', INTERVAL '7 days');

The total size actually doubled.

daily_price=# SELECT hypertable_detailed_size('stock_price_hyper') ;
 (4452524032,238190592,50585600,4741300224,)

All the chunks are already compressed as seen in the following command

SELECT before_compression_total_bytes,after_compression_total_bytes, node_name
FROM chunk_compression_stats(‘stock_price_hyper’);                         606208 |                       1105920 | 
                         499712 |                       1114112 | 
                         606208 |                       1114112 | 
                         606208 |                       1114112 | 
                         606208 |                       1114112 |

The problem here is my chunks are too small <1 MB and compress_segmentby by stock ticker will make each segment too small, hence a lot of overhead at db level. Either use orderby option or set_chunk_time_interval to make the chunk time interval bigger value

My experience with timescaledb Compression

Written by Deepak Pant