My experience with timescaledb Compression

Deepak Pant
2 min readMar 6, 2021

TLDR: I got a response from Michael about the the space inflation problem i am observing. You can see the response in the comment. The problem here is my chunks are too small <1 MB and compress_segmentby by stock ticker will make each segment too small, hence a lot of overhead at db level. Either use orderby option or set_chunk_time_interval to make the chunk time interval bigger value

Full story.

I took ~28 Million data points of stock price data and put it in postgres.

It was all tradable stocks daily data going max back in the history.The total size of data was ~1.8GB.

The schema looks like this

CREATE TABLE stock_price_daily_all_tradable (
ticker TEXT NOT NULL,
date TIMESTAMP NOT NULL,
open REAL NOT NULL,
high REAL,
low REAL ,
close REAL,
volume FLOAT,
PRIMARY KEY (date , ticker)
);
SELECT query in the table

I migrated this to timescaledb hyper tables using this tutorial in the webpage

Short summary of migration

CREATE TABLE new_table (LIKE old_table INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
-- Assuming 'time' is the time column for the dataset SELECT create_hypertable('new_table', 'time');
-- Insert everything from old_table
INSERT INTO new_table SELECT * FROM old_table;

Timescale hypertable size

The hypertable size is much higher than the postgre. Roughly ~2.3G.This looks to be a big hit in space ~500 MB

daily_price=# SELECT hypertable_detailed_size(‘stock_price_hyper’) ;
(2024120320,264527872,25296896,2313945088,)

Now i thought turning on compression mode will help but i didn’t, in reality it increased the size

daily_price=# ALTER TABLE stock_price_hyper SET ( timescaledb.compress,
timescaledb.compress_segmentby = 'ticker'
);
SELECT add_compression_policy('stock_price_hyper', INTERVAL '7 days');

The total size actually doubled.

daily_price=# SELECT hypertable_detailed_size('stock_price_hyper') ;
(4452524032,238190592,50585600,4741300224,)

All the chunks are already compressed as seen in the following command

SELECT before_compression_total_bytes,after_compression_total_bytes, node_name
FROM chunk_compression_stats(‘stock_price_hyper’);
606208 | 1105920 |
499712 | 1114112 |
606208 | 1114112 |
606208 | 1114112 |
606208 | 1114112 |

The problem here is my chunks are too small <1 MB and compress_segmentby by stock ticker will make each segment too small, hence a lot of overhead at db level. Either use orderby option or set_chunk_time_interval to make the chunk time interval bigger value

--

--