My experience with timescaledb Compression
TLDR: I got a response from Michael about the the space inflation problem i am observing. You can see the response in the comment. The problem here is my chunks are too small <1 MB and compress_segmentby by stock ticker will make each segment too small, hence a lot of overhead at db level. Either use orderby option or set_chunk_time_interval to make the chunk time interval bigger value
Full story.
I took ~28 Million data points of stock price data and put it in postgres.
It was all tradable stocks daily data going max back in the history.The total size of data was ~1.8GB.
The schema looks like this
CREATE TABLE stock_price_daily_all_tradable (
ticker TEXT NOT NULL,
date TIMESTAMP NOT NULL,
open REAL NOT NULL,
high REAL,
low REAL ,
close REAL,
volume FLOAT,
PRIMARY KEY (date , ticker)
);
I migrated this to timescaledb hyper tables using this tutorial in the webpage
Short summary of migration
CREATE TABLE new_table (LIKE old_table INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
-- Assuming 'time' is the time column for the dataset SELECT create_hypertable('new_table', 'time'); -- Insert everything from old_table
INSERT INTO new_table SELECT * FROM old_table;
Timescale hypertable size
The hypertable size is much higher than the postgre. Roughly ~2.3G.This looks to be a big hit in space ~500 MB
daily_price=# SELECT hypertable_detailed_size(‘stock_price_hyper’) ;
(2024120320,264527872,25296896,2313945088,)
Now i thought turning on compression mode will help but i didn’t, in reality it increased the size
daily_price=# ALTER TABLE stock_price_hyper SET ( timescaledb.compress,
timescaledb.compress_segmentby = 'ticker'
);SELECT add_compression_policy('stock_price_hyper', INTERVAL '7 days');
The total size actually doubled.
daily_price=# SELECT hypertable_detailed_size('stock_price_hyper') ;
(4452524032,238190592,50585600,4741300224,)
All the chunks are already compressed as seen in the following command
SELECT before_compression_total_bytes,after_compression_total_bytes, node_name
FROM chunk_compression_stats(‘stock_price_hyper’); 606208 | 1105920 |
499712 | 1114112 |
606208 | 1114112 |
606208 | 1114112 |
606208 | 1114112 |
The problem here is my chunks are too small <1 MB and compress_segmentby by stock ticker will make each segment too small, hence a lot of overhead at db level. Either use orderby option or set_chunk_time_interval to make the chunk time interval bigger value