One billion records table in Postgresql
I have a simple table in Postgersql database with character key.
It is required to insert into this table 1000000000 unique records by loading from CSV file. We can split this file to 5 parts by 200 millions each and insert it from command line:
Size of each part is around 9 Gigabytes. Allowed memory limit to Postgresql is 32Gb. Here is results:
Really one record have uses 36 bytes + index size. If between first and second blocks time increased ~1,5 times, then for next 200M block it raise more then twice, and on the next (4th) increased 3 times more! But 5th block added again only 1,5 times slow then preveous. I think, after iserting ~550M records, database uses hard disk to build index. So, if I increase memory limit to 64 Gigabyte before start importing, it executes in around 10 hours, not 30.
In conclusion, set right memory limits to improve Postgresql peformance: imports would end in 10 hours, not 30, if I increased the memory limit to 64GB, I think.
CREATE TABLE public.zero_hids ( hexblock character(32) COLLATE pg_catalog."default" NOT NULL, hid bigint, CONSTRAINT zero_hids_pkey PRIMARY KEY (hexblock) USING INDEX TABLESPACE alcal )
It is required to insert into this table 1000000000 unique records by loading from CSV file. We can split this file to 5 parts by 200 millions each and insert it from command line:
!/bin/bash time psql --port=5433 -d db1 -c "COPY zero_hids FROM '/tmp/table0.csv' delimiter AS ',' HEADER CSV" ... time psql --port=5433 -d db1 -c "COPY zero_hids FROM '/tmp/table4.csv' delimiter AS ',' HEADER CSV"
Size of each part is around 9 Gigabytes. Allowed memory limit to Postgresql is 32Gb. Here is results:
COPY 200000000 real 55m16.461s user 0m0.040s sys 0m0.008s COPY 200000000 real 89m48.861s user 0m0.048s sys 0m0.016s COPY 200000000 real 202m21.283s user 0m0.044s sys 0m0.024s COPY 200000000 real 602m34.156s user 0m0.056s sys 0m0.008s COPY 200000000 real 898m45.511s user 0m0.052s sys 0m0.016s
Really one record have uses 36 bytes + index size. If between first and second blocks time increased ~1,5 times, then for next 200M block it raise more then twice, and on the next (4th) increased 3 times more! But 5th block added again only 1,5 times slow then preveous. I think, after iserting ~550M records, database uses hard disk to build index. So, if I increase memory limit to 64 Gigabyte before start importing, it executes in around 10 hours, not 30.
In conclusion, set right memory limits to improve Postgresql peformance: imports would end in 10 hours, not 30, if I increased the memory limit to 64GB, I think.
Comments
Post a Comment