One billion records table in Postgresql

I have a simple table in Postgersql database with character key.

CREATE TABLE public.zero_hids
(
    hexblock character(32) COLLATE pg_catalog."default" NOT NULL,
    hid bigint,
    CONSTRAINT zero_hids_pkey PRIMARY KEY (hexblock)
        USING INDEX TABLESPACE alcal
)

It is required to insert into this table 1000000000 unique records by loading from CSV file. We can split this file to 5 parts by 200 millions each and insert it from command line:

!/bin/bash
time psql --port=5433 -d db1 -c "COPY zero_hids FROM '/tmp/table0.csv' delimiter AS ',' HEADER CSV"
...
time psql --port=5433 -d db1 -c "COPY zero_hids FROM '/tmp/table4.csv' delimiter AS ',' HEADER CSV"

Size of each part is around 9 Gigabytes. Allowed memory limit to Postgresql is 32Gb. Here is results:

COPY 200000000
real 55m16.461s
user 0m0.040s
sys 0m0.008s
COPY 200000000
real 89m48.861s
user 0m0.048s
sys 0m0.016s
COPY 200000000
real 202m21.283s
user 0m0.044s
sys 0m0.024s
COPY 200000000
real 602m34.156s
user 0m0.056s
sys 0m0.008s
COPY 200000000
real 898m45.511s
user 0m0.052s
sys 0m0.016s

Really one record have uses 36 bytes + index size. If between first and second blocks time increased ~1,5 times, then for next 200M block it raise more then twice, and on the next (4th) increased 3 times more! But 5th block added again only 1,5 times slow then preveous. I think, after iserting ~550M records, database uses hard disk to build index. So, if I increase memory limit to 64 Gigabyte before start importing, it executes in around 10 hours, not 30.

In conclusion, set right memory limits to improve Postgresql peformance: imports would end in 10 hours, not 30, if I increased the memory limit to 64GB, I think.

Comments

Popular posts from this blog

Installing and using a free GeoIP database

Create custom Spinner on ActionBar