Find the Byte Size of a Row in Postgresql

Find the byte size of a row in PostgreSQL

Use pg_column_size and octet_length.

See:

  • How can pg_column_size be smaller than octet_length?
  • How can I find out how big a large TEXT field is in Postgres?
  • How can pg_column_size be smaller than octet_length?

Get size for Bytea column in postgreSQL

You can use the length() function to get the length in bytes:

select length(bytea_column) as filesize
from file_info;

If you want to format the output you can use pg_size_pretty()

select pg_size_pretty(length(bytea_column)) as filesize
from file_info;

That will adapt the "kb", "mb" output based on the size. If you always want MB, you can use:

select concat(length(bytea_column) / 1048576.0, ' MB') as filesize
from file_info;

Making sense of Postgres row sizes

Calculation of row size is much more complex than that.

Storage is typically partitioned in 8 kB data pages. There is a small fixed overhead per page, possible remainders not big enough to fit another tuple, and more importantly dead rows or a percentage initially reserved with the FILLFACTOR setting.

And there is even more overhead per row (tuple): an item identifier of 4 bytes at the start of the page, the HeapTupleHeader of 23 bytes and alignment padding. The start of the tuple header as well as the start of tuple data are aligned at a multiple of MAXALIGN, which is 8 bytes on a typical 64-bit machine. Some data types require alignment to the next multiple of 2, 4 or 8 bytes.

Quoting the manual on the system table pg_tpye:

typalign is the alignment required when storing a value of this type.
It applies to storage on disk as well as most representations of the
value inside PostgreSQL. When multiple values are stored
consecutively, such as in the representation of a complete row on
disk, padding is inserted before a datum of this type so that it
begins on the specified boundary. The alignment reference is the
beginning of the first datum in the sequence.

Possible values are:

  • c = char alignment, i.e., no alignment needed.

  • s = short alignment (2 bytes on most machines).

  • i = int alignment (4 bytes on most machines).

  • d = double alignment (8 bytes on many machines, but by no means all).

Read about the basics in the manual here.

Your example

This results in 4 bytes of padding after your 3 integer columns, because the timestamp column requires double alignment and needs to start at the next multiple of 8 bytes.

So, one row occupies:

   23   -- heaptupleheader
+ 1 -- padding or NULL bitmap
+ 12 -- 3 * integer (no alignment padding here)
+ 4 -- padding after 3rd integer
+ 8 -- timestamp
+ 0 -- no padding since tuple ends at multiple of MAXALIGN

Plus item identifier per tuple in the page header (as pointed out by @A.H. in the comment):

 +  4   -- item identifier in page header
------
= 52 bytes

So we arrive at the observed 52 bytes.

The calculation pg_relation_size(tbl) / count(*) is a pessimistic estimation. pg_relation_size(tbl) includes bloat (dead rows) and space reserved by fillfactor, as well as overhead per data page and per table. (And we didn't even mention compression for long varlena data in TOAST tables, since it doesn't apply here.)

You can install the additional module pgstattuple and call SELECT * FROM pgstattuple('tbl_name'); for more information on table and tuple size.

Related:

  • Table size with page layout
  • Calculating and saving space in PostgreSQL

Determine size of a table or selected records in terms of bytes/megabytes

For the table:

SELECT  pg_size_pretty(pg_total_relation_size('name_of_tbl'));

For a particular table row and/or column in it:

SELECT pg_column_size(t) AS row_size_on_disk
pg_column_size(t.column1) AS column1_size_on_disk
FROM tbl t
WHERE tbl_id = ?;

See:

  • Measure the size of a PostgreSQL table row
  • What is the overhead for varchar(n)?

Get PostgreSql table size on disk (byte) in ORMlite

int result = OGGETTO_DAO.executeRaw(SQL);

Yeah that's not right. Looking at the javadocs for executeRaw(...) they say that it returns the number of rows affected not the result.

SQL = "SELECT pg_table_size('" + TABLE_NAME + "');" ;
GenericRawResults ARRAY = OGGETTO_DAO.queryRaw(SQL);
String result= ARRAY.getFirstResult();

Looking at the javadocs for queryRaw(...), the problem here is it returns a GenericRawResults<String[]> and not <String>. It returns a collection of raw results, each row being represented by a string array. I'm really surprised that your code even compiles.

It should be:

GenericRawResults<String[]> ARRAY = OGGETTO_DAO.queryRaw(SQL);
String result= ARRAY.getFirstResult()[0];

Probably the best way to do this is to use queryRawValue(...) which performs a raw query and returns a single value.

// throws an exception if there are no results or if the first one isn't a number
long size = OGGETTO_DAO.queryRawValue(SQL);

PostgreSQL column size vs table size

From https://www.postgresql.org/docs/14/functions-admin.html

pg_table_size - Computes the disk space used by the specified table,
excluding indexes (but including its TOAST table if any, free space
map, and visibility map).

So pg_table_size gives you the amount of disk postgres is using for the table and for some metadata that postgres keeps about the table (Visibility Map and Free Space Map). Deleting a row will not decrease this number (unless you do a VACUUM FULL), so we wouldn't expect the disk used by a table to match the sum of the data in each visible row. Instead, the disk used by a table would be larger.

pg_column_size - Shows the number of bytes used to store any individual
data value. If applied directly to a table column value, this reflects
any compression that was done.

So this returns the size of each row on disk (including the row-header information stored on disk).

I'm not sure whether you'd consider the row header information 'real', but it does take up space on your harddrive, so whether this is correct or not depends on your use case.

Using an example table from a database I have:

SELECT pg_table_size('users')
-- 3751936 <-- the size of my 'users' table on disk, including table meta-data
SELECT (SUM(pg_column_size(t.*))) FROM users AS t;
-- 3483028 <-- the total size on disk of the visible rows, including row "header" metadata.
SELECT (SUM(pg_column_size(t.*)-24)) FROM users AS t;
-- 3069412 <-- the size of the data in visible rows, excluding row "header" meta-data

We'd expect each of these queries to return different numbers, and each is useful for a different purpose.

As for the specific numbers that you've posted (with the pg_column_size being larger than the pg_table_size) I can't explain.

How to determine how much space 1 row will take in Postgres db?

That "something inside me says that can't be right" is wrong. Actually trying id determine the size of each row is impractical. You can calculate the average row, and given a large number of rows the better that average get. Part of that reason is variable length columns. Your definition varchar(50) does not required bytes of storage unless unless it contains 50 bytes, if it has 20 then it only takes up 20 bytes (plus overhead), even then it's not exact as the padding may change. The definition only specifies the Maximum not the actual, storage is on actual.

As far a your 24kb that doesn't seem out-of-line at all. Keep in mind that physical I/O is the slowest possible individual operation and trying to adjust to individual rows for I/O would bring your system to a screeching halt. Postgres therefore only reads in full blocks (and allocates space the same), and/or multiple blocks. Typically with a block size of 8K (8192 bytes). This is the trade off I/O performance vs. space allocation. It appears your system has a multi-block read of 3 blocks (??). If anything is surprising it would that is is that small.

In short trying to get the size of a row not piratical, instead get several hundred representative rows and calculate the average.

BTW you can change the length just by rearranging your columns:

1 row =
23 (heaptupleheader)
+ 1 (padding)
+ 8 (id)
+ 8 (size)
+ 8 (timestamp)
+ 8 (timestamp)
+ 50 (content_id)
+ 2 (padding) (if content contains all 50 char)
+ 100 (type) (if type contains all 100 char)
= 208 bytes

Get size of large object in PostgreSQL query?

Not that I've used large objects, but looking at the docs: http://www.postgresql.org/docs/current/interactive/lo-interfaces.html#LO-TELL

I think you have to use the same technique as some file system APIs require: seek to the end, then tell the position. PostgreSQL has SQL functions that appear to wrap the internal C functions. I couldn't find much documentation, but this worked:

CREATE OR REPLACE FUNCTION get_lo_size(oid) RETURNS bigint
VOLATILE STRICT
LANGUAGE 'plpgsql'
AS $$
DECLARE
fd integer;
sz bigint;
BEGIN
-- Open the LO; N.B. it needs to be in a transaction otherwise it will close immediately.
-- Luckily a function invocation makes its own transaction if necessary.
-- The mode x'40000'::int corresponds to the PostgreSQL LO mode INV_READ = 0x40000.
fd := lo_open($1, x'40000'::int);
-- Seek to the end. 2 = SEEK_END.
PERFORM lo_lseek(fd, 0, 2);
-- Fetch the current file position; since we're at the end, this is the size.
sz := lo_tell(fd);
-- Remember to close it, since the function may be called as part of a larger transaction.
PERFORM lo_close(fd);
-- Return the size.
RETURN sz;
END;
$$;

Testing it:

-- Make a new LO, returns an OID e.g. 1234567
SELECT lo_create(0);

-- Populate it with data somehow
...

-- Get the length.
SELECT get_lo_size(1234567);

It seems the LO functionality is designed to be used mostly through the client or through low-level server programming, but at least they've provided some SQL visible functions for it, which makes the above possible. I did a query for SELECT relname FROM pg_proc where relname LIKE 'lo%' to get myself started. Vague memories of C programming and a bit of research for the mode x'40000'::int and SEEK_END = 2 value were needed for the rest!



Related Topics



Leave a reply



Submit