How to Expand a "Condensed" Postgresql Row into Separate Columns

Expand column into multiple columns in postgres

Basically, functions which return set of rows should be called in the FROM clause. In this way you will get regular columns instead of records in the resultset.

SELECT upd.*
FROM input,
update_record(input) AS upd
WHERE upd.id IS NOT NULL

id | certified
----+-----------
1 | t
3 | t
5 | t
(3 rows)

SQLFiddle.

How to pass table rows to a plpgsql function?

After digging for a couple more weeks, I found the answer: it's LATERAL JOIN.
In my example, the query I need is:

WITH data AS (
SELECT 1 * i AS a, 2 * i AS b
FROM GENERATE_SERIES( 1, 3, 1 ) as i
)
SELECT
f.*
FROM
data, LATERAL toy_function( a, b ) f;

which gives the result I was looking for:

 x  | y
----+----
4 | 2
5 | 4
7 | 8
8 | 16
10 | 18
11 | 36
(6 rows)

(NOTE: the LATERAL keyword is optional for functions).

This new join was added to postgresql 9.3, docs here, where they explicitely mention this usage: "A common application is providing an argument value for a set-returning function". Also, the runtime of the query is now ok, it doesn't take 3x as much.

Related posts (for reference):

How can you expand a "condensed" PostgreSQL row into separate columns?

What is the difference between LATERAL and a subquery in PostgreSQL?

Call a set-returning function with an array argument multiple times

As for the reason of the increased runtime when wrapping the function call in ().* , it turns out it's because of a a bad macro expansion in the parser, which doesn't happen when you do a LATERAL join. See here for more details:

How to avoid multiple function evals with the (func()).* syntax in an SQL query?

Alternate output format for psql showing one column per line with column name

I just needed to spend more time staring at the documentation. This command:

\x on

will do exactly what I wanted. Here is some sample output:

select * from dda where u_id=24 and dda_is_deleted='f';
-[ RECORD 1 ]------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
dda_id | 1121
u_id | 24
ab_id | 10304
dda_type | CHECKING
dda_status | PENDING_VERIFICATION
dda_is_deleted | f
dda_verify_op_id | 44938
version | 2
created | 2012-03-06 21:37:50.585845
modified | 2012-03-06 21:37:50.593425
c_id |
dda_nickname |
dda_account_name |
cu_id | 1
abd_id |

Appending data to existing CSV file by columns/rows

The smallest fix, as Shubham P. pointed out, is to make sure you are writing a "line" by including a line break, like '\n':

file = open('dogedata.csv', 'a')
file.write(f'{current_time},{num_of_wallets}\n')
file.close()

The far better fix, as martineau pointed out, is to use the csv module: it's standard, well-developed and used, and importantly takes care of issues like escaping and quoting characters, and it only takes one more line:

file = open('dogedata.csv', 'a', newline='')
writer = csv.writer(f)
writer.writerow([current_time, num_of_wallets])
file.close()

Note that newline='' was added to the open() function, which tells open() not to handle newlines, and instead defer to the writer which has CSV-specific smarts for dealing w/newlines.

Also, instead of using string formatting, you just wrap your data in a list ([current_time, num_of_wallets]); the writer will convert everything to strings/text.

How to store a list in a column of a database table

No, there is no "better" way to store a sequence of items in a single column. Relational databases are designed specifically to store one value per row/column combination. In order to store more than one value, you must serialize your list into a single value for storage, then deserialize it upon retrieval. There is no other way to do what you're talking about (because what you're talking about is a bad idea that should, in general, never be done).

I understand that you think it's silly to create another table to store that list, but this is exactly what relational databases do. You're fighting an uphill battle and violating one of the most basic principles of relational database design for no good reason. Since you state that you're just learning SQL, I would strongly advise you to avoid this idea and stick with the practices recommended to you by more seasoned SQL developers.

The principle you're violating is called first normal form, which is the first step in database normalization.

At the risk of oversimplifying things, database normalization is the process of defining your database based upon what the data is, so that you can write sensible, consistent queries against it and be able to maintain it easily. Normalization is designed to limit logical inconsistencies and corruption in your data, and there are a lot of levels to it. The Wikipedia article on database normalization is actually pretty good.

Basically, the first rule (or form) of normalization states that your table must represent a relation. This means that:

  • You must be able to differentiate one row from any other row (in other words, you table must have something that can serve as a primary key. This also means that no row should be duplicated.
  • Any ordering of the data must be defined by the data, not by the physical ordering of the rows (SQL is based upon the idea of a set, meaning that the only ordering you should rely on is that which you explicitly define in your query)
  • Every row/column intersection must contain one and only one value

The last point is obviously the salient point here. SQL is designed to store your sets for you, not to provide you with a "bucket" for you to store a set yourself. Yes, it's possible to do. No, the world won't end. You have, however, already crippled yourself in understanding SQL and the best practices that go along with it by immediately jumping into using an ORM. LINQ to SQL is fantastic, just like graphing calculators are. In the same vein, however, they should not be used as a substitute for knowing how the processes they employ actually work.

Your list may be entirely "atomic" now, and that may not change for this project. But you will, however, get into the habit of doing similar things in other projects, and you'll eventually (likely quickly) run into a scenario where you're now fitting your quick-n-easy list-in-a-column approach where it is wholly inappropriate. There is not much additional work in creating the correct table for what you're trying to store, and you won't be derided by other SQL developers when they see your database design. Besides, LINQ to SQL is going to see your relation and give you the proper object-oriented interface to your list automatically. Why would you give up the convenience offered to you by the ORM so that you can perform nonstandard and ill-advised database hackery?

Does excess whitespace in SQL queries affect performance?

No. The additional white space is not going to noticeably affect performance. There might be a little impact in two places:

  • The larger string might be passed through a network. So a really slow network (remember dialups?) might slow it down.
  • The tokenization phase of the compiler has to skip over white space. String processing is pretty fast these days. And just the tokenization phase is doing a lot of other work.

The cost of a select query is rarely in the compile phase anyway. Reading the data and processing it is usually where (almost) all the time is spent.

Note: You could trivially create edge cases, such as select 1 followed by or preceded by a million spaces where the compilation would be noticeable. But you would have to be intentionally creating such a query string.



Related Topics



Leave a reply



Submit