Trim trailing spaces with PostgreSQL
There are many different invisible characters. Many of them have the property WSpace=Y
("whitespace") in Unicode. But some special characters are not considered "whitespace" and still have no visible representation. The excellent Wikipedia articles about space (punctuation) and whitespace characters should give you an idea.
<rant>Unicode sucks in this regard: introducing lots of exotic characters that mainly serve to confuse people.</rant>
The standard SQL trim()
function by default only trims the basic Latin space character (Unicode: U+0020 / ASCII 32). Same with the rtrim()
and ltrim()
variants. Your call also only targets that particular character.
Use regular expressions with regexp_replace()
instead.
Trailing
To remove all trailing white space (but not white space inside the string):
SELECT regexp_replace(eventdate, '\s+$', '') FROM eventdates;
The regular expression explained:\s
... regular expression class shorthand for [[:space:]]
- which is the set of white-space characters - see limitations below+
... 1 or more consecutive matches$
... end of string
Demo:
SELECT regexp_replace('inner white ', '\s+$', '') || '|'
Returns:
inner white|
Yes, that's a single backslash (\
). Details in this related answer:
- SQL select where column begins with \
Leading
To remove all leading white space (but not white space inside the string):
regexp_replace(eventdate, '^\s+', '')
^
.. start of string
Both
To remove both, you can chain above function calls:
regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '')
Or you can combine both in a single call with two branches.
Add 'g'
as 4th parameter to replace all matches, not just the first:
regexp_replace(eventdate, '^\s+|\s+$', '', 'g')
But that should typically be faster with substring()
:
substring(eventdate, '\S(?:.*\S)*')
\S
... everything but white space(?:
re
)
... non-capturing set of parentheses.*
... any string of 0-n characters
Or one of these:
substring(eventdate, '^\s*(.*\S)')
substring(eventdate, '(\S.*\S)') -- only works for 2+ printing characters
(
re
)
... Capturing set of parentheses
Effectively takes the first non-whitespace character and everything up to the last non-whitespace character if available.
Whitespace?
There are a few more related characters which are not classified as "whitespace" in Unicode - so not contained in the character class [[:space:]]
.
These print as invisible glyphs in pgAdmin for me: "mongolian vowel", "zero width space", "zero width non-joiner", "zero width joiner":
SELECT E'\u180e', E'\u200B', E'\u200C', E'\u200D';
'' | '' | '' | ''
Two more, printing as visible glyphs in pgAdmin, but invisible in my browser: "word joiner", "zero width non-breaking space":
SELECT E'\u2060', E'\uFEFF';
'' | ''
Ultimately, whether characters are rendered invisible or not also depends on the font used for display.
To remove all of these as well, replace '\s'
with '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]'
or '[\s]'
(note trailing invisible characters!).
Example, instead of:
regexp_replace(eventdate, '\s+$', '')
use:
regexp_replace(eventdate, '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]+$', '')
or:
regexp_replace(eventdate, '[\s]+$', '') -- note invisible characters
Limitations
There is also the Posix character class [[:graph:]]
supposed to represent "visible characters". Example:
substring(eventdate, '([[:graph:]].*[[:graph:]])')
It works reliably for ASCII characters in every setup (where it boils down to [\x21-\x7E]
), but beyond that you currently (incl. pg 10) depend on information provided by the underlying OS (to define ctype
) and possibly locale settings.
Strictly speaking, that's the case for every reference to a character class, but there seems to be more disagreement with the less commonly used ones like graph. But you may have to add more characters to the character class [[:space:]]
(shorthand \s
) to catch all whitespace characters. Like: \u2007
, \u202f
and \u00a0
seem to also be missing for @XiCoN JFS.
The manual:
Within a bracket expression, the name of a character class enclosed in
[:
and:]
stands for the list of all characters belonging to that
class. Standard character class names are:alnum
,alpha
,blank
,cntrl
,digit
,graph
,lower
,punct
,space
,upper
,xdigit
.
These stand for the character classes defined in ctype.
A locale can provide others.
Bold emphasis mine.
Also note this limitation that was fixed with Postgres 10:
Fix regular expressions' character class handling for large character
codes, particularly Unicode characters aboveU+7FF
(Tom Lane)Previously, such characters were never recognized as belonging to
locale-dependent character classes such as[[:alpha:]]
.
Prevent trailing spaces during insert?
Use the PostgreSQL trim()
function.
There is trim()
, rtrim()
and ltrim()
.
To trim trailing spaces:
...
rtrim(b.acct_type_desc) as acct_desc,
...
If acct_type_desc
is not of type text
or varchar
, cast it to text first:
...
rtrim(b.acct_type_desc::text) as acct_desc,
...
If acct_type_desc
is of type char(n)
, casting it to text removes trailing spaces automatically, no trim()
necessary.
How to trim/remove leading/left white spaces from a text file using Windows Batch?
Have a look at the -t
and -A
psql parameters:
-t
removes headers and footers from the results-A
switches off aligned mode (which is most likely where your whitespace is coming from - alignment into columns).
So the command should look something like the following:
psql -d databasename -p portname -U username -t -A -f filename -o "C:\text.txt"
So, basically, you shouldn't need to modify the resulting file - you can modify your psql command to get results in a format you want.
how to trim trailing spaces from every columns in all tables in PostgreSQL database
Following query will return all tables and their columns that may or may not have trailing spaces.
NOTE : I'm assuming that all tables have tbl_
prefix.
select
table_name,COLUMN_NAME
from
INFORMATION_SCHEMA.COLUMNS
where
table_name LIKE 'tbl_%' and (data_type='text' or data_type='character varying')
to get the UPDATE
query for all tables use the following select
select
'UPDATE '||quote_ident(c.table_name)||' SET '||c.COLUMN_NAME||'=TRIM('||quote_ident(c.COLUMN_NAME)||')
WHERE '||quote_ident(c.COLUMN_NAME)||' ILIKE ''% '' ' as script
from (
select
table_name,COLUMN_NAME
from
INFORMATION_SCHEMA.COLUMNS
where
table_name LIKE 'tbl_%' and (data_type='text' or data_type='character varying')
) c
This will return rows like update tbl_sale set product=trim(product) where product LIKE '% '
to update all columns in all tables.
Finally,
Use this method to update all columns in a database
that having trailing space.
do $$
declare
selectrow record;
begin
for selectrow in
select
'UPDATE '||quote_ident(c.table_name)||' SET '||c.COLUMN_NAME||'=TRIM('||c.COLUMN_NAME||') WHERE '||quote_ident(c.COLUMN_NAME)||' ILIKE ''% '' ' as script
from (
select
table_name,COLUMN_NAME
from
INFORMATION_SCHEMA.COLUMNS
where
table_name LIKE 'tbl_%' and (data_type='text' or data_type='character varying' )
) c
loop
execute selectrow.script;
end loop;
end;
$$;
Wrap the above method into a Function
, So that it is more convenient to use future
create function rm_trail_spaces() returns void as
$$
declare
selectrow record;
begin
for selectrow in
select
'UPDATE '||quote_ident(c.table_name)||' SET '||quote_ident(c.COLUMN_NAME)||'=TRIM('||quote_ident(c.COLUMN_NAME)||') WHERE '||quote_ident(c.COLUMN_NAME)||' ILIKE ''% '' ' as script
from (
select
table_name,COLUMN_NAME
from
INFORMATION_SCHEMA.COLUMNS
where
table_name LIKE 'tbl_%' and (data_type='text' or data_type='character varying' )
) c
loop
execute selectrow.script;
end loop;
end;
$$
language plpgsql
usage: SELECT rm_trail_spaces()
How to show leading/trailing whitespace in a PostgreSQL column?
If you don't mind substituting all whitespace characters whether or not they are leading/trailing, something like the following will do it:
SELECT REPLACE(REPLACE(REPLACE(REPLACE(txt, ' ', '_'),
E'\t', '\t'),
E'\r', '\r'),
E'\n', '\n') AS txt
FROM test;
This is using an underscore to mark the spaces but of course you are free to choose your own. See SQL fiddle demo.
If you strictly only want to show up the leading/trailing ones it will get more complex - but if this is really desired, something may be possible using regex_replace
.
PostgreSQL regexp_replace() to keep just one whitespace
SELECT trim(regexp_replace(col_name, '\s+', ' ', 'g')) as col_name FROM table_name;
Or In case of update :
UPDATE table_name SET col_name = trim(regexp_replace(col_name, '\s+', ' ', 'g'));
The regexp_replace
is flags are described on this section of the documentation.
Related Topics
Transactsql to Run Another Transactsql Script
How to Compare the Current Row with Next and Previous Row in Postgresql
Conditional Logic in Postdeployment.SQL Script Using SQLcmd
Select Rows Where Column Value Has Changed
Difference of Two Date Time in SQL Server
How to Read the Last Row with SQL Server
Alphanumeric Sorting with Postgresql
Best Way to Model Customer <--> Address
Alter User Defined Type in SQL Server
SQL Server:Sum() of Multiple Rows Including Where Clauses
How to Create a Date in SQL Server Given the Day, Month and Year as Integers
Oracle - How to Create a Materialized View with Fast Refresh and Joins
Postgres Error: More Than One Row Returned by a Subquery Used as an Expression
Lightweight SQL Database Which Doesn't Require Installation
How to Use My SQL Knowledge with Cloudant/Couchdb
Modify Default Value in SQL Server
Sql: How to Select a Single Id ("Row") That Meets Multiple Criteria from a Single Column
SQL Query: Simulating an "And" Over Several Rows Instead of Sub-Querying