Trimmining a Column with Bad Data

Trimmining a column with bad data

Use CHARINDEX(), LEN() and RIGHT() instead.

RIGHT(LPNumber, LEN(LPNumber) - CHARINDEX('#', LPNumber, 0))

Is there a way to TRIM all data in a SELECT * FROM statement?

You need to specify each string column by hand:

SELECT TRIM(col1),       --LTRIM(RTRIM(...)) If RDBMS is SQL Server
       TRIM(col2),
       TRIM(col3),
       TRIM(col4)
      -- ...
FROM table

There is another problem with your proposal. * is placeholder for each column in table so there will be problem with trimming date/decimal/spatial data ....

Addendum

Using Oracle 18c Polymorphic Table Functions(provided code is just PoC, there is a space for a lot of improvements):

CREATE TABLE tab(id INT, d DATE,
                 v1 VARCHAR2(100), v2 VARCHAR2(100), v3 VARCHAR2(100) );

INSERT INTO tab(id, d,v1, v2, v3) 
VALUES (1, SYSDATE, '  aaaa   ', '     b   ', '  c');
INSERT INTO tab(id, d,v1, v2, v3)
VALUES (2, SYSDATE+1, '   afasd', '    ', '  d');
COMMIT;

SELECT * FROM tab;
-- Output
.----.-----------.-----------.-----------.-----.
| ID |     D     |    V1     |    V2     | V3  |
:----+-----------+-----------+-----------+-----:
|  1 | 02-MAR-18 |   aaaa    |      b    |   c |
:----+-----------+-----------+-----------+-----:
|  2 | 03-MAR-18 |     afasd |           |   d |
'----'-----------'-----------'-----------'-----'

And table function:

CREATE OR REPLACE PACKAGE ptf AS
  FUNCTION describe(tab IN OUT dbms_tf.table_t)RETURN dbms_tf.describe_t;      
  PROCEDURE FETCH_ROWS;
END ptf; 
/
CREATE OR REPLACE PACKAGE BODY  ptf AS
  FUNCTION describe(tab IN OUT dbms_tf.table_t) RETURN dbms_tf.describe_t AS
  new_cols DBMS_TF.COLUMNS_NEW_T;
  BEGIN
    FOR i IN 1 .. tab.column.count LOOP
        IF tab.column(i).description.type IN ( dbms_tf.type_varchar2) THEN
           tab.column(i).pass_through:=FALSE;
           tab.column(i).for_read:= TRUE;
           NEW_COLS(i) :=
              DBMS_TF.COLUMN_METADATA_T(name=> tab.column(i).description.name,
                                       type => tab.column(i).description.type);
        END IF;
    END LOOP;
    RETURN DBMS_TF.describe_t(new_columns=>new_cols, row_replication=>true);
    END;
  
  PROCEDURE FETCH_ROWS AS
    inp_rs DBMS_TF.row_set_t;
    out_rs DBMS_TF.row_set_t;
    rows   PLS_INTEGER;
  BEGIN 
    DBMS_TF.get_row_set(inp_rs, rows);
    FOR c IN 1 .. inp_rs.count() LOOP
      FOR r IN 1 .. rows LOOP
          out_rs(c).tab_varchar2(r) := TRIM(inp_rs(c).tab_varchar2(r));
      END LOOP;
    END LOOP;
    DBMS_TF.put_row_set(out_rs, replication_factor => 1);
  END;
END ptf;

And final call:

CREATE OR REPLACE FUNCTION trim_col(tab TABLE)
RETURN TABLE pipelined row polymorphic USING ptf;

SELECT *
FROM trim_col(tab);    -- passing table as table function argument

.----.-----------.-------.-----.----.
| ID |     D     |  V1   | V2  | V3 |
:----+-----------+-------+-----+----:
|  1 | 02-MAR-18 | aaaa  | b   | c  |
:----+-----------+-------+-----+----:
|  2 | 03-MAR-18 | afasd |  -  | d  |
'----'-----------'-------'-----'----'

db<>fiddle demo

Trim all database fields

No cursors. Copy and paste the output. Works also for SQL 2000, which doesn't have varchar(max). This can be easily extended to add a GO line to the end of each UPDATE if desired.

SELECT  SQL
FROM    (       SELECT  t.TABLE_CATALOG
                 ,      t.TABLE_SCHEMA
                 ,      t.TABLE_NAME
                 ,      0               SORT
                 ,      'UPDATE  ' + QUOTENAME(t.TABLE_CATALOG) + '.' + QUOTENAME(t.TABLE_SCHEMA) + '.' + QUOTENAME(t.TABLE_NAME)       SQL
                FROM    INFORMATION_SCHEMA.TABLES       t
                 JOIN   INFORMATION_SCHEMA.COLUMNS      c
                  ON    t.TABLE_CATALOG = c.TABLE_CATALOG
                   AND  t.TABLE_SCHEMA  = c.TABLE_SCHEMA
                   AND  t.TABLE_NAME    = c.TABLE_NAME
                WHERE   t.TABLE_TYPE    = 'BASE TABLE'
                 AND    c.DATA_TYPE     IN ('char','nchar','varchar','nvarchar')
                GROUP BY        t.TABLE_CATALOG
                 ,              t.TABLE_SCHEMA
                 ,              t.TABLE_NAME
                UNION ALL
                SELECT  x.TABLE_CATALOG
                 ,      x.TABLE_SCHEMA
                 ,      x.TABLE_NAME
                 ,      CASE    WHEN    x.COLUMN_NAME_MIN       = y.COLUMN_NAME
                                THEN    1
                                ELSE    2
                        END                                                     SORT
                 ,      CASE    WHEN    x.COLUMN_NAME_MIN       = y.COLUMN_NAME
                                THEN    'SET     '
                                ELSE    ' ,      '
                        END + y.SQL                                             SQL
                FROM    (       SELECT  t.TABLE_CATALOG
                                 ,      t.TABLE_SCHEMA
                                 ,      t.TABLE_NAME
                                 ,      MIN(c.COLUMN_NAME)      COLUMN_NAME_MIN
                                FROM    INFORMATION_SCHEMA.TABLES       t
                                 JOIN   INFORMATION_SCHEMA.COLUMNS      c
                                  ON    t.TABLE_CATALOG = c.TABLE_CATALOG
                                   AND  t.TABLE_SCHEMA  = c.TABLE_SCHEMA
                                   AND  t.TABLE_NAME    = c.TABLE_NAME
                                WHERE   t.TABLE_TYPE    = 'BASE TABLE'
                                 AND    c.DATA_TYPE     IN ('char','nchar','varchar','nvarchar')
                                GROUP BY        t.TABLE_CATALOG
                                 ,              t.TABLE_SCHEMA
                                 ,              t.TABLE_NAME
                        )       x
                 JOIN   (       SELECT  t.TABLE_CATALOG
                                 ,      t.TABLE_SCHEMA
                                 ,      t.TABLE_NAME
                                 ,      c.COLUMN_NAME
                                 ,      QUOTENAME(c.COLUMN_NAME) + ' = LTRIM(RTRIM(' + QUOTENAME(c.COLUMN_NAME) + '))'  SQL
                                FROM    INFORMATION_SCHEMA.TABLES       t
                                 JOIN   INFORMATION_SCHEMA.COLUMNS      c
                                  ON    t.TABLE_CATALOG = c.TABLE_CATALOG
                                   AND  t.TABLE_SCHEMA  = c.TABLE_SCHEMA
                                   AND  t.TABLE_NAME    = c.TABLE_NAME
                                WHERE   t.TABLE_TYPE    = 'BASE TABLE'
                                 AND    c.DATA_TYPE     IN ('char','nchar','varchar','nvarchar')
                        )       y
                  ON    x.TABLE_CATALOG = y.TABLE_CATALOG
                   AND  x.TABLE_SCHEMA  = y.TABLE_SCHEMA
                   AND  x.TABLE_NAME    = y.TABLE_NAME
        )       x
ORDER BY        x.TABLE_CATALOG
 ,              x.TABLE_SCHEMA
 ,              x.TABLE_NAME
 ,              x.SORT
 ,              x.SQL

when we need trim when we don't need in oracle

Looking at the Oracle Documentation on literals:

Text literals have properties of both the CHAR and VARCHAR2 datatypes:
Within expressions and conditions, Oracle treats text literals as though they have the datatype CHAR by comparing them using blank-padded comparison semantics.

and the documentation of blank-padded comparison semantics states:

With blank-padded semantics, if the two values have different lengths, then Oracle first adds blanks to the end of the shorter one so their lengths are equal. Oracle then compares the values character by character up to the first character that differs. The value with the greater character in the first differing position is considered greater. If two values have no differing characters, then they are considered equal. This rule means that two values are equal if they differ only in the number of trailing blanks. Oracle uses blank-padded comparison semantics only when both values in the comparison are either expressions of datatype CHAR, NCHAR, text literals, or values returned by the USER function.

So, for your examples:

SELECT * FROM AAA WHERE T1 = 'A'

Since the left-hand side of the comparison, T1 which equals 'A ', is a CHAR(3) and the right-hand side is a text literal 'A' then blank-padded comparison semantics are used and 'A ' = 'A' is true.

Then:

SELECT  * FROM AAA WHERE T2 = 'A  '

The left-hand side of the comparison, T2 which equals 'A ', is a VARCHAR2(3) and the right-hand side is a text literal 'A ' then blank-padded comparison semantics are not used (both sides are not CHAR or text literals) but the values are equal ('A ' = 'A ') so the condition is true.

And:

SELECT  * FROM AAA WHERE T2 = 'A    '

The LHS is a VARCHAR2 equal to 'A ' and the RHS is a text literal equal to 'A '; blank-padded comparison semantics will not be used and the values are not equal so the condition is not true and the row will not be returned.

Then:

SELECT  * FROM AAA WHERE TRIM(T2) = TRIM('A   ')

The TRIM() function states that:

The string returned is of VARCHAR2 datatype if trim_source is a character datatype and a LOB if trim_source is a LOB datatype.

Both values are trimmed so you have the condition 'A' = 'A' where both sides are a VARCAHR2 data type so blank-padded comparison semantics will not be used but the values are equal so the condition is true and the row will be returned.

Finally:

SELECT CASE WHEN 'A' = 'A ' THEN '1' ELSE '0' END T1, 
       CASE WHEN 'A       ' = 'A' THEN '1' ELSE '0' END T2,
       CASE WHEN 'A ' = 'A ' THEN '1' ELSE '0' END T2   
FROM DUAL;

In all cases both sides of the equality are text literals so blank-padded comparison semantics will be used so for

'A' = 'A ' the LHS will be padded so the strings are the same length;
'A ' = 'A' the RHS will be padded;
'A ' = 'A ' both sides are equal length so no padding is needed.

And in each case the equalities are equal.

If you use CAST to change from text literals to a VARCHAR2 data type:

SELECT CASE WHEN 'A' = CAST( 'A ' AS VARCHAR2(5) ) THEN 1 ELSE 0 END AS T1
FROM   DUAL

The LHS is a text literal but the RHS is a VARCHAR2 and blank-padded comparison semantics are not used and the equality is not true.

Trimming all cells in DataTable

OleDb Objects

Actually what I meant is, to get formatted/trimmed string values from the Excel Sheet and create a DataTable with DataColumn objects of string type only, use the forward-only OleDbDataReader to create both, DataColumn and DataRow objects as it reads. Doing so, the data will be modified and filled in one step hence no need to call another routine to loop again and waste some more time. Also, consider using asynchronous calls to speed up the process and avoid freezing the UI while executing the lengthy task.

Something might help you to go:

private async void TheCaller()
{
    using (var ofd = new OpenFileDialog
    {
        Title = "Select File",
        Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
        AutoUpgradeEnabled = true,
    })
    {
        if (ofd.ShowDialog() != DialogResult.OK) return;

        var conString = string.Empty;
        var msg = "Loading... Please wait.";

        try
        {
            switch (ofd.FilterIndex)
            {
                case 1: //xlsx
                    conString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={ofd.FileName};Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";                            
                    break;
                case 2: //xls
                    conString = $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={ofd.FileName};Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
                    break;
                default:
                    throw new FileFormatException();
            }

            var sheetName = "sheet1";
            var dt = new DataTable();

            //Optional: a label to show the current status
            //or maybe show a ProgressBar with ProgressBarStyle = Marquee
            lblStatus.Text = msg;

            await Task.Run(() =>
            {
                using (var con = new OleDbConnection(conString))
                using (var cmd = new OleDbCommand($"SELECT * From [{sheetName}$]", con))
                {
                    con.Open();

                    using (var r = cmd.ExecuteReader())
                        while (r.Read())
                        {
                            if (dt.Columns.Count == 0)
                                for (var i = 0; i < r.FieldCount; i++)
                                    dt.Columns.Add(r.GetName(i).Trim(), typeof(string));

                            object[] values = new object[r.FieldCount];

                            r.GetValues(values);
                            dt.Rows.Add(values.Select(x => x?.ToString().Trim()).ToArray());
                        }
                }
            });

            //If you want...
            dataGridView1.DataSource = null;
            dataGridView1.DataSource = dt;

            msg = "Loading Completed";
        }
        catch (FileFormatException)
        {
            msg = "Unknown Excel file!";
        }
        catch (Exception ex)
        {
            msg = ex.Message;
        }
        finally
        {
            lblStatus.Text = msg;
        }
    }
}

Here's a demo, reading sheets with 8 columns and 5000 rows from both xls and xlsx files:

SOA62682518

Less than a second. Not bad.

However, this will not work correctly if the Sheet has mixed-types columns like your case where the third column has string and int values in different rows. That because the data type of a column is guessed in Excel by examining the first 8 rows by default. Changing this behavior requires changing the registry value of TypeGuessRows in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\x.0\Engines\Excel from 8 to 0 to force checking all the rows instead of just the first 8. This action will dramatically slow down the performance.

Office Interop Objects

Alternatively, you could use the Microsoft.Office.Interop.Excel objects to read the Excel Sheet, get and format the values of the cells regardless of their types.

using Excel = Microsoft.Office.Interop.Excel;
//...

private async void TheCaller()
{
    using (var ofd = new OpenFileDialog
    {
        Title = "Select File",
        Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
        AutoUpgradeEnabled = true,
    })
    {
        if (ofd.ShowDialog() != DialogResult.OK) return;

        var msg = "Loading... Please wait.";
        Excel.Application xlApp = null;
        Excel.Workbook xlWorkBook = null;

        try
        {
            var dt = new DataTable();

            lblStatus.Text = msg;

            await Task.Run(() =>
            {
                xlApp = new Excel.Application();
                xlWorkBook = xlApp.Workbooks.Open(ofd.FileName, Type.Missing, true);

                var xlSheet = xlWorkBook.Sheets[1] as Excel.Worksheet;
                var xlRange = xlSheet.UsedRange;

                dt.Columns.AddRange((xlRange.Rows[xlRange.Row] as Excel.Range)
                .Cells.Cast<Excel.Range>()
                .Where(h => h.Value2 != null)
                .Select(h => new DataColumn(h.Value2.ToString()
                .Trim(), typeof(string))).ToArray());

                foreach (var r in xlRange.Rows.Cast<Excel.Range>().Skip(1))
                    dt.Rows.Add(r.Cells.Cast<Excel.Range>()
                        .Take(dt.Columns.Count)
                        .Select(v => v.Value2 is null
                        ? string.Empty
                        : v.Value2.ToString().Trim()).ToArray());
            });

            (dataGridView1.DataSource as DataTable)?.Dispose();
            dataGridView1.DataSource = null;
            dataGridView1.DataSource = dt;

            msg = "Loading Completed";
        }
        catch (FileFormatException)
        {
            msg = "Unknown Excel file!";
        }
        catch (Exception ex)
        {
            msg = ex.Message;
        }
        finally
        {
            xlWorkBook?.Close(false);
            xlApp?.Quit();

            Marshal.FinalReleaseComObject(xlWorkBook);
            Marshal.FinalReleaseComObject(xlApp);

            xlWorkBook = null;
            xlApp = null;

            GC.Collect();
            GC.WaitForPendingFinalizers();

            lblStatus.Text = msg;
        }
    }
}

^{Note: You need to add reference to the mentioned library.}

SOA62682518B

Not fast especially with a big number of cells but it gets the desired output.

How to avoid space(s) while inserting data in a table in oracle

You can use a trigger:

CREATE OR REPLACE TRIGGER EMPLOYEE_BIU
  BEFORE INSERT OR UPDATE ON EMPLOYEE
  FOR EACH ROW
BEGIN
  :NEW.DEPT := TRIM(:NEW.DEPT);
END EMPLOYEE_BIU;

Trim a part of a String

Since you are using SQL Server 2016, you can use it's built in STRING_SPLIT method to convert your string to a table.
Then all you have to do is to select the value from the table that starts with CUSTOMER NAME:, like this:

DECLARE @s varchar(300) = 'QSTY-IOM-HFVNBJGYG | Mobile #: 9876541323 | CUSTOMER NAME: MNOP UNIPOUYTREA POIUY | INVOICE DATE:02/28/18 | EMP#: 101466 | EMPLOYEE NAME: ANGELINA CASIANO'

SELECT [value]
FROM STRING_SPLIT(@s, '|')
WHERE LTRIM([value]) LIKE 'CUSTOMER NAME:%'

However, you should read Is storing a delimited list in a database column really that bad? and normalize your database if possible.

Trimmining a Column with Bad Data