Trimmining a column with bad data
Use CHARINDEX(), LEN() and RIGHT() instead.
RIGHT(LPNumber, LEN(LPNumber) - CHARINDEX('#', LPNumber, 0))
Is there a way to TRIM all data in a SELECT * FROM statement?
You need to specify each string column by hand:
SELECT TRIM(col1), --LTRIM(RTRIM(...)) If RDBMS is SQL Server
TRIM(col2),
TRIM(col3),
TRIM(col4)
-- ...
FROM table
There is another problem with your proposal. *
is placeholder for each column in table so there will be problem with trimming date/decimal/spatial data ...
.
Addendum
Using Oracle 18c Polymorphic Table Functions
(provided code is just PoC, there is a space for a lot of improvements):
CREATE TABLE tab(id INT, d DATE,
v1 VARCHAR2(100), v2 VARCHAR2(100), v3 VARCHAR2(100) );
INSERT INTO tab(id, d,v1, v2, v3)
VALUES (1, SYSDATE, ' aaaa ', ' b ', ' c');
INSERT INTO tab(id, d,v1, v2, v3)
VALUES (2, SYSDATE+1, ' afasd', ' ', ' d');
COMMIT;
SELECT * FROM tab;
-- Output
.----.-----------.-----------.-----------.-----.
| ID | D | V1 | V2 | V3 |
:----+-----------+-----------+-----------+-----:
| 1 | 02-MAR-18 | aaaa | b | c |
:----+-----------+-----------+-----------+-----:
| 2 | 03-MAR-18 | afasd | | d |
'----'-----------'-----------'-----------'-----'
And table function:
CREATE OR REPLACE PACKAGE ptf AS
FUNCTION describe(tab IN OUT dbms_tf.table_t)RETURN dbms_tf.describe_t;
PROCEDURE FETCH_ROWS;
END ptf;
/
CREATE OR REPLACE PACKAGE BODY ptf AS
FUNCTION describe(tab IN OUT dbms_tf.table_t) RETURN dbms_tf.describe_t AS
new_cols DBMS_TF.COLUMNS_NEW_T;
BEGIN
FOR i IN 1 .. tab.column.count LOOP
IF tab.column(i).description.type IN ( dbms_tf.type_varchar2) THEN
tab.column(i).pass_through:=FALSE;
tab.column(i).for_read:= TRUE;
NEW_COLS(i) :=
DBMS_TF.COLUMN_METADATA_T(name=> tab.column(i).description.name,
type => tab.column(i).description.type);
END IF;
END LOOP;
RETURN DBMS_TF.describe_t(new_columns=>new_cols, row_replication=>true);
END;
PROCEDURE FETCH_ROWS AS
inp_rs DBMS_TF.row_set_t;
out_rs DBMS_TF.row_set_t;
rows PLS_INTEGER;
BEGIN
DBMS_TF.get_row_set(inp_rs, rows);
FOR c IN 1 .. inp_rs.count() LOOP
FOR r IN 1 .. rows LOOP
out_rs(c).tab_varchar2(r) := TRIM(inp_rs(c).tab_varchar2(r));
END LOOP;
END LOOP;
DBMS_TF.put_row_set(out_rs, replication_factor => 1);
END;
END ptf;
And final call:
CREATE OR REPLACE FUNCTION trim_col(tab TABLE)
RETURN TABLE pipelined row polymorphic USING ptf;
SELECT *
FROM trim_col(tab); -- passing table as table function argument
.----.-----------.-------.-----.----.
| ID | D | V1 | V2 | V3 |
:----+-----------+-------+-----+----:
| 1 | 02-MAR-18 | aaaa | b | c |
:----+-----------+-------+-----+----:
| 2 | 03-MAR-18 | afasd | - | d |
'----'-----------'-------'-----'----'
db<>fiddle demo
Trim all database fields
No cursors. Copy and paste the output. Works also for SQL 2000, which doesn't have varchar(max). This can be easily extended to add a GO line to the end of each UPDATE if desired.
SELECT SQL
FROM ( SELECT t.TABLE_CATALOG
, t.TABLE_SCHEMA
, t.TABLE_NAME
, 0 SORT
, 'UPDATE ' + QUOTENAME(t.TABLE_CATALOG) + '.' + QUOTENAME(t.TABLE_SCHEMA) + '.' + QUOTENAME(t.TABLE_NAME) SQL
FROM INFORMATION_SCHEMA.TABLES t
JOIN INFORMATION_SCHEMA.COLUMNS c
ON t.TABLE_CATALOG = c.TABLE_CATALOG
AND t.TABLE_SCHEMA = c.TABLE_SCHEMA
AND t.TABLE_NAME = c.TABLE_NAME
WHERE t.TABLE_TYPE = 'BASE TABLE'
AND c.DATA_TYPE IN ('char','nchar','varchar','nvarchar')
GROUP BY t.TABLE_CATALOG
, t.TABLE_SCHEMA
, t.TABLE_NAME
UNION ALL
SELECT x.TABLE_CATALOG
, x.TABLE_SCHEMA
, x.TABLE_NAME
, CASE WHEN x.COLUMN_NAME_MIN = y.COLUMN_NAME
THEN 1
ELSE 2
END SORT
, CASE WHEN x.COLUMN_NAME_MIN = y.COLUMN_NAME
THEN 'SET '
ELSE ' , '
END + y.SQL SQL
FROM ( SELECT t.TABLE_CATALOG
, t.TABLE_SCHEMA
, t.TABLE_NAME
, MIN(c.COLUMN_NAME) COLUMN_NAME_MIN
FROM INFORMATION_SCHEMA.TABLES t
JOIN INFORMATION_SCHEMA.COLUMNS c
ON t.TABLE_CATALOG = c.TABLE_CATALOG
AND t.TABLE_SCHEMA = c.TABLE_SCHEMA
AND t.TABLE_NAME = c.TABLE_NAME
WHERE t.TABLE_TYPE = 'BASE TABLE'
AND c.DATA_TYPE IN ('char','nchar','varchar','nvarchar')
GROUP BY t.TABLE_CATALOG
, t.TABLE_SCHEMA
, t.TABLE_NAME
) x
JOIN ( SELECT t.TABLE_CATALOG
, t.TABLE_SCHEMA
, t.TABLE_NAME
, c.COLUMN_NAME
, QUOTENAME(c.COLUMN_NAME) + ' = LTRIM(RTRIM(' + QUOTENAME(c.COLUMN_NAME) + '))' SQL
FROM INFORMATION_SCHEMA.TABLES t
JOIN INFORMATION_SCHEMA.COLUMNS c
ON t.TABLE_CATALOG = c.TABLE_CATALOG
AND t.TABLE_SCHEMA = c.TABLE_SCHEMA
AND t.TABLE_NAME = c.TABLE_NAME
WHERE t.TABLE_TYPE = 'BASE TABLE'
AND c.DATA_TYPE IN ('char','nchar','varchar','nvarchar')
) y
ON x.TABLE_CATALOG = y.TABLE_CATALOG
AND x.TABLE_SCHEMA = y.TABLE_SCHEMA
AND x.TABLE_NAME = y.TABLE_NAME
) x
ORDER BY x.TABLE_CATALOG
, x.TABLE_SCHEMA
, x.TABLE_NAME
, x.SORT
, x.SQL
when we need trim when we don't need in oracle
Looking at the Oracle Documentation on literals:
Text literals have properties of both the
CHAR
andVARCHAR2
datatypes:
- Within expressions and conditions, Oracle treats text literals as though they have the datatype
CHAR
by comparing them using blank-padded comparison semantics.
and the documentation of blank-padded comparison semantics states:
With blank-padded semantics, if the two values have different lengths, then Oracle first adds blanks to the end of the shorter one so their lengths are equal. Oracle then compares the values character by character up to the first character that differs. The value with the greater character in the first differing position is considered greater. If two values have no differing characters, then they are considered equal. This rule means that two values are equal if they differ only in the number of trailing blanks. Oracle uses blank-padded comparison semantics only when both values in the comparison are either expressions of datatype
CHAR
,NCHAR
, text literals, or values returned by theUSER
function.
So, for your examples:
SELECT * FROM AAA WHERE T1 = 'A'
Since the left-hand side of the comparison, T1
which equals 'A '
, is a CHAR(3)
and the right-hand side is a text literal 'A'
then blank-padded comparison semantics are used and 'A ' = 'A'
is true.
Then:
SELECT * FROM AAA WHERE T2 = 'A '
The left-hand side of the comparison, T2
which equals 'A '
, is a VARCHAR2(3)
and the right-hand side is a text literal 'A '
then blank-padded comparison semantics are not used (both sides are not CHAR
or text literals) but the values are equal ('A ' = 'A '
) so the condition is true.
And:
SELECT * FROM AAA WHERE T2 = 'A '
The LHS is a VARCHAR2
equal to 'A '
and the RHS is a text literal equal to 'A '
; blank-padded comparison semantics will not be used and the values are not equal so the condition is not true and the row will not be returned.
Then:
SELECT * FROM AAA WHERE TRIM(T2) = TRIM('A ')
The TRIM()
function states that:
The string returned is of
VARCHAR2
datatype if trim_source is a character datatype and aLOB
if trim_source is aLOB
datatype.
Both values are trimmed so you have the condition 'A' = 'A'
where both sides are a VARCAHR2
data type so blank-padded comparison semantics will not be used but the values are equal so the condition is true and the row will be returned.
Finally:
SELECT CASE WHEN 'A' = 'A ' THEN '1' ELSE '0' END T1,
CASE WHEN 'A ' = 'A' THEN '1' ELSE '0' END T2,
CASE WHEN 'A ' = 'A ' THEN '1' ELSE '0' END T2
FROM DUAL;
In all cases both sides of the equality are text literals so blank-padded comparison semantics will be used so for
'A' = 'A '
the LHS will be padded so the strings are the same length;'A ' = 'A'
the RHS will be padded;'A ' = 'A '
both sides are equal length so no padding is needed.
And in each case the equalities are equal.
If you use CAST
to change from text literals to a VARCHAR2
data type:
SELECT CASE WHEN 'A' = CAST( 'A ' AS VARCHAR2(5) ) THEN 1 ELSE 0 END AS T1
FROM DUAL
The LHS is a text literal but the RHS is a VARCHAR2
and blank-padded comparison semantics are not used and the equality is not true.
Trimming all cells in DataTable
OleDb Objects
Actually what I meant is, to get formatted/trimmed string values from the Excel Sheet and create a DataTable with DataColumn objects of string type only, use the forward-only OleDbDataReader to create both, DataColumn and DataRow objects as it reads. Doing so, the data will be modified and filled in one step hence no need to call another routine to loop again and waste some more time. Also, consider using asynchronous calls to speed up the process and avoid freezing the UI while executing the lengthy task.
Something might help you to go:
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var conString = string.Empty;
var msg = "Loading... Please wait.";
try
{
switch (ofd.FilterIndex)
{
case 1: //xlsx
conString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={ofd.FileName};Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";
break;
case 2: //xls
conString = $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={ofd.FileName};Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
break;
default:
throw new FileFormatException();
}
var sheetName = "sheet1";
var dt = new DataTable();
//Optional: a label to show the current status
//or maybe show a ProgressBar with ProgressBarStyle = Marquee
lblStatus.Text = msg;
await Task.Run(() =>
{
using (var con = new OleDbConnection(conString))
using (var cmd = new OleDbCommand($"SELECT * From [{sheetName}$]", con))
{
con.Open();
using (var r = cmd.ExecuteReader())
while (r.Read())
{
if (dt.Columns.Count == 0)
for (var i = 0; i < r.FieldCount; i++)
dt.Columns.Add(r.GetName(i).Trim(), typeof(string));
object[] values = new object[r.FieldCount];
r.GetValues(values);
dt.Rows.Add(values.Select(x => x?.ToString().Trim()).ToArray());
}
}
});
//If you want...
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
lblStatus.Text = msg;
}
}
}
Here's a demo, reading sheets with 8 columns and 5000 rows from both xls
and xlsx
files:
Less than a second. Not bad.
However, this will not work correctly if the Sheet has mixed-types columns like your case where the third column has string
and int
values in different rows. That because the data type of a column is guessed in Excel by examining the first 8 rows by default. Changing this behavior requires changing the registry value of TypeGuessRows
in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\x.0\Engines\Excel
from 8 to 0 to force checking all the rows instead of just the first 8. This action will dramatically slow down the performance.
Office Interop Objects
Alternatively, you could use the Microsoft.Office.Interop.Excel
objects to read the Excel Sheet, get and format the values of the cells regardless of their types.
using Excel = Microsoft.Office.Interop.Excel;
//...
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var msg = "Loading... Please wait.";
Excel.Application xlApp = null;
Excel.Workbook xlWorkBook = null;
try
{
var dt = new DataTable();
lblStatus.Text = msg;
await Task.Run(() =>
{
xlApp = new Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(ofd.FileName, Type.Missing, true);
var xlSheet = xlWorkBook.Sheets[1] as Excel.Worksheet;
var xlRange = xlSheet.UsedRange;
dt.Columns.AddRange((xlRange.Rows[xlRange.Row] as Excel.Range)
.Cells.Cast<Excel.Range>()
.Where(h => h.Value2 != null)
.Select(h => new DataColumn(h.Value2.ToString()
.Trim(), typeof(string))).ToArray());
foreach (var r in xlRange.Rows.Cast<Excel.Range>().Skip(1))
dt.Rows.Add(r.Cells.Cast<Excel.Range>()
.Take(dt.Columns.Count)
.Select(v => v.Value2 is null
? string.Empty
: v.Value2.ToString().Trim()).ToArray());
});
(dataGridView1.DataSource as DataTable)?.Dispose();
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
xlWorkBook?.Close(false);
xlApp?.Quit();
Marshal.FinalReleaseComObject(xlWorkBook);
Marshal.FinalReleaseComObject(xlApp);
xlWorkBook = null;
xlApp = null;
GC.Collect();
GC.WaitForPendingFinalizers();
lblStatus.Text = msg;
}
}
}
Note: You need to add reference to the mentioned library.
Not fast especially with a big number of cells but it gets the desired output.
How to avoid space(s) while inserting data in a table in oracle
You can use a trigger:
CREATE OR REPLACE TRIGGER EMPLOYEE_BIU
BEFORE INSERT OR UPDATE ON EMPLOYEE
FOR EACH ROW
BEGIN
:NEW.DEPT := TRIM(:NEW.DEPT);
END EMPLOYEE_BIU;
Trim a part of a String
Since you are using SQL Server 2016, you can use it's built in STRING_SPLIT
method to convert your string to a table.
Then all you have to do is to select the value from the table that starts with CUSTOMER NAME:
, like this:
DECLARE @s varchar(300) = 'QSTY-IOM-HFVNBJGYG | Mobile #: 9876541323 | CUSTOMER NAME: MNOP UNIPOUYTREA POIUY | INVOICE DATE:02/28/18 | EMP#: 101466 | EMPLOYEE NAME: ANGELINA CASIANO'
SELECT [value]
FROM STRING_SPLIT(@s, '|')
WHERE LTRIM([value]) LIKE 'CUSTOMER NAME:%'
However, you should read Is storing a delimited list in a database column really that bad? and normalize your database if possible.
Related Topics
Interval (Days) in Postgresql with Two Parameters
How to Extract Multiple Strings from Single Rows in SQL Server
Dbms_Metadata.Get_Ddl Not Working
Check Constraint of String to Contain Only Digits. (Oracle SQL)
Problem with Execute Procedure in Pl/SQL Developer
Cannot Validate, with Novalidate Option
SQL Query: How to Create Subtotal Rows When There Is No Aggregate Function
How to Create Text File Using SQL Script with Text "|"
How to Make This Query in SQL Server Compact Edition
Performance of Querying for a String That Starts and Ends with Something
Select Query Does Not Work When Converted to Vba - Invalid SQL Statement
How to Pass a Variable That Contains a List to a Dynamic SQL Query
Get All Punch in and Out for Each Employee
SQL Where in (...) Sort by Order of the List
Ddmmyyyy to SQL Datetime in SQL