How to extract values from column and update result in another column
If open to a helper Table-Valued Function:
Example
Declare @YourTable table (IdDate int,FullDate varchar(max))
Insert Into @YourTable values
(0,'Nº1 (26) - Friday 4, January 2014')
,(0,'Nº2 (64) - Monday 10, February 2015')
Update A
set IdDate = substring(Pos1,3,10)
+ try_convert(varchar(10),try_convert(date,Pos6+' '+Pos5+' '+Pos7),112)
From @YourTable A
Cross Apply [dbo].[tvf-Str-Parse-Row](FullDate,' ') B
Returns
IDDate FullDate
120140104 Nº1 (26) - Friday 4, January 2014
220150210 Nº2 (64) - Monday 10, February 2015
If it Helps with the Visualization, the TVF Returns
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse-Row] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(@String,@Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
)
Or Without the Function
Update A
set IdDate = substring(Pos1,3,10)
+ try_convert(varchar(10),try_convert(date,Pos6+' '+Pos5+' '+Pos7),112)
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(FullDate,' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
) B
EDIT
This is an expanded version of Shawn's cleaner solution
Update @YourTable
set IdDate = substring(left(FullDate,charindex(' ',FullDate)-1),3,25)
+try_convert(varchar(10),try_convert(date,replace(substring(FullDate, charindex(',', FullDate) - 2, 100), ',', '')),112)
Select * from @YourTable
Extract column value based on another column in Pandas
You could use loc
to get series which satisfying your condition and then iloc
to get first element:
In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4
In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object
In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'
extract values into new column for each unique values in another column
if you share the data, I can reproduce and add the result
This hopefully will answer your question
df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()
ngram date rating attraction indo review_id
0 bigram 2018 10 uss sangat lengkap [911, 977, 3531]
1 bigram 2019 9 uss agak bingung [2919]
2 bigram 2019 10 sea_aquarium sangat blengkap [4282]
3 bigram 2019 10 uss agak bingung [1062]
4 bigram 2019 10 uss sangat lengkap [359]
5 bigram 2021 10 uss sangat lengkap [4]
Extract data from column and use it to update another column
If you have mysql version 5.7 then you can do the following using mysql xml functions:
UPDATE t1 SET col1 = ExtractValue(data, '/col1'), col2 = ExtractValue(data, '/col2');
Test data and output:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
id INT UNSIGNED NOT NULL,
data TEXT,
col1 VARCHAR(255),
col2 VARCHAR(255)
);
INSERT INTO t1 (id, data) VALUES
(1, '<col1>data1</col1><col2>data2</col2>');
SELECT * FROM t1;
UPDATE t1 SET col1 = ExtractValue(data, '/col1'), col2 = ExtractValue(data, '/col2');
SELECT * FROM t1;
Before update:
+----+--------------------------------------+------+------+
| id | data | col1 | col2 |
+----+--------------------------------------+------+------+
| 1 | <col1>data1</col1><col2>data2</col2> | NULL | NULL |
+----+--------------------------------------+------+------+
After update:
+----+--------------------------------------+-------+-------+
| id | data | col1 | col2 |
+----+--------------------------------------+-------+-------+
| 1 | <col1>data1</col1><col2>data2</col2> | data1 | data2 |
+----+--------------------------------------+-------+-------+
Extract pattern from a column based on another column's value
You can use a regex with str.extract
in a groupby
+apply
:
import re
df['match'] = (df.groupby('root')['word']
.apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
)
Or, if you expect few repeated "root" values:
import re
df['match'] = df.apply(lambda r: m.group()
if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
else None, axis=1)
output:
word root match
0 replay play replay
1 replayed play replay
2 playable play play
3 thinker think think
4 think think think
5 thoughtful think NaN
Extract column values from one table and insert with modifications into another
-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;
db<>fiddle here
Most importantly, use format()
to concatenate your dynamic SQL commands safely. And use the format specifier %I
for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number
. That's where your original failed.
We could concatenate _relation
as string to be inserted into label
, too. But the preferable way to pass values to EXECUTE
is with the USING
clause. $1
inside the SQL string passed to EXECUTE
is a placeholder for the first USING
argument. Not to be confused with $1
referencing function parameters in the context of the function body outside EXECUTE
! (You can pass any string, leading colon (:
) does not matter, the string is not interpreted when done right.)
See:
- Format specifier for integer variables in format() for EXECUTE?
- Table name as a PostgreSQL function parameter
I replaced the DELETE
in your original with a WHERE
clause to the SELECT
of the INSERT
. Don't insert rows in the first place, instead of deleting them again later.
(%1$I, %2$I) IS NOT NULL
only qualifies when both values are NOT NULL
.
About that:
- Check if a Postgres composite field is null/empty
Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.
I schema-qualify known temporary tables with pg_temp.
That's typically optional as the temporary schema comes first in the search_path
by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path
. So better safe than sorry. See:
- How does the search_path influence identifier resolution and the "current schema"
I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT
parameter, I am allowed to skip the RETURNS
clause. See:
- Can I make a plpgsql function return an integer without using a variable?
Extract values for a column from another column based on another column in data frame R
If you don't want to have to hard code all of the column names you can use something like this.
comp.cols <- colnames(df)[grepl("_comp", colnames(df)) == TRUE]
non.comp.cols <- sub("_comp", "", comp.cols)
df[df[,"reg"] == "a", comp.cols] <- df[df[,"reg"] == "a", non.comp.cols]
How to extract values from a column into the dataframe by matching two other columns in R
If you use dplyr, this is pretty straightforward (first join the 2 dataframes, then select the right values from column bb.x and bb.y based on NA values in the bb.x. Finally keep only the required columns.
dfa %>%
dplyr::left_join(dfb, by = "aa") %>%
dplyr::mutate(bb = ifelse(is.na(bb.y), bb.x, bb.y)) %>%
dplyr::select(aa, bb)
Result
aa bb
1 1 10
2 2 0
3 3 8
4 4 0
5 5 6
Related Topics
How - Create and Use Database Directly After Creation in SQL Server
Determine the Size of a SQL Result Set in Kb
How to Get Only One Row Per Record in Master Table
How to Determine the Auto-Generated Primary Key Used as a Foreign Key for Another Table
How to Change the Formatting for My Return Values in This Function
Why Postgres Is Not Using the Index in My Query
Setting Variables in SQL Functions/Probs
Convert an Int to a Date Field
Find Out the Nth-Highest Salary from Table
Missing Keyword Error in Oracle Case When SQL Statement
What Is Wrong with This SQL Server Query Division Calculation
Recursive Cte in Presence of Circular References
Valid Date Verification in SQL
Oracle Combine Several Columns into One
Delete Primary Key Row Which Is Foreign Key of Another Table