How to Extract Values from Column and Update Result in Another Column

How to extract values from column and update result in another column

If open to a helper Table-Valued Function:

Example

Declare @YourTable table (IdDate int,FullDate varchar(max))
Insert Into @YourTable values
(0,'Nº1 (26) - Friday 4, January 2014')
,(0,'Nº2 (64) - Monday 10, February 2015')

Update A
set IdDate = substring(Pos1,3,10)
+ try_convert(varchar(10),try_convert(date,Pos6+' '+Pos5+' '+Pos7),112)
From @YourTable A
Cross Apply [dbo].[tvf-Str-Parse-Row](FullDate,' ') B

Returns

IDDate      FullDate
120140104 Nº1 (26) - Friday 4, January 2014
220150210 Nº2 (64) - Monday 10, February 2015

If it Helps with the Visualization, the TVF Returns

Sample Image

The Function if Interested

CREATE FUNCTION [dbo].[tvf-Str-Parse-Row] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(@String,@Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
)

Or Without the Function

Update A
set IdDate = substring(Pos1,3,10)
+ try_convert(varchar(10),try_convert(date,Pos6+' '+Pos5+' '+Pos7),112)
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(FullDate,' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A
) B

EDIT

This is an expanded version of Shawn's cleaner solution

Update @YourTable 
set IdDate = substring(left(FullDate,charindex(' ',FullDate)-1),3,25)
+try_convert(varchar(10),try_convert(date,replace(substring(FullDate, charindex(',', FullDate) - 2, 100), ',', '')),112)

Select * from @YourTable

Extract column value based on another column in Pandas

You could use loc to get series which satisfying your condition and then iloc to get first element:

In [2]: df
Out[2]:
A B
0 p1 1
1 p1 2
2 p3 3
3 p2 4

In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2 p3
Name: A, dtype: object

In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'

extract values into new column for each unique values in another column

if you share the data, I can reproduce and add the result

This hopefully will answer your question

df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()
    ngram   date    rating  attraction   indo               review_id
0 bigram 2018 10 uss sangat lengkap [911, 977, 3531]
1 bigram 2019 9 uss agak bingung [2919]
2 bigram 2019 10 sea_aquarium sangat blengkap [4282]
3 bigram 2019 10 uss agak bingung [1062]
4 bigram 2019 10 uss sangat lengkap [359]
5 bigram 2021 10 uss sangat lengkap [4]

Extract data from column and use it to update another column

If you have mysql version 5.7 then you can do the following using mysql xml functions:

UPDATE t1 SET col1 = ExtractValue(data, '/col1'), col2 = ExtractValue(data, '/col2');

Test data and output:

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
id INT UNSIGNED NOT NULL,
data TEXT,
col1 VARCHAR(255),
col2 VARCHAR(255)
);

INSERT INTO t1 (id, data) VALUES
(1, '<col1>data1</col1><col2>data2</col2>');

SELECT * FROM t1;

UPDATE t1 SET col1 = ExtractValue(data, '/col1'), col2 = ExtractValue(data, '/col2');

SELECT * FROM t1;

Before update:

+----+--------------------------------------+------+------+
| id | data | col1 | col2 |
+----+--------------------------------------+------+------+
| 1 | <col1>data1</col1><col2>data2</col2> | NULL | NULL |
+----+--------------------------------------+------+------+

After update:

+----+--------------------------------------+-------+-------+
| id | data | col1 | col2 |
+----+--------------------------------------+-------+-------+
| 1 | <col1>data1</col1><col2>data2</col2> | data1 | data2 |
+----+--------------------------------------+-------+-------+

Extract pattern from a column based on another column's value

You can use a regex with str.extract in a groupby+apply:

import re
df['match'] = (df.groupby('root')['word']
.apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
)

Or, if you expect few repeated "root" values:

import re
df['match'] = df.apply(lambda r: m.group()
if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
else None, axis=1)

output:

         word   root   match
0 replay play replay
1 replayed play replay
2 playable play play
3 thinker think think
4 think think think
5 thoughtful think NaN

Extract column values from one table and insert with modifications into another

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;

db<>fiddle here

Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.

We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:

  • Format specifier for integer variables in format() for EXECUTE?
  • Table name as a PostgreSQL function parameter

I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.

(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:

  • Check if a Postgres composite field is null/empty

Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.

I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:

  • How does the search_path influence identifier resolution and the "current schema"

I made the function return the number of inserted rows. That's totally optional!

Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:

  • Can I make a plpgsql function return an integer without using a variable?

Extract values for a column from another column based on another column in data frame R

If you don't want to have to hard code all of the column names you can use something like this.

comp.cols <- colnames(df)[grepl("_comp", colnames(df)) == TRUE]
non.comp.cols <- sub("_comp", "", comp.cols)

df[df[,"reg"] == "a", comp.cols] <- df[df[,"reg"] == "a", non.comp.cols]

How to extract values from a column into the dataframe by matching two other columns in R

If you use dplyr, this is pretty straightforward (first join the 2 dataframes, then select the right values from column bb.x and bb.y based on NA values in the bb.x. Finally keep only the required columns.

dfa %>% 
dplyr::left_join(dfb, by = "aa") %>%
dplyr::mutate(bb = ifelse(is.na(bb.y), bb.x, bb.y)) %>%
dplyr::select(aa, bb)

Result

  aa bb
1 1 10
2 2 0
3 3 8
4 4 0
5 5 6


Related Topics



Leave a reply



Submit