How to Extract Values from Column and Update Result in Another Column

How to extract values from column and update result in another column

If open to a helper Table-Valued Function:

Example

Declare @YourTable table (IdDate int,FullDate varchar(max))
Insert Into @YourTable values
 (0,'Nº1 (26) - Friday 4, January 2014')
,(0,'Nº2 (64) - Monday 10, February 2015')

Update A
   set IdDate = substring(Pos1,3,10)
              + try_convert(varchar(10),try_convert(date,Pos6+' '+Pos5+' '+Pos7),112)
 From  @YourTable A
 Cross Apply [dbo].[tvf-Str-Parse-Row](FullDate,' ') B

Returns

IDDate      FullDate
120140104   Nº1 (26) - Friday 4, January 2014
220150210   Nº2 (64) - Monday 10, February 2015

If it Helps with the Visualization, the TVF Returns

Sample Image

The Function if Interested

CREATE FUNCTION [dbo].[tvf-Str-Parse-Row] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (
    Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
          ,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
          ,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
          ,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
          ,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
          ,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
          ,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
          ,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
          ,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
    From  (Select Cast('<x>' + replace((Select replace(@String,@Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A 
)

Or Without the Function

Update A
   set IdDate = substring(Pos1,3,10)
              + try_convert(varchar(10),try_convert(date,Pos6+' '+Pos5+' '+Pos7),112)
 From  @YourTable A
 Cross Apply (
                Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
                      ,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
                      ,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
                      ,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
                      ,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
                      ,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
                      ,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
                From  (Select Cast('<x>' + replace((Select replace(FullDate,' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as A 
             ) B

EDIT

This is an expanded version of Shawn's cleaner solution

Update @YourTable 
   set IdDate = substring(left(FullDate,charindex(' ',FullDate)-1),3,25)
               +try_convert(varchar(10),try_convert(date,replace(substring(FullDate, charindex(',', FullDate) - 2, 100), ',', '')),112)

Select * from @YourTable

Extract column value based on another column in Pandas

You could use loc to get series which satisfying your condition and then iloc to get first element:

In [2]: df
Out[2]:
    A  B
0  p1  1
1  p1  2
2  p3  3
3  p2  4

In [3]: df.loc[df['B'] == 3, 'A']
Out[3]:
2    p3
Name: A, dtype: object

In [4]: df.loc[df['B'] == 3, 'A'].iloc[0]
Out[4]: 'p3'

extract values into new column for each unique values in another column

if you share the data, I can reproduce and add the result

This hopefully will answer your question

df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()

    ngram   date    rating  attraction   indo               review_id
0   bigram  2018    10      uss          sangat lengkap     [911, 977, 3531]
1   bigram  2019    9       uss          agak bingung       [2919]
2   bigram  2019    10      sea_aquarium sangat blengkap    [4282]
3   bigram  2019    10      uss          agak bingung       [1062]
4   bigram  2019    10      uss          sangat lengkap     [359]
5   bigram  2021    10      uss          sangat lengkap     [4]

Extract data from column and use it to update another column

If you have mysql version 5.7 then you can do the following using mysql xml functions:

UPDATE t1 SET col1 = ExtractValue(data, '/col1'), col2 = ExtractValue(data, '/col2');

Test data and output:

DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
    id INT UNSIGNED NOT NULL,
    data TEXT,
    col1 VARCHAR(255),
    col2 VARCHAR(255)
);

INSERT INTO t1 (id, data) VALUES
(1, '<col1>data1</col1><col2>data2</col2>');

SELECT * FROM t1;

UPDATE t1 SET col1 = ExtractValue(data, '/col1'), col2 = ExtractValue(data, '/col2');

SELECT * FROM t1;

Before update:

+----+--------------------------------------+------+------+
| id | data                                 | col1 | col2 |
+----+--------------------------------------+------+------+
|  1 | <col1>data1</col1><col2>data2</col2> | NULL | NULL |
+----+--------------------------------------+------+------+

After update:

+----+--------------------------------------+-------+-------+
| id | data                                 | col1  | col2  |
+----+--------------------------------------+-------+-------+
|  1 | <col1>data1</col1><col2>data2</col2> | data1 | data2 |
+----+--------------------------------------+-------+-------+

Extract pattern from a column based on another column's value

You can use a regex with str.extract in a groupby+apply:

import re
df['match'] = (df.groupby('root')['word']
                 .apply(lambda g: g.str.extract(f'^(.*{re.escape(g.name)})'))
               )

Or, if you expect few repeated "root" values:

import re
df['match'] = df.apply(lambda r: m.group()
                       if (m:=re.match(f'.*{re.escape(r["root"])}', r['word']))
                       else None, axis=1)

output:

         word   root   match
0      replay   play  replay
1    replayed   play  replay
2    playable   play    play
3     thinker  think   think
4       think  think   think
5  thoughtful  think     NaN

Extract column values from one table and insert with modifications into another

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text := format(
       'INSERT INTO pg_temp.%3$I (label, source, target)
        SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
        WHERE (%1$I, %2$I) IS NOT NULL'
      , _s, _v, _tbl, _tbl_src);
BEGIN
   -- RAISE NOTICE '%', _sql;  -- debug
   EXECUTE _sql USING _relation;
   GET DIAGNOSTICS row_count = ROW_COUNT;  -- return number of inserted rows
END
$func$;

db<>fiddle here

Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.

We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:

Format specifier for integer variables in format() for EXECUTE?
Table name as a PostgreSQL function parameter

I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.

(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:

Check if a Postgres composite field is null/empty

Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.

I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:

How does the search_path influence identifier resolution and the "current schema"

I made the function return the number of inserted rows. That's totally optional!

Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:

Can I make a plpgsql function return an integer without using a variable?

Extract values for a column from another column based on another column in data frame R

If you don't want to have to hard code all of the column names you can use something like this.

comp.cols <- colnames(df)[grepl("_comp", colnames(df)) == TRUE]
non.comp.cols <- sub("_comp", "", comp.cols)

df[df[,"reg"] == "a", comp.cols] <- df[df[,"reg"] == "a", non.comp.cols]

How to extract values from a column into the dataframe by matching two other columns in R

If you use dplyr, this is pretty straightforward (first join the 2 dataframes, then select the right values from column bb.x and bb.y based on NA values in the bb.x. Finally keep only the required columns.

dfa %>% 
  dplyr::left_join(dfb, by = "aa") %>% 
  dplyr::mutate(bb = ifelse(is.na(bb.y), bb.x, bb.y)) %>% 
  dplyr::select(aa, bb)

Result

How to Extract Values from Column and Update Result in Another Column