Summarize the List into a Comma-Separated String

Summarize the list into a comma-separated string

Use:

declare @t table(Number int, Grade varchar)

insert @t values(1, 'a'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'),
(3, 'b'), (3, 'a')

select t1.Number
    , stuff((
        select ',' + Grade
        from @t t2
        where t2.Number = t1.Number
        for xml path(''), type
    ).value('.', 'varchar(max)'), 1, 1, '') [values]
from @t t1
group by t1.Number

Capturing comma inside text of comma separated values

I would suggest you follow the suggestions given to you in the comments and not use regex.

However, if you did need to do this using regex, the following should do the trick:

(.*?)=("?)([^"]+?)\2(?:,|$)

(.*?)= Captures the key to the left of the = sign. It only captures one key because the ? makes it match as few characters as possible.
("?) Captures whether or not the value is in quotes.
([^"]+?)\2(?:,|$)
- ([^"]+?) Captures more than 1 character that is not ", but as few as possible.
- \2(?:,|$) This stops either if there was a quote and it finds one again, or at the next comma or if the string has finished.

Test online

Sum values in a comma-separated string

This looks a bit ugly but should work. Assuming column QTY is a character -

your_df$QTY_new <- sapply(strsplit(your_df$QTY, ", "), function(x) sum(as.numeric(x)))

Dataframe: Cell Level: Convert Comma Separated String to List

Use pandas.Series.str.split to split the string into a list.

# use str split on the column
df.mgrs_grids = df.mgrs_grids.str.split(',')

# display(df)
   driver_code journey_code                                                                                                                                       mgrs_grids
0      7211863  7211863-140                            [18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                            [18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

print(type(df.loc[0, 'mgrs_grids']))
[out]:
list

separate row per value

After creating a column of lists.
Use pandas.DataFrame.explode to create separate rows for each value in the list.

# get a separate row for each value
df = df.explode('mgrs_grids').reset_index(drop=True)

# display(df.hea())
   driver_code journey_code   mgrs_grids
0      7211863  7211863-140  18TWL927129
1      7211863  7211863-140  18TWL888113
2      7211863  7211863-140  18TWL888113
3      7211863  7211863-140  18TWL887113
4      7211863  7211863-140  18TWL888113

Update

Here is another option, which combines the 'journey_code' to the front of 'mgrs_grids', and then splits the string into a list.
- This list is assigned back to 'mgrs_grids', but can also be assigned to a new column.

# add the journey code to mgrs_grids and then split
df.mgrs_grids = (df.journey_code + ',' + df.mgrs_grids).str.split(',')

# display(df.head())
   driver_code journey_code                                                                                                                                                    mgrs_grids
0      7211863  7211863-140                            [7211863-140, 18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [7211863-105, 18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                             [7211863-50, 18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [7211863-109, 18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

# output to nested list
df.mgrs_grids.tolist()

[out]:
[['7211863-140', '18TWL927129', '18TWL888113', '18TWL888113', '18TWL887113', '18TWL888113', '18TWL887113', '18TWL887113', '18TWL887113', '18TWL903128'],
 ['7211863-105', '18TWL927129', '18TWL939112', '18TWL939112', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL960111', '18TWL960112'],
 ['7211863-50', '18TWL927129', '18TWL889085', '18TWL889085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL890085'],
 ['7211863-109', '18TWL927129', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952105', '18TWL951103']]

Collapse / concatenate / aggregate a column to a single comma separated string within each group

Here are some options using toString, a function that concatenates a vector of strings using comma and space to separate components. If you don't want commas, you can use paste() with the collapse argument instead.

data.table

# alternative using data.table
library(data.table)
as.data.table(data)[, toString(C), by = list(A, B)]

aggregate This uses no packages:

# alternative using aggregate from the stats package in the core of R
aggregate(C ~., data, toString)

sqldf

And here is an alternative using the SQL function group_concat using the sqldf package :

library(sqldf)
sqldf("select A, B, group_concat(C) C from data group by A, B", method = "raw")

dplyr A dplyr alternative:

library(dplyr)
data %>%
  group_by(A, B) %>%
  summarise(test = toString(C)) %>%
  ungroup()

plyr

# plyr
library(plyr)
ddply(data, .(A,B), summarize, C = toString(C))

Sum values using lookup for comma-separated list in one of the columns in Excel

This formula iterates the parts and uses SUMIFS to return the number to SUMPRODUCT:

=SUMPRODUCT(SUMIFS(B:B,A:A,TRIM(MID(SUBSTITUTE(E2,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(E2)-LEN(SUBSTITUTE(E2,",",""))+1))-1)*999+1,999))))

No vba or named range workarounds needed.

Sample Image

comma separated field and comma separated search data

You are currently comparing two arrays using ANY when that is incorrect.

If you want to see if the two arrays are identical in their contents, you can simply remove ANY:

string_to_array(data,',') = (string_to_array('25,2,3,15', ',')::character varying[])

If you want to see if there is overlap between the two arrays, you use &&:

string_to_array(data,',') && (string_to_array('25,2,3,15', ',')::character varying[])

Additional operators are available here: https://www.postgresql.org/docs/current/static/functions-array.html

Comma separated results in SQL

Update (As suggested by @Aaron in the comment)

STRING_AGG is the preferred way of doing this in the modern versions of SQL Server (2017 or later). It also supports easy ordering.

SELECT
    STUDENTNUMBER
    , STRING_AGG(INSTITUTIONNAME, ', ') AS StringAggList
    , STRING_AGG(INSTITUTIONNAME, ', ') WITHIN GROUP (ORDER BY INSTITUTIONNAME DESC) AS StringAggListDesc
FROM Education E
GROUP BY E.STUDENTNUMBER;

Original Answer:

Use FOR XML PATH('') - which is converting the entries to a comma separated string and STUFF() -which is to trim the first comma- as follows Which gives you the same comma separated result

SELECT
    STUFF((SELECT ',' + INSTITUTIONNAME
              FROM EDUCATION EE
              WHERE  EE.STUDENTNUMBER = E.STUDENTNUMBER
              ORDER BY sortOrder
              FOR XML PATH(''), TYPE).value('text()[1]', 'nvarchar(max)')
        , 1, LEN(','), '') AS XmlPathList
FROM EDUCATION E
GROUP BY E.STUDENTNUMBER

Here is the FIDDLE showing results for both STRING_AGG and FOR XML PATH('').

Reverse summary to expand comma separated strings in dataframe

data.frame(
  group = c("cat", "dog", "horse"),
  value = c("1", "2", "3"),
  list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
  stringsAsFactors = FALSE
) -> xdf

tidyverse:

tidyr::separate_rows(xdf, list, sep=",")
##   group value       list
## 1   cat     1    siamese
## 2   cat     1    burmese
## 3   cat     1   balinese
## 4   dog     2      corgi
## 5   dog     2    sheltie
## 6   dog     2     collie
## 7 horse     3    arabian
## 8 horse     3   friesian
## 9 horse     3 andalusian

Base R:

do.call(
  rbind.data.frame,
  lapply(1:nrow(xdf), function(idx) {

    data.frame(
      group = xdf[idx, "group"],
      value = xdf[idx, "value"],
      list = strsplit(xdf[idx, "list"], ",")[[1]],
      stringsAsFactors = FALSE
    )

  })
)
##   group value       list
## 1   cat     1    siamese
## 2   cat     1    burmese
## 3   cat     1   balinese
## 4   dog     2      corgi
## 5   dog     2    sheltie
## 6   dog     2     collie
## 7 horse     3    arabian
## 8 horse     3   friesian
## 9 horse     3 andalusian

The shootout:

microbenchmark::microbenchmark(

  unnest = transform(xdf, list = strsplit(list, ",")) %>%
    tidyr::unnest(list),

  separate_rows = tidyr::separate_rows(xdf, list, sep=","),

  base = do.call(
    rbind.data.frame,
    lapply(1:nrow(xdf), function(idx) {

      data.frame(
        group = xdf[idx, "group"],
        value = xdf[idx, "value"],
        list = strsplit(xdf[idx, "list"], ",")[[1]],
        stringsAsFactors = FALSE
      )

    })
  )
)
## Unit: microseconds
##           expr      min        lq     mean   median        uq       max neval
##         unnest 3689.890 4280.7045 6326.231 4881.160  6428.508 16670.715   100
##  separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528   100
##           base  872.343  975.1615 1589.915 1099.391  1660.324  6663.132   100

I'm constantly surprised at the horrible performance of tidyr ops.

Summarize the List into a Comma-Separated String