Group by Repeating Attribute

Group by repeating attribute

Proper SQL

I want to get 3 message groups in the below order: [1,2], [3,4], [5]

To get the requested order, add ORDER BY min(id):

SELECT grp, user_id, array_agg(id) AS ids
FROM  (
   SELECT id
        , user_id
        , row_number() OVER (ORDER BY id) -
          row_number() OVER (PARTITION BY user_id ORDER BY id) AS grp
   FROM   tbl
   ORDER  BY 1   -- for ordered arrays in result
   ) t
GROUP  BY grp, user_id
ORDER  BY min(id);

db<>fiddle here

_{Old sqliddle}

The addition would barely warrant another answer. The more important issue is this:

Faster with PL/pgSQL

I'm using PostgreSQL and would be happy to use something specific to it, whatever would give the best performance.

Pure SQL is all nice and shiny, but a procedural server-side function is much faster for this task. While processing rows procedurally is generally slower, plpgsql wins this competition big-time, because it can make do with a single table scan and a single ORDER BY operation:

CREATE OR REPLACE FUNCTION f_msg_groups()
  RETURNS TABLE (ids int[])
  LANGUAGE plpgsql AS
$func$
DECLARE
   _id    int;
   _uid   int;
   _id0   int;                         -- id of last row
   _uid0  int;                         -- user_id of last row
BEGIN
   FOR _id, _uid IN
       SELECT id, user_id FROM messages ORDER BY id
   LOOP
       IF _uid <> _uid0 THEN
          RETURN QUERY VALUES (ids);   -- output row (never happens after 1 row)
          ids := ARRAY[_id];           -- start new array
       ELSE
          ids := ids || _id;           -- add to array
       END IF;

       _id0  := _id;
       _uid0 := _uid;                  -- remember last row
   END LOOP;

   RETURN QUERY VALUES (ids);          -- output last iteration
END
$func$;

Call:

SELECT * FROM f_msg_groups();

Benchmark and links

I ran a quick test with EXPLAIN ANALYZE on a similar real life table with 60k rows (execute several times, pick fastest result to exclude cashing effects):

SQL:

Total runtime: 1009.549 ms

Pl/pgSQL:

Total runtime: 336.971 ms

GROUP BY and aggregate sequential numeric values
GROUP BY consecutive dates delimited by gaps
Ordered count of consecutive repeats / duplicates

Postgree double group by repeating attribute

I found only 1 way:

With window function lead() find a moment when was changed dict (user, message type)
With window function sum() set sequnce number for each new dict
Group by sequence and select what you need:

Checking:

create table test (
  id serial primary key,
  user_id integer,
  message_id integer,
  message_type varchar
);

insert into test (user_id, message_id, message_type) 
values 
(1, 4, 'Warning'),
(1, 5, 'Warning'),
(1, 6, 'Warning'),
(2, 4, 'Error'),
(2, 1, 'Exception'),
(1, 2, 'Exception'),
(1, 3, 'Exception'),
(2, 4, 'Exception')
;

select 
    array_agg(grouped.id) as record_ids,
    grouped.user_id,
    array_agg(grouped.message_id) as message_ids,
    grouped.message_type
from (
    select changed.*, 
        sum(changed.changed) over (order by changed.id desc) as group_n
    from (
        select tt.*,
            case when lag((user_id, message_type)) over (order by tt.id desc) is distinct from (user_id, message_type) then 1 else 0 end as changed
        from test tt
    ) changed
    order by id desc
) grouped
group by grouped.group_n, grouped.user_id, grouped.message_type
order by grouped.group_n
;

Result:

record_ids | user_id | message_ids | message_type 
------------+---------+-------------+--------------
 {8}        |       2 | {4}         | Exception
 {7,6}      |       1 | {3,2}       | Exception
 {5}        |       2 | {1}         | Exception
 {4}        |       2 | {4}         | Error
 {3,2,1}    |       1 | {6,5,4}     | Warning
(5 rows)

Normalization: What does repeating groups mean?

The term "repeating group" originally meant the concept in CODASYL and COBOL based languages where a single field could contain an array of repeating values. When E.F.Codd described his First Normal Form that was what he meant by a repeating group. The concept does not exist in any modern relational or SQL-based DBMS.

The term "repeating group" has also come to be used informally and imprecisely by database designers to mean a repeating set of columns, meaning a collection of columns containing similar kinds of values in a table. This is different to its original meaning in relation to 1NF. For instance in the case of a table called Families with columns named Parent1, Parent2, Child1, Child2, Child3, ... etc the collection of Child N columns is sometimes referred to as a repeating group and assumed to be in violation of 1NF even though it is not a repeating group in the sense that Codd intended.

This latter sense of a so-called repeating group is not technically a violation of 1NF if each attribute is only single-valued. The attributes themselves do not contain repeating values and therefore there is no violation of 1NF for that reason. Such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values (maximum N children in a family) and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the "DRY" principle of design. Because it is generally considered poor design it suits database designers and sometimes even teachers to refer to repeated columns of this kind as a "repeating group" and a violation of the spirit of the First Normal Form.

This informal usage of terminology is slightly unfortunate because it can be a little arbitrary and confusing (when does a set of columns actually constitute a repetition?) and also because it is a distraction from a more fundamental issue, namely the Null problem. All of the Normal Forms are concerned with relations that don't permit the possibility of nulls. If a table permits a null in any column then it doesn't meet the requirements of a relation schema satisfying 1NF. In the case of our Families table, if the Child columns permit nulls (to represent families who have fewer than N children) then the Families table doesn't satisfy 1NF. The possibility of nulls is often forgotten or ignored in normalization exercises but the avoidance of unnecessary nullable columns is one very good reason for avoiding repeating sets of columns, whether or not you call them "repeating groups".

LINQ how to use group by to simplify lists with duplicate attributes

The general approach is as follows:

// First GroupBy compound type
.GroupBy(i => new { i.Type, i.Year })

// Then select from the Group Key and
// apply an Aggregate/query on the Grouped Values
.Select(g => new {
   Type = g.Key.Type,         // Pull out key values
   Year = g.Key.Year,
   Cost = g.Sum(i => i.Cost)  // Sum all items in group
})

DB Design: 1st Normal Form and Repeating Groups

Design 2 and Design 4 are the best ways to go provided the results will not always be present (aka NULLs in Desigin 1). If they always are taken, then the first design is fine.

I believe repeating groups in SQL would actually be if you have a column stuffed with add'l values e.g. Phone_Number contains "123-444-4444,123-333-3334" etc.

Anyway, the later designs are suboptimal -- you continue to take that to the final level and have the "One True Lookup Table" http://www.dbazine.com/ofinterest/oi-articles/celko22 or Entity Attribute Value http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

Either way, it's almost always a bad thing. Although they may share a common datatype/domain, the meaning differs -- thus they should remain individual attributes (maxtemp, mintemp, etc.)

XSLT Group sibling elements between repeating attributes

Well XPath allows you to select the following or preceding sibling elements with the corresponding axes. And for efficiency you can use a key:

<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:key name="subs" match="widget[@type = 'nav']/links/tab[@type = 'page']"
  use="generate-id(preceding-sibling::tab[@type = 'label'][1])"/>

<xsl:output method="html" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="container/widget[@type = 'nav']">
  <ul>
    <xsl:apply-templates select="links/tab[@type = 'label' and starts-with(@header, 'Top Level')]"/>
  </ul>
</xsl:template>

<xsl:template match="links/tab[@type = 'label' and starts-with(@header, 'Top Level')]">
  <li>
    <a href="link for label">
      <xsl:value-of select="@header"/>
    </a>
    <div class="submenu">
      <ul>
        <xsl:apply-templates select="key('subs', generate-id())"/>
      </ul>
    </div>
  </li>
</xsl:template>

<xsl:template match="links/tab[@type = 'page']">
  <li>
    <xsl:value-of select="@header"/>
  </li>
</xsl:template>

</xsl:stylesheet>

That stylesheet transforms

<Root>
   <container>
        <widget type="nav">
            <links>
                <tab type="label" header="Top Level Tab1"/>
                <tab type="page" header="submenu of tab1"/>
                <tab type="page" header="submenu of tab1"/>
                <tab type="page" header="submenu of tab1"/>
                <tab type="label" header="Top Level Tab2"/>
                <tab type="page" header="submenu of tab2"/>
                <tab type="page" header="submenu of tab2"/>
                <tab type="page" header="submenu of tab2"/>
                <tab type="page" header="submenu of tab2"/>
                <tab type="page" header="submenu of tab2"/>
                <tab type="label" header="Top Level Tab3"/>
                <tab type="page" header="submenu of tab3"/>
                <tab type="page" header="submenu of tab3"/>
                <tab type="page" header="submenu of tab3"/>
                <tab type="page" header="submenu of tab3"/>
                <tab type="page" header="submenu of tab3"/>
            </links>
        </widget>
   </container>
</Root>

into

<ul>
   <li><a href="link for label">Top Level Tab1</a><div class="submenu">
         <ul>
            <li>submenu of tab1</li>
            <li>submenu of tab1</li>
            <li>submenu of tab1</li>
         </ul>
      </div>
   </li>
   <li><a href="link for label">Top Level Tab2</a><div class="submenu">
         <ul>
            <li>submenu of tab2</li>
            <li>submenu of tab2</li>
            <li>submenu of tab2</li>
            <li>submenu of tab2</li>
            <li>submenu of tab2</li>
         </ul>
      </div>
   </li>
   <li><a href="link for label">Top Level Tab3</a><div class="submenu">
         <ul>
            <li>submenu of tab3</li>
            <li>submenu of tab3</li>
            <li>submenu of tab3</li>
            <li>submenu of tab3</li>
            <li>submenu of tab3</li>
         </ul>
      </div>
   </li>
</ul>

Handling repeating Attributes and its value in xml using XSLT code

What you could do is apply-templates to children of GrossWeightTransportationQuantity and InsuranceDeclaredAmount except for text and change those to attributes. Then just output the value of text.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="GrossWeightTransportationQuantity|InsuranceDeclaredAmount">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()[not(self::text)]"/>
      <xsl:value-of select="text"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="GrossWeightTransportationQuantity/*|InsuranceDeclaredAmount/*">
    <xsl:attribute name="{name()}">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>

Parsing repeating groups from a string

This code will parse out the groups/subgroups. You can replace the system.print statements with your map building. You may want to rethink the format, however, because it could be a lot clearer if you used a format that naturally supports nesting like XML

@Test
public void testname() throws Exception {
    parseText("453=3~^448=0A~!447=D~!452=1~!~^448=0A~!447=D~!452=17~!~^448=81~!447=D~!452=7~!~^");
}

private int subgroupLength = 0;

public void parseText(String text) {
    for (String group : text.split("~\\^")) {
        System.out.println("Group");
        parseGroup(group);
    }
}

public void parseGroup(String group) {
    for (String attribute : group.split("~!"))
        parseAttribute(attribute);
}

public void parseAttribute(String attribute) {
    String[] split = attribute.split("=");
    if (split.length != 2)
        return;

    if (split[0].equals("453")) {
        System.out.println("\tSubgroup length " + split[1]);
        subgroupLength = Integer.parseInt(split[1]);
    } else if (subgroupLength > 0) {
        subgroupLength--;
        System.out.println("\t\t" + split[0] + " = " + split[1]);
    } else
        System.out.println("\t" + split[0] + " = " + split[1]);
}

How will I repeat group header of same group where I only want to repeat the second portion of the group?

To repeat only a part of a group header there are (at least) two different approaches:

multiple (sub-)header on a group and suppressing the unwanted header
multiple, identical groups, repeating only some of them

To use multiple group header:

Create a group a field (PONo) in your case).
Check Repeat Group Header On Each Page in the Group Options in the Group Expert.
Create group headers for the group (#1a and #1b in your case)
Add a Suppression Formula for the group header that should not repeat (#1a), put InRepeatedGroupHeader in the formula body

This will print both group header when a group is printed first but only group header #1b on following pages.

To use multiple groups:

Create two groups on the same field (Group 1 and 2 on PONo)
Check Repeat Group Header On Each Page only for group 2

This will print both group header when a group is printed first but only group header #2 on following pages.

Group by Repeating Attribute