Sql - Finding Students Taking 2 or More Classes

SQL -- Finding students taking 2 or more classes

This query will do what you want:

SELECT s.student_name, COUNT(DISTINCT r.class_id) AS num_classes
FROM student s
LEFT JOIN rosters r ON r.student_id = s.student_id
GROUP BY s.student_id
HAVING num_classes >= 2

It counts all the distinct class_id values in roster for each student (num_classes) and returns only students with 2 or more (HAVING num_classes >= 2).

Note I've used a LEFT JOIN to catch all students, however since you want only those with more than 1 class this is not necessary and you could use a straight JOIN.

Also note that it's not necessary to JOIN the class table for this question, however if you wanted the names of the classes the student was taking you would need to.

Find students who take most courses SQL

Your idea is good. Now that you have the students' semester course counts along with the maximum semester course counts, compare the two:

SELECT semester, num_courses, student
FROM
(
SELECT
student,
semester,
num_courses,
MAX(num_courses) OVER (PARTITION BY semester) AS semester_max_num_courses
FROM students s
) with_max
WHERE num_courses = semester_max_num_courses
ORDER BY semester, student;

Another approach would be to select all maximum semester course counts and then use this to get the students:

SELECT semester, num_courses, student
FROM students
WHERE (semester, num_courses) IN
(
SELECT semester, MAX(num_courses)
from students
GROUP BY semester
)
ORDER BY semester, student;

Find pair of students who take exactly the same classes

Given sample data:

CREATE TABLE student_course (
student_id integer,
course_id integer,
PRIMARY KEY (student_id, course_id)
);

INSERT INTO student_course (student_id, course_id)
VALUES (1, 1), (1, 2), (1, 3), (2, 1), (3, 1), (3, 2), (3, 3) ;

Use array aggregation

One option is to use a CTE to join on the ordered lists of courses each student is taking:

WITH student_coursearray(student_id, courses) AS (
SELECT student_id, array_agg(course_id ORDER BY course_id)
FROM student_course
GROUP BY student_id
)
SELECT a.student_id, b.student_id
FROM student_coursearray a INNER JOIN student_coursearray b ON (a.courses = b.courses)
WHERE a.student_id > b.student_id;

array_agg is actually part of the SQL standard, as is the WITH common-table expression syntax. Neither are supported by MySQL so you'll have to express this a different way if you want to support MySQL.

Find missing course pairings per-student

Another way to think about this would be "for every student pairing, find out if one is taking a class the other is not". This would lend its self to a FULL OUTER JOIN, but it's pretty awkward to express. You have to determine the pairings of student IDs of interest, then for each pairing do a full outer join across the set of classes each takes. If there are any null rows then one took a class the other didn't, so you can use that with a NOT EXISTS filter to exclude such pairings. That gives you this monster:

WITH student_id_pairs(left_student, right_student) AS (
SELECT DISTINCT a.student_id, b.student_id
FROM student_course a
INNER JOIN student_course b ON (a.student_id > b.student_id)
)
SELECT left_student, right_student
FROM student_id_pairs
WHERE NOT EXISTS (
SELECT 1
FROM (SELECT course_id FROM student_course WHERE student_id = left_student) a
FULL OUTER JOIN (SELECT course_id FROM student_course b WHERE student_id = right_student) b
ON (a.course_id = b.course_id)
WHERE a.course_id IS NULL or b.course_id IS NULL
);

The CTE is optional and may be replaced by a CREATE TEMPORARY TABLE AS SELECT ... or whatever if your DB doesn't support CTEs.

Which to use?

I'm very confident that the array approach will perform better in all cases, particularly because for a really large data set you can take the WITH expression, create a temporary table from the query instead, add an index on (courses, student_id) to it and do crazy-fast equality searching that'll well and truly pay off the cost of the index creation time. You can't do that with the subquery joins approach.

Return a list of students who attended the same class

You can do this with a self-join:

select c1.student, c2.student, c1.class
from ClassStudents c1 join
ClassStudents c2
on c1.class = c2.class and
c1.student < c2.student;

If two students attended multiple classes together, this will return one row per class.

If you are using ids and not names in the tables (a good idea), then you will need to join in the appropriate information from other tables. I'll leave that up to you.

EDIT:

To count the number of classes two students have had together, you would use group by:

select c1.student, c2.student, count(*) as NumClasses
from ClassStudents c1 join
ClassStudents c2
on c1.class = c2.class and
c1.student < c2.student
group by c1.student, c2.student;

You can use a having clause to filter by count(*). You can join this back to the other tables to get the specific classes.

Here is how you can join this back to the original tables:

select s1.name as student1, s2.name as student2, count(*) as NumClasses
from ClassStudents c1 join
ClassStudents c2
on c1.class = c2.class and
c1.studentid < c2.studentid join
Students s1
on c1.studentid = s1.studentid join
Students s2
on c2.studentid = s2.studentid
group by c1.name, c2.name
having count(*) > 1;

You could also do this as a subquery, but this is easy enough.



Related Topics



Leave a reply



Submit