Aggregation Filter After $Lookup

Aggregation filter after $lookup

The question here is actually about something different and does not need $lookup at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:

MongoDB 3.6 - Sub-pipeline

db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"let": { "id": "$id" },
"pipeline": [
{ "$match": {
"value": "1",
"$expr": { "$in": [ "$$id", "$contain" ] }
}}
],
"as": "childs"
}}
])

Earlier - $lookup + $unwind + $match coalescence

db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$unwind": "$childs" },
{ "$match": { "childs.value": "1" } },
{ "$group": {
"_id": "$_id",
"id": { "$first": "$id" },
"value": { "$first": "$value" },
"contain": { "$first": "$contain" },
"childs": { "$push": "$childs" }
}}
])

If you question why would you $unwind as opposed to using $filter on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size for all the detail on why this is generally necessary and far more optimal.

For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.

Back to the answer though which actually describes why the question asked needs "no join" at all....



Original

Using $lookup like this is not the most "efficient" way to do what you want here. But more on this later.

As a basic concept, just use $filter on the resulting array:

db.test.aggregate([ 
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$project": {
"id": 1,
"value": 1,
"contain": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$eq": [ "$$child.value", "1" ] }
}
}
}}
]);

Or use $redact instead:

db.test.aggregate([ 
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$eq": [ "$value", "0" ] },
{ "$eq": [ "$value", "1" ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
]);

Both get the same result:

{  
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}

Bottom line is that $lookup itself cannot "yet" query to only select certain data. So all "filtering" needs to happen after the $lookup

But really for this type of "self join" you are better off not using $lookup at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group instead:

db.test.aggregate([
{ "$match": {
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}},
{ "$group": {
"_id": {
"$cond": {
"if": { "$eq": [ "$value", "0" ] },
"then": "$id",
"else": { "$arrayElemAt": [ "$contain", 0 ] }
}
},
"value": { "$first": { "$literal": "0"} },
"childs": {
"$push": {
"$cond": {
"if": { "$ne": [ "$value", "0" ] },
"then": "$$ROOT",
"else": null
}
}
}
}},
{ "$project": {
"value": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$ne": [ "$$child", null ] }
}
}
}}
])

Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:

{
"_id" : 100,
"value" : "0",
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [ 100 ]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [ 100 ]
}
]
}

So the only real issue here is "filtering" any null result from the array, created when the current document was the parent in processing items to $push.


What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.

That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:

var result = {};

db.test.find({
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}).sort({ "contain.0": 1 }).forEach(function(doc) {
if ( doc.id == 100 ) {
result = doc;
result.childs = []
} else {
result.childs.push(doc)
}
})

printjson(result);

This does exactly the same thing:

{
"_id" : ObjectId("570557d4094a4514fc1291d6"),
"id" : 100,
"value" : "0",
"contain" : [ ],
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [
100
]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [
100
]
}
]
}

And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.

This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.

That is exactly what the point of the documentation example "Model Tree Structures with Child References" in it's structure, where it makes it easy to select parents and children within one query.

Aggregation $filter is not working after $lookup

The field user_info is array and you are checking equal-to condition in $filter operation, You can change your $filter condition as per below,

  • When we access mapped_id from array field $$child.config.user_info.mapped_id, it will return array of ids so we need to use $in condition
  • $ifNull to check if user_info field is not present then it will return blank array
  • $in operator to check is 1 in mapped_id's array
  {
$project: {
mac_id: 1,
childs: {
$filter: {
"input": "$childs",
"as": "child",
"cond": {
"$in": [
1,
{ $ifNull: ["$$child.config.user_info.mapped_id", []] }
]
}
}
}
}
}

Playground


The second option and this is right way to handle this situation, $lookup using pipeline,

  • let to pass mac_id to pipeline
  • check $expr condition for mac_id
  • match mapped_id condition
db.gateway.aggregate([
{ $match: { group_id: "0" } },
{
$lookup: {
from: "commands",
let: { mac_id: "$mac_id" },
pipeline: [
{
$match: {
$expr: { $eq: ["$mac_id", "$$mac_id"] },
"config.user_info.mapped_id": 1
}
}
],
as: "childs"
}
},
{
$project: {
_id: 0,
mac_id: 1,
childs: 1
}
}
])

Playground

If you want to filter user_info array then you can add one more stage after $match stage in $lookup stage,

{
$addFields: {
"config.user_info": {
$filter: {
input: "$config.user_info",
cond: { $eq: ["$$this.mapped_id", 1] }
}
}
}
}

Playground

How to filter entire dataset after $lookup aggregate operation in mongodb?

Instead of having a $addFields to get the size of the count array field and then $match to filter the documents with size greater than zero - you can combine both the stages as a single $match stage. The $expr operator allows using Aggregation operators with the $match stage (and also within the find method). Using the $expr build the $match stage as follows:

{ $match: { $expr: { $gt: [ { $size: "$count" }, 0 ] } } }

This stage will follow the $lookup in the pipeline. Doing work in lesser stages in a pipeline is a best practice as well as it improves performance especially when the number of documents being processed are large.

MongoDB Aggregation: How to $match after a $lookup?

Solution 1

Use $filter in $project stage to filter the document(s) from the array (bs).

db.collA.aggregate([
{
$lookup: {
from: "collB",
localField: "_id",
foreignField: "refId",
as: "bs"
}
},
{
$project: {
_id: 1,
bs: {
"$filter": {
"input": "$bs",
"cond": {
$eq: [
"$$this.name",
/* Filter value */
]
}
}
}
}
}
])

Sample Solution 1 on Mongo Playground



Solution 2

Use $lookup with pipeline.

db.collA.aggregate([
{
$lookup: {
from: "collB",
let: {
id: "$_id"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{
$eq: [
"$$id",
"$refId"
]
},
{
$eq: [
"$name",
/* Filter value */
]
}
]
}
}
}
],
as: "bs"
}
}
])

Sample Solution 2 on Mongo Playground

Filter on lookup collection in MongoDB Aggregation

  • just try simple $lookup
  • $match if result not equal to empty array
db.users.aggregate([
{
$lookup: {
from: "worspace",
localField: "_id",
foreignField: "admins",
as: "workspaces"
}
},
{ $match: { workspaces: { $ne: [] } } }
])

Playground

MongoDB lookup and filter by foreign documents

AggregationOperation isActiveMatch= Aggregation.match(Criteria.where("isActive").is(true));
should be the first match.

LookupOperation lookupOperation = LookupOperation.newLookup().from("roles").localField("roleId")
.foreignField("_id").as("roles");

AggregationOperation match = Aggregation.match(Criteria.where("type").is("internal"));

Aggregation aggregation = Aggregation.newAggregation(isActiveMatch,lookupOperation, match);

Update 1

You may expect like this,

db.user.aggregate([
{
"$lookup": {
"from": "roles",
"localField": "role.roleId",
"foreignField": "id",
"as": "roles"
}
},
{
$project: {
roles: {
"$filter": {
"input": "$roles",
"cond": {
$eq: [
"$$this.type",
"internal"
]
}
}
}
}
},
{
$match: {
roles: {
$ne: []
}
}
}
])

Working Mongo playground

Here you need to add two stages after lookup. First one is to filter the interval, 2nd is to eliminate empty roles array.

ProjectionOperation as =
project()
.and(
ArrayOperators.Filter.filter("roles")
.as("role")
.by(ComparisonOperators.Eq.valueOf("role.type").equalTo("interval")))
.as("roles");

I have added project stages, hope you can add match stage. The above code is not tested, but written based on working script.

Aggregation filter and lookup on Mongodb

It was not possible to achieve both filtering for communityId = 1001 and grouping without losing count = 0 category in a single aggregation. The way to do it is first start from complaints collection, and filter the communityId = 1001 objects, and create a temp collection with it. Then from employeecategory collection, $lookup to join with that temp collection, and $group with name, you will have your result at this point, then drop the temp table.

// will not modify complaints document, will create a filtered temp document
db.complaints.aggregate(
[{
$match: {
communityId: 1001
}
},
{
$out: "temp"
}
]
);

// will return the answer that is requested by OP
db.employeecategory.aggregate(
[{
$lookup: {
from: "temp",
localField: "name",
foreignField: "category",
as: "array"
}
}, {
$group: {
_id: "$name",
count: {
$sum: {
$size: "$array"
}
}
}
}]
).pretty();

db.temp.drop(); // to get rid of this temporary collection

will result;

{ _id: "PLUMBER", count: 0},
{ _id: "SECURITY", count: 2},
{ _id: "GARDENING", count: 1}

for the test data I've had;

db.employeecategory.insertMany([
{ name: "GARDENING" },
{ name: "SECURITY" },
{ name: "PLUMBER" }
]);

db.complaints.insertMany([
{ category: "GARDENING", communityId: 1001 },
{ category: "SECURITY", communityId: 1001 },
{ category: "SECURITY", communityId: 1001 },
{ category: "SECURITY", communityId: 1002 }
]);

MongoDb: aggregation $lookup with filtering over the foreign documents

You can use $filter array aggregation operator on pets array that is produced by your $lookup stage.

To output pets older than 1 year use

db.users.aggregate([ 
{
$lookup:
{
from: "pets",
localField: "id",
foreignField: "owner",
as: "pets"
}
},
{
$project:
{
name: 1,
pets:
{
$filter:
{
input: "$pets",
as: "pet",
cond: { $gte: [ "$$pet.age", 1 ] }
}
}
}
}
]);

To output the oldest pets simply replace cond field of $filter operator in the previous aggregation pipeline with

cond: { $eq: [ "$$pet.age", { $max: "$pets.age" } ] }


Related Topics



Leave a reply



Submit