What Is Linq and What Does It Do

What is LINQ and what does it do?

LINQ stands for Language Integrated Query.

Instead of writing YAQL (Yet Another Query Language), Microsoft language developers provided a way to express queries directly in their languages (such as C# and Visual Basic). The techniques for forming these queries do not rely on the implementation details of the thing being queried, so that you can write valid queries against many targets (databases, in-memory objects, XML) with practically no consideration of the underlying way in which the query will be executed.

Let's start this exploration with the parts belonging to the .NET Framework (3.5).

  • LINQ To Objects - examine System.Linq.Enumerable for query methods. These target IEnumerable<T>, allowing any typed loopable collection to be queried in a type-safe manner. These queries rely on compiled .NET methods, not Expressions.

  • LINQ To Anything - examine System.Linq.Queryable for some query methods. These target IQueryable<T>, allowing the construction of Expression Trees that can be translated by the underlying implementation.

  • Expression Trees - examine System.Linq.Expressions namespace. This is code as data. In practice, you should be aware of this stuff, but don't really need to write code against these types. Language features (such as lambda expressions) can allow you to use various short-hands to avoid dealing with these types directly.

  • LINQ To SQL - examine the System.Data.Linq namespace. Especially note the DataContext. This is a DataAccess technology built by the C# team. It just works.

  • LINQ To Entities - examine the System.Data.Objects namespace. Especially note the ObjectContext. This is a DataAccess technology built by the ADO.NET team. It is complex, powerful, and harder to use than LINQ To SQL.

  • LINQ To XML - examine the System.Xml.Linq namespace. Essentially, people weren't satisfied with the stuff in System.Xml. So Microsoft re-wrote it and took advantage of the re-write to introduce some methods that make it easier to use LINQ To Objects against XML.

  • Some nice helper types, such as Func and Action. These types are delegates with Generic Support. Gone are the days of declaring your own custom (and un-interchangable) delegate types.

All of the above is part of the .NET Framework, and available from any .NET language (VB.NET, C#, IronPython, COBOL .NET etc).


Ok, on to language features. I'm going to stick to C#, since that's what I know best. VB.NET also had several similar improvements (and a couple that C# didn't get - XML literals). This is a short and incomplete list.

  • Extension Methods - this allows you to "add" a method to type. The method is really a static method that is passed an instance of the type, and is restricted to the public contract of the type, but it very useful for adding methods to types you don't control (string), or adding (fully implemented) helper methods to interfaces.

  • Query Comprehension Syntax - this allows you to write in a SQL Like structure. All of this stuff gets translated to the methods on System.Linq.Queryable or System.Linq.Enumerable (depending on the Type of myCustomers). It is completely optional and you can use LINQ well without it. One advantage to this style of query declaration is that the range variables are scoped: they do not need to be re-declared for each clause.

    IEnumerable<string> result =
    from c in myCustomers
    where c.Name.StartsWith("B")
    select c.Name;
  • Lambda Expressions - This is a shorthand for specifying a method. The C# compiler will translate each into either an anonymous method or a true System.Linq.Expressions.Expression. You really need to understand these to use Linq well. There are three parts: a parameter list, an arrow, and a method body.

    IEnumerable<string> result = myCustomers
    .Where(c => c.Name.StartsWith("B"))
    .Select(c => c.Name);`
  • Anonymous Types - Sometimes the compiler has enough information to create a type for you. These types aren't truly anonymous: the compiler names them when it makes them. But those names are made at compile time, which is too late for a developer to use that name at design time.

    myCustomers.Select(c => new 
    {
    Name = c.Name;
    Age = c.Age;
    })
  • Implicit Types - Sometimes the compiler has enough information from an initialization that it can figure out the type for you. You can instruct the compiler to do so by using the var keyword. Implicit typing is required to declare variables for Anonymous Types, since programmers may not use the name of an anonymous type.

    // The compiler will determine that names is an IEnumerable<string>
    var names = myCustomers.Select(c => c.Name);

What does Include() do in LINQ?

Let's say for instance you want to get a list of all your customers:

var customers = context.Customers.ToList();

And let's assume that each Customer object has a reference to its set of Orders, and that each Order has references to LineItems which may also reference a Product.

As you can see, selecting a top-level object with many related entities could result in a query that needs to pull in data from many sources. As a performance measure, Include() allows you to indicate which related entities should be read from the database as part of the same query.

Using the same example, this might bring in all of the related order headers, but none of the other records:

var customersWithOrderDetail = context.Customers.Include("Orders").ToList();

As a final point since you asked for SQL, the first statement without Include() could generate a simple statement:

SELECT * FROM Customers;

The final statement which calls Include("Orders") may look like this:

SELECT *
FROM Customers JOIN Orders ON Customers.Id = Orders.CustomerId;

Understanding the basic for Linq queries

See below links for an intro to linq

What is Linq and what does it do?

http://weblogs.asp.net/scottgu/using-linq-to-sql-part-1

Linq provides a mean of querying data, but you still need to provide a means of Linq accessing that data - be it through Linq2Sql classes, ADO, Entity Framework, etc.

I'm a fan of Entity Framework (EF) where you set up objects that represent your data, and use a context to populate those objects.

it could look something like this:

public class Table1
{
public string FirstName { get; set; }
public string SurName { get; set; }
public DateTime DOB { get; set; }
}

public class Table1Repository
{
private readonly MyEntities _context;

public Table1Repository()
{
this._context = new MyEntities();
}

public IQueryable<Table1> Get()
{
return this._context.Table1; // in effect your "Select * from table1"
}

public IQueryable<Table1> GetById(DateTime dob)
{
return this._context.Table1.Where(w => w.DOB == dob); // pulls records with a dob matching param - using lambda here but there is also "query expression syntax" which looks more like sql
}

}

Note that you're performing linq queries on the context that represents the data, not the database itself. Linq is very powerful, but you need to provide it a means of accessing data. Even if that data is as xml, a file, a database, whatever!

What is LINQ exactly?

LINQ is many things, it's the combination of many smaller things.

This answer is going to be a jumble of information, I apologize. Your best bet is to wait a bit and see if someone else summarizes it better, and do google for the keywords I use.

LINQ stands for "Language INtegrated Query", and the most naive interpretation is that they added SQL-like syntax to the C# programming language.

So instead of:

IEnumerable<int> values = otherValues.Where(i => i > 5);

they have the syntax:

IEnumerable<int> values = from i in otherValues
where i > 5
select i;

The C# compiler will actually translate the second piece of code above to the first piece of code, so in reality, you're just calling methods on the collections.

However, and here's another part of the puzzle. Those methods are not actually defined in the collections at all. They're defined as extension methods, which means they're defined somewhere else, with some trickery that basically says "let the programmer use these methods as though they were defined in the collection type to begin with, and just fix the code during compilation".

So the first piece of code above:

IEnumerable<int> values = otherValues.Where(i => i > 5);

actually ends up being compiled as:

IEnumerable<int> values = Enumerable.Where(otherValues, i => i > 5);

The Where method is defined here: Enumerable.Where.

Next piece of magic is that the C# compiler doesn't use Enumerable.Where, what it does is that it just rewrites the code on the fly to look like the second piece of code in my answer here, and let the normal type inference work it out. In other words, it's going to pretend you actually wrote the second piece of code, and then see that "otherValues" is a List<T> where T is an int, and then find that Enumerable.Where is the one to call.

This means that you can, for other types than collections, actually make your own implementations of Where, and the LINQ syntax would be none the wiser.

This means ... that things that aren't really in-memory collections can be queried. For instance, if "otherValues" above is something that knows how to get data from a database, a different Where method will be called, not the one in Enumerable.Where.

This allows those other implementations to do their things in their own way, for instance by writing the SQL for you, executing it, and packaging up the result so that it looks to the calling code as though it actually was an in-memory collection to begin with.

Next piece of magic is expressions. The parameter to the Where method above, i => i > 5 is a lambda expression, or an anonymous method, in most cases, and you could actually declare it like this for an in-memory collection:

Func<int, bool> w = delegate(int i) { return i > 5; };
IEnumerable<int> values = otherValues.Where(w);

However, expression support in C# means that you can also declare it as:

Expression<Func<int, bool>> w = i => i > 5;

Here, the compiler isn't actually storing it as a compiled piece of code, but rather an in-memory data structure that knows that it takes one argument, compares it to 5 with a greater-than comparison and returns the result. Note that you have to use the lambda way of writing it, not as a delegate.

This knowledge allows those other Where implementations, if they're declared to take expressions, to not only get a hold of the "where clause", but to look at it, pick it apart, and rewrite it.

Which means that generating that SQL can be done in the Where method that knows how to deal with SQL code.

Here's the LINQ to SQL declaration of the Where method: Queryably.Where.

So LINQ is the combination of many smaller pieces of technology added to the C# compiler:

  • LINQ syntax
  • Extension methods
  • LINQ extension methods (+ other implementations, in particular look at LINQ to SQL.)
  • Lambda expressions and Expression trees.

linq to entities vs linq to objects - are they the same?

That is definitely not the case.

LINQ-to-Objects is a set of extension methods on IEnumerable<T> that allow you to perform in-memory query operations on arbitrary sequences of objects. The methods accept simple delegates when necessary.

LINQ-to-Entities is a LINQ provider that has a set of extension methods on IQueryable<T>. The methods build up an expression tree (which is why delegates are actually passed as Expression<>s), and the provider will build up a SQL query based on its parsing of that expression tree.

As an example, consider the following queries:

var query1 = mydb.MyEntity.Select(x => x.SomeProp).Where(x => x == "Prop");
var query2 = mydb.MyEntity.Select(x => x.SomeProp).AsEnumerable().Where(x => x == "Prop");

The first query is will build up an expression tree consisting of a select and a where, with the two lambdas actually considered as LambdaExpressions. The LINQ-to-Entities provider will translate that into SQL that both selects and filters.

The second query inserts an AsEnumerable(), which will force the remainder of the query to use LINQ-to-Objects. In that case, the provider will generate SQL based on only the selection, return all those records from the database, and then the filtering will occur in-memory. Obviously, that's likely going to be much slower.

Can someone please explain what this LINQ query does?

from stat in XpoSession.Query<STUDENT>()
where stat.ID_PERSON.DT_BIRTH >= DateTime.Now.AddYears(-20)
&& stat.Status.Where(val => val.CD_REASON == Constants.REASON_PLACED).Count() == 0
select stat.CD_STATUS.Trim()).ToList().GroupBy(val => val);

Looks like it's getting whatever the CD_STATUS value is for students who are more than 20 years old where their status isn't whatever Constants.REASON_PLACED value is and grouping them by the CD_STATUS value.

Without more context or additional code, it's not really possible to provide a better answer here.

What does = mean in a Linq Expression

This notation => means lambda expression

example:

Enumerable.Range(0,100).Where(x=>x==1);

here x=> x==1 is a anonymous delegate accepting int as a parameter and returning bool. It is:

delegate bool SomeDelegate(int x);

and you can assign body of your delegate to:

bool Function(int x)
{
return x==1;
}

A lambda expression is an anonymous function that you can use to
create delegates or expression tree types. By using lambda
expressions, you can write local functions that can be passed as
arguments or returned as the value of function calls. Lambda
expressions are particularly helpful for writing LINQ query
expressions.

To create a lambda expression, you specify input parameters (if any)
on the left side of the lambda operator =>, and you put the expression
or statement block on the other side. For example, the lambda
expression x => x * x specifies a parameter that’s named x and returns
the value of x squared. You can assign this expression to a delegate
type, as the following example shows:

source:
Read about lambda expressions

Here is a SO question about why to use lambdas: C# Lambda expressions: Why should I use them?

How LINQ works internally?

It makes more sense to ask about a particular aspect of LINQ. It's a bit like asking "How Windows works" otherwise.

The key parts of LINQ are for me, from a C# perspective:

  • Expression trees. These are representations of code as data. For instance, an expression tree could represent the notion of "take a string parameter, call the Length property on it, and return the result". The fact that these exist as data rather than as compiled code means that LINQ providers such as LINQ to SQL can analyze them and convert them into SQL.
  • Lambda expressions. These are expressions like this:

    x => x * 2
    (int x, int y) => x * y
    () => { Console.WriteLine("Block"); Console.WriteLine("Lambda"); }

    Lambda expressions are converted either into delegates or expression trees.

  • Anonymous types. These are expressions like this:

    new { X=10, Y=20 }

    These are still statically typed, it's just the compiler generates an immutable type for you with properties X and Y. These are usually used with var which allows the type of a local variable to be inferred from its initialization expression.

  • Query expressions. These are expressions like this:

    from person in people
    where person.Age < 18
    select person.Name

    These are translated by the C# compiler into "normal" C# 3.0 (i.e. a form which doesn't use query expressions). Overload resolution etc is applied afterwards, which is absolutely key to being able to use the same query syntax with multiple data types, without the compiler having any knowledge of types such as Queryable. The above expression would be translated into:

    people.Where(person => person.Age < 18)
    .Select(person => person.Name)
  • Extension methods. These are static methods which can be used as if they were instance methods of the type of the first parameter. For example, an extension method like this:

    public static int CountAsciiDigits(this string text)
    {
    return text.Count(letter => letter >= '0' && letter <= '9');
    }

    can then be used like this:

    string foo = "123abc456";
    int count = foo.CountAsciiDigits();

    Note that the implementation of CountAsciiDigits uses another extension method, Enumerable.Count().

That's most of the relevant language aspects. Then there are the implementations of the standard query operators, in LINQ providers such as LINQ to Objects and LINQ to SQL etc. I have a presentation about how it's reasonably simple to implement LINQ to Objects - it's on the "Talks" page of the C# in Depth web site.

The way providers such as LINQ to SQL work is generally via the Queryable class. At their core, they translate expression trees into other query formats, and then construct appropriate objects with the results of executing those out-of-process queries.

Does that cover everything you were interested in? If there's anything in particular you still want to know about, just edit your question and I'll have a go.

When should I use LINQ for C#?

I find that I'm using LINQ just about any time that I would have previously written a loop to fill a container. I use LINQ to SQL as my ORM and lots of LINQ everywhere else.

Here's a little snippet that I wrote for an Active Directory helper class that finds out if a particular user is an a particular group. Note the use of the Any() method to iterate over the user's authorization groups until it finds one with a matching SID. Much cleaner code than the alternative.

private bool IsInGroup( GroupPrincipal group, UserPrincipal user )
{
if (group == null || group.Sid == null)
{
return false;
}
return user.GetAuthorizationGroups()
.Any( g => g.Sid != null && g.Sid.CompareTo( group.Sid ) == 0 );
}

Alternative:

private bool IsInGroup( GroupPrincipal group, UserPrincipal user )
{
if (group == null || group.Sid == null)
{
return false;
}
bool inGroup = false;
foreach (var g in user.GetAuthorizationGroups())
{
if ( g => g.Sid != null && g.Sid.CompareTo( group.Sid ) == 0 )
{
inGroup = true;
break;
}
}
return inGroup;
}

or

private bool IsInGroup( GroupPrincipal group, UserPrincipal user )
{
if (group == null || group.Sid == null)
{
return false;
}

foreach (var g in user.GetAuthorizationGroups())
{
if ( g => g.Sid != null && g.Sid.CompareTo( group.Sid ) == 0 )
{
return true;
}
}
return false;
}

Here's a snippet that does a search against a repository, orders, and converts the first 10 matching business objects into a view-specific model (Distance is the Levenshtein edit distance of the matching model's unique id from the uniqueID parameter).

model.Results = this.Repository.FindGuestByUniqueID( uniqueID, withExpired )
.OrderBy( g => g.Distance )
.Take( 10 )
.ToList()
.Select( g => new GuestGridModel( g ) );


Related Topics



Leave a reply



Submit