Multimap in .Net

multimap in .NET

Because it's almost christmas :)

//////////////////////////////////////////////////////////////////////
// Algorithmia is (c) 2008 Solutions Design. All rights reserved.
// http://www.sd.nl
//////////////////////////////////////////////////////////////////////
// COPYRIGHTS:
// Copyright (c) 2008 Solutions Design. All rights reserved.
//
// The Algorithmia library sourcecode and its accompanying tools, tests and support code
// are released under the following license: (BSD2)
// ----------------------------------------------------------------------
// Redistribution and use in source and binary forms, with or without modification,
// are permitted provided that the following conditions are met:
//
// 1) Redistributions of source code must retain the above copyright notice, this list of
// conditions and the following disclaimer.
// 2) Redistributions in binary form must reproduce the above copyright notice, this list of
// conditions and the following disclaimer in the documentation and/or other materials
// provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY SOLUTIONS DESIGN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
// INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SOLUTIONS DESIGN OR CONTRIBUTORS BE LIABLE FOR
// ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
// NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
// USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// The views and conclusions contained in the software and documentation are those of the authors
// and should not be interpreted as representing official policies, either expressed or implied,
// of Solutions Design.
//
//////////////////////////////////////////////////////////////////////
// Contributers to the code:
// - Frans Bouma [FB]
//////////////////////////////////////////////////////////////////////
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using SD.Tools.Algorithmia.UtilityClasses;

namespace SD.Tools.Algorithmia.GeneralDataStructures
{
/// <summary>
/// Extension to the normal Dictionary. This class can store more than one value for every key. It keeps a HashSet for every Key value.
/// Calling Add with the same Key and multiple values will store each value under the same Key in the Dictionary. Obtaining the values
/// for a Key will return the HashSet with the Values of the Key.
/// </summary>
/// <typeparam name="TKey">The type of the key.</typeparam>
/// <typeparam name="TValue">The type of the value.</typeparam>
public class MultiValueDictionary<TKey, TValue> : Dictionary<TKey, HashSet<TValue>>
{
/// <summary>
/// Initializes a new instance of the <see cref="MultiValueDictionary<TKey, TValue>"/> class.
/// </summary>
public MultiValueDictionary()
: base()
{
}

/// <summary>
/// Adds the specified value under the specified key
/// </summary>
/// <param name="key">The key.</param>
/// <param name="value">The value.</param>
public void Add(TKey key, TValue value)
{
ArgumentVerifier.CantBeNull(key, "key");

HashSet<TValue> container = null;
if(!this.TryGetValue(key, out container))
{
container = new HashSet<TValue>();
base.Add(key, container);
}
container.Add(value);
}

/// <summary>
/// Determines whether this dictionary contains the specified value for the specified key
/// </summary>
/// <param name="key">The key.</param>
/// <param name="value">The value.</param>
/// <returns>true if the value is stored for the specified key in this dictionary, false otherwise</returns>
public bool ContainsValue(TKey key, TValue value)
{
ArgumentVerifier.CantBeNull(key, "key");
bool toReturn = false;
HashSet<TValue> values = null;
if(this.TryGetValue(key, out values))
{
toReturn = values.Contains(value);
}
return toReturn;
}

/// <summary>
/// Removes the specified value for the specified key. It will leave the key in the dictionary.
/// </summary>
/// <param name="key">The key.</param>
/// <param name="value">The value.</param>
public void Remove(TKey key, TValue value)
{
ArgumentVerifier.CantBeNull(key, "key");

HashSet<TValue> container = null;
if(this.TryGetValue(key, out container))
{
container.Remove(value);
if(container.Count <= 0)
{
this.Remove(key);
}
}
}

/// <summary>
/// Merges the specified multivaluedictionary into this instance.
/// </summary>
/// <param name="toMergeWith">To merge with.</param>
public void Merge(MultiValueDictionary<TKey, TValue> toMergeWith)
{
if(toMergeWith==null)
{
return;
}

foreach(KeyValuePair<TKey, HashSet<TValue>> pair in toMergeWith)
{
foreach(TValue value in pair.Value)
{
this.Add(pair.Key, value);
}
}
}

/// <summary>
/// Gets the values for the key specified. This method is useful if you want to avoid an exception for key value retrieval and you can't use TryGetValue
/// (e.g. in lambdas)
/// </summary>
/// <param name="key">The key.</param>
/// <param name="returnEmptySet">if set to true and the key isn't found, an empty hashset is returned, otherwise, if the key isn't found, null is returned</param>
/// <returns>
/// This method will return null (or an empty set if returnEmptySet is true) if the key wasn't found, or
/// the values if key was found.
/// </returns>
public HashSet<TValue> GetValues(TKey key, bool returnEmptySet)
{
HashSet<TValue> toReturn = null;
if(!base.TryGetValue(key, out toReturn) && returnEmptySet)
{
toReturn = new HashSet<TValue>();
}
return toReturn;
}
}
}

How does a STL multimap differ from a .NET Dictionary key, List values ?

multimap.count: O(log n + m) where n is number of keys and m is number of items associated with a given key.

For a Dictionary<TKey, List<TValue>> the equivalent functionality would be:

int count = dictionary[key].Count;

And safer is to say

int count;
List<TValue> list;
if(dictionary.TryGetValue(key, out list)) {
int count = list.Count;
}

This is an O(1) operation because lookup is O(1)1 and List<T>.Count is O(1).

multimap.find: O(log n) where n is number of keys

For a Dictionary<TKey, List<TValue>> the equivalent functionality would be:

List<TValue> elements = dictionary[key];

And safer is to say

List<TValue> list;
if(dictionary.TryGetValue(key, out list)) {
// safe to iterate list
}

This is O(1). See the previous remark on lookup by key in a Dictionary<TKey, TValue>.

multimap.insert: O(log n) where n is the number of keys.

For a Dictionary<TKey, List<TValue>> the equivalent functionality would be:

// value is TValue to insert
List<TValue> list;
if(!dictionary.TryGetValue(key, out list)) {
list = new List<TValue>();
dictionary.Add(key, list);
}
list.Add(value);

This is usually O(1) but can be O(n) when the capacity of the dictionary must be increased to accomodate the new element.

multimap.remove: There are three overloads of this method; I will only consider the one that accepts a key and removes all occurrences of that key from the multimap. This is an O(log n + m) operation where there n keys and m objects associate with a given key.

For a Dictionary<TKey, List<TValue>> the equivalent functionality would be:

 dictionary.Remove(key);

From the documentation: "This method approaches an O(1) operation." Same comment applies.

1: From the documentation: "Retrieving a value by using its key is very fast, close to O(1)." Why the documentation is vague on this point is confusing to me. Either an operation is O(1) or it isn't. There is no such thing as "close" to O(1).

MultiMap implementation

I would remove the collections so that your MultiMap has consistent behavior. If I used your MultiMap I would be very surprised (and unhappy) to find that a missing key behaves differently depending on whether a key was previously in the MultiMap or not.

Does Clear() remove the the Collections?

You may also create an unintended memory leak if you do not remove the collections. A developer may add many items, then remove them. Memory usage (after GC) should return to the same amount as before those items were added.

I would not worry about the cost of creating Collections. I would worry about the contract you create for your MultiMap. If after profiling your application you find that to be a concerned, you could modify or create a special MultiMap for that behavior. Don't fall into the trap of premature optimization.

MultiMap implementation

I would remove the collections so that your MultiMap has consistent behavior. If I used your MultiMap I would be very surprised (and unhappy) to find that a missing key behaves differently depending on whether a key was previously in the MultiMap or not.

Does Clear() remove the the Collections?

You may also create an unintended memory leak if you do not remove the collections. A developer may add many items, then remove them. Memory usage (after GC) should return to the same amount as before those items were added.

I would not worry about the cost of creating Collections. I would worry about the contract you create for your MultiMap. If after profiling your application you find that to be a concerned, you could modify or create a special MultiMap for that behavior. Don't fall into the trap of premature optimization.

Is there a C# Data structure to map keys to multiple values?

What you're looking for is a multimap.

You may want to take a look at the answer to this question.

You may also want to look into the C5 Generic Collection library, which is free and has an implementation of a multimap.

If you feel like rolling your own, a simple place to start is a dictionary of lists:

Dictionary<TKey,List<TValue>>

However, you can't add to such a dictionary the normal way. You have to first check if the key already exists, and if so, fetch the value (list) and add to it. Otherwise, you need to create the list and populate it with a value.

If you are so inclined, I would suggest you consider using a set of extension methods to simplify the Add/Remove operations:

public static class MultimapExt
{
public static void Add<TKey,TValue>(
this Dictionary<TKey,List<TValue>> dictionary, TKey key, TValue value )
{
List<TValue> valueList;
if( !dictionary.TryGetValue( key, out valueList )
{
valueList = new List<TValue>();
dictionary.Add( key, valueList );
}
valueList.Add( value );
}

public static void Remove<TKey,TValue>(
this Dictionary<TKey,List<TValue>> dictionary, TKey key, TValue value )
{
List<TValue> valueList;
if( dictionary.TryGetValue( key, out valueList ) )
{
valueList.Remove( value );
if( valueList.Count == 0 )
dictionary.Remove( key );
}
}
}

What is the equivalent of a map of maps in C#

It should be Dictionary<string, Dictionary<string, string>>, documention on MSDN.

Updated the sample based on comment :



Dictionary<string, Dictionary<string, string>> map = new Dictionary<string, Dictionary<string, string>>();
map["test1"] = new Dictionary<string, string>();
map["test1"]["test2"] = "anything";

How to forward MultiMap .Net to Java?

If these are two independent applications, there is no way for them to see each others' variables without an explicit communication channel. Your best bet is to open some sort of Socket between the two and specify how you want to data over the line to be structured.

JSON example(you probably don't need to do this manually, check for a library do to the object->JSON conversion.

{
{
"key":"mykey"
"values":["val1","val2","val3"]
}

{
"key":"mykey2"
"values":["val12","val22","val32"]
}
}

Once you know your data format and have a socket connection, you just need the C# and Java to be able to convert to and from the data format.

If you meant javascript interacting with server side C#, there are ways to bind variables but I don't know if they work with complex data types. How do you pass variables from c# to javascript?

Red Black Tree and Multimap

1) Nope, not true.

2) Modifying a single mapping red black tree to map keys to multiple values would be trivial. It would just require using a second data structure and mapping key -> collection.

For example, instead of mapping from a string to and int, you could map from a string to a vector of ints. Or a string to a linked list of ints. Or a string to a single-mapping RBT. So on :).


Revisiting #1: Technically that would still be mapping a key to a single value, just the value wouldn't be the directly mapped type. Depending on what you consider a "DataValue", then yes, the statement is true.


Also, the auxiliary data structure isn't actually necessary; it just simplifies traversal. Basically to accommodate duplicates, instead of a strict less than/greater than relation between parent/left and parent/right, you have one of the sides also include equal.

For example:

      5
3 7
3

Duplicate keys in .NET dictionaries?

If you're using .NET 3.5, use the Lookup class.

EDIT: You generally create a Lookup using Enumerable.ToLookup. This does assume that you don't need to change it afterwards - but I typically find that's good enough.

If that doesn't work for you, I don't think there's anything in the framework which will help - and using the dictionary is as good as it gets :(



Related Topics



Leave a reply



Submit