Url Slugify Algorithm in C#

URL Slugify algorithm in C#?

http://predicatet.blogspot.com/2009/04/improved-c-slug-generator-or-how-to.html

public static string GenerateSlug(this string phrase) 
{
string str = phrase.RemoveAccent().ToLower();
// invalid chars
str = Regex.Replace(str, @"[^a-z0-9\s-]", "");
// convert multiple spaces into one space
str = Regex.Replace(str, @"\s+", " ").Trim();
// cut and trim
str = str.Substring(0, str.Length <= 45 ? str.Length : 45).Trim();
str = Regex.Replace(str, @"\s", "-"); // hyphens
return str;
}

public static string RemoveAccent(this string txt)
{
byte[] bytes = System.Text.Encoding.GetEncoding("Cyrillic").GetBytes(txt);
return System.Text.Encoding.ASCII.GetString(bytes);
}

Slugify and Character Transliteration in C#

I would also like to add that the //TRANSLIT removes the apostrophes and that @jxac solution doesn't address that. I'm not sure why but by first encoding it to Cyrillic and then to ASCII you get a similar behavior as //TRANSLIT.

var str = "éåäöíØ";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str));

=> "eaaoiO"

URL and Title with Foreign Characters

It's hard to tell how to fix this problem without knowing the way your columns are declared in your MySQL table, and how your entities (characters) are stored. This kind of multinational stuff is easiest to handle if your columns are declared something like this:

  Title VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_turkish_ci

This allows the table to contain your Turkish characters (and Greek and Hungarian, for that matter) without having to be entity-coded (ü, etc.)

If in fact your tables are coded this way, try the following SELECT statement:

 SELECT * 
FROM Article
WHERE Title = 'ococu'
COLLATE utf8_general_ci

Being ignorant of Turkish, I don't know the reasons for this, but it's clear that the Turkish collation treats ö as a different letter from o, likewise for ç and c, and ü and u. However, the utf_general_ci collation treats those letters as the same. That's why the SELECT statement above works.

IF your data in the table is stored entity-coded (ü, etc.) you really ought to translate it to utf8 so you can use this kind of search.

Finally, the fragment of URL you're mentioning with the value ococu is often called a slug in the trade. Your title Öçöçü needs to be converted to the slug value for searching. My suggestion above employs the collation to do that. It's worth mentioning that content management systems often store an article's title and slug in separate columns in the database table. This allows the creation of the slug from the title to be done explicitly at the time the article is created.

Here's a Stack Overflow item explaining how to use C# to turn a unicode phrase into a slug.

URL Slugify algorithm in C#?

Generate custom URL - showing description in URL

I believe this has been termed a URL slug, which will make it 100x easier to search for. I suggest you start: https://stackoverflow.com/a/2921135/507025 for an algorithm to help you slugify your url.

If you have a database with the information, you might want to save the slugified description to it so you can check for duplicates upon the creation of a new item.

Routing will be very similar although you'll be needing to change your routing attribute to:

[Route("detail/{productName:string")]
public ViewResult Detail(string productName)
{
return View();
}

You can then search your DB for the slugified description to return the item searched for OR use the terms given in a search to return multiple results.

There are probably different ways of doing this but now that you know it's called 'slug' you will have an easier time finding info about them.

Helper to generate friendly URL in Razor (C#/MVC 4)

I've done this before. While I dig up the code, here are things to keep in consideration:

  1. Make sure you store your generated URLs so you can do collision detection; converting strings to friendly URLs is almost certainly going to be lossy, so you need logic to resolve conflicted names.
  2. You ought to try to convert diacritical marks into more easily-typable characters.
  3. Consider making url-to-resource mappings a 1:many relationship; if the name of your resource changes, you may want to generate a new URL and have the old URL redirect to the new.

UPDATE: Here is my code for this. The Stack Overflow approach is OK, but I like mine better; instead of using a set of character substitutions, it uses the great .NET Unicode library to create friendlier characters:

public static string ConvertToFriendlyUrl(string text)
{
var decomposed = text.Normalize(NormalizationForm.FormKD);
var builder = new StringBuilder();
foreach (var ch in decomposed)
{
var charInfo = CharUnicodeInfo.GetUnicodeCategory(ch);
switch (charInfo)
{
// Keep these as they are
case UnicodeCategory.DecimalDigitNumber:
case UnicodeCategory.LetterNumber:
case UnicodeCategory.LowercaseLetter:
case UnicodeCategory.CurrencySymbol:
case UnicodeCategory.OtherLetter:
case UnicodeCategory.OtherNumber:
builder.Append(ch);
break;

// Convert these to dashes
case UnicodeCategory.DashPunctuation:
case UnicodeCategory.MathSymbol:
case UnicodeCategory.ModifierSymbol:
case UnicodeCategory.OtherPunctuation:
case UnicodeCategory.OtherSymbol:
case UnicodeCategory.SpaceSeparator:
builder.Append('-');
break;

// Convert to lower-case
case UnicodeCategory.TitlecaseLetter:
case UnicodeCategory.UppercaseLetter:
builder.Append(char.ToLowerInvariant(ch));
break;

// Ignore certain types of characters
case UnicodeCategory.OpenPunctuation:
case UnicodeCategory.ClosePunctuation:
case UnicodeCategory.ConnectorPunctuation:
case UnicodeCategory.Control:
case UnicodeCategory.EnclosingMark:
case UnicodeCategory.FinalQuotePunctuation:
case UnicodeCategory.Format:
case UnicodeCategory.InitialQuotePunctuation:
case UnicodeCategory.LineSeparator:
case UnicodeCategory.ModifierLetter:
case UnicodeCategory.NonSpacingMark:
case UnicodeCategory.OtherNotAssigned:
case UnicodeCategory.ParagraphSeparator:
case UnicodeCategory.PrivateUse:
case UnicodeCategory.SpacingCombiningMark:
case UnicodeCategory.Surrogate:
break;
}
}

var built = builder.ToString();
while (built.Contains("--"))
built = built.Replace("--", "-");
while (built.EndsWith("-"))
{
built = built.Substring(0, built.Length - 1);
}
while (built.StartsWith("-"))
{
built = built.Substring(1, built.Length - 1);
}
return built;
}

public static string GetIncrementedUrl(string url)
{
var parts = url.Split('-');
var lastPortion = parts.LastOrDefault();
int numToInc;
bool incExisting;
if (lastPortion == null)
{
numToInc = 1;
incExisting = false;
}
else
{
if (int.TryParse(lastPortion, out numToInc))
{
incExisting = true;
}
else
{
incExisting = false;
numToInc = 1;
}
}

var fragToKeep = incExisting
? string.Join("-", parts.Take(parts.Length - 1).ToArray())
: url;
return fragToKeep + "-" + (numToInc + 1).ToString();
}

public static string SeekUrl(
string name, Func<string, bool> uniquenessCheck)
{
var urlName = UrlUtils.ConvertToFriendlyUrl(name);
while (!uniquenessCheck(urlName))
{
urlName = UrlUtils.GetIncrementedUrl(urlName);
}

return urlName;
}

How to make SEO friendly extensionless urls dynamically in ASP.NET 4.0 webforms

Please look a simple way I have just implemented as per your requirement. There may be more ways to do the same.

I created a class: Images

public class Images
{
public string CurrentImage { get; set; }
public string NextImage { get; set; }
public string PreviousImage { get; set; }
public string CurrentImagePhysicalName { get; set; }
public Images(string currentImagePhysicalName, string Current, string Next, string Previous)
{
this.CurrentImagePhysicalName = currentImagePhysicalName;
this.CurrentImage = Current;
this.NextImage = Next;
this.PreviousImage = Previous;
}
}

Register the route and initializes the image collection at application startup:

public class Global : HttpApplication
{
public static List<Images> col = new List<Images>();
private void GetImages()
{
// Build this collection as per your requirement. This is just a sample.
// Logic is to store current, next, previous image details for current displaying image/page.
// Hope while storing image each will have unique name before saving and will have all details in db like path, display name, etc.

col.Add(new Images("orderedList0.png", "orderedList0", "orderedList1", ""));
col.Add(new Images("orderedList1.png", "orderedList1", "orderedList2", "orderedList0"));
col.Add(new Images("orderedList2.png", "orderedList2", "orderedList3", "orderedList1"));
col.Add(new Images("orderedList3.png", "orderedList3", "orderedList4", "orderedList2"));
col.Add(new Images("orderedList4.png", "orderedList4", "", "orderedList3"));
}

void Application_Start(object sender, EventArgs e)
{
GetImages();
RegisterRoutes(RouteTable.Routes);
}

public static void RegisterRoutes(RouteCollection routeCollection)
{
routeCollection.MapPageRoute("RouteForImage", "Posts/{Name}", "~/Posts.aspx");

}
}

Posts.aspx

protected void Page_PreRender(object sender, EventArgs e)
{
string currentImage = RouteData.Values["Name"].ToString();
if (!String.IsNullOrEmpty(currentImage))
{
Images image = Global.col.Find(x => x.CurrentImage == currentImage);
// Get Current Image URL where actually it is stored using from image variable and render / set image path where you want to using image.CurrentImagePhysicalName

// Set Next - Previous Image urls
if (!String.IsNullOrEmpty(image.NextImage))
{
hyperlink_next.Visible = true;
hyperlink_next.Text = image.NextImage;
hyperlink_next.NavigateUrl = GetRouteUrl("RouteForImage", new { Name = image.NextImage });
}
else
hyperlink_next.Visible = false;

if (!String.IsNullOrEmpty(image.PreviousImage))
{
hyperlink_previous.Visible = true;
hyperlink_previous.Text = image.PreviousImage;
hyperlink_previous.NavigateUrl = GetRouteUrl("RouteForImage", new { Name = image.PreviousImage });
}
else
hyperlink_previous.Visible = false;
}
}

This is just a sample demonstration. The main idea here was to handle RouteData.Values["Name"].ToString() to handle dynamic urls.

Hope this will be useful for you.

How does Stack Overflow generate its SEO-friendly URLs?

Here's how we do it. Note that there are probably more edge conditions than you realize at first glance.

This is the second version, unrolled for 5x more performance (and yes, I benchmarked it). I figured I'd optimize it because this function can be called hundreds of times per page.

/// <summary>
/// Produces optional, URL-friendly version of a title, "like-this-one".
/// hand-tuned for speed, reflects performance refactoring contributed
/// by John Gietzen (user otac0n)
/// </summary>
public static string URLFriendly(string title)
{
if (title == null) return "";

const int maxlen = 80;
int len = title.Length;
bool prevdash = false;
var sb = new StringBuilder(len);
char c;

for (int i = 0; i < len; i++)
{
c = title[i];
if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
{
sb.Append(c);
prevdash = false;
}
else if (c >= 'A' && c <= 'Z')
{
// tricky way to convert to lowercase
sb.Append((char)(c | 32));
prevdash = false;
}
else if (c == ' ' || c == ',' || c == '.' || c == '/' ||
c == '\\' || c == '-' || c == '_' || c == '=')
{
if (!prevdash && sb.Length > 0)
{
sb.Append('-');
prevdash = true;
}
}
else if ((int)c >= 128)
{
int prevlen = sb.Length;
sb.Append(RemapInternationalCharToAscii(c));
if (prevlen != sb.Length) prevdash = false;
}
if (i == maxlen) break;
}

if (prevdash)
return sb.ToString().Substring(0, sb.Length - 1);
else
return sb.ToString();
}

To see the previous version of the code this replaced (but is functionally equivalent to, and 5x faster), view revision history of this post (click the date link).

Also, the RemapInternationalCharToAscii method source code can be found here.



Related Topics



Leave a reply



Submit