Thursday, March 15, 2007

Parsing tags input using Split function

User input validation is one of the more complex issues in developing a solid data-entry form. Input validation is the basis of application security, database integrity, application stability and more.

Today i encountered an input validation issue regarding the input that users enter as "tags", the popular new type of information used in many User Generated Content (UGC) sites.

When i used the function Split from the String class to parse the tags into an array of strings i found out that if the user entered multiple space characters (eg. "tag1 tag2 tag3") each space from the second one will result in an empty cell in the tags array. Splitting these sample tags will result in this array:
array[0]: "tag1"
array[1]: "tag2"
array[2]: ""
array[3]: ""
array[4]: ""
array[5]: "tag3"

In order to avoid entering blank tags in the database i wrote a simple function that prepares a clean array of tags from the raw tags string. Here it is:
(Besides removing empty cells this function also convert commas to blanks and lowers the case for all tags. This avoids multiple-word tags and case differences, as our project demands)


private string[] PrepareTags(string tags)
{
char[] delim = { ' ' };
//Replace comma with blank
tags = tags.Replace(",", " ");
//Split to array with blank delimiter
string[] tagsArray = tags.Split(delim);
ArrayList tagsList = new ArrayList();
//Eliminate empty cells
string[] preparedTags;
for (int i = 0; i < tagsArray.Length; i++)
{
if (tagsArray[i].Trim().Length > 0)
//Make all tags lowercase
tagsList.Add((string)tagsArray[i].ToLower());
}
preparedTags = new string[tagsList.Count];
tagsList.CopyTo(preparedTags);
return preparedTags;
}


If you find this function useful please comment and tell me.

blog comments powered by Disqus