Express solutions elegantly using your programming language

Posted on Jul 16, 2017

Introduction

The post features a case study of a real life problem and the C# programming language. It explains how we went about trying to best express the solution within the bounds of the C# language.

Although we use the C# programming language here,the ideas shared in this post aren’t specific to it and can be applied to any high level language you are working with.

Problem

Recently, for a customer requirement we had to work with a document that has both metadata and data. This doc needed to be parsed and interpreted. Metadata is enclosed within # symbols and the associated value is provided just after it.

Example:

...
#BookName#Foobar
#ChapterName#Adventures of Foo
#SectionName#Introducing Foo
...

Tags allow for identifying various aspects of the document like chapters, sections, book names amongst others.

The task was to parse out the document, identify sections and extract the necessary information.

Solution

The first step is to identify the paragraphs that contain tags that are of importance to us. Following this, we need to be able to parse out the data within these tags.

Finally we put the metadata together with the data and return a parsedDocument.

Part 1

A loop that iterates paragraph by paragraph must now check the contents of the para for respective tags. For this, the tags have to be declared upfront. One way to do this is by declaring simple constants in the class:

private const string chapterNameTag = "#ChapterName#";
private const string sectionNameTag = "#SectionName#";

So the check block becomes:

if (text.Has(chapterNameTag))
{
  // ...
}

(Note: Has is a simple C# extension method on top of the StartsWith function to do the check case insensitively.)

Simple constants work but I could do better, so I pulled out all the tags and moved them to a nested private class as public constants, so:

public class Parser
{
  private class Tags
  {
    public const string ClassName = "#ClassName";
    public const string SectionName = "#SectionName#";
    // ...
  }
}

The checks now become:

if (text.Has(Tags.ClassName))
{
  // ...
}

This feels better than plain constants declared at the top. It feels more intuitive that new tags have a designated place to be declared viz a viz being declared as just constants in the class

Part 2

Once we have identified the tag, the next step is to extract the <name> portion from it. In our case, we’d like to get out the Introduction to Foo from the #SectionName# tag. The simplest and initial approach to this was:

if (text.Has(Tags.SectionName))
{
  parsedDocument.SectionName = ExtractTextFromTag(text, Tags.SectionName);
}

This works, but ExtractTextFromTag is just a lone standing function, which kind of didn’t sit for me (Pun intended). I thought it belongs better inside the Tags class.

To satisfy my want, here’s what I could do:

if (text.Has(Tags.SectionName))
{
  parsedDocument.SectionName = Tags.ExtractFromText(text, Tags.SectionName));
}

This solves it, sure. But I still felt I could do better because there is a repetition of Tags.SectionName. I wanted to be able to call a method on Tags.SectionName to parse it out. Think of it like calling DateTime.Parse.

This is what I wanted:

parsedDocument.SectionName = Tags.SectionName.Parse(text);

Hard to achieve, I thought since we’ve declared Tags.SectionName as a string.

The first thought was to add an extension method on the string class (like we did for the Has function). But this is a dead end since it would be available to all instances of the string class.

Let’s restate the problem we’re facing:

“I wanted a way to describe Tags.SectionName in a way that would allow me to use it as a regular string for comparison purposes but still be a class where I could declare the Parse functionality."

With this thought in mind, a bit of googling led me to the solution:

C# allows to define methods in a class using the implicit keyword. Doing this allows you to define functionality for how the object should behave if it is treated as another type. Think of it like explicit type casting (a la OtherType otherType = (OtherType)myType) but just implicit.

With that in mind, I defined a Tag class like so:

(Note: Error handling avoided for the same of brevity and discussion)

public class Tag
{
    private string _value;
    
    public Tag(string tag) 
        => _value = tag;

    public string Parse(string textWithTag)
        => Regex.Replace(textWithTag, _value, "", RegexOptions.IgnoreCase);

    public static implicit operator string(Tag t)
        => t._value;
}

Internally a Tag is just a value, so we mandate a tag to contain a value by accepting it in its constructor.

This is followed by the definition of the Parse function. It takes the text with the tag (say #SectionName#Introduction to Foo) and strips out the #SectionName# part ignoring the case. We use System.Text's Regex class to do this for us. Simple and straightforward.

What follows next is the usage of the implicit operator string which tells C#: “When this class is used as a string, return the value property”.

SUUUPER neat. WOO!

Now this allowed me to redefine the original Tags class like so:

public class Parser
{
  private class Tags
  {
    public static readonly Tag ClassName = new Tag("#ClassName");
    public static readonly Tag SectionName = new Tag("#SectionName#");
    // ...
  }
}

Which means that I can now use this as both a string, and an instance on which I have a parse method. So my if condition and the parse block becomes:

if (text.Has(Tags.SectionName))
{
    parsedDocument.SectionName = Tags.SectionName.Parse(text);
}

This made me happy so I stopped right here.

Conclusion

I’m quite happy that everything is well contained and works together. This however isn’t the approach I took for the program since I think its overkill for the particular scenario in the application. But it was a good opportunity to push myself to learn better ways to express a solution for such a problem.

Take aways

  • When you think something isn’t possible with the language, dig deeper. If you can think about it, chances are that someone already has and it has been solved
  • Learn the technical terms (In this case, implicit type conversion C# gave me what I was looking for). It becomes that much easier to narrow down your search when looking for answers
  • Don’t settle for what just works, ask yourself “what is the best way to represent my problem” and try to achieve that

Credits

Major props to Bargava for helping with proof reading the blog and VK for helping bounce ideas off of on a lazy Sunday morning.