A blog on Technology | Culture | Life | People | Experiences

.NET HTML Parsers

Recently, I was looking for HTML parsers for use in a .NET project, and I came across these:

  1. HTML Tidy – seems very popular with ports in Java platform as well. You can create a .NET wrapper around this C++ library, and a few people have already done this for you! Like here 🙂 Couple of GUI tools are also available, like Tidy UI. The documentation seems a little complex, so I will try Tidy the last!
  2. ACRUX HTML Parser. I installed the trial version, but it is not a fully-functional-time-bound trial.
  3. Html Agility PackThis is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don’t HAVE to understand XPATH nor XSLT to use it, don’t worry…). It is a .NET code library that allows you to parse “out of the web” HTML files. The parser is very tolerant with “real world” malformed HTML. The object model is very similar to what
    proposes System.Xml, but for HTML documents (or streams)
    “. This seems easy to use, and coded directly in .NET!
  4. Html DOM – “A class library that implements HTML DOM (Document object Model) for .Net platform.
  5. WebLexicon – “Open-Source Markup Language Parser Library for .NET (XHTML/HTML/SGML/XML/MATHML)


Filed under: Programming

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: