Markdown Parser in JavaScript
This challenge asks you to implement a basic Markdown parser in JavaScript. Markdown is a lightweight markup language with plain text formatting syntax. Building a parser allows you to convert Markdown text into HTML, enabling rich text formatting in various applications.
Problem Description
You are tasked with creating a JavaScript function that takes a Markdown string as input and returns the corresponding HTML string. The parser should support the following Markdown features:
- Headers:
#for<h1>,##for<h2>,###for<h3>, etc. (up to######for<h6>). - Emphasis:
*text*or_text_for<em>text</em>. - Strong Emphasis:
**text**or__text__for<strong>text</strong>. - Paragraphs: Text separated by one or more blank lines should be rendered as
<p>tags. - Unordered Lists:
*or-at the beginning of a line followed by a space should create<li>elements within a<ul>tag. - Code Blocks: Lines starting with a single backtick (
) should be rendered as<code>` elements.
The parser should handle multiple lines of Markdown input. It should correctly identify and convert the specified Markdown elements into their HTML equivalents. Whitespace should be preserved where appropriate (e.g., within paragraphs).
Edge Cases to Consider:
- Empty input string.
- Markdown elements nested within each other (e.g.,
**_text_**). While full nesting support is not required, basic nesting should be handled correctly. - Lines starting with
#that are not headers (e.g.,* #comment). - Multiple consecutive list items without blank lines in between.
- Code blocks spanning multiple lines.
- Whitespace at the beginning and end of lines.
Examples
Example 1:
Input: "# Hello\n\nThis is a paragraph.\n* List item 1\n- List item 2\n\n## Another header"
Output: "<h1>Hello</h1><p>This is a paragraph.</p><ul><li>List item 1</li><li>List item 2</li></ul><p>Another header</p>"
Explanation: The input is parsed into a header, a paragraph, an unordered list, and another header, all correctly formatted as HTML.
Example 2:
Input: "This is a *bold* and _italic_ text.\n\n```\nfunction hello() {\n console.log('Hello');\n}\n```"
Output: "<p>This is a <strong>bold</strong> and <em>italic</em> text.</p><pre>function hello() {<br> console.log('Hello');<br>}</pre>"
Explanation: The input includes bold and italic text, and a code block which is rendered using the `<pre>` tag.
Example 3:
Input: "# Header 1\n## Header 2\n### Header 3\n#### Header 4\n##### Header 5\n###### Header 6"
Output: "<h1>Header 1</h1><h2>Header 2</h2><h3>Header 3</h3><h4>Header 4</h4><h5>Header 5</h5><h6>Header 6</h6>"
Explanation: Demonstrates parsing of all header levels.
Constraints
- The input Markdown string can be up to 1000 characters long.
- The output HTML string should be valid HTML.
- The parser should be reasonably efficient; avoid excessive looping or recursion.
- The parser should handle Unicode characters correctly.
Notes
- You can use regular expressions to simplify the parsing process, but they are not strictly required.
- Consider breaking down the problem into smaller, manageable functions (e.g., a function to handle headers, a function to handle emphasis, etc.).
- Focus on correctly parsing the specified Markdown features. Advanced features like links, images, or tables are not required for this challenge.
- Think about how to handle different line endings (e.g.,
\nvs.\r\n). Normalizing line endings can simplify the parsing logic. - The order of operations matters. For example,
**_text_**should be parsed as<strong><em>text</em></strong>, not<em><strong>text</strong>.