UseJunior Book a Demo

safe-docx · XML Parsing

Element structure preservation through XML round-trip

When round-tripping XML content, serialized output needs to preserve the element names, attributes, and text that carry document meaning. A parser that drops or rewrites those pieces can make a later document operation work from a different structure than the one the caller supplied.

parseXml turns an XML string into a DOM document using XML parsing rules, and serializeXml turns that document back into XML text. The round-trip matters because parsing is often only the first step before another primitive reads or mutates the document tree.[1]

Below is a test scenario of the baseline successful case of parseXml: valid XML is parsed and immediately serialized.

The scenario

Given valid XML with a child element, an attribute, text, and a sibling element,
When the XML is parsed and immediately serialized,
Then

  • the serialized output contains child,
  • the serialized output contains attr="val",
  • the serialized output contains hello,
  • the serialized output contains sibling.

Test fixture

The scenario records the serialized output after the round-trip and checks the pieces that must survive the parse step. The DOM document is captured in result.doc, but the assertions are made against result.output.[2]

Below is the test fixture code.

test.openspec('parse and serialize preserves element structure')('Scenario: parse and serialize preserves element structure', async ({ when, then, attachPrettyJson }: AllureBddContext) => {
  const input = '<root><child attr="val">hello</child><sibling/></root>';
  let result!: { doc: ReturnType<typeof parseXml>; output: string };

  await when('valid XML is parsed and immediately serialized', async () => {
    const doc = parseXml(input);
    const output = serializeXml(doc);
    await attachPrettyJson('Round-trip', { input, output });
    result = { doc, output };
  });

  await then('the output SHALL contain all original elements, attributes, and text', () => {
    expect(result.output).toContain('child');
    expect(result.output).toContain('attr="val"');
    expect(result.output).toContain('hello');
    expect(result.output).toContain('sibling');
  });
});

Expected outcome

The scenario asserts predicates over the serialized XML string, so the expected outcome is the set of containment checks that must pass after parsing and serialization.

Below are the assertions that describe the expected serialized output for this scenario.

expect(result.output).toContain('child');
expect(result.output).toContain('attr="val"');
expect(result.output).toContain('hello');
expect(result.output).toContain('sibling');

The result.output field is expected to contain child, attr="val", hello, and sibling, because those values represent the original element name, attribute, text content, and sibling element that must remain present after the round-trip.