UseJunior Book a Demo

safe-docx · Parsing Heuristics

Parenthesized letter label extraction

When parsing agreement paragraphs that use typed list labels, the leading marker must be separated from the paragraph content before classification can proceed. The label (i.e., the visible numbering or lettering marker at the start of the paragraph) can determine how later style inference treats the paragraph.

extractListLabel returns the extracted label and its label type for text-formatted list labels. Because parenthesized letters can overlap with other parenthesized patterns, the parser classifies the label after checking the higher-priority label forms handled by the same primitive.[1]

Scenario

Below is a test scenario of the baseline successful case of extractListLabel: a parenthesized letter label is classified as a letter label.

The scenario

Given the input text is (a) First item of the agreement,
When extractListLabel is called with that text,
Then

  • result.label_type is LabelType.LETTER.
  • result.label is (a).

Test fixture

The fixture calls extractListLabel with a paragraph-like string and checks the returned label classification.

Below is the test fixture code.

  test.openspec('extract parenthesized letter labels')('Scenario: extract parenthesized letter labels', async ({ when, then, attachPrettyJson }: AllureBddContext) => {
    const text = '(a) First item of the agreement';

    let result!: ReturnType<typeof extractListLabel>;
    await when('extractListLabel is called', async () => {
      result = extractListLabel(text);
      await attachPrettyJson('Result', result);
    });

    await then('the result SHALL have label_type LETTER', () => {
      expect(result.label_type).toBe(LabelType.LETTER);
      expect(result.label).toBe('(a)');
    });
  });

Expected result shape

The scenario asserts the label type and the extracted label returned by extractListLabel.[2]

Below is the result that extractListLabel is expected to return for this scenario.

expect(result.label_type).toBe(LabelType.LETTER);
expect(result.label).toBe('(a)');

Below is a description of the expected fields: