UseJunior Book a Demo

safe-docx · Text Matching

Quote-normalized match across quote styles

When locating a defined term in paragraph text, quote style can differ between the document text and the search phrase. A matching step that compares quote variants as the same character prevents punctuation style from hiding a real match.

findUniqueSubstringMatch compares a haystack (the paragraph text being searched) with a needle (the substring, i.e., the smaller phrase being located). The function checks matching stages in order, and the quote-normalized stage maps curly and straight quote characters to the same form before searching while still returning a mode that records how the substring was found.[1]

Below is a test scenario of findUniqueSubstringMatch: quote-normalized matching treats curly quotes in the haystack as equivalent to straight quotes in the needle.

The scenario

When findUniqueSubstringMatch is called with a haystack containing curly quotes and a needle containing straight quotes,
Then

  • the result status is unique;
  • the result mode is quote_normalized.

The test fixture

The fixture supplies one paragraph-like haystack and one needle, then checks the returned matching status and mode. The haystack and needle differ only in quote style, so the scenario isolates the quote-normalized matching stage.[2]

Below is the test fixture code.

test.openspec('quote_normalized matches curly quotes against straight quotes')('Scenario: quote_normalized matches curly quotes against straight quotes', async ({ when, then, attachPrettyJson }: AllureBddContext) => {
  const haystack = '\u201CCompany\u201D means ABC Corp.';
  const needle = '"Company" means ABC Corp.';
  let result!: ReturnType<typeof findUniqueSubstringMatch>;

  await when('findUniqueSubstringMatch is called', async () => {
    result = findUniqueSubstringMatch(haystack, needle);
    await attachPrettyJson('Result', result);
  });

  await then('the result SHALL have status unique and mode quote_normalized', () => {
    expect(result.status).toBe('unique');
    if (result.status !== 'unique') return;
    expect(result.mode).toBe('quote_normalized');
  });
});

The expected result shape

The scenario asserts two fields on the returned value, so the expected shape shows those fields rather than the unasserted offsets or matched substring.

Below is the result that findUniqueSubstringMatch is expected to return for this scenario.

{
  status: 'unique',
  mode: 'quote_normalized',
}

Below is a description of the expected fields:

A non-obvious detail

The quote-normalized stage follows the exact and clean stages, so the returned mode records the first stage that finds one match. In this scenario, the quote style difference prevents an exact-stage match, and quote normalization produces the single match checked by the assertions.