Customization of the rules

All your technical terms (but not some proper nouns) are unknown to the term checker until you add them to the the disambiguation files. If you do not add your organization's technical names and technical verbs to the term checker, the term checker cannot fully analyse your text.

To customize the term checker, you must know these:

To learn how to write rules, refer to https://dev.languagetool.org/#rule-development. Use an XML editor or a text editor that has syntax highlighting (https://en.wikipedia.org/wiki/Syntax_highlighting).

You can write rules that use regular expressions. To learn about regular expressions, refer to www.regular-expressions.info.

For information about terminology management, refer to Case study: text simplification for shipping procedures (www.techscribe.co.uk/techw/text-simplification-for-shipping-procedures.htm).

To buy customization services, contact TechScribe. TechScribe can do these tasks for you:

To customize the term checker

  1. Make sure that you did the installation procedure 'Download the templates for your project terms'.
  2. Add technical names and technical verbs to disambiguation-projectterms.xml.
  3. Add not-approved terms and misused terms to grammar-projectterms.xml.
  4. To make sure that complex rules are correct, use testrules.

Add terms to disambiguation-projectterms.xml

The term checker contains technical names from these sources:

The term checker contains most technical verbs that are in rule 1.12.

For each approved term that is not in the term checker, add each inflection of the term. Use the rules that are in disambiguation-projectterms.xml as examples. For British English, the term checker uses the Oxford spelling. Thus, if applicable, use the Oxford spelling for the terms that you add.

Multi-word terms are more difficult to add than 1-word terms because a separate rule for each inflection is necessary. If you keep the technical terms in a spreadsheet, you can use a script to convert the data to XML. TechScribe uses a series of regular expressions in PowerGREP (www.powergrep.com) to convert the terms to XML.

If a term is approved for only 1 meaning, and if you want to give guidelines to technical writers, add a grammar rule for that term.

ASD-STE100 rule 1.5 does not tell you the part of speech that a technical name has. The rule only tells you the categories of terms that you can add. All the examples are nouns or noun clusters. Rule 1.5.17 tells you that colours are adjectives, but in STE they are technical names. Thus, to comply with the STE specification, you can only add nouns or noun clusters as technical names.

When you write a disambiguation rule, do not use the immunize attribute. Immunization can cause a grammar rule not to find text (https://dev.languagetool.org/developing-a-disambiguator#immunizing-words-from-matching).

Numbers, units of measurement and time (and their symbols) (rule 1.5.9)

These terms are approved in the term checker:

The term checker uses SI units of measurement. These terms are examples in rule 1.5.9, but they are not in the term checker because they are imperial units of measurement: knot, mile, inch.

To simplify noun clusters (part of rule 2.2)

To simplify a noun cluster, you can "use hyphens (-) between words that are used as a single unit." Sometimes, hyphens in different locations are possible. For example, for the noun cluster filter unit top cover, hyphens in these locations are possible:

You must make a decision about where to put the hyphens.

To add project terms that are not-approved in STE

A 1-word not-approved term in the STE dictionary can be technical term, if the meaning of the technical term is different from the meaning of the not-approved term. Examples are in the table:

A not-approved term can be a technical term
TermNot-approved meaningProject approved meaning
case (n) condition a type of bag: briefcase, suitcase
chip (n) particle semiconductors: integrated circuit, microchip
collapse (v) close, fall astronomy: for a star to fall in on itself
compile (v) make a list, record, collect software development: to change a high-level programming language to binary code to make an executable program
deposit (n) particle, contamination geophysics: a natural underground layer of rock or other material
route (n, v) noun: routing [direction of cables and pipes]
verb : put.
logistics: noun: the course to go from a start location to an end location.
verb: to calculate or to specify the course of a transport vehicle.

To ignore the default rule for the not-approved term, do one of these tasks:

For example, the noun case is not approved. You will see different messages for the noun case and the verb case with the two sentences that follow:
Each passenger is permitted to put 1 case in the overhead locker.
To prevent an accident, case the gun after you use it.

The not-approved noun 'case' has a different message to the unknown verb 'case'.

If you add the singular noun case to disambiguation-projectterms.xlm, the term checker will ignore the rule for the not-approved noun case and will use the rules for technical names and technical verbs.

The technical name 'case' has different messages to the not-approved noun 'case'.

To tell writers to use the term with the approved meaning, add a grammar rule for the term.

Most rules for not-approved terms have an exception for words that have a different part of speech. Thus, the rule STE_NOT_APPROVED_case_CASE does not give a warning for the verb case. Most rules for terms that are not-approved with more than one part of speech do not have exceptions for other parts of speech. For example, the word route is a not-approved as a noun and as a verb. The rule has an exception only for multi-word project terms and for proper nouns.

If you add the noun route to disambiguation-projectterms.xlm, the term checker will give a correct analysis, but because rule STE_NOT_APPROVED_route_ROUTE contains examples, if you use testrules, you will see an error message that contains this text:

Errors expected: 1
Errors found   : 0

To add adjectives

Usually, an adjective is not a technical name or a technical verb. (Rule 1.5.17 tells you that colours are technical names). Thus, you must not add an adjective to the disambiguation rules, unless it is a colour. As an alternative, add the full technical term.

Examples of technical names that contain adjectives are as follows: achromatic material, allergic reaction, biodegradable container, sterile surgical equipment.

If you add only the adjective part of a technical name, you will not get an error message for incorrect STE. Examples:
The material is achromatic.
Are you allergic to penicillin?
This container is biodegradable.
The needle is not sterile.

To add adverbs

An adverb is not a technical name or a technical verb. Thus, you must not add an adverb to the disambiguation rules. As an alternative, add the full technical term.

For example, the term intrinsically safe is a technical term (www.osha.gov/laws-regs/regulations/standardnumber/1910/1910.307/).

Examples of technical names that contain intrinsically are as follows: intrinsically safe apparatus, intrinsically safe equipment, intrinsically safe tool.

If you add intrinsically as an adverb, you will not get an error message for this incorrect STE:
The test is intrinsically easy to do.

The term intrinsically safe is an adjective, not a technical name. Thus, if you add intrinsically safe as an adjective, you will not get an error message for this incorrect STE:
The equipment is intrinsically safe.

In ASD-STE100, for an example of a technical name that contains an adverb, refer to the approved example for view (v). The term n o'clock position, where n is an integer between 1 and 12, is a technical name:
The bolt will be at the 2 o'clock position when you look at the pump from the rear.

To specify that text is part of a list item that starts a sentence

In a regular expression, the ^ character (caret) matches the start of a string. In LanguageTool, the largest possible string is a sentence. LanguageTool has a POS tag SENT_START, which is equivalent to a caret.

Technical documentation frequently contains numbered instructions. The term checker gives the POS tag NLI_SENT_START to the last token in some basic number patterns. Examples: 2), 3.c). The term checker cannot identify all possible number sequences. For example, Step n) is an unknown number sequence. The term checker gives warnings for the sentence that follows:
Step 3) Open the window.

The term checker gives a warning for the undefined list number 'Step 3)'.

If Step n) is an approved number sequence, to make the term checker ignore the number sequence and give a correct analysis for the words at the start of a sentence after the number sequence, add this rule to disambiguation-projectterms.xml:


<rule id="PROJECT_SENTENCE_START" name="Project sentence start: Step n)">
  <pattern>
    <token postag="SENT_START"/>
    <marker>
      <token case_sensitive="yes">Step</token>
      <token regexp="yes">[1-9]|1[0-9]</token><!-- The approved number range is 1 to 19 -->
      <token>)</token>
    </marker>
  </pattern>
  <disambig action="add">
    <wd pos="IS_NOUN"/><!-- In the context of the pattern, disambiguate the word to assert that it is a noun -->
    <wd pos="IS_NOUN"/><!-- In STE, a number is a noun -->
    <wd pos="NLI_SENT_START"/>
  </disambig>
  <example type="untouched">Step 99) If necessary, change the number range.</example>
  <example type="untouched">STEP 5) Use initial capitals only.</example>
  <example type="untouched">Before you do Step 5) Open the window.</example>
  <example type="ambiguous" inputform=")[)]" outputform=")[)/NLI_SENT_START]">Step 3<marker>)</marker> Open the window.</example>
</rule>

If the sentences do not end with a full stop (period) and are not separated by an empty line, LanguageTool does not find the end of a sentence, and you will see incorrect warnings.

Add terms to grammar-projectterms.xml

To give guidelines to technical writers, add terms to grammar-projectterms.xml. Typically, add rules for these:

For examples of the types of rules that you can write, refer to grammar-projectterms.xml.

In English, many words have more than one part of speech. To prevent unwanted warnings, you can make a rule that shows a message only if a term has (or does not have) a specified part of speech. This example is from Managing terminology with term checker, Jake Cahill, 2018:


<rule id="PROJECT_NOT_APPROVED_screen" name="Project Not Approved noun: screen">
  <pattern>
    <token regexp="yes">screens?<exception postag="IS_VERB"/></token>
  </pattern>
  <message>The noun '\1' is not approved. Possible replacements: <suggestion><match no="1" postag_regexp="yes" postag="(NNS?)" postag_replace="$1">page</match></suggestion></message>
  <<short>Project Dictionary. Not approved noun: screen</short>
  <example correction="page" type="incorrect">This <marker>screen</marker> displays the results.</example>
  <example correction="pages" type="incorrect">If the <marker>screens</marker> do not show these messages, stop the test.</example>
  <example type="correct">On this <marker>page</marker> you can enter a new name.</example>
  <example type="correct">When you <marker>screen</marker> the drugs for side-effects...</example>
  <example type="correct">Who <marker>screens</marker> the drugs for side-effects?</example>
  <example type="triggers_error">When the medical technicians <marker>screen</marker> the drugs for side-effects...</example><!-- False positive -->
</rule>

This line in the rule tells the term checker to find the words screen and screens except if they are verbs :

<token regexp="yes">screens?<exception postag="IS_VERB"/></token>

In the term checker, the noun screen is approved as a technical name. The word is unknown as a verb. Thus, until you add the verb screen and its approved inflections in disambiguation-projectterms.xml, you will see a message that tells you not to use a technical name as a verb. (You can deactivate the rule.)

In grammar-projectterms.xml, you can use these values with the postag attribute:

Notes:

To find the part of speech that a word has

  1. In LanguageTool, select Text Checking>Tag Text.
  2. The Tagger Result screen shows the parts of speech that a word has:
    Tagger Result screen shows the parts of speech for words in the sentence: The word 'disambiguation' is unknown.

To make sure that complex rules are correct, use testrules

If you write complex rules, use testrules to make sure that the rules are correct. Refer to https://dev.languagetool.org/development-overview#testing-rules.

STE rule 1.6 shows that an not-approved STE term can be an approved project term. For example, the word regulation is not approved as a noun, but rule 1.5.15 and the example in people (n) show that it can be a technical name. The word is in the term checker and a rule tells you to make sure that it has the correct meaning.

Not all the not-approved STE terms that can be technical names or technical verbs are in the term checker. For example, the word route as a noun and as a verb is not approved in ASD-STE100 and it is not in the term checker as a technical name or a technical verb. If route is an approved term in your organization, if you add the approved inflections of route to disambiguation-projectterms.xml, testrules will give an error message because the term checker has a dictionary rule for the word. The dictionary rule contains an example of incorrect text and it also has an exception for project terms. Thus, there is a conflict, which testrules finds.

Local files version only. To prevent the testrules error message, put the STE rule into comments or delete the rule from grammar-ste8.xml and change the POS tags in the applicable rules in disambiguation-ste8.xml.

Other customization

You can customize the rules to make other types of language quality-assurance software, for example:

RSS feed