Detailed Code Completion filtering, selection, and replacement algorithms

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Detailed Code Completion filtering, selection, and replacement algorithms

Sam Harwell

In the past I alluded to spending a great deal of time thinking about ways to improve the performance and usefulness of a code completion feature. This algorithm is complicated but ends up producing consistent, predictable behavior. It truly excels (blows the competition out of the water) with COMPLETION_AUTO_POPUP_DELAY set to 0 and low-latency implementations of AsyncCompletionTask.query, but still performs better than alternatives when latencies are observable.


To start with, a couple definitions from the subject line:


1.       Filtering algorithm: the algorithm used after semantic context evaluation to restrict the identifiers actually displayed in the completion dropdown.

2.       Selection algorithm: the algorithm used to select an initial “best match” item within the completion dropdown.

3.       Replacement algorithm: the algorithm used to determine what text gets replaced - currently chosen by the user pressing Enter or Ctrl+Enter.


I noticed that Ctrl+Enter deletes the current identifier before calling defaultAction. I’ll refer to the behavior of Ctrl+Enter as “Extend” because it extends the completion to the end of the current identifier. I’ll refer to the behavior of Enter as “No-Extend”.


For this email, I’m examining the following questions:


1.       Why is there a difference between Enter and Ctrl+Enter?

2.       When would you want the behavior of Enter, and when would you want the behavior of Ctrl+Enter?

3.       How should the completion dropdown get filtered?

4.       How should the “best match” be determined?


Part 0 - Progress:


I have modified the Editor Code Completion module to support everything described in this email without any breaking API changes relative to the current specification 1.28. In addition to preserving API compatibility, my current implementation exactly follows the existing code completion behavior if a developer does not explicitly override it.


Part 1 - My answer to #1:


The difference is present because the current algorithm cannot reliably answer question #2.


Part 2 - My answer to #2:


From what I can tell, the code completion algorithm uses this feature to compensate for not keeping track of information available when the completion was invoked. For example, suppose you are trying to complete the identifier getStuff, and the following currently present. For reference, assume this is columns 0 before the ‘g’ through 5 after the ‘t’.




When the code completion (Ctrl+Space) is invoked at positions p=0 or p=5, the user expects the behavior of Enter. When the code completion is invoked at positions 0<p<5, the user expects the behavior of Ctrl+Enter. Also note that at position p=5, the algorithms of Enter and Ctrl+Enter are equivalent.


Part 3 - Code completion selection in “Extend” mode:


Using the example from Part 2, it’s clear that the current selection algorithm used in the completion dropdown does not properly handle the “Extend” behavior, because it only considers the text before the caret when selecting an item. When the completion algorithm in invoked in “Extend” mode, all of the text of the current identifier should be considered when choosing the default selection.


Part 4 - Code completion filtering in both “Extend” and “No-Extend” modes:


If the code selection algorithm of Part 3 is implemented, then the user will encounter major problems under the current filtering algorithm. In addition, it should be immediately apparent that the current filtering algorithm is crippled because it is fully incapable of handling even the simplest typos when completion is invoked at the end of an identifier (position p=5 in the previous example). While I do not believe the following rules are ideal for all situations, I designed them to be easily implemented and feel similar to the current rules while preserving the ability to support the selection mode of Part 3 as well as handling many misspelling cases:


1.       For prefix filtering using the prefix text in span [0,x) with the caret at position c, you should always have x<=c.

2.       If filtering on the prefix [0,x+1), where x>=0, produces an empty result, then filtering should be on [0,x) instead.


In “No-Extend” mode with no misspellings before the caret, these rules produce exactly the same result as the current implementation. In “No-Extend” mode with misspellings present before the caret, these rules prevent having an empty (useless) dropdown appear. Unfortunately, if the user attempting to complete “getShell” types “getSt” and presses Ctrl+Enter, when both “getStuff” and “getShell” are available, the filtering above would result in only showing “getStuff”. To resolve this problem, the solution is adding the following rule which has much larger ramifications:


3.       The completion list should not be filtered before the user types a character *after* the list is initially shown. When the dropdown appears in response to an automatic trigger, the list should not be initially filtered even if the trigger is a word character (e.g. from option to automatically display the completion dropdown when an identifier character is typed).


To allow even more convenient typing, the filtering algorithm can be updated to also allow the following.


4.       When filtering on a prefix span [0,x), with x>1 (at least 2 characters), the filter allows items containing the substring [0,x). (substring match, may or may not be case sensitive).

5.       When filtering on a prefix span [0,x) which matches the regular expression ([A-Z]\w*){2,}, we call each group ([A-Z]\w*) a “word prefix”. The filter allows items containing words in the same relative order that match the word prefix. Note that this means the prefix “UnsExc” will allow “UnsupportedOperationException” even though the “Operation” appears between the words matched by “Uns” and “Exc”. When evaluating the match, the “words” of a completion item start when [A-Z] follows ^ or [^A-Z], and when [A-Za-z] follows ^ or [^A-Za-z]. The latter covers multi_word_ids_with_underscores.


Part 5 - Instant completion with rule 4.3 in place:


Currently instant completion only operates if the filtered list has a single item in it. It also only works if the caret is located at the end of an identifier. This algorithm would need to be updated as follows:


1.       The evaluated prefix for instant completion is the span [0,a), where “Extend” mode chooses the current identifier block and “No-Extend” mode chooses MIN(caretPosition, endOfIdentifier).

2.       The evaluated prefix must match the prefix of exactly 1 item in the completion list (unique match constraint). The match may or may not be case sensitive (this is an option in the Java editor, and some languages don’t consider case at all). Note that only prefix matching is used, even if the filtering algorithm implements the advanced filters 4.4 and 4.5 above.


Part 6 - Initial selection with filtering rules 4.2, 4.3, 4.4, and/or 4.5 in place:


It should be clear that if the filtering rules are relaxed per the advanced rules in Part 4 (especially rule 4.3), the current selection algorithm of first prefix match will do a poor job of choosing items the user is trying to complete. The following selection rules are prioritized for ideal behavior, but an implementation may use variations for efficiency as long as the variations result in predictable behavior (typically restricted to specific simplifications in evaluating rules 1 and 9).


Unless otherwise specified, character matches are case-insensitive. While the user has an option to explicitly disable case-insensitive matching, if all of the rules from this email are in place then they won’t need to use that option.


1.       Evaluated span: Due to the relaxation of filtering rules in Part 4, the completion dropdown may include items even when the current identifier block does not match any item. The largest possible span is the same as the one defined in Part 5.1. A match which includes more characters from this span wins over one that includes fewer.

2.       Exact match (case-sensitive): When possible, choose a completion item whose text exactly matches the evaluated span text.

3.       Prefix match (case-sensitive): When possible, choose a completion item that starts with the evaluated span text.

4.       Exact match (case-insensitive)

5.       Prefix match (case-insensitive)

6.       Substring match: When possible, choose a completion item that contains the evaluated span as a substring (filtering rule 4.4).

7.       Word boundary match: When possible, choose a completion item that matches the evaluated span as described in filtering rule 4.5.

8.       Validity match: When possible, choose a “smart match” completion item (sort priority < 0).

9.       Recently used: When possible, choose a completion item which was successfully inserted by a more recent completion than one inserted by an earlier completion (or not previously inserted).

a.       If semantics are tracked as well as the actual inserted text (e.g. for Java an MRU list of ElementHandle instead of String), then use a weighting algorithm to balance between text inserted more recently and having a full semantic match. One way to handle this having one MRU that tracks semantic elements and one that tracks strings. In the semantic element list make sure there are never 2 items present which are valid within the same context, then always prefer a semantic match over a plain string match. The net impact of this is very cool but hard to explain the nuances (the end user will just see it as the algorithm always knowing what to type).

10.   Case sensitivity: When possible, choose a completion item that matches the case of every matched character from the evaluated span. This decision is Boolean – the match is either all characters or does not match.

11.   Alphabetical order: If evaluation of all of the above still results in multiple potential “best match” items, choose the first one in alphabetical order.





Sam Harwell

Owner, Lead Developer

Description: Description: C:\Users\sam\Documents\Work\TVL\tvl_logo_small.png