Notes

What I would otherwise forget

View the Project on GitHub

Digging Into SourceKit Queries

Valid July 2017 with swift:master

Background

SourceKit is the component of Swift that provides an API for understanding and manipulating Swift source code. Its conceptual model is higher-level and simpler than the Swift compiler data structures: SourceKit itself builds and processes those data structures to provide what it thinks is useful to clients.

The main client of SourceKit is Xcode. It is pretty clear that changes to the API are made organically to meet Xcode’s needs: the various parts of the API do not hang together as a coherent designed whole. The SourceKit API is not specified publically or much documented – it is not covered by the swift-evolution process. This probably gives us a better Xcode experience but makes it tough/interesting to work with.

The point of this piece is to compare the various APIs and figure out what best to change to improve SourceKitten & clients, particularly Jazzy.

Query Request Types

source.request.editor.open
Returns the 'structure' of the code. Issued by sourcekitd-test -req=structure and sourcekitten structure.
  • Bugs: Omits (at least) subscript and typealias declarations.
  • The only SourceKit API to return accessibility fields.
  • No USR, declaration annotation. No XML doc comment.
  • Lists attributes by names but drops all content.
  • (Requires key.syntactic_only: 1 otherwise times out waiting for sema???)
source.request.docinfo
Intended to be a more 'useful' view of code? Issued by sourcekitd-test -req=doc-info.
  • Includes USR and fully-annotated declaration, XML doc comment.
  • Has great decoding of @available attributes but ignores all others.
source.request.cursorinfo
Returns details of a single declaration. SourceKitten uses this after structure to fill in blanks. Issued by sourcekitd-test -req=cursor.
  • Includes USR, annotated declarations, XML doc comment.
  • Does not include ACL; does not include attribute info.
  • Offset + Length refer to the identifier only.
source.request.indexsource
Returns a thorough view of the file structure. Issued by sourcekitd-test -req=index and sourcekitten index.
  • Includes detailed references (for 'follow-symbol' type use).
  • File positions based on column + line numbers.
  • Includes USR, no declaration annotation, no XML doc comment.

So this is a bit of a mess. It looks like structure and cursorinfo should fit together, and this is SourceKitten’s main strategy, but that leads to problems:

  1. Due to structure bugs, subscripts and typealiases are missing. SourceKitten valiantly tries to deal with this by sending cursorinfos after doc-comments that don’t crop up in the structure. This means undocumented decls are omitted, and there is no ACL or attribute info for the documented ones.
  2. The rich decode of @available is … unavailable.

Let’s try to figure out why these problems exist. Hope to add the missing types to structure but am somewhat resigned to having to add an additional docinfo query to SourceKitten in order to access the @available stuff.

Code Investigation

Briefly:

Queries

Structure - source.request.editor.open
CursorInfo - source.request.cursorinfo
DocInfo - source.request.docinfo

These APIs are almost entirely independent, each having their own AST-traversal logic, their own intermediate data structure, and their own serialization logic. Indexing is separate again. The structure returned by editoropen uses an AST that has only been parsed: the others use a type-checked ‘sema’ed AST.

Would be fascinating to know the development history here – feels like the classic “user interfaces are easy” plus drip-drip of requirements antipatterns.

The fields returned form a classic three-way venn diagram:

Key editoropen cursorinfo docinfo
kind[1] Y Y Y
offset[2] Y Y Y
length[2] Y Y Y
name Y Y Y
nameoffset Y    
namelength Y    
bodyoffset Y    
bodylength Y    
usr   Y Y
accesslevel Y    
setteraccesslevel Y    
typename Y[3] Y  
runtime_name Y[4]    
selector_name Y[5]    
attributes Y[6]   Y[7]
full_as_xml   Y Y[8]
annotated_decl   Y  
fully_annotated_decl   Y Y
groupname, modulename   Y  
localization_key   Y Y
filepath   Y  
parent_loc   Y  
is_system   Y  
typeusr   Y  
containertypeusr   Y  
unavailable     Y
deprecated     Y
optional     Y
generic_requirements     Y

Then more complex:

Concept editoropen cursorinfo docinfo
Inherited classes inheritedtypes - names   inherits - name/usr/kind
Conformed protocols inheritedtypes - names   conforms - name/usr/kind
Overridden class members   overrides - usr inherits - name/usr/kind
Overridden proto members   overrides - usr conforms - name/usr/kind
Extended types     extends - name/usr/kind
Overloaded functions   related_decls  
Generic parameters     generic_params - names; generic_type_param entity - name / usr/declaration

Notes:

  1. Kind space is different
  2. editoropen and docinfo disagree on parameters. editoropen generates a decl.var.parameter for the argumen, with name correct, offset pointing to the name of the arg, and length covering the entire arg declaration. But docinfo generates a decl.var.local with the correct name but with offset and length pointing at the type of the arg. For cursorinfo the input offset must be of the identifier in question, ie. nameoffset from editoropen.
  3. Only for parameters
  4. Only for top-level non-generic classes and protocols.
  5. Only for @IBActions!
  6. Decl attribute names only, mixing up @attributes with stuff like override that users do not think of as attributes.
  7. @available only, all parameters decoded
  8. Omits doc comments that should be inherited from protocol conformances into nominal types.

Fixes

Fixing Structure/source.request.editor.open