libclang: how to get token semantics
libclang defines only 5 types of tokens:
Is it possible to get a more detailed information about tokens? For example, for the following source code:
struct Type;
void foo(Type param);
I would expect the output to be like:
I also need to map those entities to file locations.
First, you probably need a bit of background on how parsing works. A textbook on compilers would be a useful resource. First, the file is converted into a series of tokens; that gives you identifiers, punctuation, etc. The code that does this is called a lexer. Then, the parser runs; this converts a list of tokens into an AST (structured declarations/expressions/etc.).
clang does keep track of the various parts of declarations and expressions, but not in the way you're describing. For a given function declaration, it keeps track of things like the location of the name of the function and the start of the parameter list, but it keeps those in terms of locations in the file, not tokens.
A CXToken
is just a token; there isn't any additional associated semantic information beyond the five types you listed. (You can get the actual text of the token with clang_getTokenSpelling
, and the location with clang_getTokenExtent
.) clang_annotateTokens
gives you CXCursor
s, which let you examine the relevant declarations.
Note that some details aren't exposed by the libclang API; if you need more detail, you might need to use clang's C++ API instead.
You're looking for the token spelling
and location
attributes exposed by libclang. In C++ these can be retrieved using the functions clang_getTokenLocation and clang_getTokenSpelling. A minimal use of these functions (using their python equivalents would be:
s = '''
struct Type;
void foo(Type param);
'''
idx = clang.cindex.Index.create()
tu = idx.parse('tmp.cpp', args=['-std=c++11'], unsaved_files=[('tmp.cpp', s)], options=0)
for t in tu.get_tokens(extent=tu.cursor.extent):
print t.kind, t.spelling, t.location
Gives:
TokenKind.KEYWORD struct <SourceLocation file 'tmp.cpp', line 2, column 1>
TokenKind.IDENTIFIER Type <SourceLocation file 'tmp.cpp', line 2, column 8>
TokenKind.PUNCTUATION ; <SourceLocation file 'tmp.cpp', line 2, column 12>
TokenKind.KEYWORD void <SourceLocation file 'tmp.cpp', line 3, column 1>
TokenKind.IDENTIFIER foo <SourceLocation file 'tmp.cpp', line 3, column 6>
TokenKind.PUNCTUATION ( <SourceLocation file 'tmp.cpp', line 3, column 9>
TokenKind.IDENTIFIER Type <SourceLocation file 'tmp.cpp', line 3, column 10>
TokenKind.IDENTIFIER param <SourceLocation file 'tmp.cpp', line 3, column 15>
TokenKind.PUNCTUATION ) <SourceLocation file 'tmp.cpp', line 3, column 20>
TokenKind.PUNCTUATION ; <SourceLocation file 'tmp.cpp', line 3, column 21>
链接地址: http://www.djcxy.com/p/94582.html
上一篇: 在libclang(Python)中查找特定函数声明的所有引用
下一篇: libclang:如何获取令牌语义