Python 中用 libClang 进行源码静态分析
/ 3 min read
Table of Contents
libClang
提供了一系列 python binding 来支持对单个(?) C/C++ 文件进行静态分析
缺点
- 不能正确处理宏, 比如
#ifdef
, 如果他不满足就会不继续解析里面的内容, 这个在编译的条件下是合理的, 但是不能满足我的需求 libClang
的 AST 语法树不知道为什么和clang -Xclang -ast-dump
的结果不一样, 后者的结果是清晰的, 前者有的时候会把一些节点解析到UNEXPOSED_DECL
而不是本身的
0. Prep
libClang
是捆绑在clang
中的- 安装
clang
pacman -S clang
- 找到
libClang
的 python bindingpacman -Ql clang | grep cindex.py
(比如位于clang /usr/lib/python3.11/site-packages/clang/cindex.py
) - [optional] 因为我是用的
pipenv
环境, 掉不到系统的库, 所以就直接把/usr/lib/python3.11/site-packages/clang
文件夹复制到当前路径下 - 找到
clang.so
调用库的位置pacman -Ql clang | grep clang.so
(比如位于clang /usr/lib/libclang.so
)
1. Usage
1.0 Basic Usecase
import clang.cindex as cindexfrom pathlib import Path
# folder of libclang.so, found in step 0.5cindex.Config.library_path = "/usr/lib"
file = Path("test.c")parser = cindex.Index.create()tu = parser.parse(file.absolute())
for node in tu.cursor.get_children(): node: cindex.Cursor = node # for type hint
if not str(node.location.file).endswith(".c"): # if no need to dive into header files continue print(node.kind, node.spelling, node.extent.start)
1.1 从 CALL_EXPR 取调用的方法名
def find_function_name_in_CALL_EXPR(node: cindex.Cursor) -> str: assert node.kind == cindex.CursorKind.CALL_EXPR tmp = next(node.get_children()) if tmp.kind == cindex.CursorKind.DECL_REF_EXPR: DEBUG(f"Fall back to DECL_REF_EXPR {tmp.spelling} {tmp.kind} {tmp.extent.start}") name = tmp.spelling else: childs = [c for c in tmp.get_children()] # panic_if_not(len(childs) == 1, childs) if len(childs) != 1: return None if childs[0].kind != cindex.CursorKind.DECL_REF_EXPR: DEBUG(f"(Skip indirect calling)UB in parse {childs[0].spelling} {childs[0].kind} {childs[0].extent.start}") return None name = childs[0].spelling return name
1.2 取方法中调用的方法名
def parse_function(node: cindex.Cursor) -> set[str]: assert node.kind == cindex.CursorKind.FUNCTION_DECL dep_funcs = set() func_body = [n for n in node.get_children() if n.kind == cindex.CursorKind.COMPOUND_STMT] if len(func_body) == 0: # Can be forward-decline DEBUG(f"not found body, can be forward-decline {node.spelling}") return dep_funcs assert len(func_body) == 1 func_body: cindex.Cursor = func_body[0] for child in func_body.walk_preorder(): child: cindex.Cursor = child # for type hint if child.kind == cindex.CursorKind.CALL_EXPR: name = find_function_name_in_CALL_EXPR(child) dep_funcs.add(name) return dep_funcs