CodeQL学习笔记

郑重声明：文中所涉及的技术、思路和工具仅供以安全为目的的学习交流使用，如果您不同意请关闭该页面！任何人不得将其用于非法用途以及盈利等目的，否则后果自行承担！

前言

关于搭建就不写了，网上太多了，内容大部分都来着互联网，我只是做个记录，方便自己查询

基本查询结构

为了使用CodeQL进行定制分析，我们可以通过自己编写查询来实现查找漏洞或错误。CodeQL的查询类型有：

告警查询：突出显示代码中特定位置的问题的查询。
路径查询：代码中source和sink之间信息流的查询。

用CodeQL编写的查询文件扩展名为.ql，并包含一个select子句。

import <language> /* 导入对应的语言包 */

/* 可能存在的 一些谓词 类的设置 */

from /* 声明变量等 */
where /* 设置逻辑表达式 */
select /* 打印结果 */

CodeQL主要使用逻辑连接词(如and、 or、 not)，限定词(如forall 、exists)，还有谓词(predicates)等重要逻辑概念。同时CodeQL也提供了递归的支持和聚合(如count、 sum、 average)

import语句

每个查询通常包含一个或多个import语句，这些语句定义了要导入到查询中的库或模块。

From子句

每个声明必须采用 <type> <variable name>的形式。

Where子句

该子句使用聚合，谓词和逻辑公式将目标变量限制为较小的集合，这些集合满足已定义的条件。

Select 子句

select element, string

Element：查询所标识的代码元素。这定义了告警的位置。
String：为该代码元素显示的消息，描述了生成告警的原因。

关于上述的内容，使用个简单的例子，筛选system函数调用

import cpp

from FunctionCall f
where f.getTarget().getName() = "system"
select f, "system call"

高级语法

一定要看一遍官方文档！一定要看一遍官方文档！一定要看一遍官方文档！

谓词

在CodeQL中，函数并不叫"函数"，叫做Predicates（谓词）

无返回值的谓词

无返回值的谓词以predicate关键词开头，无返回值的谓词其实有点像宏的意思，他会直接替换过来，举个例子

//这里的i是一个无穷大的int类型数组，可以把它当成没有初始化的
predicate isTest(int i) {
  i in [1 .. 9]//如果传入小于10的正整数，传入后等式成立
}

from int i 
where isTest(i)
select i//输出处理过的集合i，结果为1-9的数字

有返回值的谓词

当我们需要将某些结果从谓词中返回时，使用的是result，和其他语言不一样，该值是一个特殊的变量

int isTest(int i) {
  // 类似一个判断语句，在CodeQL中判断是不是使用if for，而是循环一般通过递归实现，判断一般通过逻辑表达式实现
 	i in [1 .. 9] and result = i + 1 
}
  
select isTest(3)  // 输出4
//select isTest(33)  // 没有输出

并且可以多个结果输出

string getANeighbor(string country) {
    country = "France" and result = "Belgium"
    or
    country = "France" and result = "Germany"
    or
    country = "Germany" and result = "Austria"
    or
    country = "Germany" and result = "Belgium"
}
select getANeighbor("France")
// 返回两个条目，"Belgium"与"Germany"

如果要限制集合数据大小，可以添加一个bindingset标注，不然的话这两个值是不合法的运算

bindingset[x] bindingset[y]
predicate isTest(int x, int y) {
  x + 1 = y
}

from int x, int y
where y = 42 and isTest(x, y)
select x, y

递归

/*官方的解释
you could use recursion to refine the above example. As it stands, the relation defined in getANeighbor is not symmetric—it does not capture the fact that if x is a neighbor of y, then y is a neighbor of x. 
*/
string getANeighbor(string country) {
  country = "France" and result = "Belgium"
  or
  country = "France" and result = "Germany"
  or
  country = "Germany" and result = "Austria"
  or
  country = "Germany" and result = "Belgium"
  or
  country = getANeighbor(result)
}
select getANeighbor("Belgium")
// 输出France和Germany
//大概的意思应该是第一遍使用country来进行if判断未能找到结果，第二遍就把传入的result当场if来判断

第二个运算的例子

int getANumber() {
  result = 0
  or
  result <= 100 and result = getANumber() + 1
}

select getANumber()
//输出0-100

传递闭包

使用+来表示通常，p.getAParent+()等价于以下递归谓词：

COPYPerson getAnAncestor() {
  result = this.getAParent()
  or
  result = this.getAParent().getAnAncestor()
}

使用*来表示， p.getAParent*()将会输出p的祖先，或者p。该谓词调用等价于以下谓词:

COPYPerson getAnAncestor2() {
  result = this
  or
  result = this.getAParent().getAnAncestor2()
}

类

CodeQL中的类，并不意味着建立一个新的对象，而只是表示特定一类的数据集合。

class OneTwoThree extends int {
  OneTwoThree() { // characteristic predicate
    this = 1 or this = 2 or this = 3
  }
 
  string getAString() { // member predicate
    result = "One, two or three: " + this.toString()
  }

  predicate isEven() { // member predicate
    this in [1 .. 2] // 
  }
}

from OneTwoThree i 
//i=1时返回值为1
where i = 1 or i.getAString() = "One, two or three: 2"
select i
// 输出1和2

特征谓词类似于C++中的类构造函数，它将会进一步限制当前类所表示数据的集合。例如上面的特征谓词

OneTwoThree() { // characteristic predicate
  this = 1 or this = 2 or this = 3
}
//它将数据集合从原先的Int集，进一步限制至1-3这个范围。

this变量表示的是当前类中所包含的数据集合。与result变量类似，this同样是用于表示数据集合直接的关系。

string getAString() { // member predicate
	result = "One, two or three: " + this.toString()
}
//这个函数是用来匹配"One, two or three:"这个字符串后面的值，但是被上面的OneTwoThree函数限制在了1-3，所以如果输入"One, two or three: 4"是不会返回结果的

Source和Sink

在安全审计的理论当中有一个三元组概念，分别是source、sink和sanitizer

Source：是指漏洞污染链条的输入点。可以是请求的参数（GET、POST等）、上传的文件、Cookie、数据库数据等用户可控或者间接可控的地方
Sink：是指漏洞污染链条的执行点。比如在c++中system、scanf、sprintf等函数。
Sanitizer：处理函数是对数据进行过滤或者编解码的函数这些函数会对输入造成影响，为漏洞利用带来不确定性。

大概的流程图

graph LR
start((source)) --> node1[node1]
node1 --> node2[node2]
node2 --> stop((Sink))

参考文章

https://codeql.github.com/codeql-standard-libraries/cpp/
https://codeql.github.com/docs/ql-language-reference/about-the-ql-language/
https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)