用词法分析判断sqli时需要注意的“短”tokenizer

使用词法分析器对用户输入进行is_sqlinjection的判断已有一定应用广度,但单纯依赖这一种方式进行检测在某些情况下是会存在严重的误报行为。

这里我结合实际经验做了一点微小的工作,供自己备忘。

==定义==

tokenizer类型定义:

KEYWORD = ‘k’ UNION = ‘U’ GROUP = ‘B’
EXPRESSION = ‘E’ SQLTYPE = ‘t’ FUNCTION = ‘f’
BAREWORD = ‘n’ NUMBER = ‘1’ VARIABLE = ‘v’
STRING = ‘s’ OPERATOR = ‘o’ LOGIC_OPERATOR = ‘&’
COMMENT = ‘c’ COLLATE = ‘A’ LEFTPARENS = ‘(‘
RIGHTPARENS = ‘)’ LEFTBRACE = ‘{‘ RIGHTBRACE = ‘}’
DOT = ‘.’ COMMA = ‘,’ COLON = ‘:’
SEMICOLON = ‘;’ TSQL = ‘T’ UNKNOWN = ‘?’
EVIL = ‘X’ BACKSLASH = ‘\\’

容易造成false positive的词法规则

  • sos:as well as comment and “like” in the Moments.
  • sc: c-sharp is NOT C-井, sharp means “#”.
  • s&s:oil production declined by300,000 bpd, the first decline in several years,” and “net petroleumimports were up 25 pct to 5.3 mln bp.
  • n&nof:StandardChartered and co-lead Mitsui Trust Finance (Hong Kong) Ltd will takeup some 60 pct of the loa.
  • skn&n:”When Pickens and Icahn get together we wantpeople to know about it,” Proxmire sai.
  • son&n:He also said a proposal forestablishing formal procedures for international negotiations oncurrency exchange rates “is unrealistic and could well have damagingeffect.”
  • 1oo():{“password”: “1234567890!@#$%^&*()”, “username”: “test-user”}

实践中发现,“短”tokenizer被误报的几率明显上升。尤其是以“c”、“s”、“n”等tokenizer作为规则的起止部分时,情况最为严重。原因在于“s”、“c”、“n”等的此类tokenizer为概要性描述,其表述的是一个无实义的要件(相对SQL语法规则而言),因此无法与正常语言做出有效的区分。

针对这个缺陷,我们也做出了相关调整策略,下文再表。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

*