How does SQL query parameterisation work?

2018-06-05 06:23:11

I feel a little silly for asking this since I seem to be the only person in the world who doesn't get it, but here goes anyway. I'm going to use Python as an example. When I use raw SQL queries (I usually use ORMs) I use parameterisation, like this example using SQLite:

Method A:

username = "wayne"
query_params = (username)
cursor.execute("SELECT * FROM mytable WHERE user=?", query_params)

I know this works and I know this is the generally recommended way to do it. A SQL injection-vulnerable way to do the same thing would be something like this:

Method B:

username = "wayne"
cursor.execute("SELECT * FROM mytable WHERE user='%s'" % username)

As far I can tell I understand SQL injection, as explained in this Wikipedia article. My question is simply: How is method A really different to method B? Why is the end result of method A not the same as method B? I assume that the cursor.execute() method (part of Python's DB-API specification) takes care of correctly escaping and type-checking the input, but this is never explicitly stated anywhere. Is that all that parameterisation in this context is? To me, when we say "parameterisation", all that means is "string substitution", like %-formatting. Is that incorrect?

A parameterized query doesn't actually do string replacement. If you use string substitution, then the SQL engine actually sees a query that looks like

SELECT * FROM mytable WHERE user='wayne'

If you use a ? parameter, then the SQL engine sees a query that looks like

SELECT * FROM mytable WHERE user=<some value>

Which means that before it even sees the string "wayne", it can fully parse the query and understand, generally, what the query does. It sticks "wayne" into its own representation of the query, not the SQL string that describes the query. Thus, SQL injection is impossible, since we've already passed the SQL stage of the process.

(The above is generalized, but it more or less conveys the idea.)

Using parameterized queries is a good way to punt the task for escaping and preventing injections to the DB client library. It will do the escape before it replaces the string with "?". This is done in the client library, before DB server.

If you have MySQL running, turn on SQL log, and try a few parameterized queries, and you will see that MySQL server is receiving fully substituted queries with no "?" in it, but the MySQL client library has already escaped any quotes in your "parameter" for you.

If you use method B with just string replacement, "s are not automatically escaped.

Synergetically, with MySQL, you can prepare a parameterized query ahead of time, and then use the prepared statement repeatedly later. When you prepare a query, MySQL parses it and gives you back a prepared statement -- some parsed representation MySQL understands. Each time you use the prepared statement, not only you are guarded against injection, but also you avoid the cost of parsing the query again.

And, if you really want to be secure, you can modify your DB access/ORM layer so that 1) web server code can only use prepared statements, and 2) you can only prepare statements before your web server starts. Then, even if your web app is hacked into (say via a buffer overrun exploit), the hacker can only still use the prepared statements, but nothing more. For this you need to jail your web app and only allow access to the database via your DB access/ORM layer.

When you do text replacement (like your method B), you have to be wary of quotes and such, because the server will get a single piece of text, and it have to determine where the value ends.

With parameterized statements, OTOH, the DB server gets the statement as is, without the parameter. The value is sent to the server as a different piece of data, using a simple binary safe protocol. Therefore, your program doesn't have to put quotes around the value, and of course it doesn't matter if there were already quotes in the value itself.

An analogy is about source and compiled code: in your method B, you're building the source code of a procedure, so you have to be sure to strictly follow the language syntax. With Method A, you first build and compile a procedure, then (immediately after, in your example), you call that procedure with your value as a parameter. And of course, in-memory values aren't subject to syntax limitations.

Umm... that wasn't really an analogy, it's really what is happening under the hood (roughly).

链接地址: http://www.djcxy.com/p/16764.html

上一篇: 动态枢轴表

下一篇: SQL查询参数化如何工作？