Lets talk about web applications and logging. For most people, when talking about log analysis or log events, the Apache log format is the first that comes to mind as a good example of a log event. But is Apache log content enough to understand what is going on in your system? Let’s take a closer look.
Custom output of Apache access log
When sending an ordinary GET request to your web server, everything seems perfect. For the following request
GET /index.php?page_id=1 HTTP/1.1Host: www.unionselect1.eu
You will see a proper event description in the Apache access log:
423.619.551.990 — — [06/Jul/2018:14:31:52 +0300] "GET /index.php?page_id=1 HTTP/1.1" 200 795 "-" "-"
At first this looks quite good: you have the timestamp, IP address that the request originated from, URI that was visited and even the ID of the page that was requested. Or do you?
To understand if logging is enough you first have to know your application and its implmentation in your environment. To illustrate my example, I will use the popular LAMP setup and PHP parameter handling in the web application.
One of the common mistakes is to use the $_REQUEST associative array instead of GET or POST arrays to get the parameters sent to the page. I’m not going too deep about the differences and I’m not going to talk about CSRF and other harm this approach can bring to your application or customers. I’m only focusing on the logging part.
To illustrate my example, I’m creating a web page with the following content:
<?phpprintf("Current page ID is %d", intval($_REQUEST['page_id']) );?>
As a result of the last request, the output on the page would be: “Current page ID is 1”. So far so good, right?
Lets make another request. This time lets perform a POST request:
POST /index.php?page_id=1 HTTP/1.1Host: www.unionselect1.euContent-Length: 9Content-Type: application/octet-stream application/x-www-form-urlencoded
page_id=2
and check the respective entry in the log file. The event is as follows:
423.619.551.990 — — [06/Jul/2018:14:40:13 +0300] "POST /index.php?page_id=1 HTTP/1.1" 200 795 "-" "-"
However — the output on the page tells us “Current page ID is 2”. How come the page ID on the page differs from the one in the access log event? Seems that the POST parameter was used, but why? I’ll get to that, but first things first.
How could we get the correct information to the logs, i.e. how could we log the correct page_id used by the application? One approach would be to log the POST parameters as well. This would give us the full payload but this might give you additional overhead on logging (e.g. when logging file content that are POST-ed) or it could create problems with compliance when saving all the post request parameters to the log file (including usernames, password, credit card numbers etc). But it’s doable, e.g. with parameter filtering. Could that solve the issue?
Lets try another example:
POST /index.php?page_id=1 HTTP/1.1Host: www.unionselect1.euCookie: page_id=3Content-Length: 9Content-Type: application/octet-stream application/x-www-form-urlencoded
page_id=2
This time I added the cookie to the equation. And yes, you are right: the log file still shows
423.619.551.990 — — [06/Jul/2018:14:44:05 +0300] "POST /index.php?page_id=1 HTTP/1.1" 200 795 "-" "-"
but this time the output on the page shows “Current page ID is 3”. Seems that this time, the Cookie value was used by the application.
A solution could be to log all the cookie parameters as well, but the amount that you’d log would only increase and would it actually help you? If at some point you will go through your logs, you still have to understand, which parameters were used by the application.
To illustrate, here’s a little quiz: what would the following request show on the page and what would you find in the log event?
POST /index.php?page_id=1&page_id=4 HTTP/1.1Host: www.unionselect1.euContent-Length: 29Content-Type: application/octet-stream application/x-www-form-urlencoded
page_id=2&page_id=5
(Hint: the page shows 5)
But how about this one?
POST /index.php?page_id=1&page_id=4 HTTP/1.1Host: www.unionselect1.euCookie: page_id=3; page_id=6Content-Length: 19Content-Type: application/octet-stream application/x-www-form-urlencoded
page_id=2&page_id=5
(Hint: it’s not 6, it’s 3)
Generated at https://imgflip.com/memegenerator/Jackie-Chan-WTF
To answer Jackie: it has to do with three factors:
In other words: a combination of poor software development practices and application server configuration might lead to a situation where the information logged might not express the behavior of the application. The same code can work differently in different servers, depending on the configuration of PHP in the server by using different request parameters (depending on their order).
By secretlondon123 (originally posted to Flickr as way out) [CC BY-SA 2.0], via Wikimedia Commons