Regenerating a raw request payload — an impossible task?

Background

I was creating a Facebook Messenger bot and needed to verify the signature of the raw payload received at my webhook. My webhook was powered by the http-trigger of Google Cloud Functions (beta), which automatically provisions a https endpoint to invoke the function without hassle.

However, I encountered some problems along the way that had to do with platform limitations, JSON encoding and vague documentation. I thought it’d be insightful to share how I approached the problems one at a time.

The Facebook docs

The HTTP request will contain an X-Hub-Signature header which contains the SHA1 signature of the request payload, using the app secret as the key, and prefixed with sha1=. Your callback endpoint can verify this signature to validate the integrity and origin of the payload.

Please note that the calculation is made on the escaped unicode version of the payload, with lower case hex digits. If you just calculate against the decoded bytes, you will end up with a different signature. For example, the string äöå should be escaped to \u00e4\u00f6\u00e5.

Source: Facebook docs as at 3 May 2017

The main problem

If my application had access to the raw payload, I could sign it and compare the signatures and they should match perfectly.

But I didn’t have access to the raw payload! I only had access to an already-parsed JSON payload.

Why?

Update: Google has fixed the issue by making the raw body accessible.

In Google Cloud Functions (beta) as of 3 May 2017, it parses the request body automatically based on the content-type and stores it under req.body. It does not make the raw buffer available to the developer and it does not offer any way to customize the bodyParser used under the hood.

Upon further research, I found that this problem was echoed by someone else as well.

A possible solution

What if we could take the already-parsed request body and work backwards to “regenerate” the raw payload? We could then verify the signature of the raw payload.

Attempt 1

I took req.body and called JSON.stringify on it to produce the JSON string. Thereafter, I signed the string and compared the signatures.

function regenerateRawPayload(req) {return JSON.stringify(req.body);}

Result of Attempt 1

It worked perfectly for a simple “hello”, until I tried sending weird characters to the bot such as “äöå”.

Attempt 2

I tried to produce an “escaped unicode version” of the JSON payload that satisfies the requirement below.

Please note that the calculation is made on the escaped unicode version of the payload, with lower case hex digits…

For example, the string äöå should be escaped to \u00e4\u00f6\u00e5.

Result of Attempt 2

I found the jsesc library which provided a robust replacement for stringifying JSON and escaping characters “where needed”. Regenerating the raw payloads after passing “hello” and “äöå” to the bot and then signing it produced the correct signature.

In particular, the Facebook docs mentions to use “lower case hex digits”. The jsesc library caters for that requirement by offering a lowercaseHex property.

function regenerateRawPayload(req) {return jsesc(req.body, {lowercaseHex: true,json: true};}

However, I soon realized that the requirement to produce an “escaped unicode version” was too vague. Exactly which unicode characters need to be escaped? ASCII characters are also unicode characters — do they have to be escaped too? But that can’t be the case, because I tried a simple “hello” without escaping in Attempt 1 and that worked… Do they mean to unicode-escape all non-ASCII characters instead?

Given that the docs did not explain how Facebook creates the escaped unicode version of the payload, the only way to find out was to send a variety of different characters to the bot and test if escaping them caused the correct signature to be produced.

Attempt 3

I sent in a random sample of funny characters to the bot.

äöå/\/hi你好こんにちは디제이맥йн£ ÃƒÆ’Ã¢â‚¬Å¡Ãƒâ€šÃ‚Â£ШđђбÉ╡tü³gÃ©Å°áúæø"'‘’“”&`~!@#$%^&*()-_+=\|{[;:?<>]},23:59\z\b\t/‹

Result of Attempt 3

It was a frustrating experience.

The signature did not match when I sent in the entire string, so I sent in substrings at a time to narrow down which were the offending substrings.

Ironically, the culprits were ASCII characters rather than the odd characters.

The first offending character I found was a forward slash.

I thought the solution was to unicode-escape the forward slash. However, it didn’t work.

After many guesses, I found that Facebook had preceded it by a single back slash, as in ‘\/’. This does not seem to be necessary as per the JSON spec but it isn’t wrong either.

I had some issues adding one single back slash in front of the forward slash in my Javascript code because a back slash is an escape character in JS. You had to “escape the escape” by using two back slashes to represent one back slash literal. Furthermore, I had to escape the forward slash used by the regexp.

This was the revised method.

function regenerateRawPayload(req) {return jsesc(req.body, {lowercaseHex: true,json: true}).replace(/\//g, '\\/');

I moved on to test other characters and detected three more culprits: ‘@’, ‘%’ and ‘<’. Once again, through trial and error, I realized that they had to be unicode-escaped as well.

But there was one more bump on the road.

Sometimes, the docs can be wrong.

Please note that the calculation is made on the escaped unicode version of the payload, with lower case hex digits.

For the ‘<’ character, I tried escaping to ‘\u003c’ and ‘\<’, but both did not work. What worked was actually ‘\u003C’. The docs said that all hex digits had to be lower case, but this was clearly a violation of that.

This was the final method.

function regenerateRawPayload(req) {return jsesc(req.body, {lowercaseHex: true,json: true}).replace(/\//g, '\\/').replace(/@/g, '\\u0040').replace(/%/g, '\\u0025').replace(/</g, '\\u003C');}

Limitations

So, is regenerating the raw payload from an already-parsed JSON payload still an impossible task?

Yes. Even though we managed to create a proof of concept that works on a few samples, it still remains as a fool’s errand. As long as Facebook does not tell us exactly how it produces the “unicode escaped version” of the payload, we would not have the confidence to regenerate it perfectly. We do not want to reject legitimate requests just because our regeneration is incorrect.

Let’s see a simple example of why it is an impossible task.

// Calling JSON.parse and then JSON.stringify// on both payloads produces the same output// but the original payloads clearly differ.var payload1 = '{"foo":"bar"}';var payload2 = '{"foo": "bar"}';

Workaround

Since we cannot verify the signature, what else can we do to ensure that the request came from Facebook?

If we use a function name with enough random entropy, i.e. create a function name with enough random (but legal) characters, the full https endpoint containing the random function name as the resource should not be guessable by other parties. There is no reason to share the endpoint url publicly either.

When using such an endpoint, it is reasonable to believe that all requests come from Facebook only. In fact, this is the approach taken by Telegram’s setWebhook!

Hopefully, Google will make the raw payload data in its functions available soon, or that Facebook documents how it creates the raw payload.

I hope you learnt something from my experience. Till next time!

Regenerating a raw request payload — an impossible task?

Too Long; Didn't Read

Companies Mentioned

Coin Mentioned

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Regenerating a raw request payload — an impossible task?

Too Long; Didn't Read

Companies Mentioned

Coin Mentioned

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics