Today, weāre going to be wrapping up this series on file uploads for the web. If youāve been following along, you should now be familiar withĀ
Weāve covered architectural decisions toĀ
So, I thought we would wrap up the series today by covering security as it relates to file uploads.
In case youād like to go back and revisit any earlier blogs in the series, hereās a list of what weāve covered so far:
Upload files with HTML Upload files with JavaScript Receive uploads in Node.js (Nuxt.js) Optimize storage costs with Object Storage Optimize performance with a CDN Upload security & malware protection
https://www.youtube.com/watch?v=FARa5NHZh90&embedable=true
Introduction
Anytime I discuss the topic of security, I like to consult the experts atĀ
Today, weāll walk through this cheat sheet and how to implement some of their recommendations into an existing application.
For a bit of background, the application has a frontend with a form that has a single file input that uploads that file to a backend.
The backend is powered byĀ event
ā object, detects whether itās aĀ multipart/form-data
Ā request (always true for file uploads), and passes the underlyingĀ IncomingMessage
) to this custom function calledĀ parseMultipartNodeRequest
.
import formidable from 'formidable';
/* global defineEventHandler, getRequestHeaders, readBody */
/**
* @see https://nuxt.com/docs/guide/concepts/server-engine
* @see https://github.com/unjs/h3
*/
export default defineEventHandler(async (event) => {
let body;
const headers = getRequestHeaders(event);
if (headers['content-type']?.includes('multipart/form-data')) {
body = await parseMultipartNodeRequest(event.node.req);
} else {
body = await readBody(event);
}
console.log(body);
return { ok: true };
});
All the code weāll be focusing on today will live within thisĀ parseMultipartNodeRequest
Ā function. And since it works with the Node.js primitives, everything we do should work in any Node environment, regardless of whether youāre using Nuxt or Next or any other sort of framework or library.
InsideĀ parseMultipartNodeRequest
Ā we:
-
Create a new Promise
-
Instantiate aĀ
multipart/form-data
Ā parser using a library calledĀformidable -
Parse the incoming Node request object
-
The parser writes files to their storage location
-
The parser provides information about the fields and the files in the request
Once itās done parsing, we resolveĀ parseMultipartNodeRequest
ās Promise with the fields and files.
/**
* @param {import('http').IncomingMessage} req
*/
function parseMultipartNodeRequest(req) {
return new Promise((resolve, reject) => {
const form = formidable({
multiples: true,
});
form.parse(req, (error, fields, files) => {
if (error) {
reject(error);
return;
}
resolve({ ...fields, ...files });
});
});
}
Thatās what weāre starting with today, but if you want a better understanding of the low-level concepts for handlingĀ multipart/form-data
Ā requests in Node, check out, ā
It covers low-level topics like chunks, streams, and buffers, then shows how to use a library instead of writing one from scratch.
Securing Uploads
With our app set up and running, we can start to implement some of the recommendations from OWASPās cheat sheet.
Extension Validation
With this technique, we check the uploading file name extensions and only allow files with the allowed extension types into our system.
Fortunately, this is pretty easy to implement with formidable. When we initialize the library, we can pass aĀ filter
Ā configuration option which should be a function that has access to aĀ file
Ā object parameter that provides some details about the file, including the original file name.
The function must return a boolean that tells formidable whether to allow writing it to the storage location or not.
const form = formidable({
// other config options
filter(file) {
// filter logic here
}
});
We could checkĀ file.originalFileName
Ā against a regular expression that tests whether a string ends with one of the allowed file extensions.
For any upload that doesnāt pass the test, we can returnĀ false
Ā to tell formidable to skip that file and for everything else, we can returnĀ true
Ā to tell formidable to write the file to the system.
const form = formidable({
// other config options
filter(file) {
const originalFilename = file.originalFilename ?? '';
// Enforce file ends with allowed extension
const allowedExtensions = /\.(jpe?g|png|gif|avif|webp|svg|txt)$/i;
if (!allowedExtensions.test(originalFilename)) {
return false;
}
return true;
}
});
Filename Sanitization
Filename sanitization is a good way to protect against file names that may be too long or include characters that are not acceptable for the operating system.
The recommendation is to generate a new filename for any upload. Some options may be a random string generator, a UUID, or some sort of hash.
Once again, formidable makes this easy for us by providing aĀ filename
Ā configuration option.
And once again, it should be a function that provides details about the file, but this time, it expects a string.
const form = formidable({
// other config options
filename(file) {
// return some random string
},
});
We can actually skip this step because formidableās default behavior is to generate a random hash for every upload. So, weāre already following best practices just by using the default settings.
Upload and Download Limits
Next, weāll tackle upload limits. This protects our application from running out of storage, limits how much we pay for storage, and limits how much data could be transferred if those files get downloaded, which may also affect how much we have to pay.
Once again, we get some basic protection just by using formidable because it sets a default value of 200 megabytes as the maximum file upload size.
If we want, we could override that value with a customĀ maxFileSize
Ā configuration option. For example, we could set it to 10 megabytes like this:
const form = formidable({
// other config options
maxFileSize: 1024 * 1024 * 10,
});
The right value to choose is highly subjective based on your application needs. For example, an application that accepts high-definition video files will need a much higher limit than one that expects only PDFs.
Youāll want to choose the lowest conservative value without being so low that it hinders normal users.
File Storage Location
Itās important to be intentional about where uploaded files get stored. The top recommendation is to store uploaded files in a completely different location than where your application server is running.
That way, if malware does get into the system, it will still be quarantined without access to the running application. This can prevent access to sensitive user information, environment variables, and more.
In one of my previous posts, ā
But if storing files on a different host isnāt an option, the next best thing we can do is make sure that uploaded files do not end up in the root folder on the application server.
Again, formidable handles this by default. It stores any uploaded files in the operating systemās temp folder. Thatās good for security, but if you want to access those files later on, the temp folder is probably not the best place to store them.
Fortunately, thereās another formidable configuration setting calledĀ uploadDir
Ā to explicitly set the upload location. It can be either a relative path or an absolute path.
So, for example, I may want to store files in a folder called ā/uploadsā inside my project folder. This folder must already exist, and if I want to use a relative path, it must be relative to the application runtime (usually the project root). That being the case, I can set the config option like this:
const form = formidable({
// other config options
uploadDir: './uploads',
});
Content-Type Validation
Content-Type validation is important to ensure that the uploaded files match a given list of allowedĀ
Itās similar to extension validation, but itās important to also check a fileās MIME-type because itās easy for an attacker to simply rename a file to include a file extension thatās in our allowed list.
Looking back at formidableās filter function, weāll see that it also provides us with the fileās MIME-type. So we could add some logic that enforces the file MIME-type matches our allow list.
We could modify our old function to also filter out any upload that is not an image.
const form = formidable({
// other config options
filter(file) {
const originalFilename = file.originalFilename ?? '';
// Enforce file ends with allowed extension
const allowedExtensions = /\.(jpe?g|png|gif|avif|webp|svg|txt)$/i;
if (!allowedExtensions.test(originalFilename)) {
return false;
}
const mimetype = file.mimetype ?? '';
// Enforce file uses allowed mimetype
return Boolean(mimetype && (mimetype.includes('image')));
}
});
Now, this would be great in theory, but the reality is thatĀ
That makes it no more useful than our extension validation. Itās unfortunate, but it also makes sense and is likely to remain the case.
Formidableās filter function is designed to prevent files from being written to disk. It runs as itās parsing uploads. But the only reliable way to know a fileās MIME-type is by checking the fileās contents. And you can only do that after the file has already been written to the disk.
So we technically havenāt solved this issue yet, but checking file contents actually brings us to the next issue, file content validation.
Intermission
Before we get into that, letās check the current functionality. I can upload several files, including three JPEGs and one text file (note that one of the JPEGs is quite large).
When I upload this list of files, Iāll get a failed request with a status code of 500. The server console reports the error is because the maximum allowed file size was exceeded.
This is good.
Weāve prevented a file from being uploaded into our system that exceeds the maximum file limit size (we should probably do a better job of handling errors on the backend, but thatās a job for another day).
Now, what happens when we upload all those files except the big one?
No error.
And looking in the āuploadsā folder, weāll see that despite uploading three files, only two were saved. TheĀ .txt
Ā file did not get past our file extension filter.
Weāll also notice that the names of the two saved files are random hash values. Once again, thatās thanks to formidable default behavior.
Now thereās just one problem. One of those two successful uploads came from the ābad-dog.jpegā file I selected. That file was actually a copy of the ābad-dog.txtā that I renamed. And THAT file actually contains malware š±š±š±
We can prove it by running one of the most popular Linux antivirus tools on the uploads folder,Ā
(Side note: The file I used was created for testing malware software. So itās harmless, but itās designed to trigger malware scanners. But that meant I had to get around browser warnings, virus scanner warnings, firewall blockers, AND angry emails from our IT department just to get a copy.
So you better learn something.)
OK,Ā nowĀ letās talk about file content validation.
File Content Validation
File content validation is a fancy way of saying, āScan the file for malwareā, and itās one of the more important security steps you can take when accepting file uploads.
We used ClamScan above, so now you might be thinking, āAha, why donāt I just scan the files as formidable parses them?ā
Similar to MIME-type checking, malware scanning can only happen after the file has already been written to disc. Additionally, scanning file contents can take a long time. Far longer than is appropriate in a request-response cycle. You wouldnāt want to keep the user waiting that long.
So we have two potential problems:
- By the time we can start scanning a file for malware, itās already on our server.
- We canāt wait for scans to finish before responding to usersā upload requests.
Bummerā¦
Malware Scanning Architecture
Running a malware scan on every single upload request is probably not an option, but there are solutions. Remember that the goal is to protect our application from malicious uploads as well as to protect our users from malicious downloads.
Instead of scanning uploads during the request-response cycle, we could accept all uploaded files, store them in a safe location, and add a record in a database containing the fileās metadata, storage location, and a flag to track whether the file has been scanned.
Next, we could schedule a background process that locates and scans all the records in the database for unscanned files. If it finds any malware, it could remove it, quarantine it, and/or notify us. For all the clean files, it can update their respective database records to mark them as scanned.
Lastly, there are considerations to make for the front end. Weāll likely want to show any previously uploaded files, but we have to be careful about providing access to potentially dangerous ones.
Here are a couple of different options:
- After an upload, only show the file information to the user that uploaded it, letting them know that it wonāt be available to others until after itās been scanned. You may even email them when itās complete.
- After an upload, show the file to every user, but do not provide a way to download the file until after it has been scanned. Include some messaging to tell users the file is pending a scan, but they can still see the fileās metadata.
Which option is right for you really depends on your application use case. And of course, these examples assume your application already has a database and the ability to schedule background tasks.
Itās also worth mentioning here that one of the OWASP recommendations was to limit file upload capabilities to authenticated users. This makes it easier to track and prevent abuse.
Unfortunately, databases, user accounts, and background tasks all require more time than I have to cover in todayās article, but I hope these concepts give you more ideas on how you can improve your upload security methods.
Block Malware at the Edge
Before we finish up today, thereās one more thing that I want to mention. If youāre anĀ
I got to play around with it briefly and want to show it off because itās super cool.
I have an application up and running at uploader.austingil.com. Itās already integrated withĀ
I configured the Malware Protection policy to just deny any request containing malware or a content type mismatch.
Now, if I go to my application and try to upload a file that has known malware, Iāll see almost immediately the response is rejected with a 403 status code.
To be clear, thatās logic I didnāt actually write into my application. Thatās happening thanks to Akamaiās malware protection, and I really like this product for a number of reasons.
- Itās convenient and easy to set up and modify from within the Akamai UI.
- I love that I donāt have to modify my application to integrate the product.
- It does its job well, and I donāt have to manage maintenance on it.
- Last, but not least, the files are scanned on Akamaiās edge servers, which means itās not only faster, but it also keeps blocked malware from ever even reaching my servers. This is probably my favorite feature.
Due to time and resource restrictions, I think Malware Protection can only scan files up to a certain size, so it wonāt work for everything, but itās a great addition for blocking some files from even getting into your system.
Closing Thoughts
Itās important to remember that there is no one-and-done solution when it comes to security. Each of the steps we covered has its own pros and cons, and itās generally a good idea to add multiple layers of security to your application.
Okay, thatās going to wrap up this series on file uploads for the web. If you havenāt yet, consider reading some of the other articles.
Upload files with HTML Upload files with JavaScript Receive uploads in Node.js (Nuxt.js) Optimize storage costs with Object Storage Optimize performance with a CDN Upload security & malware protection
Please let me know if you found this useful, or if you have ideas on other series youād like me to cover. Iād love to hear from you.
Thank you so much for reading. If you liked this article, and want to support me, the best ways to do so are toĀ
Also published here.