Sometimes when you have been working as a software engineer for a while, you learn what areas of the craft are swampy.
You know exactly how to do it, you know all the problems that may happen, and you approach such areas with special attention, laughing out loud when someone thinks he can do it faster and better not knowing what's ahead of them.
This is a story of one of such areas I've had a lot of experience during my career.
Somewhere around the year 2007 YouTube was already two years old meaning still almost no one knew about it. People were recording videos in SD and storing them on CDs or DVDs.
No one imagined back in the days you'd upload every video on the Internet because it was very expensive to download them.
Well, except torrents maybe. But there were already apps to record your screen and add captions, so it was interesting to see if you could upload video tutorials online.
Since that said, I bought a CentOS server and started to assemble an online video tutorial site.
For it to function I had to figure out two things: how to enable people to upload tutorials and how to enable people to watch them.
Uploading a video was quite easy, I thought. You just add an input field in HTML and receive the file on the server.
But it occurred harder than predicted. First things first was the size of the video, and the second the quality of the internet connection.
My server using PHP, the only language that offered everything you needed (that still does it) was great. It was running on Apache, and later on Nginx.
The thing with the early versions of Ngnix was that it didn't expect such payload. I had to configure several settings to enable the server to accept up to 30-60 minutes of constant upload, and the size of the transfer. It took some experimentation and eventually worked.
Another thing that was problematic was the way Ngnix forked processes. The early versions of Ngnix had some bugs that caused forks to hang unexpectedly.
Such a fork, hanged by one user, was assigned later to other users, meaning they were unable to upload videos too.
I've had to store the information about who used what process and kill them when it was the time to do it. Moreover, I had to kill all processes regularly every night, because bugs accumulated during the day.
After long hours spent on Nginx and PHP config, eventually uploading started to work.
The next step was to display videos to the users. At that time there was one online player available. I don't remember the name, but it was good and offered subtitles. It was important because I wanted everyone to be able to learn from the video tutorials.
But another problem occurred. Playing a raw video was very hard. On a quite fast connection it glitched a lot, and sometimes even broke the browser.
There was already ffmpeg, so I figured out how to optimize the video. The important aspect of video tutorials is that you want to show the whole screen in a high quality. Because otherwise people won't see what you click, and what texts are displayed on the monitor.
Compared to the live recordings at that time, video tutorials had higher requirements for the quality (that's why video tutorials weren't a strong foot of YouTube for many, many years).
It took a lot of experimentation with ffmpeg to win a bargain for high quality and small video size. Of course it couldn't be done instantly, so there was a cron job that was querying MySQL databases and finding videos that weren't already converted properly.
It was also important to check if a video not only was produced by ffmpeg. While ffmpeg was great, sometimes it didn't like the video or audio compression format, or even the size of the input video. As a result it produced an output video, but it was black.
So I've added another cron job. Its task was to run ffmpeg again on the output video, read its size, resolution and grab one frame that I also needed for a thumbnail. Next, the process checked pixels on the thumbnail to see if it was black. If there were no errors and the thumbnail wasn't black, it was assumed the conversion was successful.
I thought it would be enough, but it occurred it wasn't. Because sometimes ffmpeg or the other library just hang. Meaning the conversion process, or validation process hung too and was unable to save information if something went wrong.
So I've added another cron process. The third process was checking on the previous two processes. It checked in the database when a conversion process started, when it reported being alive the last time and compared to the current time.
So for example if a conversion didn't respond for three hours - I've marked the conversion as failed and added it for a re-run. If a validation process didn't respond after 30 minutes of work, the database validation failed, so the video had to be converted from scratch and processes had to be killed.
Usually that helped. If it didn't work for the third time the system sent an email something is wrong.
Eventually the system worked and people were able to upload videos. Videos were converted and displayed on the page with a thumbnail and subtitles.
It was an interesting project I've spent over 400h on. It teached me a lot about video, audio processing and validation.
But it also teached me that handling uploads is a swampy area that I have to focus on. It paid off…
You might think that now, in 2023 the situation is different. But it's not. A lot of things have changed for sure. It is easier to get processing power and storage. Internet connections are better. Tools are better and have more detailed documentation.
But still, with the default settings some of the uploads will fail because of the size. A default upload form won't work on Android devices. I see this problem over and over again in a lot of web apps.
Handling multiple formats of videos, and even photos is difficult. Especially since now there are new experimental formats.
For example some years ago I was developing a content management and planning system for social media.
It had to offer marketers the ability to edit the uploaded photo. Just basic stuff like crop and filters. Even when using the latest libraries it was not 100% reliable and required a lot of tweaking to make it at least usable to acceptable extent.
In many situations uploading a file is an important step of a process. For example when you fill out a long form and in the last step you have to upload a file.
If the upload fails for any reason, you have to protect the user from losing all of the data he put in already.
You have to handle a scenario when the upload fails, or any following process you apply to the file.
One way of solving some of such problems is to limit users to the size and format. It sounds tempting.
I was working lately on an ecommerce project where people were able to create their own products by uploading photos. The app however limited the image size to 2MB and a resolution of 1000x1000px.
There was a proposal to increase these limits to 3MB and 2000x2000px. I took out my phone, took a photo and showed people what the size of the photo was.
It was 4MB and around 3000x3000px. So another proposal was made to account for that. So I've increased the resolution my phone takes a photo with and told us we have to remove these limits altogether.
There was also a proposal to keep the limits because people can convert photos and shrink them before uploading.
It's correct. But the majority of people don't know how to do it, don't want to think about it, and may find it difficult. I was able to convince the client and team to lift the limits altogether.
It was a great move because it drastically increased sales and the exit rate dropped too proving people wanted to buy products on their phones, but they were blocked by the requirements. Lifting the limits was one easy solution to enable a lot of people to buy products they wanted.
Uploading files is a swampy area, but in some cases it may be critical for sales, and customer satisfaction (again sales).
That's why I put special attention to these areas of systems I develop that handle file upload, because it's a place that can impact customer business the most.
Do you have any file upload related stories? Share in the comment!
Also click the heart icon, like and share in social media. It motivates me to write more about such topics!
If you don't want to think about it as a software engineer too much, check out Filestacktoo. It is an API to handle Javascript file uploads. They sponsor a contest I participate in with this story. Filestack and HackerNoon - thanks for the motivation to share! It was great to do it!