OrangeFox's DDoS and my response

Yacha

The DDoS attack started at 01:21:43 (UTC).

At 4:25 I woke up and found that the OrangeFox downloads website (named dsite) (https://orangefox.download/) was down.

The first thing I did was logged into the server that was responsive to it and found that it had crashed the web server container, then I re-created the container and that fixed the issue immediately. I had no further downtime notifications.

At about 3 pm, I started analyzing what had happened. So let's go through the process I did.

Step 0 - E-Mail:

I got a lot of E-Mails from Cloudflare saying the site is under a DDoS attack.

The first one, as you can see, is from 01:22 (UTC), the second is from the same time, and the third is from 04:57.

This is important since I restarted the web server at 04:25.

I also got some E-Mails from Netdata saying the server is overloaded.

Step 1 - CloudFlare

Next, I checked the CF's metrics

This is obviously a DDoS attack.

The firewall events show something like this

And the traffic page shows this

As you can see, the attack was made by a lot of countries, mostly Turkey though.

Step 2 - Logs

The next thing I did was check the access logs, at OrangeFox we store a little of the last access logs. I did that to find a similar pattern of the requests. The previous DDoS attack contained the same User-Agent and was pretty easy to filter, I expected to have something like this. I found that all requests had their own natural User Agent and had no similarities. Except for one - the URL. Every single request, from any single country, was made to "https://orangefox.download/ru-RU"

The dsite (a project that serves orangefox.download site), had a peculiarity, it does parse the browser's language from its settings and added the language part to the URL.

I'm sure the one who copied the link to the botnet had the Russian language in his/shes browser.

Step 3 - CloudFlare again

Here I made a few rules to filter this exact pattern of DDoS, I won't show them for obvious reasons, but I did.

Step 4 - Analysis of the Technical side of the issue

The next thing which is confused me, why would the web server crash in the first place? I referred to the different sources of logs.

The first thing which comes to mind is to check the webserver's log. Unfortunately, that is impossible. In the setup I'm using, the webserver is installed in the podman container. And you'd usually take logs from those, but when I got up, I completely re-created it. If I were a little smarter, I'd rename the old container and run the new one, so I could grab the artifacts from the old one. Take this as a suggestion.

Of course, some of the containers log copies to the syslog, which is now occupied by the systems. By doing journalctl --since "2023-01-16 01:25:00" I was able to retrieve the logs. And I saw this:

The dsite engine shows a lot of requests in the Russian language, and exactly at 01:27:20 it stopped. It stopped till I manually restarted the web server. There's no single line saying the web-server was killed or something.

Step 5 - Struggles

Let's refer to Netdata and find out what happened with the server during the attack

At 1:24 (UTC) server got a huge spike, and that's the time when the web-server died.

At 4:45 (UTC) I restarted the webserver. At 4:57 (UTC) we can see another and the biggest spike, however, it did not crash the web-server.

The second guess was about OOMKiller. Let's also check the Memory information.

We got plenty of RAM, didn't we? We even have more used ram afterwards. So the web server was not crashed by the OOMKiller. Otherwise, I'd get the syslog about it.

Step 6 - Workaround

Since we can't trace the issue to the web server, perhaps it's a web server's bug or an issue.

So, I made a few changes to the Server's settings. The first one is updating the web server's container. The second one is auto-restarting the web server's container every time it fails.

Conclusion

Even though, the site was down for around 3 hours during the night and in the most unpopular period of time. I consider it a serious issue, as it shows how unreliable OrangeFox's infrastructure is.

I made a few simple changes already and looking forward to implementing more serious ones.

Plans and stuff

My plan would be to trace the web server's issue. If you have any questions or suggestions, please ask me directly on Telegram - @MrYacha.

Ideally, I want to find someone who can make a stress test for our infrastructure again.

I hope I can continue this story.