Behind the scenes: why a new version of the USB-IDS-TC dataset

The use of xAI techniques on the original version of USB-IDS-TC showed that the recognition of slowhttptest attacks required no particular capability, as the attack flows were easily identified by very high values of a single feature, Fwd Packet Length Max. Hence, the use of machine/deep learning techniques was simply overkill. Further (long and tedious) investigation on the causes of this “strange” behavior has shown that the slowhttptest tool present in most Linux repositories:

1) is affected by a (minor) bug, due to the lack of commas in the user_agents strings.
Up to now in the tool github repository (https://github.com/shekyan/slowhttptest, 9th January 2026), the user_agents are defined in the file slowhttptest.cc as follows: 

Mobirise Website Builder

user_agents definition in slowhttptest.cc

The lack of commas following (some) user agent strings leads to the “fusion” of two strings and to a particularly long user_agent, reflecting in a very long packet length sent to the server at the start of every connection that makes the flow easily recognizable.

2) slowhttptest does not rotate user_agents. Every connection started by the same run of the tool uses the same user_agent. This is not a bug, but was particularly detrimental for our dataset collection, which relies on a single slowhttptest call.

In light of all the above, we added the missing commas and patched slowhttptest to obtain a randomized user_agent choice for each connection within a single run of the tool. This brief report is just a sample of the subtle dataset collection issues that often lead to recognition rates close to 100%. By the way, we also extended the capture times to obtain a larger dataset. 

No Code Website Builder