Data breaches happen.
Today, as never before, data plays a fundamental role in our real life. Everybody is both: dataproducer and dataconsumer.
We are data producer by simply moving from one building to another one,
having a smartphone in our pocket or surfing the web or just by tapping
on smartphone applications. We are data consumer when we buy things on
Amazon or when we read information on social networks or again when we
consume raw data through API. Somebody refers to data as the “new Oil”
(concept usually accredited to Clive Humby)
and data is what we let on the digital world and it’s what we have very
close to our physical life. Data is what we are on the cyber space.
Data is what we need to protect.
Protecting our private data is like protecting ourselves in the cyber
world and for such a reason the protection needs to be regulated (GDPR teaches us). So it might be very interesting to understand how data breaches might happen.
Unfortunately there is not a standard path to protect, for example a
data breach might come through an insider attack, or by clicking on a
malspam champaign or hitting eMail phishing or again through common
vulnerabilities. But one of the main path, so far, is driven by
vulnerability. One of the most exploited vulnerability from attackers to
illegally collect data is SQL-Injection. It is pretty easy to detect
and to exploit even for not sophisticated attackers. But on the other
side of the coin there are a lot of frameworks, designed patterns and
methodologies to prevent and to block such a vulnerability. From here,
I’ve started my research. I wanted to prove that SQLi vulnerabilities are quite “rare” (or difficult to find) in 2019, but — unfortunately — I acknowledged that I was wrong when I found these fresh pastes (here, here and here). The “possible attacker” exposed a set of “presumed” SQLi vulnerable websites harvested in a metter of 24h internet scanning.

According to the “pastes” the attacker harvest 327 circa
vulnerable websites in less then a day ! So let’s dig a little bit on
them to see if we might find some interesting correlations.
A first interesting result comes from the first level domain names.
Leaving out “.com” (which actually is the most common used domain name)
it is possible to see additional interesting domain names such as “.ca”,
“.it”, “.ir”, “.ch”, “.il” and so on, which are mostly “country” based
domain names. I agree with those who might think that the used dataset
could not be considered as a “significative dataset”, since 24h of
internet scraping is far-far-far away from having an internet
significative view, but we might agree that it could be considered as an
“indicative dataset”. In other words if in only 24h of internet
scraping he/she found 327 circa vulnerable websites, let’s immagine what
an attacker could do with weeks or months of scraping power. Still
interesting to see that no specific geographic targets and/or country
patterns emerged (for example: only richest/poorest countries or
European countries, or countries with cyber activists, or countries in a
war conflict, etc..) suggesting that the issue (having vulnerable SQLi
WebSite) is still quite spread all over the world. The following map
shows the geo-distribution domain names where domains such as:
“.ld”,”.dk”,”.nz”,”.ug”, “gk”, … , took a single hit, so are not
visualised.

The following histogram shows the percentage of web server technology
found in “presumed” vulnerable websites. Apache and Nginx are the most
common used technology. I am not saying that Apache and Nginx are
vulnerable to SQLi or that they might infer or enable in somehow
vulnerable webpages. Yet I am not saying that they are responsible in
anyway of serving vulnerable applications. Indeed vulnerable
applications does not have a direct link to the used web server, I am
just observing the analysed data. It could be an “obvious consequence”,
since Apache and Nginx technologies are the most used over the web, or
maybe not.

A little bit more interesting is the DB Technology distribution used in
presumed SQLi vulnerable websites. It might highlight the application
“type”. For example we might believe that applications built on top of
Microsoft Access are quite “old applications” (this is not always true,
I’m aware of it, but it might be an indicative parameter to be
considered on SQLi researches) or applications built on top of Oracle
databases might be corporate applications and not opensource and/or
“mockup” applications. Or we might stretch a little bit this concept by
assuming that applications built on top of Microsoft SQL servers might
be professional/company applications and so on and so forth. Of course
we cannot walk the same way starting from MySql or PostreSQL since both
of them are used into opensource/free applications as well as corporate
and professional ones.

Conclusions: Everyday we read about personal data breaches. One of the least ones happened on German Politics (more info here, here and here). (P) Data breaches might sap our companies and our digital identities, regulations have been made trying to normalise and to block breaches, but unfortunately in 2019 is still easy to get random personal data out of internet. In this personal research started on the darkweb and finally ended up on “paste” website, I’ve found out that a common and quite easy way to mine personal data, even in 2019, is through SQLinjection which is surprisedly still effective although hundreds of countermeasures (such as: frameworks, design patters, native parametrised queries, etc..). The main reason of the 327 circa vulnerable websites found in less then a day (according to the found pasties) are the un-patched software version. In fact it could be easy to find common google dorks on the attacker patterns. To block well-known SQLi vulnerabilities is pretty simple as patching your website. Please do it for the safety of your users.