Data breaches happen.
Today, as never before, data plays a fundamental role in our real life. Everybody is both:  dataproducer and dataconsumer. We are data producer by simply moving from one building to another one, having a smartphone in our pocket or surfing the web or just by tapping on smartphone applications. We are data consumer when we buy things on Amazon or when we read information on social networks or again when we consume raw data through API. Somebody refers to data as the “new Oil”  (concept usually accredited to Clive Humby)  and data is what we let on the digital world and it’s what we have very close to our physical life. Data is what we are on the cyber space. Data is what we need to protect. 
Protecting our private data is like protecting ourselves in the cyber world and for such a reason the protection needs to be regulated (GDPR teaches us).  So it might be very interesting to understand how data breaches might happen.
Unfortunately there is not a standard path to protect, for example a data breach might come through an insider attack, or by clicking on a malspam champaign or hitting eMail phishing or again through common vulnerabilities.  But one of the main path, so far, is driven by vulnerability. One of the most exploited vulnerability from attackers to illegally collect data is SQL-Injection. It is pretty easy to detect and to exploit even for not sophisticated attackers. But on the other side of the coin there are a lot of frameworks, designed patterns and methodologies to prevent and to block such a vulnerability. From here, I’ve started my research. I wanted to prove that SQLi vulnerabilities are quite “rare” (or difficult to find) in 2019, but — unfortunately — I acknowledged that I was wrong when I found these fresh pastes (here, here and here). The “possible attacker” exposed a set of “presumed” SQLi vulnerable websites harvested in a metter of 24h internet scanning.

principal domain names with SLQi

According to the “pastes” the attacker harvest 327 circa vulnerable websites in less then a day ! So let’s dig a little bit on them to see if we might find some interesting correlations.
A first interesting result comes from the first level domain names. Leaving out “.com” (which actually is the most common used domain name) it is possible to see additional interesting domain names such as “.ca”, “.it”, “.ir”, “.ch”, “.il” and so on, which are mostly “country” based domain names. I agree with those who might think that the used dataset could not be considered as a “significative dataset”, since 24h of internet scraping is far-far-far away from having an internet significative view, but we might agree that it could be considered as an “indicative dataset”. In other words if in only 24h of internet scraping he/she found 327 circa vulnerable websites, let’s immagine what an attacker could do with weeks or months of scraping power. Still interesting to see that no specific geographic targets and/or country patterns emerged (for example: only richest/poorest countries or European countries,  or countries with cyber activists, or countries in a war conflict, etc..) suggesting that the issue (having vulnerable SQLi WebSite) is still quite spread all over the world.  The following map shows the geo-distribution domain names where domains such as: “.ld”,”.dk”,”.nz”,”.ug”, “gk”, … , took a single hit, so are not visualised.

Domain Names Geographically Distributed

The following histogram shows the percentage of web server technology found in “presumed” vulnerable websites. Apache and Nginx are the most common used technology. I am not saying that Apache and Nginx are vulnerable to SQLi or that they might infer or enable  in somehow vulnerable webpages. Yet I am not saying that they are responsible in anyway of serving vulnerable applications. Indeed vulnerable applications does not have a direct link to the used web server, I am just observing the analysed data. It could be an “obvious consequence”, since Apache and Nginx technologies are the most used over the web, or maybe not. 

Percentage of WebServer Technology in front of vulnerable websites

A little bit more interesting is the DB Technology distribution used in presumed SQLi vulnerable websites. It might highlight the application “type”. For example we might believe that applications built on top of Microsoft Access are quite “old applications” (this is not always true, I’m aware of it, but it might be an indicative parameter to be considered on SQLi researches) or applications built on top of Oracle databases might be corporate applications and not opensource and/or “mockup” applications. Or we might stretch a little bit this concept by assuming that applications built on top of Microsoft SQL servers might be professional/company applications and so on and so forth. Of course we cannot walk the same way starting from MySql or PostreSQL since both of them are used into opensource/free applications as well as corporate and professional ones.

Percentage of DataBase Technology in of vulnerable websites backend

Conclusions: Everyday we read about personal data breaches. One of the least ones happened on German Politics (more info here, here and here). (P) Data breaches might sap our companies and our digital identities, regulations have been made trying to normalise and to block breaches, but unfortunately in 2019 is still easy to get random personal data out of internet. In this personal research started on the darkweb and finally ended up on “paste” website,  I’ve found out that a common and quite easy way to mine personal data, even in 2019, is through SQLinjection which is surprisedly still effective although hundreds of countermeasures (such as: frameworks, design patters, native parametrised queries, etc..). The main reason of the 327 circa vulnerable websites found in less then a day (according to the found pasties) are the un-patched software version. In fact it could be easy to find common google dorks on the attacker patterns. To block well-known SQLi vulnerabilities is pretty simple as patching your website. Please do it for the safety of your users.