Recently, while we were discussing a contract, an industry executive needed me to give an explanation of the difference between a “security audit” and a “penetration test.” The party with whom the executive was negotiating the contract had changed the contractual requirement from the one to the other. Since this might be of interest to others, I’ll provide the explanation here on the blog.

The short explanation is that a “penetration test” is just one small component of a “security audit,” and that the software / service provider was aiming for a much lower bar than we were aiming for on behalf of the executive and her MLS. Following is an explanation of how much lower it is.

A penetration test is an attempt to find weaknesses in the defenses in a computer system’s security. Sometimes the term is used in reference to non-computer security, but typically not. The test usually consists of a combination of automated and manual testing to find specific attacks that will enable the attacker to bypass certain defenses, then working to find additional vulnerabilities that can be found only once those initial defenses are defeated. As a result of this testing, risk can theoretically be assessed and a remediation plan put in place. One can also evaluate how well an organization detects and responds to an attack.

One significant problem with penetration testing is that, if the system being tested has good outer defenses, a penetration test may not find security risks lurking inside of the defenses. It may then present as its finding that risk is low and no action is needed. Then when there is a change to the outer defense that causes a vulnerability (or a vulnerability is discovered in that outer defense) actual attackers will breach all of the inner layers where security has not been well designed. It’s a well-known principle of security design that one designs multiple levels of security (a/k/a “defense in depth”). Penetration testing, on its own, doesn’t reliably measure whether this has been done properly.

For the more technical readers, I will provide an example:

Let’s say a web application has been written with excellent protections against SQL database injection attacks. The penetration testing is run against the application, no SQL injection issues are found, and the system passes with flying colors. But, let’s say the web application had full database server privileges (a “db_owner” role for MS SQL or a DBA or user with too many global privileges on MySQL), and that the database platform service had full system privileges (an “Administrator” role user on Windows or “root” on Linux). Or let’s say that someone didn’t have mySQL’s “Outfile” disabled along with other issues allowing remote file access. Then, one day, a programmer makes a single mistake (that never happens, right?) and all of the poor configuration behind the web application is exposed and a hacker can easily grab database contents and take over the database server or even the whole network. I’ve taken an attack exactly that far – from the login prompt on the outside of an application – during a sanctioned security audit, of course!

The other problem with using a penetration test as the sole way to measure security is that information security is a much broader area of exploration, normally measured during a full security audit. Most of the breaches we’ve had in this industry have been the result of weak policy and procedure or the contracts that reflect those policies, inadequate human resources practices, and physical security issues. Still other technical security issues have resulted from a lack of protection against screen scraping, and yet others from authentication related issues – both items that most penetration testing tools cannot easily uncover. Looking at the security configuration of network equipment, servers and workstation operating systems, platforms, and installed software, mobile devices, antivirus, printer and copier configuration (yes, I really said copiers!), password selection, backup practices and so much more – all of that falls outside the purview of typical penetration testing and is only reliably addressed via a full security audit.

There’s nothing wrong with using a penetration test as a part of a security audit – just don’t mistake the part for the whole.
When organizations create policy requiring screen-scraping and other automated attack prevention and monitoring, it’s important for those organizations to be specific enough to ensure that compliance with policy can be measured in some way.  Indeed, it is equally important for the organization to ensure that their technology contracts contain clear and explicit terms that implement those policies.

If policies and contracts do not contain specific anti-scraping technology requirements, one can easily end up in an argument over whether the steps taken to prevent scraping are sufficient, even if those steps are demonstrably ineffective.  For example, a website provider might implement a
“CAPTCHA” on login and say, “This should be enough to prove the humanity of the user.  It’s not a computer program using the website.”  But, not only are many CAPTCHA tests easy for computers to defeat (it’s an arm’s race!), but if all a data pirate needs to do is have a human being log in and/or complete a CAPTCHA test once per day and have the cookie (containing session information) captured by a computer for use in scraping data, it’s not a very high barrier.  Likewise, an anti-scraping solution might block an IP address as being used by a scraper if the website gets more than 20 or 30 information requests per minute from that address – and while that seems like a reasonable step, these days the more advanced scrapers spin up a hundred servers on different IP addresses and have each of them grab the data from just a few pages, then move those servers to different IP addresses.  Thus, anti-scraping is difficult, and while the mechanisms mentioned above might play a part in a solution, one must include a more comprehensive solution if one wishes to actually have a reasonable chance of stopping the screen scrapers.  Moreover, that more comprehensive solution should be detailed explicitly both in the terms of any contracts executed by the defending organization, as well as in any policies implemented by them regarding reasonable security measures.

Anti-scraping requirements might look something like the following:

The display (website, app’s API) must implement technology that prevents, detects, and proactively mitigates scraping. This means implementing an effective combination of the countermeasures defined in the “OWASP Automated Threat Handbook” “Figure 7: Automated Threat Countermeasure Classes” (reproduced below and available at Those countermeasures must be demonstrably effective against commercial scraping services as well as advanced and evolving scraping techniques.

The anti-scraping solution must be comprised of multiple countermeasures in all three classes of countermeasures (prevent, detect, and recover) as defined by OWASP, sufficient to address all aspects of the security threat model, including at least complete implementations of all of the following: Fingerprinting, Reputation, Rate, Monitoring, and Instrumentation.

Those fielding displays and APIs requiring anti-scraping technology must demonstrate compliance with the above requirements using technology they have built or a commercial product/service. It must be demonstrated that the technology meets those requirements and that it has been properly configured to effectively address scraping.

Following is some more detail about each countermeasure:


Brief Description



Recover (Mitigate)


Define relevant threats and assess effects on site’s performance toward business objectives





Hide assets, add overhead to screen scraping and hinder theft of assets





Identify automated usage by user agent string, HTTP request format, and/or devices fingerprint content





Use reputation analysis of user identity, user behavior, resources accessed, not accessed, or repeatedly accessed





Limit number and/or rate of usage per user, IP address/range, device ID / fingerprint, etc.





Monitor errors, anomalies, function usage/sequencing, and provide alerting and/or monitoring dashboard





Perform real-time attack detection and automated response





Use incident data to feed back into adjustments to countermeasures (e.g. requirements, testing, monitoring)





Share fingerprints and bot detection signals across infrastructure and clients




The appendix that follows provides more thorough OWASP countermeasure definitions.

Appendix: OWASP Countermeasure Definitions

The following is excerpted from “OWASP Automated Threat Handbook Web Applications”:

•    Requirements. Identify relevant automated threats in security risk assessment and assess effects of alternative countermeasures on functionality usability and accessibility. Use this to then define additional application development and deployment requirements.
•    Obfuscation. Hinder automated attacks by dynamically changing URLs, field names and content, or limiting access to indexing data, or adding extra headers/fields dynamically, or converting data into images, or adding page and session-specific tokens.
•    Fingerprinting. Identification and restriction of automated usage by automation identification techniques, including utilization of user agent string, and/or HTTP request format (e.g. header ordering), and/or HTTP header anomalies (e.g. HTTP protocol, header inconsistencies), dynamic injections, and/or device fingerprint content to determine whether a user is likely to be a human or not. As a result of these countermeasures, for example, browsers automated via tools such as Selenium must certainly be blocked. The technology should use machine learning or behavioral analysis utilized to detect automation patterns and adapt to the evolving threat on an ongoing basis.
•    Reputation. Identification and restriction of automated usage by utilizing reputation analysis of user identity (e.g. web browser fingerprint, device fingerprint, username, session, IP address/range/geolocation), and/or user behavior (e.g. previous site, entry point, time of day, rate of requests, rate of new session generation, paths through application), and/or types of resources accessed (e.g. static vs dynamic, invisible/ hidden links, robots.txt file, paths excluded in robots.txt, honey trap resources, cache-defined resources), and/or types of resources not accessed (e.g. JavaScript generated links), and/ or types of resources repeatedly accessed. As a result of these countermeasures, for example, known commercial scraping tools and the use of data center IP addresses must certainly be identified and blocked.
•    Rate. Set upper and/or lower limits and/or trend thresholds, and limit number and/or rate of usage per user, per group of users, per IP address/range, and per device ID/fingerprint.  Note that this kind of countermeasure cannot stand alone as hackers commonly utilize a slow crawl from many rotating IP addresses that can simulate the activity of legitimate users. Monitoring. Monitor errors, anomalies, function usage/sequencing, and provide alerting and/or monitoring dashboard.
•    Instrumentation. Build in application-wide instrumentation to perform real-time attack detection and automated response including locking users out, blocking, delaying, changing behavior, altering capacity/capability, enhanced identity authentication, CAPTCHA, penalty box, or other technique needed to ensure that automated attacks are unsuccessful. Response. Define actions in an incident response plan for various automated attack scenarios. Consider automated responses once an attack is detected. Consider using actual incident data to feed back into other countermeasures (e.g. Requirements, Testing, Monitoring).
•    Sharing. Share information about automated attacks, such as IP addresses or known violator device fingerprints, with others in same sector, with trade organizations, and with national CERTs.


mattsretechblog: matt cohen (Default)
Matt's Real Estate Tech Blog

Most Popular Tags


This blog is for informational purposes only. The author shall have no liability in connection with any inaccuracies or omissions herein. All trademarks are the property of their respective holders. The views expressed on this blog are those of the author and do not necessarily reflect the views of his employer. Non plaudite, modo pecuniam jacite.