Everyone, sorry for the loooong delay on the response, the past weeks have
been totally crazy.
Anyhow, here what I think and I would suggest, based on the current draft:
1. Tool Setup and Installation
"Setup and Installation" is not really interesting, I believe. The more
important is the platform support (can I run the tool from my linux box, my
mac, our windows server, etc).
1.1 Time required to perform initial installation
That's usually subjective, unless you say something like "always less than 2
hours", "always less than a day", etc. But then again, I find this totally
quite irrelevant to the problem.
1.2 Skills required to perform initial installation
1.3 Privileges required to perform initial installation
I don't find this item very informative. Okay, you need to have root access,
or admin access on the machine... or not.
1.4 Documentation setup accuracy
1.5 Platform Support
This one is interesting for the customers.
2. Performing a Scan
Logically, I would not talk about scanning just now. But, after the platform
support section, I would talk about language, framework support.
2.1 Time required to perform a scan
This does not make any sense. "Time required to scan"... what? This question
however, is answerable if we provide a proper test case and environment to
run the tool. But then again, it's a quite misleading information.
2.2 Number of steps required to perform a scan
Many tools have scripting interfaces. Using scripts, you reduce your steps
from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.
2.3 Skills required to perform a scan
I understand that some tools (like PolySpace) require someone to actually
design and model the suspected behavior of the program. But most tools do
not require that. Then again, how to rate the user? Do we assume the user
(who runs the scan) will also look at the findings? Does he also setup the
scan? I definitely see the scan being run by security operation (mostly for
monitoring), and being setup by security engineers...
3. Tool Coverage:
"Tool Coverage" might be the most misleading term here. Coverage of what?!
Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?
3.1 Languages supported by the tool
Very important. Now, we should not limit ourselves to the languages, but we
should go at the version of framework level. Nowadays, the language is just
a mean, most of the juicy stuff happen in the relationship with the
frameworks... Also, the behavior of the frameworks might be different from
one version to one another...
*3.2 Support for Semantic Analysis
*3.3 Support for Syntactic Analysis *
I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of inter-procedural
data flow analysis, or if it has some other limitations. Then again, I would
prefer not to talk about the underlying logics (and modeling) of the tool
since I believe this is out of scope. Users don't really care about that,
they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
3.4 Ability of the tool to understand different components of a project
(.sql, .xml, .xsd, .properties…etc)
This is a very interesting item. When generalized a little bit, we can
derive several items:
Another item that would be quite interesting, is the support for "new
extensions", or redefinition of extensions. Let's say the tool does
recognize ".pl" as perl, but that I have all my stored procedures (in
PL/SQL) with this extension, I'd like to be able to tell the tool to
consider the .pl to be PL/SQL for this application. The same reasoning needs
to be done for new extensions.
3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10,
SANS Top 25…etc)
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.
4. Detection Accuracy
Usually, that does not mean anything.
4.1 Number of false positives
4.2 Number of true negatives
My first comment here was "Gniii?", then I did s/Number/Rate and it made a
bit more sense.
I could understand why someone would want to get a rate of false-positive,
and false-negatives, but true-negatives? True negatives, are the things that
are not reported by the tool, and it's good from the tool not to report
them, and examples would be data flow path that uses a proper validation
routine before sending the data to a sink. You do not want the tool to
report such, and this is a true-negative.
By the way, the rate of FP/FN are very interesting for an experiment point
of view, but there is no way to get this data to mean anything for Joe the
project manager who wants to get a tool. Most likely your data will be very
different than his (if you're making the same experiment on your
applications). Sad reality fix: tools results depend a lot on the
*4.3 Accuracy % *
Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We
cannot measure that in a meaningful way.
5. Triage and Remediation Process
Do we want to talk about the quality of the UI provided by the tool to
facilitate the triage? IMO, the remediation process is out of scope for a
5.1 Average time to triage a finding*
This seems to me like rating your assessor more than the tool you use.
5.2 Quality of data surrounding a finding (explanation, tracing, trust
Those are indeed very important information. As an assessor, I want to know
why the heck this tool reported this finding to me. Not only I want to have
paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that generated
the finding. This is extremely useful for advanced use of the tools. I
undersand that most tools do not report that, so at least reporting the rule
ID (or something I can track later on, and make sense of) is important.
5.3 Ability to mark findings as false positive
Mark a finding as FP might have several meaning. Does this mean:
5.4 Ability to “diff” assessments
Very important indeed.
5.5 Ability to merge assessments*
Tracking, merging, combining assessment is definitely part of the
5.6 Correctness of remediation advice*
5.7 Completeness of remediation advice
I hope no one actually relies on the tool to give proper remediation advice.
They're usually fine to give an idea, but no way they will give you a good
solution, for your case (even though, in theory they have lots of
information to do so).
5.8 Does the tool automatically prioritize defects*
Prioritize what? Is this category supposed to be talking about the severity
rating? Is this talking about prioritization at the engine level so that the
tool misses lots of stuff (yeah, that's usually what happen when the flow
6. UI Simplicity and Intuitiveness
6.1 Quality of triage interface (need a way to measure this)
6.2 Quality of remediation interface (need a way to measure this)
6.3 Support for IDE plug-ins both out of the box and on-demand
"Integration with IDEs", and possible support for new IDEs. Yes, that's
important to get at least, a list of integrated IDEs.
6.4 Quality of tools’ out of the box plugin UI
Subjective. Why not talking about the features available though the plugin.
7. Product Update Process
It's indeed good to know that automated/federated/etc. updates are possible.
7.1 Frequency of signature update*
Interesting, but the reader must be careful not to make much decision based
on that. If the tool gets a new pack of rules every week or every months,
that does not mean much about the quality...
7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats*
Are we talking about new weaknesses? The word "threat" is very confusing
here... and does not make sense to me in the context of SAST.
8. Product Maturity and Scalability
Would be good to know indeed, though... how to get the data?
8.1 Peak memory usage*
42GB?! That's a very subjective data that depends on many factors (machine,
configuration, application, etc. etc.)
8.2 Number of scans done before a crash or serious degradation in
42, but only because it was 71 degree in the room, and the train was passing
every 2.5 days.
8.3 Maximum lines of code the tool can scan per project*
It would be good to talk about scalability of the tool, and how to improve
it. For examples, can I scan the same application with several machines
(parallelism)? If I add more RAM/CPU, do I get much better results? Is there
a known limit?
8.4 What languages does the tool support?*
This should be covered in a different section.
9. Enterprise Offerings
This is also very interesting for companies. However,
the enterprise offerings, are usually, central solution host findings,
review findings, etc. This is not really SAST, but SAST-management. Do we
want to talk about that? I'm happy to have this in the criteria...
9.1 Ability to integrate with major bug tracking systems*
This is mostly a general comment, but instead of a boolean answer. We should
ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with
9.2 Ability to integrate with enterprise software configuration management*
To what regard?
10. Reporting Capabilities
10.1 Quality of reports
10.2 Availability of role-based reports*
It's indeed important to report different kind of data for the engineer,
dev, QA, managers, etc. Eventually, we're talking about data reporting here,
and tools should provide several ways to slice and represent the data for
the different audience.
10.3 Availability of report customization*
Yup, though, to what extent is the report customizable? Can I just change
the logo, or can I integrate the findings in my word template?
11. Tool Customization and Automation
I feel that we're finally going to touch the interesting part. Every mature
use of SAST have to make use of automation, and tool customization. This
section is a very important one, and we should emphasize it as much as we
11.1 Can custom rules be added?*
Right, that's the first question to ask. Does the tool support finding
support customization? Now, we need many other points, such as ... What kind
of rules are supported? Can we specific/create a new type of
11.2 Do the rules need learning new language\script?*
Most likely it will be "yes", unless it's only GUI based. My point is that
even XML rules represent a "language" to describe the rules...
11.3 Can the tool be scripted? (e.g. integrated into ANT build script or
other build script)*
Build automation is crucial, but to me, is different than automation. This
item should be in a different section.
11.4 Can documentation be customized (installation instructions, remediation
advice, finding explanation…etc)*
Interesting point. Can we overwrite the remediation given by a tool?
11.5 Can the defect prioritization scheme customized?*
Right! Can I integrate the results within my risk management system?
11.6 Can the tool be extended so that custom plugins could be developed for
That part should be in the IDE integration.
In summary, I believe that the SATEC needs to be restructured to address the
actual problems. We should also move away from any subjective criterion. I
believe that the SATEC should be able to be filled-in by a tool vendor, or
someone who will evaluate the tool. Eventually, we should provide a
spreadsheet that could be filled.
Concerning the overall sections, the order should make sense as well.
Anyhow, I suggest the list to rethink about the current criteria and see
what can be measured properly, and what needs to be captured by any tool
evaluator. The following is just a suggestion (came up with that in too
little time), but I believe it captures the interesting part in a better
Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.