wasc-satec@lists.webappsec.org

WASC Static Analysis Tool Evaluation Criteria

View all threads

Fwd: Comments on the direction of SATEC - And on every items. - RFC

SK
Sherif Koussa
Tue, Sep 13, 2011 5:27 PM

Hi All,

I forwarding the email below to everyone on the list due to the nature of
the email. Romain basically comments on the direction the SATEC project is
going with, in addition to requesting some major changes with the categories
and sub-categories.

I would like to hear your opinion on what is proposed below.

Regards,
Sherif

---------- Forwarded message ----------
From: Romain Gaucher romain@webappsec.org
Date: Mon, Aug 22, 2011 at 11:47 AM
Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every
items.
To: wasc-satec@lists.webappsec.org

Everyone, sorry for the loooong delay on the response, the past weeks have
been totally crazy.
Anyhow, here what I think and I would suggest, based on the current draft:

1. Tool Setup and Installation

"Setup and Installation" is not really interesting, I believe. The more
important is the platform support (can I run the tool from my linux box, my
mac, our windows server, etc).

1.1 Time required to perform initial installation

That's usually subjective, unless you say something like "always less than 2
hours", "always less than a day", etc. But then again, I find this totally
quite irrelevant to the problem.

1.2 Skills required to perform initial installation

Subjective.

1.3 Privileges required to perform initial installation

I don't find this item very informative. Okay, you need to have root access,
or admin access on the machine... or not.

1.4 Documentation setup accuracy

Subjective.

1.5 Platform Support

This one is interesting for the customers.

2. Performing a Scan

Logically, I would not talk about scanning just now. But, after the platform
support section, I would talk about language, framework support.

2.1 Time required to perform a scan

This does not make any sense. "Time required to scan"... what? This question
however, is answerable if we provide a proper test case and environment to
run the tool. But then again, it's a quite misleading information.

2.2 Number of steps required to perform a scan

Many tools have scripting interfaces. Using scripts, you reduce your steps
from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.

2.3 Skills required to perform a scan

I understand that some tools (like PolySpace) require someone to actually
design and model the suspected behavior of the program. But most tools do
not require that. Then again, how to rate the user? Do we assume the user
(who runs the scan) will also look at the findings? Does he also setup the
scan? I definitely see the scan being run by security operation (mostly for
monitoring), and being setup by security engineers...

3. Tool Coverage:

"Tool Coverage" might be the most misleading term here. Coverage of what?!
Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?

3.1 Languages supported by the tool

Very important. Now, we should not limit ourselves to the languages, but we
should go at the version of framework level. Nowadays, the language is just
a mean, most of the juicy stuff happen in the relationship with the
frameworks... Also, the behavior of the frameworks might be different from
one version to one another...

*3.2 Support for Semantic Analysis
*
*3.3 Support for Syntactic Analysis *

I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of inter-procedural
data flow analysis, or if it has some other limitations. Then again, I would
prefer not to talk about the underlying logics (and modeling) of the tool
since I believe this is out of scope. Users don't really care about that,
they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
care
.

3.4 Ability of the tool to understand different components of a project
(.sql, .xml, .xsd, .properties…etc)

This is a very interesting item. When generalized a little bit, we can
derive several items:

  • Analysis support of configuration files (i.e., the tool gets knowledge
    from the configuration files)
  • Analysis support for multiple languages in separated files
  • Cross-languages analysis support (the tool is capable of performing its
    analysis from one language, let's say Java, to SQL, and back to Java)

Another item that would be quite interesting, is the support for "new
extensions", or redefinition of extensions. Let's say the tool does
recognize ".pl" as perl, but that I have all my stored procedures (in
PL/SQL) with this extension, I'd like to be able to tell the tool to
consider the .pl to be PL/SQL for this application. The same reasoning needs
to be done for new extensions.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10,
SANS Top 25…etc)

*
*
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or
    weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we
    should use CWE, and nothing else.
  • Is this category about the the reporting and classification of findings?
    (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's
    very bad for your PCI compliance!"

4. Detection Accuracy

Usually, that does not mean anything.

4.1 Number of false positives
4.2 Number of true negatives

My first comment here was "Gniii?", then I did s/Number/Rate and it made a
bit more sense.
I could understand why someone would want to get a rate of false-positive,
and false-negatives, but true-negatives? True negatives, are the things that
are not reported by the tool, and it's good from the tool not to report
them, and examples would be data flow path that uses a proper validation
routine before sending the data to a sink. You do not want the tool to
report such, and this is a true-negative.

By the way, the rate of FP/FN are very interesting for an experiment point
of view, but there is no way to get this data to mean anything for Joe the
project manager who wants to get a tool. Most likely your data will be very
different than his (if you're making the same experiment on your
applications). Sad reality fix: tools results depend a lot on the
application.

*4.3 Accuracy % *

Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We
cannot measure that in a meaningful way.

5. Triage and Remediation Process
*
*
Do we want to talk about the quality of the UI provided by the tool to
facilitate the triage? IMO, the remediation process is out of scope for a
SAST.
*
5.1 Average time to triage a finding*
*
*
This seems to me like rating your assessor more than the tool you use.
*
5.2 Quality of data surrounding a finding (explanation, tracing, trust
level…etc)*
*
*
Those are indeed very important information. As an assessor, I want to know
why the heck this tool reported this finding to me. Not only I want to have
paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that generated
the finding. This is extremely useful for advanced use of the tools. I
undersand that most tools do not report that, so at least reporting the rule
ID (or something I can track later on, and make sense of) is important.
*
*
5.3 Ability to mark findings as false positive
*
*
Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will
    encounter a similar case, it won't report it?

5.4 Ability to “diff” assessments
*
*
Very important indeed.
*
5.5 Ability to merge assessments*
*
*
Tracking, merging, combining assessment is definitely part of the
workflow...
*
5.6 Correctness of remediation advice*
5.7 Completeness of remediation advice
*
*
I hope no one actually relies on the tool to give proper remediation advice.
They're usually fine to give an idea, but no way they will give you a good
solution, for your case (even though, in theory they have lots of
information to do so).
*
5.8 Does the tool automatically prioritize defects*
*
*
Prioritize what? Is this category supposed to be talking about the severity
rating? Is this talking about prioritization at the engine level so that the
tool misses lots of stuff (yeah, that's usually what happen when the flow
gets complex).
*
*
6. UI Simplicity and Intuitiveness
6.1 Quality of triage interface (need a way to measure this)
6.2 Quality of remediation interface (need a way to measure this)

Subjective.

6.3 Support for IDE plug-ins both out of the box and on-demand

"Integration with IDEs", and possible support for new IDEs. Yes, that's
important to get at least, a list of integrated IDEs.

6.4 Quality of tools’ out of the box plugin UI

Subjective. Why not talking about the features available though the plugin.

7. Product Update Process
*
*
It's indeed good to know that automated/federated/etc. updates are possible.
*
7.1 Frequency of signature update*
*
*
Interesting, but the reader must be careful not to make much decision based
on that. If the tool gets a new pack of rules every week or every months,
that does not mean much about the quality...
*
7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats*
*
*
Are we talking about new weaknesses? The word "threat" is very confusing
here... and does not make sense to me in the context of SAST.

8. Product Maturity and Scalability
*
*
Would be good to know indeed, though... how to get the data?
*
8.1 Peak memory usage*
*
*
42GB?! That's a very subjective data that depends on many factors (machine,
configuration, application, etc. etc.)
*
8.2 Number of scans done before a crash or serious degradation in
performance*
*
*
42, but only because it was 71 degree in the room, and the train was passing
every 2.5 days.
*
8.3 Maximum lines of code the tool can scan per project*
*
*
It would be good to talk about scalability of the tool, and how to improve
it. For examples, can I scan the same application with several machines
(parallelism)? If I add more RAM/CPU, do I get much better results? Is there
a known limit?
*
8.4 What languages does the tool support?*
*
*
This should be covered in a different section.
*
*
9. Enterprise Offerings
*
*
This is also very interesting for companies. However,
the enterprise offerings, are usually, central solution host findings,
review findings, etc. This is not really SAST, but SAST-management. Do we
want to talk about that? I'm happy to have this in the criteria...
*
9.1 Ability to integrate with major bug tracking systems*
*
*
This is mostly a general comment, but instead of a boolean answer. We should
ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with
JoeBugTracker...
*
9.2 Ability to integrate with enterprise software configuration management*
*
*
To what regard?

10. Reporting Capabilities
10.1 Quality of reports

*
*
Subjective.
*
10.2 Availability of role-based reports*
*
*
It's indeed important to report different kind of data for the engineer,
dev, QA, managers, etc. Eventually, we're talking about data reporting here,
and tools should provide several ways to slice and represent the data for
the different audience.
*
10.3 Availability of report customization*
*
*
Yup, though, to what extent is the report customizable? Can I just change
the logo, or can I integrate the findings in my word template?

11. Tool Customization and Automation
*
*
I feel that we're finally going to touch the interesting part. Every mature
use of SAST have to make use of automation, and tool customization. This
section is a very important one, and we should emphasize it as much as we
can.
*
11.1 Can custom rules be added?*
*
*
Right, that's the first question to ask. Does the tool support finding
support customization? Now, we need many other points, such as ... What kind
of rules are supported? Can we specific/create a new type of
weakness/findings/category?
*
11.2 Do the rules need learning new language\script?*
*
*
Most likely it will be "yes", unless it's only GUI based. My point is that
even XML rules represent a "language" to describe the rules...
*
11.3 Can the tool be scripted? (e.g. integrated into ANT build script or
other build script)*
*
*
Build automation is crucial, but to me, is different than automation. This
item should be in a different section.
*
11.4 Can documentation be customized (installation instructions, remediation
advice, finding explanation…etc)*
*
*
Interesting point. Can we overwrite the remediation given by a tool?
*
11.5 Can the defect prioritization scheme customized?*
*
*
Right! Can I integrate the results within my risk management system?
*
11.6 Can the tool be extended so that custom plugins could be developed for
other IDEs?*
*
*
That part should be in the IDE integration.

In summary, I believe that the SATEC needs to be restructured to address the
actual problems. We should also move away from any subjective criterion. I
believe that the SATEC should be able to be filled-in by a tool vendor, or
someone who will evaluate the tool. Eventually, we should provide a
spreadsheet that could be filled.

Concerning the overall sections, the order should make sense as well.

Anyhow, I suggest the list to rethink about the current criteria and see
what can be measured properly, and what needs to be captured by any tool
evaluator. The following is just a suggestion (came up with that in too
little time), but I believe it captures the interesting part in a better
order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization, etc.)

Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.

Romain


wasc-satec mailing list
wasc-satec@lists.webappsec.org
http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org

Hi All, I forwarding the email below to everyone on the list due to the nature of the email. Romain basically comments on the direction the SATEC project is going with, in addition to requesting some major changes with the categories and sub-categories. I would like to hear your opinion on what is proposed below. Regards, Sherif ---------- Forwarded message ---------- From: Romain Gaucher <romain@webappsec.org> Date: Mon, Aug 22, 2011 at 11:47 AM Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every items. To: wasc-satec@lists.webappsec.org Everyone, sorry for the loooong delay on the response, the past weeks have been totally crazy. Anyhow, here what I think and I would suggest, based on the current draft: *1. Tool Setup and Installation* "Setup and Installation" is not really interesting, I believe. The more important is the platform support (can I run the tool from my linux box, my mac, our windows server, etc). *1.1 Time required to perform initial installation* That's usually subjective, unless you say something like "always less than 2 hours", "always less than a day", etc. But then again, I find this totally quite irrelevant to the problem. *1.2 Skills required to perform initial installation* Subjective. *1.3 Privileges required to perform initial installation* I don't find this item very informative. Okay, you need to have root access, or admin access on the machine... or not. *1.4 Documentation setup accuracy* Subjective. *1.5 Platform Support* This one is interesting for the customers. *2. Performing a Scan* Logically, I would not talk about scanning just now. But, after the platform support section, I would talk about language, framework support. *2.1 Time required to perform a scan* This does not make any sense. "Time required to scan"... what? This question however, is answerable if we provide a proper test case and environment to run the tool. But then again, it's a quite misleading information. *2.2 Number of steps required to perform a scan* Many tools have scripting interfaces. Using scripts, you reduce your steps from 7, to 1 (i.e., run the script). How does that count? In summary, I find this information not interesting at all. *2.3 Skills required to perform a scan* I understand that some tools (like PolySpace) require someone to actually design and model the suspected behavior of the program. But most tools do not require that. Then again, how to rate the user? Do we assume the user (who runs the scan) will also look at the findings? Does he also setup the scan? I definitely see the scan being run by security operation (mostly for monitoring), and being setup by security engineers... *3. Tool Coverage:* "Tool Coverage" might be the most misleading term here. Coverage of what?! Coverage of supported weaknesses, languages, version of languages, framework, application coverage, entry point coverage, etc.? *3.1 Languages supported by the tool* Very important. Now, we should not limit ourselves to the languages, but we should go at the version of framework level. Nowadays, the language is just a mean, most of the juicy stuff happen in the relationship with the frameworks... Also, the behavior of the frameworks might be different from one version to one another... *3.2 Support for Semantic Analysis * *3.3 Support for Syntactic Analysis * I do not understand these items. (Usually, "semantic" is used to say something like AST-level type of knowledge). I would be, honestly, more interested to know if the tool is properly capable of inter-procedural data flow analysis, or if it has some other limitations. Then again, I would prefer not to talk about the underlying logics (and modeling) of the tool since I believe this is out of scope. Users don't really care about that, they just want the tool to work perfectly. If you use a dataflow based model, abstract interpretation, or whatever one comes up with ... *don't care*. *3.4 Ability of the tool to understand different components of a project (.sql, .xml, .xsd, .properties…etc)* This is a very interesting item. When generalized a little bit, we can derive several items: - Analysis support of configuration files (i.e., the tool gets knowledge from the configuration files) - Analysis support for multiple languages in separated files - Cross-languages analysis support (the tool is capable of performing its analysis from one language, let's say Java, to SQL, and back to Java) Another item that would be quite interesting, is the support for "new extensions", or redefinition of extensions. Let's say the tool does recognize ".pl" as perl, but that I have all my stored procedures (in PL/SQL) with this extension, I'd like to be able to tell the tool to consider the .pl to be PL/SQL for this application. The same reasoning needs to be done for new extensions. *3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, SANS Top 25…etc)* * * Static analysis tools do not find vulnerabilities. They find source code weaknesses (there is a huge difference). Now, I do not understand what "coverage of industry standard vulnerability categories" mean. - Is this category supposed to be about coverage of type of "stuff" (or weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we should use CWE, and nothing else. - Is this category about the the reporting and classification of findings? (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's very bad for your PCI compliance!" *4. Detection Accuracy* Usually, that does not mean anything. *4.1 Number of false positives 4.2 Number of true negatives* My first comment here was "Gniii?", then I did s/Number/Rate and it made a bit more sense. I could understand why someone would want to get a rate of false-positive, and false-negatives, but true-negatives? True negatives, are the things that are not reported by the tool, and it's good from the tool not to report them, and examples would be data flow path that uses a proper validation routine before sending the data to a sink. You do not want the tool to report such, and this is a true-negative. By the way, the rate of FP/FN are very interesting for an experiment point of view, but there is no way to get this data to mean anything for Joe the project manager who wants to get a tool. Most likely your data will be very different than his (if you're making the same experiment on your applications). Sad reality fix: tools results depend a lot on the application. *4.3 Accuracy % * Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We cannot measure that in a meaningful way. *5. Triage and Remediation Process* * * Do we want to talk about the quality of the UI provided by the tool to facilitate the triage? IMO, the remediation process is out of scope for a SAST. * 5.1 Average time to triage a finding* * * This seems to me like rating your assessor more than the tool you use. * 5.2 Quality of data surrounding a finding (explanation, tracing, trust level…etc)* * * Those are indeed very important information. As an assessor, I want to know why the heck this tool reported this finding to me. Not only I want to have paths, confidence, data flow info, etc. but I want to know the internals. Some tools will report the pre-conditions and post conditions that generated the finding. This is extremely useful for advanced use of the tools. I undersand that most tools do not report that, so at least reporting the rule ID (or something I can track later on, and make sense of) is important. * * *5.3 Ability to mark findings as false positive* * * Mark a finding as FP might have several meaning. Does this mean: - Mark the findings as FP for the report? - Mark the findings as FP for the engine, so that next time it will encounter a similar case, it won't report it? *5.4 Ability to “diff” assessments* * * Very important indeed. * 5.5 Ability to merge assessments* * * Tracking, merging, combining assessment is definitely part of the workflow... * 5.6 Correctness of remediation advice* *5.7 Completeness of remediation advice* * * I hope no one actually relies on the tool to give proper remediation advice. They're usually fine to give an idea, but no way they will give you a good solution, for your case (even though, in theory they have lots of information to do so). * 5.8 Does the tool automatically prioritize defects* * * Prioritize what? Is this category supposed to be talking about the severity rating? Is this talking about prioritization at the engine level so that the tool misses lots of stuff (yeah, that's usually what happen when the flow gets complex). * * *6. UI Simplicity and Intuitiveness 6.1 Quality of triage interface (need a way to measure this) 6.2 Quality of remediation interface (need a way to measure this)* Subjective. *6.3 Support for IDE plug-ins both out of the box and on-demand* "Integration with IDEs", and possible support for new IDEs. Yes, that's important to get at least, a list of integrated IDEs. *6.4 Quality of tools’ out of the box plugin UI* Subjective. Why not talking about the features available though the plugin. *7. Product Update Process* * * It's indeed good to know that automated/federated/etc. updates are possible. * 7.1 Frequency of signature update* * * Interesting, but the reader must be careful not to make much decision based on that. If the tool gets a new pack of rules every week or every months, that does not mean much about the quality... * 7.2 Relevance of signatures to evolving threats 7.3 Re-activeness to evolving threats* * * Are we talking about new weaknesses? The word "threat" is very confusing here... and does not make sense to me in the context of SAST. *8. Product Maturity and Scalability* * * Would be good to know indeed, though... how to get the data? * 8.1 Peak memory usage* * * 42GB?! That's a very subjective data that depends on many factors (machine, configuration, application, etc. etc.) * 8.2 Number of scans done before a crash or serious degradation in performance* * * 42, but only because it was 71 degree in the room, and the train was passing every 2.5 days. * 8.3 Maximum lines of code the tool can scan per project* * * It would be good to talk about scalability of the tool, and how to improve it. For examples, can I scan the same application with several machines (parallelism)? If I add more RAM/CPU, do I get much better results? Is there a known limit? * 8.4 What languages does the tool support?* * * This should be covered in a different section. * * *9. Enterprise Offerings* * * This is also very interesting for companies. However, the enterprise offerings, are usually, central solution host findings, review findings, etc. This is not really SAST, but SAST-management. Do we want to talk about that? I'm happy to have this in the criteria... * 9.1 Ability to integrate with major bug tracking systems* * * This is mostly a general comment, but instead of a boolean answer. We should ask for the supported bug tracking systems. Also, it's important to customize this, and to be able to integrate with JoeBugTracker... * 9.2 Ability to integrate with enterprise software configuration management* * * To what regard? *10. Reporting Capabilities 10.1 Quality of reports* * * Subjective. * 10.2 Availability of role-based reports* * * It's indeed important to report different kind of data for the engineer, dev, QA, managers, etc. Eventually, we're talking about data reporting here, and tools should provide several ways to slice and represent the data for the different audience. * 10.3 Availability of report customization* * * Yup, though, to what extent is the report customizable? Can I just change the logo, or can I integrate the findings in my word template? *11. Tool Customization and Automation* * * I feel that we're finally going to touch the interesting part. Every mature use of SAST have to make use of automation, and tool customization. This section is a very important one, and we should emphasize it as much as we can. * 11.1 Can custom rules be added?* * * Right, that's the first question to ask. Does the tool support finding support customization? Now, we need many other points, such as ... What kind of rules are supported? Can we specific/create a new type of weakness/findings/category? * 11.2 Do the rules need learning new language\script?* * * Most likely it will be "yes", unless it's only GUI based. My point is that even XML rules represent a "language" to describe the rules... * 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or other build script)* * * Build automation is crucial, but to me, is different than automation. This item should be in a different section. * 11.4 Can documentation be customized (installation instructions, remediation advice, finding explanation…etc)* * * Interesting point. Can we overwrite the remediation given by a tool? * 11.5 Can the defect prioritization scheme customized?* * * Right! Can I integrate the results within my risk management system? * 11.6 Can the tool be extended so that custom plugins could be developed for other IDEs?* * * That part should be in the IDE integration. In summary, I believe that the SATEC needs to be restructured to address the actual problems. We should also move away from any subjective criterion. I believe that the SATEC should be able to be filled-in by a tool vendor, or someone who will evaluate the tool. Eventually, we should provide a spreadsheet that could be filled. Concerning the overall sections, the order should make sense as well. Anyhow, I suggest the list to rethink about the current criteria and see what can be measured properly, and what needs to be captured by any tool evaluator. The following is just a suggestion (came up with that in too little time), but I believe it captures the interesting part in a better order: 1. Platform support 2.1 OS support 2.2 Scalability tuning (support for 64bits, etc.) 2. Application technology support 2.1 Language support (up to the version of language) 2.2 Framework support 3. Scan, command and control 3.1 Scan configuration 3.2 Build system integration 3.3 IDE integration 3.4 Command line support 3.5 Automation support 3.6 Enterprise offerings (need of a better terminology) 4. Application analysis 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) 4.2 Customization 4.3 Triage capabilities 4.4 Scan results post-processing 5. Reporting 5.1 Reports for different audiences 5.2 Report customization 5.3 Finding-level reporting information 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) 5.3.2 Finding description (paths, pre-post conditions, etc.) 5.3.3 Finding remediation (available, customizable, etc.) 6. Miscellanies 6.1 Knowledge update (rules update) 6.2 Integration in bug trackers (list of supported BT, customization, etc.) Btw, I'm sorry to come back with such feedback quite late... but the deadlines are too aggressive for me. Romain _______________________________________________ wasc-satec mailing list wasc-satec@lists.webappsec.org http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org
AZ
Alen Zukich
Tue, Sep 13, 2011 7:12 PM

My comments below.

alen

From: wasc-satec-bounces@lists.webappsec.org [mailto:wasc-satec-bounces@lists.webappsec.org] On Behalf Of Sherif Koussa
Sent: September-13-11 1:28 PM
To: wasc-satec@lists.webappsec.org
Subject: [WASC-SATEC] Fwd: Comments on the direction of SATEC - And on every items. - RFC

Hi All,

I forwarding the email below to everyone on the list due to the nature of the email. Romain basically comments on the direction the SATEC project is going with, in addition to requesting some major changes with the categories and sub-categories.

I would like to hear your opinion on what is proposed below.

Regards,
Sherif
---------- Forwarded message ----------
From: Romain Gaucher <romain@webappsec.orgmailto:romain@webappsec.org>
Date: Mon, Aug 22, 2011 at 11:47 AM
Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every items.
To: wasc-satec@lists.webappsec.orgmailto:wasc-satec@lists.webappsec.org

Everyone, sorry for the loooong delay on the response, the past weeks have been totally crazy.
Anyhow, here what I think and I would suggest, based on the current draft:

  1. Tool Setup and Installation

"Setup and Installation" is not really interesting, I believe. The more important is the platform support (can I run the tool from my linux box, my mac, our windows server, etc).
[az] Agree.  Platform is really the only criteria.

1.1 Time required to perform initial installation

That's usually subjective, unless you say something like "always less than 2 hours", "always less than a day", etc. But then again, I find this totally quite irrelevant to the problem.
[az] Is this really a pain point these days?

1.2 Skills required to perform initial installation

Subjective.
[az] I don't know if it is subjective but it doesn't help with much.  Installation used to be a problem with tools but that really has gone away.

1.3 Privileges required to perform initial installation

I don't find this item very informative. Okay, you need to have root access, or admin access on the machine... or not.
[az] Agreed.

1.4 Documentation setup accuracy

Subjective.
[az] Not even sure what this means?

1.5 Platform Support

This one is interesting for the customers.

  1. Performing a Scan

Logically, I would not talk about scanning just now. But, after the platform support section, I would talk about language, framework support.

2.1 Time required to perform a scan
This does not make any sense. "Time required to scan"... what? This question however, is answerable if we provide a proper test case and environment to run the tool. But then again, it's a quite misleading information.
[az] This is usually a very common criteria.  A better way is to measure based on native build time.  i.e. typically the tool takes 1-3X the native build time (or compile time).

2.2 Number of steps required to perform a scan
Many tools have scripting interfaces. Using scripts, you reduce your steps from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.
[az] Agreed.

2.3 Skills required to perform a scan

I understand that some tools (like PolySpace) require someone to actually design and model the suspected behavior of the program. But most tools do not require that. Then again, how to rate the user? Do we assume the user (who runs the scan) will also look at the findings? Does he also setup the scan? I definitely see the scan being run by security operation (mostly for monitoring), and being setup by security engineers...
[az] Don't know if this category is helpful.

  1. Tool Coverage:
    "Tool Coverage" might be the most misleading term here. Coverage of what?! Coverage of supported weaknesses, languages, version of languages, framework, application coverage, entry point coverage, etc.?

3.1 Languages supported by the tool
Very important. Now, we should not limit ourselves to the languages, but we should go at the version of framework level. Nowadays, the language is just a mean, most of the juicy stuff happen in the relationship with the frameworks... Also, the behavior of the frameworks might be different from one version to one another...

3.2 Support for Semantic Analysis
3.3 Support for Syntactic Analysis
I do not understand these items. (Usually, "semantic" is used to say something like AST-level type of knowledge). I would be, honestly, more interested to know if the tool is properly capable of inter-procedural data flow analysis, or if it has some other limitations. Then again, I would prefer not to talk about the underlying logics (and modeling) of the tool since I believe this is out of scope. Users don't really care about that, they just want the tool to work perfectly. If you use a dataflow based model, abstract interpretation, or whatever one comes up with ... don't care.

3.4 Ability of the tool to understand different components of a project (.sql, .xml, .xsd, .properties...etc)
This is a very interesting item. When generalized a little bit, we can derive several items:

  • Analysis support of configuration files (i.e., the tool gets knowledge from the configuration files)
  • Analysis support for multiple languages in separated files
  • Cross-languages analysis support (the tool is capable of performing its analysis from one language, let's say Java, to SQL, and back to Java)

Another item that would be quite interesting, is the support for "new extensions", or redefinition of extensions. Let's say the tool does recognize ".pl" as perl, but that I have all my stored procedures (in PL/SQL) with this extension, I'd like to be able to tell the tool to consider the .pl to be PL/SQL for this application. The same reasoning needs to be done for new extensions.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, SANS Top 25...etc)

Static analysis tools do not find vulnerabilities. They find source code weaknesses (there is a huge difference). Now, I do not understand what "coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we should use CWE, and nothing else.
  • Is this category about the the reporting and classification of findings? (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's very bad for your PCI compliance!"
  1. Detection Accuracy

Usually, that does not mean anything.

4.1 Number of false positives
4.2 Number of true negatives

My first comment here was "Gniii?", then I did s/Number/Rate and it made a bit more sense.
I could understand why someone would want to get a rate of false-positive, and false-negatives, but true-negatives? True negatives, are the things that are not reported by the tool, and it's good from the tool not to report them, and examples would be data flow path that uses a proper validation routine before sending the data to a sink. You do not want the tool to report such, and this is a true-negative.

By the way, the rate of FP/FN are very interesting for an experiment point of view, but there is no way to get this data to mean anything for Joe the project manager who wants to get a tool. Most likely your data will be very different than his (if you're making the same experiment on your applications). Sad reality fix: tools results depend a lot on the application.
[az] True, but people always ask this.  Every vendor has a number they use.

4.3 Accuracy %

Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We cannot measure that in a meaningful way.
[az] Agreed.

  1. Triage and Remediation Process

Do we want to talk about the quality of the UI provided by the tool to facilitate the triage? IMO, the remediation process is out of scope for a SAST.
[az] Don't agree, I think this is very important.  Especially when it comes to duplication among targets.

5.1 Average time to triage a finding

This seems to me like rating your assessor more than the tool you use.
[az] Plus it is hard to really put any average value.

5.2 Quality of data surrounding a finding (explanation, tracing, trust level...etc)

Those are indeed very important information. As an assessor, I want to know why the heck this tool reported this finding to me. Not only I want to have paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that generated the finding. This is extremely useful for advanced use of the tools. I undersand that most tools do not report that, so at least reporting the rule ID (or something I can track later on, and make sense of) is important.

5.3 Ability to mark findings as false positive

Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will encounter a similar case, it won't report it?

5.4 Ability to "diff" assessments

Very important indeed.

5.5 Ability to merge assessments

Tracking, merging, combining assessment is definitely part of the workflow...

5.6 Correctness of remediation advice
5.7 Completeness of remediation advice

I hope no one actually relies on the tool to give proper remediation advice. They're usually fine to give an idea, but no way they will give you a good solution, for your case (even though, in theory they have lots of information to do so).

5.8 Does the tool automatically prioritize defects

Prioritize what? Is this category supposed to be talking about the severity rating? Is this talking about prioritization at the engine level so that the tool misses lots of stuff (yeah, that's usually what happen when the flow gets complex).
[az] I understand it as severity rating.

  1. UI Simplicity and Intuitiveness
    6.1 Quality of triage interface (need a way to measure this)
    6.2 Quality of remediation interface (need a way to measure this)

Subjective.
[az] Perhaps this can be measured by looking if vendors have status (label a defect) and comment changes per vulnerability.  The ability to keep track of these status and comment changes.

6.3 Support for IDE plug-ins both out of the box and on-demand

"Integration with IDEs", and possible support for new IDEs. Yes, that's important to get at least, a list of integrated IDEs.

6.4 Quality of tools' out of the box plugin UI

Subjective. Why not talking about the features available though the plugin.

  1. Product Update Process

It's indeed good to know that automated/federated/etc. updates are possible.

7.1 Frequency of signature update

Interesting, but the reader must be careful not to make much decision based on that. If the tool gets a new pack of rules every week or every months, that does not mean much about the quality...

7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats

Are we talking about new weaknesses? The word "threat" is very confusing here... and does not make sense to me in the context of SAST.

  1. Product Maturity and Scalability

Would be good to know indeed, though... how to get the data?

8.1 Peak memory usage

42GB?! That's a very subjective data that depends on many factors (machine, configuration, application, etc. etc.)

8.2 Number of scans done before a crash or serious degradation in performance

42, but only because it was 71 degree in the room, and the train was passing every 2.5 days.

8.3 Maximum lines of code the tool can scan per project

It would be good to talk about scalability of the tool, and how to improve it. For examples, can I scan the same application with several machines (parallelism)? If I add more RAM/CPU, do I get much better results? Is there a known limit?

8.4 What languages does the tool support?

This should be covered in a different section.

  1. Enterprise Offerings

This is also very interesting for companies. However, the enterprise offerings, are usually, central solution host findings, review findings, etc. This is not really SAST, but SAST-management. Do we want to talk about that? I'm happy to have this in the criteria...

9.1 Ability to integrate with major bug tracking systems

This is mostly a general comment, but instead of a boolean answer. We should ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with JoeBugTracker...

9.2 Ability to integrate with enterprise software configuration management

To what regard?
[az] Not really necessary for static analysis tools, but it is a very common question.

  1. Reporting Capabilities
    10.1 Quality of reports

Subjective.
[az] Sticking to types of reports below is probably fine.

10.2 Availability of role-based reports

It's indeed important to report different kind of data for the engineer, dev, QA, managers, etc. Eventually, we're talking about data reporting here, and tools should provide several ways to slice and represent the data for the different audience.

10.3 Availability of report customization

Yup, though, to what extent is the report customizable? Can I just change the logo, or can I integrate the findings in my word template?

  1. Tool Customization and Automation

I feel that we're finally going to touch the interesting part. Every mature use of SAST have to make use of automation, and tool customization. This section is a very important one, and we should emphasize it as much as we can.

11.1 Can custom rules be added?

Right, that's the first question to ask. Does the tool support finding support customization? Now, we need many other points, such as ... What kind of rules are supported? Can we specific/create a new type of weakness/findings/category?

11.2 Do the rules need learning new language\script?

Most likely it will be "yes", unless it's only GUI based. My point is that even XML rules represent a "language" to describe the rules...

11.3 Can the tool be scripted? (e.g. integrated into ANT build script or other build script)

Build automation is crucial, but to me, is different than automation. This item should be in a different section.

11.4 Can documentation be customized (installation instructions, remediation advice, finding explanation...etc)

Interesting point. Can we overwrite the remediation given by a tool?

11.5 Can the defect prioritization scheme customized?

Right! Can I integrate the results within my risk management system?

11.6 Can the tool be extended so that custom plugins could be developed for other IDEs?

That part should be in the IDE integration.

In summary, I believe that the SATEC needs to be restructured to address the actual problems. We should also move away from any subjective criterion. I believe that the SATEC should be able to be filled-in by a tool vendor, or someone who will evaluate the tool. Eventually, we should provide a spreadsheet that could be filled.

Concerning the overall sections, the order should make sense as well.

Anyhow, I suggest the list to rethink about the current criteria and see what can be measured properly, and what needs to be captured by any tool evaluator. The following is just a suggestion (came up with that in too little time), but I believe it captures the interesting part in a better order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization, etc.)

Btw, I'm sorry to come back with such feedback quite late... but the deadlines are too aggressive for me.

Romain


wasc-satec mailing list
wasc-satec@lists.webappsec.orgmailto:wasc-satec@lists.webappsec.org
http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org

My comments below. alen From: wasc-satec-bounces@lists.webappsec.org [mailto:wasc-satec-bounces@lists.webappsec.org] On Behalf Of Sherif Koussa Sent: September-13-11 1:28 PM To: wasc-satec@lists.webappsec.org Subject: [WASC-SATEC] Fwd: Comments on the direction of SATEC - And on every items. - RFC Hi All, I forwarding the email below to everyone on the list due to the nature of the email. Romain basically comments on the direction the SATEC project is going with, in addition to requesting some major changes with the categories and sub-categories. I would like to hear your opinion on what is proposed below. Regards, Sherif ---------- Forwarded message ---------- From: Romain Gaucher <romain@webappsec.org<mailto:romain@webappsec.org>> Date: Mon, Aug 22, 2011 at 11:47 AM Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every items. To: wasc-satec@lists.webappsec.org<mailto:wasc-satec@lists.webappsec.org> Everyone, sorry for the loooong delay on the response, the past weeks have been totally crazy. Anyhow, here what I think and I would suggest, based on the current draft: 1. Tool Setup and Installation "Setup and Installation" is not really interesting, I believe. The more important is the platform support (can I run the tool from my linux box, my mac, our windows server, etc). [az] Agree. Platform is really the only criteria. 1.1 Time required to perform initial installation That's usually subjective, unless you say something like "always less than 2 hours", "always less than a day", etc. But then again, I find this totally quite irrelevant to the problem. [az] Is this really a pain point these days? 1.2 Skills required to perform initial installation Subjective. [az] I don't know if it is subjective but it doesn't help with much. Installation used to be a problem with tools but that really has gone away. 1.3 Privileges required to perform initial installation I don't find this item very informative. Okay, you need to have root access, or admin access on the machine... or not. [az] Agreed. 1.4 Documentation setup accuracy Subjective. [az] Not even sure what this means? 1.5 Platform Support This one is interesting for the customers. 2. Performing a Scan Logically, I would not talk about scanning just now. But, after the platform support section, I would talk about language, framework support. 2.1 Time required to perform a scan This does not make any sense. "Time required to scan"... what? This question however, is answerable if we provide a proper test case and environment to run the tool. But then again, it's a quite misleading information. [az] This is usually a very common criteria. A better way is to measure based on native build time. i.e. typically the tool takes 1-3X the native build time (or compile time). 2.2 Number of steps required to perform a scan Many tools have scripting interfaces. Using scripts, you reduce your steps from 7, to 1 (i.e., run the script). How does that count? In summary, I find this information not interesting at all. [az] Agreed. 2.3 Skills required to perform a scan I understand that some tools (like PolySpace) require someone to actually design and model the suspected behavior of the program. But most tools do not require that. Then again, how to rate the user? Do we assume the user (who runs the scan) will also look at the findings? Does he also setup the scan? I definitely see the scan being run by security operation (mostly for monitoring), and being setup by security engineers... [az] Don't know if this category is helpful. 3. Tool Coverage: "Tool Coverage" might be the most misleading term here. Coverage of what?! Coverage of supported weaknesses, languages, version of languages, framework, application coverage, entry point coverage, etc.? 3.1 Languages supported by the tool Very important. Now, we should not limit ourselves to the languages, but we should go at the version of framework level. Nowadays, the language is just a mean, most of the juicy stuff happen in the relationship with the frameworks... Also, the behavior of the frameworks might be different from one version to one another... 3.2 Support for Semantic Analysis 3.3 Support for Syntactic Analysis I do not understand these items. (Usually, "semantic" is used to say something like AST-level type of knowledge). I would be, honestly, more interested to know if the tool is properly capable of inter-procedural data flow analysis, or if it has some other limitations. Then again, I would prefer not to talk about the underlying logics (and modeling) of the tool since I believe this is out of scope. Users don't really care about that, they just want the tool to work perfectly. If you use a dataflow based model, abstract interpretation, or whatever one comes up with ... *don't care*. 3.4 Ability of the tool to understand different components of a project (.sql, .xml, .xsd, .properties...etc) This is a very interesting item. When generalized a little bit, we can derive several items: - Analysis support of configuration files (i.e., the tool gets knowledge from the configuration files) - Analysis support for multiple languages in separated files - Cross-languages analysis support (the tool is capable of performing its analysis from one language, let's say Java, to SQL, and back to Java) Another item that would be quite interesting, is the support for "new extensions", or redefinition of extensions. Let's say the tool does recognize ".pl" as perl, but that I have all my stored procedures (in PL/SQL) with this extension, I'd like to be able to tell the tool to consider the .pl to be PL/SQL for this application. The same reasoning needs to be done for new extensions. 3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, SANS Top 25...etc) Static analysis tools do not find vulnerabilities. They find source code weaknesses (there is a huge difference). Now, I do not understand what "coverage of industry standard vulnerability categories" mean. - Is this category supposed to be about coverage of type of "stuff" (or weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we should use CWE, and nothing else. - Is this category about the the reporting and classification of findings? (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's very bad for your PCI compliance!" 4. Detection Accuracy Usually, that does not mean anything. 4.1 Number of false positives 4.2 Number of true negatives My first comment here was "Gniii?", then I did s/Number/Rate and it made a bit more sense. I could understand why someone would want to get a rate of false-positive, and false-negatives, but true-negatives? True negatives, are the things that are not reported by the tool, and it's good from the tool not to report them, and examples would be data flow path that uses a proper validation routine before sending the data to a sink. You do not want the tool to report such, and this is a true-negative. By the way, the rate of FP/FN are very interesting for an experiment point of view, but there is no way to get this data to mean anything for Joe the project manager who wants to get a tool. Most likely your data will be very different than his (if you're making the same experiment on your applications). Sad reality fix: tools results depend a lot on the application. [az] True, but people always ask this. Every vendor has a number they use. 4.3 Accuracy % Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We cannot measure that in a meaningful way. [az] Agreed. 5. Triage and Remediation Process Do we want to talk about the quality of the UI provided by the tool to facilitate the triage? IMO, the remediation process is out of scope for a SAST. [az] Don't agree, I think this is very important. Especially when it comes to duplication among targets. 5.1 Average time to triage a finding This seems to me like rating your assessor more than the tool you use. [az] Plus it is hard to really put any average value. 5.2 Quality of data surrounding a finding (explanation, tracing, trust level...etc) Those are indeed very important information. As an assessor, I want to know why the heck this tool reported this finding to me. Not only I want to have paths, confidence, data flow info, etc. but I want to know the internals. Some tools will report the pre-conditions and post conditions that generated the finding. This is extremely useful for advanced use of the tools. I undersand that most tools do not report that, so at least reporting the rule ID (or something I can track later on, and make sense of) is important. 5.3 Ability to mark findings as false positive Mark a finding as FP might have several meaning. Does this mean: - Mark the findings as FP for the report? - Mark the findings as FP for the engine, so that next time it will encounter a similar case, it won't report it? 5.4 Ability to "diff" assessments Very important indeed. 5.5 Ability to merge assessments Tracking, merging, combining assessment is definitely part of the workflow... 5.6 Correctness of remediation advice 5.7 Completeness of remediation advice I hope no one actually relies on the tool to give proper remediation advice. They're usually fine to give an idea, but no way they will give you a good solution, for your case (even though, in theory they have lots of information to do so). 5.8 Does the tool automatically prioritize defects Prioritize what? Is this category supposed to be talking about the severity rating? Is this talking about prioritization at the engine level so that the tool misses lots of stuff (yeah, that's usually what happen when the flow gets complex). [az] I understand it as severity rating. 6. UI Simplicity and Intuitiveness 6.1 Quality of triage interface (need a way to measure this) 6.2 Quality of remediation interface (need a way to measure this) Subjective. [az] Perhaps this can be measured by looking if vendors have status (label a defect) and comment changes per vulnerability. The ability to keep track of these status and comment changes. 6.3 Support for IDE plug-ins both out of the box and on-demand "Integration with IDEs", and possible support for new IDEs. Yes, that's important to get at least, a list of integrated IDEs. 6.4 Quality of tools' out of the box plugin UI Subjective. Why not talking about the features available though the plugin. 7. Product Update Process It's indeed good to know that automated/federated/etc. updates are possible. 7.1 Frequency of signature update Interesting, but the reader must be careful not to make much decision based on that. If the tool gets a new pack of rules every week or every months, that does not mean much about the quality... 7.2 Relevance of signatures to evolving threats 7.3 Re-activeness to evolving threats Are we talking about new weaknesses? The word "threat" is very confusing here... and does not make sense to me in the context of SAST. 8. Product Maturity and Scalability Would be good to know indeed, though... how to get the data? 8.1 Peak memory usage 42GB?! That's a very subjective data that depends on many factors (machine, configuration, application, etc. etc.) 8.2 Number of scans done before a crash or serious degradation in performance 42, but only because it was 71 degree in the room, and the train was passing every 2.5 days. 8.3 Maximum lines of code the tool can scan per project It would be good to talk about scalability of the tool, and how to improve it. For examples, can I scan the same application with several machines (parallelism)? If I add more RAM/CPU, do I get much better results? Is there a known limit? 8.4 What languages does the tool support? This should be covered in a different section. 9. Enterprise Offerings This is also very interesting for companies. However, the enterprise offerings, are usually, central solution host findings, review findings, etc. This is not really SAST, but SAST-management. Do we want to talk about that? I'm happy to have this in the criteria... 9.1 Ability to integrate with major bug tracking systems This is mostly a general comment, but instead of a boolean answer. We should ask for the supported bug tracking systems. Also, it's important to customize this, and to be able to integrate with JoeBugTracker... 9.2 Ability to integrate with enterprise software configuration management To what regard? [az] Not really necessary for static analysis tools, but it is a very common question. 10. Reporting Capabilities 10.1 Quality of reports Subjective. [az] Sticking to types of reports below is probably fine. 10.2 Availability of role-based reports It's indeed important to report different kind of data for the engineer, dev, QA, managers, etc. Eventually, we're talking about data reporting here, and tools should provide several ways to slice and represent the data for the different audience. 10.3 Availability of report customization Yup, though, to what extent is the report customizable? Can I just change the logo, or can I integrate the findings in my word template? 11. Tool Customization and Automation I feel that we're finally going to touch the interesting part. Every mature use of SAST have to make use of automation, and tool customization. This section is a very important one, and we should emphasize it as much as we can. 11.1 Can custom rules be added? Right, that's the first question to ask. Does the tool support finding support customization? Now, we need many other points, such as ... What kind of rules are supported? Can we specific/create a new type of weakness/findings/category? 11.2 Do the rules need learning new language\script? Most likely it will be "yes", unless it's only GUI based. My point is that even XML rules represent a "language" to describe the rules... 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or other build script) Build automation is crucial, but to me, is different than automation. This item should be in a different section. 11.4 Can documentation be customized (installation instructions, remediation advice, finding explanation...etc) Interesting point. Can we overwrite the remediation given by a tool? 11.5 Can the defect prioritization scheme customized? Right! Can I integrate the results within my risk management system? 11.6 Can the tool be extended so that custom plugins could be developed for other IDEs? That part should be in the IDE integration. In summary, I believe that the SATEC needs to be restructured to address the actual problems. We should also move away from any subjective criterion. I believe that the SATEC should be able to be filled-in by a tool vendor, or someone who will evaluate the tool. Eventually, we should provide a spreadsheet that could be filled. Concerning the overall sections, the order should make sense as well. Anyhow, I suggest the list to rethink about the current criteria and see what can be measured properly, and what needs to be captured by any tool evaluator. The following is just a suggestion (came up with that in too little time), but I believe it captures the interesting part in a better order: 1. Platform support 2.1 OS support 2.2 Scalability tuning (support for 64bits, etc.) 2. Application technology support 2.1 Language support (up to the version of language) 2.2 Framework support 3. Scan, command and control 3.1 Scan configuration 3.2 Build system integration 3.3 IDE integration 3.4 Command line support 3.5 Automation support 3.6 Enterprise offerings (need of a better terminology) 4. Application analysis 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) 4.2 Customization 4.3 Triage capabilities 4.4 Scan results post-processing 5. Reporting 5.1 Reports for different audiences 5.2 Report customization 5.3 Finding-level reporting information 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) 5.3.2 Finding description (paths, pre-post conditions, etc.) 5.3.3 Finding remediation (available, customizable, etc.) 6. Miscellanies 6.1 Knowledge update (rules update) 6.2 Integration in bug trackers (list of supported BT, customization, etc.) Btw, I'm sorry to come back with such feedback quite late... but the deadlines are too aggressive for me. Romain _______________________________________________ wasc-satec mailing list wasc-satec@lists.webappsec.org<mailto:wasc-satec@lists.webappsec.org> http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org
HS
Herman Stevens
Wed, Sep 14, 2011 6:58 AM

1.1 Time required to perform initial installation
1.2 Skills required to perform initial installation

This might be an issue since some of the tools require 'consultancy' services to do the installation. Look at some vendors sites: they sell you 10 days of  consultancy just to do a POC ...

Maybe better to rename this as 'effort required': easy installer (internal staff can do it), light integration (internal staff), heavy integration/configuration (only trained staff can do it), extensive manual integration/configuration by external consultants ...

Herman

From: wasc-satec-bounces@lists.webappsec.org [mailto:wasc-satec-bounces@lists.webappsec.org] On Behalf Of Sherif Koussa
Sent: Wednesday, 14 September, 2011 1:28 AM
To: wasc-satec@lists.webappsec.org
Subject: [WASC-SATEC] Fwd: Comments on the direction of SATEC - And on every items. - RFC

Hi All,

I forwarding the email below to everyone on the list due to the nature of the email. Romain basically comments on the direction the SATEC project is going with, in addition to requesting some major changes with the categories and sub-categories.

I would like to hear your opinion on what is proposed below.

Regards,
Sherif
---------- Forwarded message ----------
From: Romain Gaucher <romain@webappsec.orgmailto:romain@webappsec.org>
Date: Mon, Aug 22, 2011 at 11:47 AM
Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every items.
To: wasc-satec@lists.webappsec.orgmailto:wasc-satec@lists.webappsec.org

Everyone, sorry for the loooong delay on the response, the past weeks have been totally crazy.
Anyhow, here what I think and I would suggest, based on the current draft:

  1. Tool Setup and Installation

"Setup and Installation" is not really interesting, I believe. The more important is the platform support (can I run the tool from my linux box, my mac, our windows server, etc).

1.1 Time required to perform initial installation

That's usually subjective, unless you say something like "always less than 2 hours", "always less than a day", etc. But then again, I find this totally quite irrelevant to the problem.

1.2 Skills required to perform initial installation

Subjective.

1.3 Privileges required to perform initial installation

I don't find this item very informative. Okay, you need to have root access, or admin access on the machine... or not.

1.4 Documentation setup accuracy

Subjective.

1.5 Platform Support

This one is interesting for the customers.

  1. Performing a Scan

Logically, I would not talk about scanning just now. But, after the platform support section, I would talk about language, framework support.

2.1 Time required to perform a scan
This does not make any sense. "Time required to scan"... what? This question however, is answerable if we provide a proper test case and environment to run the tool. But then again, it's a quite misleading information.

2.2 Number of steps required to perform a scan
Many tools have scripting interfaces. Using scripts, you reduce your steps from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.

2.3 Skills required to perform a scan

I understand that some tools (like PolySpace) require someone to actually design and model the suspected behavior of the program. But most tools do not require that. Then again, how to rate the user? Do we assume the user (who runs the scan) will also look at the findings? Does he also setup the scan? I definitely see the scan being run by security operation (mostly for monitoring), and being setup by security engineers...

  1. Tool Coverage:
    "Tool Coverage" might be the most misleading term here. Coverage of what?! Coverage of supported weaknesses, languages, version of languages, framework, application coverage, entry point coverage, etc.?

3.1 Languages supported by the tool
Very important. Now, we should not limit ourselves to the languages, but we should go at the version of framework level. Nowadays, the language is just a mean, most of the juicy stuff happen in the relationship with the frameworks... Also, the behavior of the frameworks might be different from one version to one another...

3.2 Support for Semantic Analysis
3.3 Support for Syntactic Analysis
I do not understand these items. (Usually, "semantic" is used to say something like AST-level type of knowledge). I would be, honestly, more interested to know if the tool is properly capable of inter-procedural data flow analysis, or if it has some other limitations. Then again, I would prefer not to talk about the underlying logics (and modeling) of the tool since I believe this is out of scope. Users don't really care about that, they just want the tool to work perfectly. If you use a dataflow based model, abstract interpretation, or whatever one comes up with ... don't care.

3.4 Ability of the tool to understand different components of a project (.sql, .xml, .xsd, .properties...etc)
This is a very interesting item. When generalized a little bit, we can derive several items:

  • Analysis support of configuration files (i.e., the tool gets knowledge from the configuration files)
  • Analysis support for multiple languages in separated files
  • Cross-languages analysis support (the tool is capable of performing its analysis from one language, let's say Java, to SQL, and back to Java)

Another item that would be quite interesting, is the support for "new extensions", or redefinition of extensions. Let's say the tool does recognize ".pl" as perl, but that I have all my stored procedures (in PL/SQL) with this extension, I'd like to be able to tell the tool to consider the .pl to be PL/SQL for this application. The same reasoning needs to be done for new extensions.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, SANS Top 25...etc)

Static analysis tools do not find vulnerabilities. They find source code weaknesses (there is a huge difference). Now, I do not understand what "coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we should use CWE, and nothing else.
  • Is this category about the the reporting and classification of findings? (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's very bad for your PCI compliance!"
  1. Detection Accuracy

Usually, that does not mean anything.

4.1 Number of false positives
4.2 Number of true negatives

My first comment here was "Gniii?", then I did s/Number/Rate and it made a bit more sense.
I could understand why someone would want to get a rate of false-positive, and false-negatives, but true-negatives? True negatives, are the things that are not reported by the tool, and it's good from the tool not to report them, and examples would be data flow path that uses a proper validation routine before sending the data to a sink. You do not want the tool to report such, and this is a true-negative.

By the way, the rate of FP/FN are very interesting for an experiment point of view, but there is no way to get this data to mean anything for Joe the project manager who wants to get a tool. Most likely your data will be very different than his (if you're making the same experiment on your applications). Sad reality fix: tools results depend a lot on the application.

4.3 Accuracy %

Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We cannot measure that in a meaningful way.

  1. Triage and Remediation Process

Do we want to talk about the quality of the UI provided by the tool to facilitate the triage? IMO, the remediation process is out of scope for a SAST.

5.1 Average time to triage a finding

This seems to me like rating your assessor more than the tool you use.

5.2 Quality of data surrounding a finding (explanation, tracing, trust level...etc)

Those are indeed very important information. As an assessor, I want to know why the heck this tool reported this finding to me. Not only I want to have paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that generated the finding. This is extremely useful for advanced use of the tools. I undersand that most tools do not report that, so at least reporting the rule ID (or something I can track later on, and make sense of) is important.

5.3 Ability to mark findings as false positive

Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will encounter a similar case, it won't report it?

5.4 Ability to "diff" assessments

Very important indeed.

5.5 Ability to merge assessments

Tracking, merging, combining assessment is definitely part of the workflow...

5.6 Correctness of remediation advice
5.7 Completeness of remediation advice

I hope no one actually relies on the tool to give proper remediation advice. They're usually fine to give an idea, but no way they will give you a good solution, for your case (even though, in theory they have lots of information to do so).

5.8 Does the tool automatically prioritize defects

Prioritize what? Is this category supposed to be talking about the severity rating? Is this talking about prioritization at the engine level so that the tool misses lots of stuff (yeah, that's usually what happen when the flow gets complex).

  1. UI Simplicity and Intuitiveness
    6.1 Quality of triage interface (need a way to measure this)
    6.2 Quality of remediation interface (need a way to measure this)

Subjective.

6.3 Support for IDE plug-ins both out of the box and on-demand

"Integration with IDEs", and possible support for new IDEs. Yes, that's important to get at least, a list of integrated IDEs.

6.4 Quality of tools' out of the box plugin UI

Subjective. Why not talking about the features available though the plugin.

  1. Product Update Process

It's indeed good to know that automated/federated/etc. updates are possible.

7.1 Frequency of signature update

Interesting, but the reader must be careful not to make much decision based on that. If the tool gets a new pack of rules every week or every months, that does not mean much about the quality...

7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats

Are we talking about new weaknesses? The word "threat" is very confusing here... and does not make sense to me in the context of SAST.

  1. Product Maturity and Scalability

Would be good to know indeed, though... how to get the data?

8.1 Peak memory usage

42GB?! That's a very subjective data that depends on many factors (machine, configuration, application, etc. etc.)

8.2 Number of scans done before a crash or serious degradation in performance

42, but only because it was 71 degree in the room, and the train was passing every 2.5 days.

8.3 Maximum lines of code the tool can scan per project

It would be good to talk about scalability of the tool, and how to improve it. For examples, can I scan the same application with several machines (parallelism)? If I add more RAM/CPU, do I get much better results? Is there a known limit?

8.4 What languages does the tool support?

This should be covered in a different section.

  1. Enterprise Offerings

This is also very interesting for companies. However, the enterprise offerings, are usually, central solution host findings, review findings, etc. This is not really SAST, but SAST-management. Do we want to talk about that? I'm happy to have this in the criteria...

9.1 Ability to integrate with major bug tracking systems

This is mostly a general comment, but instead of a boolean answer. We should ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with JoeBugTracker...

9.2 Ability to integrate with enterprise software configuration management

To what regard?

  1. Reporting Capabilities
    10.1 Quality of reports

Subjective.

10.2 Availability of role-based reports

It's indeed important to report different kind of data for the engineer, dev, QA, managers, etc. Eventually, we're talking about data reporting here, and tools should provide several ways to slice and represent the data for the different audience.

10.3 Availability of report customization

Yup, though, to what extent is the report customizable? Can I just change the logo, or can I integrate the findings in my word template?

  1. Tool Customization and Automation

I feel that we're finally going to touch the interesting part. Every mature use of SAST have to make use of automation, and tool customization. This section is a very important one, and we should emphasize it as much as we can.

11.1 Can custom rules be added?

Right, that's the first question to ask. Does the tool support finding support customization? Now, we need many other points, such as ... What kind of rules are supported? Can we specific/create a new type of weakness/findings/category?

11.2 Do the rules need learning new language\script?

Most likely it will be "yes", unless it's only GUI based. My point is that even XML rules represent a "language" to describe the rules...

11.3 Can the tool be scripted? (e.g. integrated into ANT build script or other build script)

Build automation is crucial, but to me, is different than automation. This item should be in a different section.

11.4 Can documentation be customized (installation instructions, remediation advice, finding explanation...etc)

Interesting point. Can we overwrite the remediation given by a tool?

11.5 Can the defect prioritization scheme customized?

Right! Can I integrate the results within my risk management system?

11.6 Can the tool be extended so that custom plugins could be developed for other IDEs?

That part should be in the IDE integration.

In summary, I believe that the SATEC needs to be restructured to address the actual problems. We should also move away from any subjective criterion. I believe that the SATEC should be able to be filled-in by a tool vendor, or someone who will evaluate the tool. Eventually, we should provide a spreadsheet that could be filled.

Concerning the overall sections, the order should make sense as well.

Anyhow, I suggest the list to rethink about the current criteria and see what can be measured properly, and what needs to be captured by any tool evaluator. The following is just a suggestion (came up with that in too little time), but I believe it captures the interesting part in a better order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization, etc.)

Btw, I'm sorry to come back with such feedback quite late... but the deadlines are too aggressive for me.

Romain


wasc-satec mailing list
wasc-satec@lists.webappsec.orgmailto:wasc-satec@lists.webappsec.org
http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org

1.1 Time required to perform initial installation 1.2 Skills required to perform initial installation This might be an issue since some of the tools require 'consultancy' services to do the installation. Look at some vendors sites: they sell you 10 days of consultancy just to do a POC ... Maybe better to rename this as 'effort required': easy installer (internal staff can do it), light integration (internal staff), heavy integration/configuration (only trained staff can do it), extensive manual integration/configuration by external consultants ... Herman From: wasc-satec-bounces@lists.webappsec.org [mailto:wasc-satec-bounces@lists.webappsec.org] On Behalf Of Sherif Koussa Sent: Wednesday, 14 September, 2011 1:28 AM To: wasc-satec@lists.webappsec.org Subject: [WASC-SATEC] Fwd: Comments on the direction of SATEC - And on every items. - RFC Hi All, I forwarding the email below to everyone on the list due to the nature of the email. Romain basically comments on the direction the SATEC project is going with, in addition to requesting some major changes with the categories and sub-categories. I would like to hear your opinion on what is proposed below. Regards, Sherif ---------- Forwarded message ---------- From: Romain Gaucher <romain@webappsec.org<mailto:romain@webappsec.org>> Date: Mon, Aug 22, 2011 at 11:47 AM Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every items. To: wasc-satec@lists.webappsec.org<mailto:wasc-satec@lists.webappsec.org> Everyone, sorry for the loooong delay on the response, the past weeks have been totally crazy. Anyhow, here what I think and I would suggest, based on the current draft: 1. Tool Setup and Installation "Setup and Installation" is not really interesting, I believe. The more important is the platform support (can I run the tool from my linux box, my mac, our windows server, etc). 1.1 Time required to perform initial installation That's usually subjective, unless you say something like "always less than 2 hours", "always less than a day", etc. But then again, I find this totally quite irrelevant to the problem. 1.2 Skills required to perform initial installation Subjective. 1.3 Privileges required to perform initial installation I don't find this item very informative. Okay, you need to have root access, or admin access on the machine... or not. 1.4 Documentation setup accuracy Subjective. 1.5 Platform Support This one is interesting for the customers. 2. Performing a Scan Logically, I would not talk about scanning just now. But, after the platform support section, I would talk about language, framework support. 2.1 Time required to perform a scan This does not make any sense. "Time required to scan"... what? This question however, is answerable if we provide a proper test case and environment to run the tool. But then again, it's a quite misleading information. 2.2 Number of steps required to perform a scan Many tools have scripting interfaces. Using scripts, you reduce your steps from 7, to 1 (i.e., run the script). How does that count? In summary, I find this information not interesting at all. 2.3 Skills required to perform a scan I understand that some tools (like PolySpace) require someone to actually design and model the suspected behavior of the program. But most tools do not require that. Then again, how to rate the user? Do we assume the user (who runs the scan) will also look at the findings? Does he also setup the scan? I definitely see the scan being run by security operation (mostly for monitoring), and being setup by security engineers... 3. Tool Coverage: "Tool Coverage" might be the most misleading term here. Coverage of what?! Coverage of supported weaknesses, languages, version of languages, framework, application coverage, entry point coverage, etc.? 3.1 Languages supported by the tool Very important. Now, we should not limit ourselves to the languages, but we should go at the version of framework level. Nowadays, the language is just a mean, most of the juicy stuff happen in the relationship with the frameworks... Also, the behavior of the frameworks might be different from one version to one another... 3.2 Support for Semantic Analysis 3.3 Support for Syntactic Analysis I do not understand these items. (Usually, "semantic" is used to say something like AST-level type of knowledge). I would be, honestly, more interested to know if the tool is properly capable of inter-procedural data flow analysis, or if it has some other limitations. Then again, I would prefer not to talk about the underlying logics (and modeling) of the tool since I believe this is out of scope. Users don't really care about that, they just want the tool to work perfectly. If you use a dataflow based model, abstract interpretation, or whatever one comes up with ... *don't care*. 3.4 Ability of the tool to understand different components of a project (.sql, .xml, .xsd, .properties...etc) This is a very interesting item. When generalized a little bit, we can derive several items: - Analysis support of configuration files (i.e., the tool gets knowledge from the configuration files) - Analysis support for multiple languages in separated files - Cross-languages analysis support (the tool is capable of performing its analysis from one language, let's say Java, to SQL, and back to Java) Another item that would be quite interesting, is the support for "new extensions", or redefinition of extensions. Let's say the tool does recognize ".pl" as perl, but that I have all my stored procedures (in PL/SQL) with this extension, I'd like to be able to tell the tool to consider the .pl to be PL/SQL for this application. The same reasoning needs to be done for new extensions. 3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, SANS Top 25...etc) Static analysis tools do not find vulnerabilities. They find source code weaknesses (there is a huge difference). Now, I do not understand what "coverage of industry standard vulnerability categories" mean. - Is this category supposed to be about coverage of type of "stuff" (or weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we should use CWE, and nothing else. - Is this category about the the reporting and classification of findings? (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's very bad for your PCI compliance!" 4. Detection Accuracy Usually, that does not mean anything. 4.1 Number of false positives 4.2 Number of true negatives My first comment here was "Gniii?", then I did s/Number/Rate and it made a bit more sense. I could understand why someone would want to get a rate of false-positive, and false-negatives, but true-negatives? True negatives, are the things that are not reported by the tool, and it's good from the tool not to report them, and examples would be data flow path that uses a proper validation routine before sending the data to a sink. You do not want the tool to report such, and this is a true-negative. By the way, the rate of FP/FN are very interesting for an experiment point of view, but there is no way to get this data to mean anything for Joe the project manager who wants to get a tool. Most likely your data will be very different than his (if you're making the same experiment on your applications). Sad reality fix: tools results depend a lot on the application. 4.3 Accuracy % Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We cannot measure that in a meaningful way. 5. Triage and Remediation Process Do we want to talk about the quality of the UI provided by the tool to facilitate the triage? IMO, the remediation process is out of scope for a SAST. 5.1 Average time to triage a finding This seems to me like rating your assessor more than the tool you use. 5.2 Quality of data surrounding a finding (explanation, tracing, trust level...etc) Those are indeed very important information. As an assessor, I want to know why the heck this tool reported this finding to me. Not only I want to have paths, confidence, data flow info, etc. but I want to know the internals. Some tools will report the pre-conditions and post conditions that generated the finding. This is extremely useful for advanced use of the tools. I undersand that most tools do not report that, so at least reporting the rule ID (or something I can track later on, and make sense of) is important. 5.3 Ability to mark findings as false positive Mark a finding as FP might have several meaning. Does this mean: - Mark the findings as FP for the report? - Mark the findings as FP for the engine, so that next time it will encounter a similar case, it won't report it? 5.4 Ability to "diff" assessments Very important indeed. 5.5 Ability to merge assessments Tracking, merging, combining assessment is definitely part of the workflow... 5.6 Correctness of remediation advice 5.7 Completeness of remediation advice I hope no one actually relies on the tool to give proper remediation advice. They're usually fine to give an idea, but no way they will give you a good solution, for your case (even though, in theory they have lots of information to do so). 5.8 Does the tool automatically prioritize defects Prioritize what? Is this category supposed to be talking about the severity rating? Is this talking about prioritization at the engine level so that the tool misses lots of stuff (yeah, that's usually what happen when the flow gets complex). 6. UI Simplicity and Intuitiveness 6.1 Quality of triage interface (need a way to measure this) 6.2 Quality of remediation interface (need a way to measure this) Subjective. 6.3 Support for IDE plug-ins both out of the box and on-demand "Integration with IDEs", and possible support for new IDEs. Yes, that's important to get at least, a list of integrated IDEs. 6.4 Quality of tools' out of the box plugin UI Subjective. Why not talking about the features available though the plugin. 7. Product Update Process It's indeed good to know that automated/federated/etc. updates are possible. 7.1 Frequency of signature update Interesting, but the reader must be careful not to make much decision based on that. If the tool gets a new pack of rules every week or every months, that does not mean much about the quality... 7.2 Relevance of signatures to evolving threats 7.3 Re-activeness to evolving threats Are we talking about new weaknesses? The word "threat" is very confusing here... and does not make sense to me in the context of SAST. 8. Product Maturity and Scalability Would be good to know indeed, though... how to get the data? 8.1 Peak memory usage 42GB?! That's a very subjective data that depends on many factors (machine, configuration, application, etc. etc.) 8.2 Number of scans done before a crash or serious degradation in performance 42, but only because it was 71 degree in the room, and the train was passing every 2.5 days. 8.3 Maximum lines of code the tool can scan per project It would be good to talk about scalability of the tool, and how to improve it. For examples, can I scan the same application with several machines (parallelism)? If I add more RAM/CPU, do I get much better results? Is there a known limit? 8.4 What languages does the tool support? This should be covered in a different section. 9. Enterprise Offerings This is also very interesting for companies. However, the enterprise offerings, are usually, central solution host findings, review findings, etc. This is not really SAST, but SAST-management. Do we want to talk about that? I'm happy to have this in the criteria... 9.1 Ability to integrate with major bug tracking systems This is mostly a general comment, but instead of a boolean answer. We should ask for the supported bug tracking systems. Also, it's important to customize this, and to be able to integrate with JoeBugTracker... 9.2 Ability to integrate with enterprise software configuration management To what regard? 10. Reporting Capabilities 10.1 Quality of reports Subjective. 10.2 Availability of role-based reports It's indeed important to report different kind of data for the engineer, dev, QA, managers, etc. Eventually, we're talking about data reporting here, and tools should provide several ways to slice and represent the data for the different audience. 10.3 Availability of report customization Yup, though, to what extent is the report customizable? Can I just change the logo, or can I integrate the findings in my word template? 11. Tool Customization and Automation I feel that we're finally going to touch the interesting part. Every mature use of SAST have to make use of automation, and tool customization. This section is a very important one, and we should emphasize it as much as we can. 11.1 Can custom rules be added? Right, that's the first question to ask. Does the tool support finding support customization? Now, we need many other points, such as ... What kind of rules are supported? Can we specific/create a new type of weakness/findings/category? 11.2 Do the rules need learning new language\script? Most likely it will be "yes", unless it's only GUI based. My point is that even XML rules represent a "language" to describe the rules... 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or other build script) Build automation is crucial, but to me, is different than automation. This item should be in a different section. 11.4 Can documentation be customized (installation instructions, remediation advice, finding explanation...etc) Interesting point. Can we overwrite the remediation given by a tool? 11.5 Can the defect prioritization scheme customized? Right! Can I integrate the results within my risk management system? 11.6 Can the tool be extended so that custom plugins could be developed for other IDEs? That part should be in the IDE integration. In summary, I believe that the SATEC needs to be restructured to address the actual problems. We should also move away from any subjective criterion. I believe that the SATEC should be able to be filled-in by a tool vendor, or someone who will evaluate the tool. Eventually, we should provide a spreadsheet that could be filled. Concerning the overall sections, the order should make sense as well. Anyhow, I suggest the list to rethink about the current criteria and see what can be measured properly, and what needs to be captured by any tool evaluator. The following is just a suggestion (came up with that in too little time), but I believe it captures the interesting part in a better order: 1. Platform support 2.1 OS support 2.2 Scalability tuning (support for 64bits, etc.) 2. Application technology support 2.1 Language support (up to the version of language) 2.2 Framework support 3. Scan, command and control 3.1 Scan configuration 3.2 Build system integration 3.3 IDE integration 3.4 Command line support 3.5 Automation support 3.6 Enterprise offerings (need of a better terminology) 4. Application analysis 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) 4.2 Customization 4.3 Triage capabilities 4.4 Scan results post-processing 5. Reporting 5.1 Reports for different audiences 5.2 Report customization 5.3 Finding-level reporting information 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) 5.3.2 Finding description (paths, pre-post conditions, etc.) 5.3.3 Finding remediation (available, customizable, etc.) 6. Miscellanies 6.1 Knowledge update (rules update) 6.2 Integration in bug trackers (list of supported BT, customization, etc.) Btw, I'm sorry to come back with such feedback quite late... but the deadlines are too aggressive for me. Romain _______________________________________________ wasc-satec mailing list wasc-satec@lists.webappsec.org<mailto:wasc-satec@lists.webappsec.org> http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org
SK
Sherif Koussa
Thu, Sep 15, 2011 6:57 PM

Please find my replies below

---------- Forwarded message ----------

From: Romain Gaucher romain@webappsec.org
Date: Mon, Aug 22, 2011 at 11:47 AM
Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every
items.
To: wasc-satec@lists.webappsec.org

Everyone, sorry for the loooong delay on the response, the past weeks have
been totally crazy.
Anyhow, here what I think and I would suggest, based on the current draft:

1. Tool Setup and Installation

"Setup and Installation" is not really interesting, I believe. The more
important is the platform support (can I run the tool from my linux box, my
mac, our windows server, etc).

Sherif: My question here. Would a software company with a large number of
developers be interested in a central kind of service where developers don't
have to install any software at all, just run the tool on check ins, send
the report via email to the developers and that's it?

1.1 Time required to perform initial installation

That's usually subjective, unless you say something like "always less than
2 hours", "always less than a day", etc. But then again, I find this totally
quite irrelevant to the problem.

Sherif: Agreed.

1.2 Skills required to perform initial installation

Subjective.

Sherif: Agreed

1.3 Privileges required to perform initial installation

I don't find this item very informative. Okay, you need to have root
access, or admin access on the machine... or not.

Sherif:That's it, whether admin\root access is needed or not.

1.4 Documentation setup accuracy

Subjective.

Sherif: Agreed with subjective. But I would argue that it is still important
to quantify it somehow. Some tools are REALLY challenging if the
installation hit a snag, without proper documentation, it is almost
impossible to get it right for an average developer.

1.5 Platform Support

This one is interesting for the customers.

2. Performing a Scan

Logically, I would not talk about scanning just now. But, after the
platform support section, I would talk about language, framework support.

2.1 Time required to perform a scan

This does not make any sense. "Time required to scan"... what? This
question however, is answerable if we provide a proper test case and
environment to run the tool. But then again, it's a quite misleading
information.

Sherif: Although I see your point, I still there is something here that
might be interesting to developers. If the developer is going to wait for 2
hours for the scan to finish before he could check in code, only to find
that there is a critical XSS that he needs to get fixed. It takes him\her 5
minutes to fix the XSS but then he\she has to wait 2 more hours for another
scan. Is that a valid concern?

2.2 Number of steps required to perform a scan

Many tools have scripting interfaces. Using scripts, you reduce your steps
from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.

Sherif: Makes sense

2.3 Skills required to perform a scan

I understand that some tools (like PolySpace) require someone to actually
design and model the suspected behavior of the program. But most tools do
not require that. Then again, how to rate the user? Do we assume the user
(who runs the scan) will also look at the findings? Does he also setup the
scan? I definitely see the scan being run by security operation (mostly for
monitoring), and being setup by security engineers...

Sherif: Well, so we need to make that clear, does the tool need a security
engineer or a developer can do it? I would think that this is a very
interesting piece of information. I agree with you though on how do
articulate\quantify\verbalize the "skills" is challenging. But in my
opinion, it is important. It is important for a company that buys a tool and
then nobody knows how to run it and it gets canned because they don't have
any security engineers in staff.

3. Tool Coverage:

"Tool Coverage" might be the most misleading term here. Coverage of what?!
Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?

Sherif: That's what the sub-categories are for :)

3.1 Languages supported by the tool

Very important. Now, we should not limit ourselves to the languages, but we
should go at the version of framework level. Nowadays, the language is just
a mean, most of the juicy stuff happen in the relationship with the
frameworks... Also, the behavior of the frameworks might be different from
one version to one another...

Sherif: Very good point

*3.2 Support for Semantic Analysis
*
*3.3 Support for Syntactic Analysis *

I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of inter-procedural
data flow analysis, or if it has some other limitations. Then again, I would
prefer not to talk about the underlying logics (and modeling) of the tool
since I believe this is out of scope. Users don't really care about that,
they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
care
.

Sherif: I see your point. I would like to hear what developers in the list
has to say about that

3.4 Ability of the tool to understand different components of a project
(.sql, .xml, .xsd, .properties…etc)

This is a very interesting item. When generalized a little bit, we can
derive several items:

  • Analysis support of configuration files (i.e., the tool gets knowledge
    from the configuration files)
  • Analysis support for multiple languages in separated files
  • Cross-languages analysis support (the tool is capable of performing its
    analysis from one language, let's say Java, to SQL, and back to Java)

Another item that would be quite interesting, is the support for "new
extensions", or redefinition of extensions. Let's say the tool does
recognize ".pl" as perl, but that I have all my stored procedures (in
PL/SQL) with this extension, I'd like to be able to tell the tool to
consider the .pl to be PL/SQL for this application. The same reasoning needs
to be done for new extensions.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10,
SANS Top 25…etc)

*
*
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or
    weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we
    should use CWE, and nothing else.
  • Is this category about the the reporting and classification of findings?
    (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's
    very bad for your PCI compliance!"

Sherif: The thought behind this is sometimes I come across organization that
are starting from square 1. The common question is where do we start given
our portfolio of applications? what do we focus on? A point of reference
sometimes is to start from OWASP Top 10 or SANS 25 or whatever.

4. Detection Accuracy

Usually, that does not mean anything.

4.1 Number of false positives
4.2 Number of true negatives

My first comment here was "Gniii?", then I did s/Number/Rate and it made a
bit more sense.
I could understand why someone would want to get a rate of false-positive,
and false-negatives, but true-negatives? True negatives, are the things that
are not reported by the tool, and it's good from the tool not to report
them, and examples would be data flow path that uses a proper validation
routine before sending the data to a sink. You do not want the tool to
report such, and this is a true-negative.

By the way, the rate of FP/FN are very interesting for an experiment point
of view, but there is no way to get this data to mean anything for Joe the
project manager who wants to get a tool. Most likely your data will be very
different than his (if you're making the same experiment on your
applications). Sad reality fix: tools results depend a lot on the
application.

*4.3 Accuracy % *

Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We
cannot measure that in a meaningful way.

5. Triage and Remediation Process
*
*
Do we want to talk about the quality of the UI provided by the tool to
facilitate the triage? IMO, the remediation process is out of scope for a
SAST.

Sherif: I would like to understand why would it be out of a SAST scope?

5.1 Average time to triage a finding*
*
*
This seems to me like rating your assessor more than the tool you use.

Sherif: Makes sense

5.2 Quality of data surrounding a finding (explanation, tracing, trust
level…etc)*
*
*
Those are indeed very important information. As an assessor, I want to know
why the heck this tool reported this finding to me. Not only I want to have
paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that
generated the finding. This is extremely useful for advanced use of the
tools. I undersand that most tools do not report that, so at least reporting
the rule ID (or something I can track later on, and make sense of) is
important.
*
*
5.3 Ability to mark findings as false positive
*
*
Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will
    encounter a similar case, it won't report it?

Sherif: I agree that we need to make it more specific

5.4 Ability to “diff” assessments
*
*
Very important indeed.
*
5.5 Ability to merge assessments*
*
*
Tracking, merging, combining assessment is definitely part of the
workflow...
*
5.6 Correctness of remediation advice*
5.7 Completeness of remediation advice
*
*
I hope no one actually relies on the tool to give proper remediation
advice. They're usually fine to give an idea, but no way they will give you
a good solution, for your case (even though, in theory they have lots of
information to do so).
*
5.8 Does the tool automatically prioritize defects*
*
*
Prioritize what? Is this category supposed to be talking about the severity
rating? Is this talking about prioritization at the engine level so that the
tool misses lots of stuff (yeah, that's usually what happen when the flow
gets complex).

Sherif: Agreed that we need to be more specific

6. UI Simplicity and Intuitiveness
6.1 Quality of triage interface (need a way to measure this)
6.2 Quality of remediation interface (need a way to measure this)

Subjective.

6.3 Support for IDE plug-ins both out of the box and on-demand

"Integration with IDEs", and possible support for new IDEs. Yes, that's
important to get at least, a list of integrated IDEs.

6.4 Quality of tools’ out of the box plugin UI

Subjective. Why not talking about the features available though the plugin.

7. Product Update Process
*
*
It's indeed good to know that automated/federated/etc. updates are
possible.
*
7.1 Frequency of signature update*
*
*
Interesting, but the reader must be careful not to make much decision based
on that. If the tool gets a new pack of rules every week or every months,
that does not mean much about the quality...
*
7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats*
*
*
Are we talking about new weaknesses? The word "threat" is very confusing
here... and does not make sense to me in the context of SAST.

Sherif: Agreed that we need to be more specific

8. Product Maturity and Scalability
*
*
Would be good to know indeed, though... how to get the data?

Sherif:  Agreed. But I think the presence of this item indicates implicitly
to the reader that not all the tools on the market are as stable.

8.1 Peak memory usage*
*
*
42GB?! That's a very subjective data that depends on many factors (machine,
configuration, application, etc. etc.)
*
8.2 Number of scans done before a crash or serious degradation in
performance*
*
*
42, but only because it was 71 degree in the room, and the train was
passing every 2.5 days

Sherif: Point taken :)

.

8.3 Maximum lines of code the tool can scan per project*
*
*
It would be good to talk about scalability of the tool, and how to improve
it. For examples, can I scan the same application with several machines
(parallelism)? If I add more RAM/CPU, do I get much better results? Is there
a known limit?

Sherif: very interesting point

8.4 What languages does the tool support?*
*
*
This should be covered in a different section.
*
*
9. Enterprise Offerings
*
*
This is also very interesting for companies. However,
the enterprise offerings, are usually, central solution host findings,
review findings, etc. This is not really SAST, but SAST-management. Do we
want to talk about that? I'm happy to have this in the criteria...

Sherif: I don't think we should delve into that, but I think people should
be aware whether the tools "supports" or "capable" of the enterprise
SAST-Management or not

9.1 Ability to integrate with major bug tracking systems*
*
*
This is mostly a general comment, but instead of a boolean answer. We
should ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with
JoeBugTracker...

Sherif: Agreed, I think the "ability" to integrate is more important than
what does it support out of the box.

9.2 Ability to integrate with enterprise software configuration management
*
*
*
To what regard?

Sherif: Building mostly

10. Reporting Capabilities
10.1 Quality of reports

*
*
Subjective.
*
10.2 Availability of role-based reports*
*
*
It's indeed important to report different kind of data for the engineer,
dev, QA, managers, etc. Eventually, we're talking about data reporting here,
and tools should provide several ways to slice and represent the data for
the different audience.
*
10.3 Availability of report customization*
*
*
Yup, though, to what extent is the report customizable? Can I just change
the logo, or can I integrate the findings in my word template?

11. Tool Customization and Automation
*
*
I feel that we're finally going to touch the interesting part. Every mature
use of SAST have to make use of automation, and tool customization. This
section is a very important one, and we should emphasize it as much as we
can.
*
11.1 Can custom rules be added?*
*
*
Right, that's the first question to ask. Does the tool support finding
support customization? Now, we need many other points, such as ... What kind
of rules are supported? Can we specific/create a new type of
weakness/findings/category?
*
11.2 Do the rules need learning new language\script?*
*
*
Most likely it will be "yes", unless it's only GUI based. My point is that
even XML rules represent a "language" to describe the rules...
*
11.3 Can the tool be scripted? (e.g. integrated into ANT build script or
other build script)*
*
*
Build automation is crucial, but to me, is different than automation. This
item should be in a different section.
*
11.4 Can documentation be customized (installation instructions,
remediation advice, finding explanation…etc)*
*
*
Interesting point. Can we overwrite the remediation given by a tool?
*
11.5 Can the defect prioritization scheme customized?*
*
*
Right! Can I integrate the results within my risk management system?
*
11.6 Can the tool be extended so that custom plugins could be developed for
other IDEs?*
*
*
That part should be in the IDE integration.

In summary, I believe that the SATEC needs to be restructured to address
the actual problems. We should also move away from any subjective criterion.

Sherif: I partially agree with the first statement. I disagree with the
second statement. Not because it subjective, then we ought to avoid it
completely. If the criterion is important but subjective, then we might just
mention it as a factor for people to watch out for it while they are
evaluating. If we didn't tell people that some tools are RAM huggers, they
wouldn't necessarily pay attention to this fact. Now, it would be up to them
to act upon this note, whether to ask for the system requirements from the
vendor, or whether to do experiment themselves....etc

I believe that the SATEC should be able to be filled-in by a tool vendor,
or someone who will evaluate the tool. Eventually, we should provide a
spreadsheet that could be filled.

Sherif: 100% agree

Concerning the overall sections, the order should make sense as well.

Anyhow, I suggest the list to rethink about the current criteria and see
what can be measured properly, and what needs to be captured by any tool
evaluator. The following is just a suggestion (came up with that in too
little time), but I believe it captures the interesting part in a better
order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization, etc.)

Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.

Sherif: Thanks for such quality feedback  :)

Please find my replies below ---------- Forwarded message ---------- > From: Romain Gaucher <romain@webappsec.org> > Date: Mon, Aug 22, 2011 at 11:47 AM > Subject: [WASC-SATEC] Comments on the direction of SATEC - And on every > items. > To: wasc-satec@lists.webappsec.org > > > Everyone, sorry for the loooong delay on the response, the past weeks have > been totally crazy. > Anyhow, here what I think and I would suggest, based on the current draft: > > *1. Tool Setup and Installation* > > "Setup and Installation" is not really interesting, I believe. The more > important is the platform support (can I run the tool from my linux box, my > mac, our windows server, etc). > Sherif: My question here. Would a software company with a large number of developers be interested in a central kind of service where developers don't have to install any software at all, just run the tool on check ins, send the report via email to the developers and that's it? > *1.1 Time required to perform initial installation* > > That's usually subjective, unless you say something like "always less than > 2 hours", "always less than a day", etc. But then again, I find this totally > quite irrelevant to the problem. > Sherif: Agreed. > > *1.2 Skills required to perform initial installation* > > Subjective. > Sherif: Agreed > *1.3 Privileges required to perform initial installation* > > I don't find this item very informative. Okay, you need to have root > access, or admin access on the machine... or not. > Sherif:That's it, whether admin\root access is needed or not. > > *1.4 Documentation setup accuracy* > > Subjective. > Sherif: Agreed with subjective. But I would argue that it is still important to quantify it somehow. Some tools are REALLY challenging if the installation hit a snag, without proper documentation, it is almost impossible to get it right for an average developer. > > *1.5 Platform Support* > > This one is interesting for the customers. > > > *2. Performing a Scan* > > Logically, I would not talk about scanning just now. But, after the > platform support section, I would talk about language, framework support. > > *2.1 Time required to perform a scan* > > This does not make any sense. "Time required to scan"... what? This > question however, is answerable if we provide a proper test case and > environment to run the tool. But then again, it's a quite misleading > information. > Sherif: Although I see your point, I still there is something here that might be interesting to developers. If the developer is going to wait for 2 hours for the scan to finish before he could check in code, only to find that there is a critical XSS that he needs to get fixed. It takes him\her 5 minutes to fix the XSS but then he\she has to wait 2 more hours for another scan. Is that a valid concern? > *2.2 Number of steps required to perform a scan* > > Many tools have scripting interfaces. Using scripts, you reduce your steps > from 7, to 1 (i.e., run the script). How does that count? > In summary, I find this information not interesting at all. > Sherif: Makes sense > > *2.3 Skills required to perform a scan* > > I understand that some tools (like PolySpace) require someone to actually > design and model the suspected behavior of the program. But most tools do > not require that. Then again, how to rate the user? Do we assume the user > (who runs the scan) will also look at the findings? Does he also setup the > scan? I definitely see the scan being run by security operation (mostly for > monitoring), and being setup by security engineers... > Sherif: Well, so we need to make that clear, does the tool need a security engineer or a developer can do it? I would think that this is a very interesting piece of information. I agree with you though on how do articulate\quantify\verbalize the "skills" is challenging. But in my opinion, it is important. It is important for a company that buys a tool and then nobody knows how to run it and it gets canned because they don't have any security engineers in staff. > > *3. Tool Coverage:* > > "Tool Coverage" might be the most misleading term here. Coverage of what?! > Coverage of supported weaknesses, languages, version of languages, > framework, application coverage, entry point coverage, etc.? > Sherif: That's what the sub-categories are for :) > > *3.1 Languages supported by the tool* > > Very important. Now, we should not limit ourselves to the languages, but we > should go at the version of framework level. Nowadays, the language is just > a mean, most of the juicy stuff happen in the relationship with the > frameworks... Also, the behavior of the frameworks might be different from > one version to one another... > Sherif: Very good point > > *3.2 Support for Semantic Analysis > * > *3.3 Support for Syntactic Analysis * > > I do not understand these items. (Usually, "semantic" is used to say > something like AST-level type of knowledge). I would be, honestly, > more interested to know if the tool is properly capable of inter-procedural > data flow analysis, or if it has some other limitations. Then again, I would > prefer not to talk about the underlying logics (and modeling) of the tool > since I believe this is out of scope. Users don't really care about that, > they just want the tool to work perfectly. If you use a dataflow based > model, abstract interpretation, or whatever one comes up with ... *don't > care*. > Sherif: I see your point. I would like to hear what developers in the list has to say about that > > *3.4 Ability of the tool to understand different components of a project > (.sql, .xml, .xsd, .properties…etc)* > > This is a very interesting item. When generalized a little bit, we can > derive several items: > - Analysis support of configuration files (i.e., the tool gets knowledge > from the configuration files) > - Analysis support for multiple languages in separated files > - Cross-languages analysis support (the tool is capable of performing its > analysis from one language, let's say Java, to SQL, and back to Java) > > Another item that would be quite interesting, is the support for "new > extensions", or redefinition of extensions. Let's say the tool does > recognize ".pl" as perl, but that I have all my stored procedures (in > PL/SQL) with this extension, I'd like to be able to tell the tool to > consider the .pl to be PL/SQL for this application. The same reasoning needs > to be done for new extensions. > > *3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, > SANS Top 25…etc)* > * > * > Static analysis tools do not find vulnerabilities. They find source code > weaknesses (there is a huge difference). Now, I do not understand what > "coverage of industry standard vulnerability categories" mean. > - Is this category supposed to be about coverage of type of "stuff" (or > weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we > should use CWE, and nothing else. > - Is this category about the the reporting and classification of findings? > (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's > very bad for your PCI compliance!" > Sherif: The thought behind this is sometimes I come across organization that are starting from square 1. The common question is where do we start given our portfolio of applications? what do we focus on? A point of reference sometimes is to start from OWASP Top 10 or SANS 25 or whatever. > > > *4. Detection Accuracy* > > Usually, that does not mean anything. > > *4.1 Number of false positives > 4.2 Number of true negatives* > > My first comment here was "Gniii?", then I did s/Number/Rate and it made a > bit more sense. > I could understand why someone would want to get a rate of false-positive, > and false-negatives, but true-negatives? True negatives, are the things that > are not reported by the tool, and it's good from the tool not to report > them, and examples would be data flow path that uses a proper validation > routine before sending the data to a sink. You do not want the tool to > report such, and this is a true-negative. > > By the way, the rate of FP/FN are very interesting for an experiment point > of view, but there is no way to get this data to mean anything for Joe the > project manager who wants to get a tool. Most likely your data will be very > different than his (if you're making the same experiment on your > applications). Sad reality fix: tools results depend a lot on the > application. > > *4.3 Accuracy % * > > Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We > cannot measure that in a meaningful way. > > > *5. Triage and Remediation Process* > * > * > Do we want to talk about the quality of the UI provided by the tool to > facilitate the triage? IMO, the remediation process is out of scope for a > SAST. > Sherif: I would like to understand why would it be out of a SAST scope? > * > 5.1 Average time to triage a finding* > * > * > This seems to me like rating your assessor more than the tool you use. > Sherif: Makes sense > * > 5.2 Quality of data surrounding a finding (explanation, tracing, trust > level…etc)* > * > * > Those are indeed very important information. As an assessor, I want to know > why the heck this tool reported this finding to me. Not only I want to have > paths, confidence, data flow info, etc. but I want to know the internals. > Some tools will report the pre-conditions and post conditions that > generated the finding. This is extremely useful for advanced use of the > tools. I undersand that most tools do not report that, so at least reporting > the rule ID (or something I can track later on, and make sense of) is > important. > * > * > *5.3 Ability to mark findings as false positive* > * > * > Mark a finding as FP might have several meaning. Does this mean: > - Mark the findings as FP for the report? > - Mark the findings as FP for the engine, so that next time it will > encounter a similar case, it won't report it? > Sherif: I agree that we need to make it more specific > > *5.4 Ability to “diff” assessments* > * > * > Very important indeed. > * > 5.5 Ability to merge assessments* > * > * > Tracking, merging, combining assessment is definitely part of the > workflow... > * > 5.6 Correctness of remediation advice* > *5.7 Completeness of remediation advice* > * > * > I hope no one actually relies on the tool to give proper remediation > advice. They're usually fine to give an idea, but no way they will give you > a good solution, for your case (even though, in theory they have lots of > information to do so). > * > 5.8 Does the tool automatically prioritize defects* > * > * > Prioritize what? Is this category supposed to be talking about the severity > rating? Is this talking about prioritization at the engine level so that the > tool misses lots of stuff (yeah, that's usually what happen when the flow > gets complex). > Sherif: Agreed that we need to be more specific > * > * > *6. UI Simplicity and Intuitiveness > 6.1 Quality of triage interface (need a way to measure this) > 6.2 Quality of remediation interface (need a way to measure this)* > > Subjective. > > *6.3 Support for IDE plug-ins both out of the box and on-demand* > > "Integration with IDEs", and possible support for new IDEs. Yes, that's > important to get at least, a list of integrated IDEs. > > *6.4 Quality of tools’ out of the box plugin UI* > > Subjective. Why not talking about the features available though the plugin. > > > *7. Product Update Process* > * > * > It's indeed good to know that automated/federated/etc. updates are > possible. > * > 7.1 Frequency of signature update* > * > * > Interesting, but the reader must be careful not to make much decision based > on that. If the tool gets a new pack of rules every week or every months, > that does not mean much about the quality... > * > 7.2 Relevance of signatures to evolving threats > 7.3 Re-activeness to evolving threats* > * > * > Are we talking about new weaknesses? The word "threat" is very confusing > here... and does not make sense to me in the context of SAST. > Sherif: Agreed that we need to be more specific > > *8. Product Maturity and Scalability* > * > * > Would be good to know indeed, though... how to get the data? > Sherif: Agreed. But I think the presence of this item indicates implicitly to the reader that not all the tools on the market are as stable. * > 8.1 Peak memory usage* > * > * > 42GB?! That's a very subjective data that depends on many factors (machine, > configuration, application, etc. etc.) > * > 8.2 Number of scans done before a crash or serious degradation in > performance* > * > * > 42, but only because it was 71 degree in the room, and the train was > passing every 2.5 days > Sherif: Point taken :) . > * > 8.3 Maximum lines of code the tool can scan per project* > * > * > It would be good to talk about scalability of the tool, and how to improve > it. For examples, can I scan the same application with several machines > (parallelism)? If I add more RAM/CPU, do I get much better results? Is there > a known limit? > Sherif: very interesting point > * > 8.4 What languages does the tool support?* > * > * > This should be covered in a different section. > * > * > *9. Enterprise Offerings* > * > * > This is also very interesting for companies. However, > the enterprise offerings, are usually, central solution host findings, > review findings, etc. This is not really SAST, but SAST-management. Do we > want to talk about that? I'm happy to have this in the criteria... > Sherif: I don't think we should delve into that, but I think people should be aware whether the tools "supports" or "capable" of the enterprise SAST-Management or not > * > 9.1 Ability to integrate with major bug tracking systems* > * > * > This is mostly a general comment, but instead of a boolean answer. We > should ask for the supported bug tracking systems. > Also, it's important to customize this, and to be able to integrate with > JoeBugTracker... > Sherif: Agreed, I think the "ability" to integrate is more important than what does it support out of the box. > * > 9.2 Ability to integrate with enterprise software configuration management > * > * > * > To what regard? > Sherif: Building mostly > > *10. Reporting Capabilities > 10.1 Quality of reports* > * > * > Subjective. > * > 10.2 Availability of role-based reports* > * > * > It's indeed important to report different kind of data for the engineer, > dev, QA, managers, etc. Eventually, we're talking about data reporting here, > and tools should provide several ways to slice and represent the data for > the different audience. > * > 10.3 Availability of report customization* > * > * > Yup, though, to what extent is the report customizable? Can I just change > the logo, or can I integrate the findings in my word template? > > > *11. Tool Customization and Automation* > * > * > I feel that we're finally going to touch the interesting part. Every mature > use of SAST have to make use of automation, and tool customization. This > section is a very important one, and we should emphasize it as much as we > can. > * > 11.1 Can custom rules be added?* > * > * > Right, that's the first question to ask. Does the tool support finding > support customization? Now, we need many other points, such as ... What kind > of rules are supported? Can we specific/create a new type of > weakness/findings/category? > * > 11.2 Do the rules need learning new language\script?* > * > * > Most likely it will be "yes", unless it's only GUI based. My point is that > even XML rules represent a "language" to describe the rules... > * > 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or > other build script)* > * > * > Build automation is crucial, but to me, is different than automation. This > item should be in a different section. > * > 11.4 Can documentation be customized (installation instructions, > remediation advice, finding explanation…etc)* > * > * > Interesting point. Can we overwrite the remediation given by a tool? > * > 11.5 Can the defect prioritization scheme customized?* > * > * > Right! Can I integrate the results within my risk management system? > * > 11.6 Can the tool be extended so that custom plugins could be developed for > other IDEs?* > * > * > That part should be in the IDE integration. > > > > In summary, I believe that the SATEC needs to be restructured to address > the actual problems. We should also move away from any subjective criterion. > > Sherif: I partially agree with the first statement. I disagree with the second statement. Not because it subjective, then we ought to avoid it completely. If the criterion is important but subjective, then we might just mention it as a factor for people to watch out for it while they are evaluating. If we didn't tell people that some tools are RAM huggers, they wouldn't necessarily pay attention to this fact. Now, it would be up to them to act upon this note, whether to ask for the system requirements from the vendor, or whether to do experiment themselves....etc > I believe that the SATEC should be able to be filled-in by a tool vendor, > or someone who will evaluate the tool. Eventually, we should provide a > spreadsheet that could be filled. > Sherif: 100% agree > > Concerning the overall sections, the order should make sense as well. > > Anyhow, I suggest the list to rethink about the current criteria and see > what can be measured properly, and what needs to be captured by any tool > evaluator. The following is just a suggestion (came up with that in too > little time), but I believe it captures the interesting part in a better > order: > > 1. Platform support > 2.1 OS support > 2.2 Scalability tuning (support for 64bits, etc.) > 2. Application technology support > 2.1 Language support (up to the version of language) > 2.2 Framework support > 3. Scan, command and control > 3.1 Scan configuration > 3.2 Build system integration > 3.3 IDE integration > 3.4 Command line support > 3.5 Automation support > 3.6 Enterprise offerings (need of a better terminology) > 4. Application analysis > 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) > 4.2 Customization > 4.3 Triage capabilities > 4.4 Scan results post-processing > 5. Reporting > 5.1 Reports for different audiences > 5.2 Report customization > 5.3 Finding-level reporting information > 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) > 5.3.2 Finding description (paths, pre-post conditions, etc.) > 5.3.3 Finding remediation (available, customizable, etc.) > 6. Miscellanies > 6.1 Knowledge update (rules update) > 6.2 Integration in bug trackers (list of supported BT, customization, etc.) > > > Btw, I'm sorry to come back with such feedback quite late... but the > deadlines are too aggressive for me. > Sherif: Thanks for such quality feedback :) > Romain > > > > > _______________________________________________ > wasc-satec mailing list > wasc-satec@lists.webappsec.org > http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org > > >
RG
Romain Gaucher
Fri, Sep 16, 2011 1:46 AM

Glad to see some replies. Thanks Sheriff for fwd my email :)

On Thu, Sep 15, 2011 at 2:57 PM, Sherif Koussa sherif.koussa@gmail.com wrote:

  1. Tool Setup and Installation
    "Setup and Installation" is not really interesting, I believe. The more
    important is the platform support (can I run the tool from my linux box, my
    mac, our windows server, etc).

Sherif: My question here. Would a software company with a large number of
developers be interested in a central kind of service where developers don't
have to install any software at all, just run the tool on check ins, send
the report via email to the developers and that's it?

IMO, this is a different question than the "Tool Setup and
Installation". You're talking about a Bureau Service, which can be
deployed within a company through many means (SaaS, internal security
team, etc.).
This needs to be related to the tool itself.

1.3 Privileges required to perform initial installation
I don't find this item very informative. Okay, you need to have root
access, or admin access on the machine... or not.

Sherif:That's it, whether admin\root access is needed or not.

Okay, it does not hurt to have this kind of information, but still...
I don't see much interest here.

1.4 Documentation setup accuracy
Subjective.

Sherif: Agreed with subjective. But I would argue that it is still important
to quantify it somehow. Some tools are REALLY challenging if the
installation hit a snag, without proper documentation, it is almost
impossible to get it right for an average developer.

So, let's make a controlled experiment to get the data. We need to get
30 developers, 30 security folks who don't know the tool at all, and
ask them to install it properly (meaning, we have to have someone to
verify the installation). Yeah, it's a bit expansive.

On this kind of subject, I do not think we should include them in the
eval criteria. We simply cannot get the data. All we can say is "hey,
test it and judge by yourself if it's okay with you".

2.1 Time required to perform a scan

This does not make any sense. "Time required to scan"... what? This
question however, is answerable if we provide a proper test case and
environment to run the tool. But then again, it's a quite misleading
information.

Sherif: Although I see your point, I still there is something here that
might be interesting to developers. If the developer is going to wait for 2
hours for the scan to finish before he could check in code, only to find
that there is a critical XSS that he needs to get fixed. It takes him\her 5
minutes to fix the XSS but then he\she has to wait 2 more hours for another
scan. Is that a valid concern?

It is indeed a valid concern: time is critical, time is money.
However, the issue is that we cannot get data here. If one do not have
data, or any answer which is valid, one just say bullshit.
I like the comment from Alen who talked about the comparison between
the scanning time, and the compile time. However, there are big
limitations here:
1- it only worked for compiled languages
2- it's only valid with no customization (or assuming that the
customization does not impact the time to scan, which is wrong)

  1. Tool Coverage:

"Tool Coverage" might be the most misleading term here. Coverage of what?!
Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?

Sherif: That's what the sub-categories are for :)

The coverage talked about languages, types of analysis... didn't make
sense to me so much (at least the analysis engines are crying
outliers).

3.2 Support for Semantic Analysis
3.3 Support for Syntactic Analysis

I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of inter-procedural
data flow analysis, or if it has some other limitations. Then again, I would
prefer not to talk about the underlying logics (and modeling) of the tool
since I believe this is out of scope. Users don't really care about that,
they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
care
.

Sherif: I see your point. I would like to hear what developers in the list
has to say about that

Others: PLEASE REPLY. Thanks.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10,
SANS Top 25…etc)
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or
    weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we
    should use CWE, and nothing else.
  • Is this category about the the reporting and classification of findings?
    (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's
    very bad for your PCI compliance!"

Sherif: The thought behind this is sometimes I come across organization that
are starting from square 1. The common question is where do we start given
our portfolio of applications? what do we focus on? A point of reference
sometimes is to start from OWASP Top 10 or SANS 25 or whatever.

This is very true. SANS Top 25 makes sense, OWASP Top 10 risk do not
here. If the tool provide a mapping with the OWASP Top 10 risks
though, that's okay. Though, this is not a coverage, just a way to
represent the findings afterwards (I might be nitpicking here).

  1. Triage and Remediation Process
    Do we want to talk about the quality of the UI provided by the tool to
    facilitate the triage? IMO, the remediation process is out of scope for a
    SAST.

Sherif: I would like to understand why would it be out of a SAST scope?

Remediation is not part of what a SAST does. SAST finds issues that
need the be fixed. The remediation process is different to me.

  1. Enterprise Offerings
    This is also very interesting for companies. However,
    the enterprise offerings, are usually, central solution host findings,
    review findings, etc. This is not really SAST, but SAST-management. Do we
    want to talk about that? I'm happy to have this in the criteria...

Sherif: I don't think we should delve into that, but I think people should
be aware whether the tools "supports" or "capable" of the enterprise
SAST-Management or not

Yeah, I agree, this is a vendor offering usually, and it's good to know about.

9.2 Ability to integrate with enterprise software configuration management
To what regard?

Sherif: Building mostly

But isn't that mostly integration in the build system which should be
touched in an earlier category?

In summary, I believe that the SATEC needs to be restructured to address
the actual problems. We should also move away from any subjective criterion.

Sherif: I partially agree with the first statement. I disagree with the
second statement. Not because it subjective, then we ought to avoid it
completely. If the criterion is important but subjective, then we might just
mention it as a factor for people to watch out for it while they are
evaluating. If we didn't tell people that some tools are RAM huggers, they
wouldn't necessarily pay attention to this fact. Now, it would be up to them
to act upon this note, whether to ask for the system requirements from the
vendor, or whether to do experiment themselves....etc

My main issue with subjective criteria, is that people will be driven
to make inaccurate conclusion. Also, memory issues aren't my main
problem here. Think about tools' accuracy across languages,
frameworks, etc. This is really where bad decision have a bad impact
to me: "Oh, this tool worked well with .NET, it supports C++ so it
must be good too."

This makes me think that in addition of the eval criteria, we should
have testing guidelines, a la "101- How to practically test a tool".
Meaning that in addition of the checklist, and features listed by the
tool vendors (mostly), they need to run the tools on some of their
softwares and see what it's like.

Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.

Sherif: Thanks for such quality feedback  :)

I'm sure many in the list are better qualified than I am, I would love
to hear from them too. An other option, because we need more comments,
would be to reach out to tool vendors (the ones that aren't in the
list) for some feedback. All we want is just a bit of fairness in this
eval criteria ;)

Thanks again for sending my email again :)

Romain
http://rgaucher.info | @rgaucher

Glad to see some replies. Thanks Sheriff for fwd my email :) On Thu, Sep 15, 2011 at 2:57 PM, Sherif Koussa <sherif.koussa@gmail.com> wrote: >> 1. Tool Setup and Installation >> "Setup and Installation" is not really interesting, I believe. The more >> important is the platform support (can I run the tool from my linux box, my >> mac, our windows server, etc). > > Sherif: My question here. Would a software company with a large number of > developers be interested in a central kind of service where developers don't > have to install any software at all, just run the tool on check ins, send > the report via email to the developers and that's it? IMO, this is a different question than the "Tool Setup and Installation". You're talking about a Bureau Service, which can be deployed within a company through many means (SaaS, internal security team, etc.). This needs to be related to the tool itself. >> 1.3 Privileges required to perform initial installation >> I don't find this item very informative. Okay, you need to have root >> access, or admin access on the machine... or not. > > Sherif:That's it, whether admin\root access is needed or not. Okay, it does not hurt to have this kind of information, but still... I don't see much interest here. >> 1.4 Documentation setup accuracy >> Subjective. > > Sherif: Agreed with subjective. But I would argue that it is still important > to quantify it somehow. Some tools are REALLY challenging if the > installation hit a snag, without proper documentation, it is almost > impossible to get it right for an average developer. So, let's make a controlled experiment to get the data. We need to get 30 developers, 30 security folks who don't know the tool at all, and ask them to install it properly (meaning, we have to have someone to verify the installation). Yeah, it's a bit expansive. On this kind of subject, I do not think we should include them in the eval criteria. We simply cannot get the data. All we can say is "hey, test it and judge by yourself if it's okay with you". >> 2.1 Time required to perform a scan >> >> This does not make any sense. "Time required to scan"... what? This >> question however, is answerable if we provide a proper test case and >> environment to run the tool. But then again, it's a quite misleading >> information. > > Sherif: Although I see your point, I still there is something here that > might be interesting to developers. If the developer is going to wait for 2 > hours for the scan to finish before he could check in code, only to find > that there is a critical XSS that he needs to get fixed. It takes him\her 5 > minutes to fix the XSS but then he\she has to wait 2 more hours for another > scan. Is that a valid concern? It is indeed a valid concern: time is critical, time is money. However, the issue is that we cannot get data here. If one do not have data, or any answer which is valid, one just say bullshit. I like the comment from Alen who talked about the comparison between the scanning time, and the compile time. However, there are big limitations here: 1- it only worked for compiled languages 2- it's only valid with no customization (or assuming that the customization does not impact the time to scan, which is wrong) >> 3. Tool Coverage: >> >> "Tool Coverage" might be the most misleading term here. Coverage of what?! >> Coverage of supported weaknesses, languages, version of languages, >> framework, application coverage, entry point coverage, etc.? > > Sherif: That's what the sub-categories are for :) The coverage talked about languages, types of analysis... didn't make sense to me so much (at least the analysis engines are crying outliers). >> 3.2 Support for Semantic Analysis >> 3.3 Support for Syntactic Analysis >> >> I do not understand these items. (Usually, "semantic" is used to say >> something like AST-level type of knowledge). I would be, honestly, >> more interested to know if the tool is properly capable of inter-procedural >> data flow analysis, or if it has some other limitations. Then again, I would >> prefer not to talk about the underlying logics (and modeling) of the tool >> since I believe this is out of scope. Users don't really care about that, >> they just want the tool to work perfectly. If you use a dataflow based >> model, abstract interpretation, or whatever one comes up with ... *don't >> care*. > > Sherif: I see your point. I would like to hear what developers in the list > has to say about that Others: PLEASE REPLY. Thanks. >> 3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, >> SANS Top 25…etc) >> Static analysis tools do not find vulnerabilities. They find source code >> weaknesses (there is a huge difference). Now, I do not understand what >> "coverage of industry standard vulnerability categories" mean. >> - Is this category supposed to be about coverage of type of "stuff" (or >> weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we >> should use CWE, and nothing else. >> - Is this category about the the reporting and classification of findings? >> (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's >> very bad for your PCI compliance!" > > Sherif: The thought behind this is sometimes I come across organization that > are starting from square 1. The common question is where do we start given > our portfolio of applications? what do we focus on? A point of reference > sometimes is to start from OWASP Top 10 or SANS 25 or whatever. This is very true. SANS Top 25 makes sense, OWASP Top 10 risk do not here. If the tool provide a mapping with the OWASP Top 10 risks though, that's okay. Though, this is not a coverage, just a way to represent the findings afterwards (I might be nitpicking here). >> 5. Triage and Remediation Process >> Do we want to talk about the quality of the UI provided by the tool to >> facilitate the triage? IMO, the remediation process is out of scope for a >> SAST. > > Sherif: I would like to understand why would it be out of a SAST scope? Remediation is not part of what a SAST does. SAST finds issues that need the be fixed. The remediation process is different to me. >> 9. Enterprise Offerings >> This is also very interesting for companies. However, >> the enterprise offerings, are usually, central solution host findings, >> review findings, etc. This is not really SAST, but SAST-management. Do we >> want to talk about that? I'm happy to have this in the criteria... > > Sherif: I don't think we should delve into that, but I think people should > be aware whether the tools "supports" or "capable" of the enterprise > SAST-Management or not Yeah, I agree, this is a vendor offering usually, and it's good to know about. >> 9.2 Ability to integrate with enterprise software configuration management >> To what regard? > > Sherif: Building mostly But isn't that mostly integration in the build system which should be touched in an earlier category? >> In summary, I believe that the SATEC needs to be restructured to address >> the actual problems. We should also move away from any subjective criterion. > > Sherif: I partially agree with the first statement. I disagree with the > second statement. Not because it subjective, then we ought to avoid it > completely. If the criterion is important but subjective, then we might just > mention it as a factor for people to watch out for it while they are > evaluating. If we didn't tell people that some tools are RAM huggers, they > wouldn't necessarily pay attention to this fact. Now, it would be up to them > to act upon this note, whether to ask for the system requirements from the > vendor, or whether to do experiment themselves....etc My main issue with subjective criteria, is that people will be driven to make inaccurate conclusion. Also, memory issues aren't my main problem here. Think about tools' accuracy across languages, frameworks, etc. This is really where bad decision have a bad impact to me: "Oh, this tool worked well with .NET, it supports C++ so it must be good too." This makes me think that in addition of the eval criteria, we should have testing guidelines, a la "101- How to practically test a tool". Meaning that in addition of the checklist, and features listed by the tool vendors (mostly), they need to run the tools on some of their softwares and see what it's like. >> Btw, I'm sorry to come back with such feedback quite late... but the >> deadlines are too aggressive for me. > > Sherif: Thanks for such quality feedback  :) I'm sure many in the list are better qualified than I am, I would love to hear from them too. An other option, because we need more comments, would be to reach out to tool vendors (the ones that aren't in the list) for some feedback. All we want is just a bit of fairness in this eval criteria ;) Thanks again for sending my email *again* :) Romain http://rgaucher.info | @rgaucher
BG
Benoit Guerette (OWASP)
Sun, Sep 18, 2011 1:18 AM

I don't agree to redo the categories again, but I agree to take
Romain's comments (very good comments by the way) in addition to our
September 12th answers for the subcategories, that the current step
anyway, a bit late but we are in it.

No offense, we are all very busy, but deadlines are there to make sure
that we can deliver the project one day, so that we will not redo the
whole project again and again.

So why don't we focus in subcategories? Here is my answer to Romain's
comment. I didn't answer to what I agree from him.

  1. Tool Setup and Installation
    "Setup and Installation" is not really interesting, I believe. The more
    important is the platform support (can I run the tool from my linux box, my
    mac, our windows server, etc).
    1.1 Time required to perform initial installation
    That's usually subjective, unless you say something like "always less than 2
    hours", "always less than a day", etc. But then again, I find this totally
    quite irrelevant to the problem.

Don't agree, some vendors are stand alone on the desktop, some
requires 1 server, other requires multiple server. In a large scale
deployment, this section is relevant.

1.2 Skills required to perform initial installation
Subjective.

Agree

1.3 Privileges required to perform initial installation
I don't find this item very informative. Okay, you need to have root access,
or admin access on the machine... or not.

Agree they all need root/admin

1.4 Documentation setup accuracy
Subjective.

Don't agree, vendors will supplies a sample installation guide, and
quality vary a lot

1.5 Platform Support
This one is interesting for the customers.

  1. Performing a Scan
    Logically, I would not talk about scanning just now. But, after the platform
    support section, I would talk about language, framework support.
    2.1 Time required to perform a scan

This does not make any sense. "Time required to scan"... what? This question
however, is answerable if we provide a proper test case and environment to
run the tool. But then again, it's a quite misleading information.

Don't agree, you would be very surprised to see the gap between vendors.

2.2 Number of steps required to perform a scan

Many tools have scripting interfaces. Using scripts, you reduce your steps
from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.

Well I agree, vendors varies from 2 to 7 steps in most case.

2.3 Skills required to perform a scan
I understand that some tools (like PolySpace) require someone to actually
design and model the suspected behavior of the program. But most tools do
not require that. Then again, how to rate the user? Do we assume the user
(who runs the scan) will also look at the findings? Does he also setup the
scan? I definitely see the scan being run by security operation (mostly for
monitoring), and being setup by security engineers...

Could be removed, mos vendors will answer 'no skills required'.

  1. Tool Coverage:

"Tool Coverage" might be the most misleading term here. Coverage of what?!
Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?
3.1 Languages supported by the tool

Very important. Now, we should not limit ourselves to the languages, but we
should go at the version of framework level. Nowadays, the language is just
a mean, most of the juicy stuff happen in the relationship with the
frameworks... Also, the behavior of the frameworks might be different from
one version to one another...

Good point, we should ask for framework, libraries, etc.

3.2 Support for Semantic Analysis
3.3 Support for Syntactic Analysis

I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of inter-procedural
data flow analysis, or if it has some other limitations. Then again, I would
prefer not to talk about the underlying logics (and modeling) of the tool
since I believe this is out of scope. Users don't really care about that,
they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
care
.

Don't agree, not all vendors support those points, it is interesting
to know. Don't forget, this is a security tool that we ask DEV tems to
used, so they are interested in that kind of verification

3.4 Ability of the tool to understand different components of a project
(.sql, .xml, .xsd, .properties…etc)

This is a very interesting item. When generalized a little bit, we can
derive several items:
 - Analysis support of configuration files (i.e., the tool gets knowledge
from the configuration files)
 - Analysis support for multiple languages in separated files
 - Cross-languages analysis support (the tool is capable of performing its
analysis from one language, let's say Java, to SQL, and back to Java)
Another item that would be quite interesting, is the support for "new
extensions", or redefinition of extensions. Let's say the tool does
recognize ".pl" as perl, but that I have all my stored procedures (in
PL/SQL) with this extension, I'd like to be able to tell the tool to
consider the .pl to be PL/SQL for this application. The same reasoning needs
to be done for new extensions.

Ok that's a good one, we should expand. Most of them can scan config
files, but not all of them can scan SQL, so we should split this
question.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10,
SANS Top 25…etc)
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or
    weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we
    should use CWE, and nothing else.
  • Is this category about the the reporting and classification of findings?
    (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's
    very bad for your PCI compliance!"

I don't understand your point on this one, but you are usually very
clear so may be we can chat together. I want to know if they look only
against the Top 10, or also SANS 25, which is very specific on
injections (command per example). Also, on a compliance point of view,
there is a big value to clearly show which model is scan (PCI 1.2 is
mapped to Top Ten, etc.). Weakness in code = Vulnerability in binary,
so all the same from my point of view.

  1. Detection Accuracy
    Usually, that does not mean anything.
    4.1 Number of false positives
    4.2 Number of true negatives
    My first comment here was "Gniii?", then I did s/Number/Rate and it made a
    bit more sense.
    I could understand why someone would want to get a rate of false-positive,
    and false-negatives, but true-negatives? True negatives, are the things that
    are not reported by the tool, and it's good from the tool not to report
    them, and examples would be data flow path that uses a proper validation
    routine before sending the data to a sink. You do not want the tool to
    report such, and this is a true-negative.
    By the way, the rate of FP/FN are very interesting for an experiment point
    of view, but there is no way to get this data to mean anything for Joe the
    project manager who wants to get a tool. Most likely your data will be very
    different than his (if you're making the same experiment on your
    applications). Sad reality fix: tools results depend a lot on the
    application.
    4.3 Accuracy %

Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We
cannot measure that in a meaningful way.

Well, vendors should answer this questions, and it is very interesting
to see there positions. Also, from a lab perspective, scanning against
a sample code and look at the false positive it generate is important.
Imagine that you use ESAPI for filtering but the tool don't know
libraries, it could generate lot of false positive.

That an important point, imagine a developer scanning is code for the
first time...

  1. Triage and Remediation Process
    Do we want to talk about the quality of the UI provided by the tool to
    facilitate the triage? IMO, the remediation process is out of scope for a
    SAST.

Why is remediation out of scope? I scan for the first time a huge
portal that we developed, and we have 1000 alerts, triage and
remediation is a huge point.

5.1 Average time to triage a finding
This seems to me like rating your assessor more than the tool you use.

What if you need to mark as false positive one after the other the
same problem? That not a code 19 problem ;)

5.2 Quality of data surrounding a finding (explanation, tracing, trust
level…etc)
Those are indeed very important information. As an assessor, I want to know
why the heck this tool reported this finding to me. Not only I want to have
paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that generated
the finding. This is extremely useful for advanced use of the tools. I
undersand that most tools do not report that, so at least reporting the rule
ID (or something I can track later on, and make sense of) is important.

I totally agree on this comment

5.3 Ability to mark findings as false positive
Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will
    encounter a similar case, it won't report it?
    5.4 Ability to “diff” assessments
    Very important indeed.
    5.5 Ability to merge assessments
    Tracking, merging, combining assessment is definitely part of the
    workflow...
    5.6 Correctness of remediation advice
    5.7 Completeness of remediation advice
    I hope no one actually relies on the tool to give proper remediation advice.
    They're usually fine to give an idea, but no way they will give you a good
    solution, for your case (even though, in theory they have lots of
    information to do so).

Are we talking about security people doing the scan, or developers
doing the scan? Unless you have an appsec team on support to all
developers, completeness is very important, the more we guide the DEV,
the more they feel in control, and that the tool is good for them
(don't need to call that appsec guy every time I find a problem)

5.8 Does the tool automatically prioritize defects
Prioritize what? Is this category supposed to be talking about the severity
rating? Is this talking about prioritization at the engine level so that the
tool misses lots of stuff (yeah, that's usually what happen when the flow
gets complex).

Reporting in my mind, how do you show the defects in which order?
Don't forget, the first scan is usually a nightmare.

  1. UI Simplicity and Intuitiveness
    6.1 Quality of triage interface (need a way to measure this)
    6.2 Quality of remediation interface (need a way to measure this)
    Subjective.
    6.3 Support for IDE plug-ins both out of the box and on-demand
    "Integration with IDEs", and possible support for new IDEs. Yes, that's
    important to get at least, a list of integrated IDEs.
    6.4 Quality of tools’ out of the box plugin UI
    Subjective. Why not talking about the features available though the plugin.

  2. Product Update Process
    It's indeed good to know that automated/federated/etc. updates are possible.
    7.1 Frequency of signature update
    Interesting, but the reader must be careful not to make much decision based
    on that. If the tool gets a new pack of rules every week or every months,
    that does not mean much about the quality...

No but in a large scale deployment, it means something to download
from internet 1500 x 100MB on Monday 9am when all developer arrive at
their desk.

7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats
Are we talking about new weaknesses? The word "threat" is very confusing
here... and does not make sense to me in the context of SAST.

I kind of agree on the usage of threats, because threats are hackers,
insiders, etc. Vulnerability should be used.

  1. Product Maturity and Scalability
    Would be good to know indeed, though... how to get the data?
    8.1 Peak memory usage
    42GB?! That's a very subjective data that depends on many factors (machine,
    configuration, application, etc. etc.)

Sorry that's an important point. Some vendors scan in the cloud, other
in a centralized server, build server, and others on the IDE directly.
Adding from 2 to 4GB of RAM on developer setup is an important
point... I am sure you understand the impact saying that this tool
will use up to 4GB of RAM of the DEV workstation. We need a metric.

8.2 Number of scans done before a crash or serious degradation in
performance
42, but only because it was 71 degree in the room, and the train was passing
every 2.5 days.

Your example will not engage a new release of the tools. Analyzing the
last 3 years releases fix will help to understand if they fix crashes
of add functionality.

8.3 Maximum lines of code the tool can scan per project
It would be good to talk about scalability of the tool, and how to improve
it. For examples, can I scan the same application with several machines
(parallelism)? If I add more RAM/CPU, do I get much better results? Is there
a known limit?
8.4 What languages does the tool support?
This should be covered in a different section.

I don't get it... But yes we can move it but propose something.

  1. Enterprise Offerings
    This is also very interesting for companies. However,
    the enterprise offerings, are usually, central solution host findings,
    review findings, etc. This is not really SAST, but SAST-management. Do we
    want to talk about that? I'm happy to have this in the criteria...

Important point, as I already said I still don't understand why
pricing is not included....

9.1 Ability to integrate with major bug tracking systems
This is mostly a general comment, but instead of a boolean answer. We should
ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with
JoeBugTracker...

We can reformulate, we the vendors usually understand that we need input.

9.2 Ability to integrate with enterprise software configuration management
To what regard?

  1. Reporting Capabilities
    10.1 Quality of reports
    Subjective.

Ok but would you buy a tool with skinny or unreadable reports? Quality
is an important factor, and reporting is usually a management issue ->
and they pay for the product.

10.2 Availability of role-based reports
It's indeed important to report different kind of data for the engineer,
dev, QA, managers, etc. Eventually, we're talking about data reporting here,
and tools should provide several ways to slice and represent the data for
the different audience.
10.3 Availability of report customization
Yup, though, to what extent is the report customizable? Can I just change
the logo, or can I integrate the findings in my word template?

I don't think we need a paragraph for each question, but we should
split this particular question in multiple questions

  1. Tool Customization and Automation
    I feel that we're finally going to touch the interesting part. Every mature
    use of SAST have to make use of automation, and tool customization. This
    section is a very important one, and we should emphasize it as much as we
    can.
    11.1 Can custom rules be added?
    Right, that's the first question to ask. Does the tool support finding
    support customization? Now, we need many other points, such as ... What kind
    of rules are supported? Can we specific/create a new type of
    weakness/findings/category?
    11.2 Do the rules need learning new language\script?
    Most likely it will be "yes", unless it's only GUI based. My point is that
    even XML rules represent a "language" to describe the rules...

Most of them use XML, some use kind of SQL, and others use GUI. I
agree, we should reformulate the question

11.3 Can the tool be scripted? (e.g. integrated into ANT build script or
other build script)
Build automation is crucial, but to me, is different than automation. This
item should be in a different section.

Not sure about this one, but we can split the section

11.4 Can documentation be customized (installation instructions, remediation
advice, finding explanation…etc)
Interesting point. Can we overwrite the remediation given by a tool?

Can we change the language, point to interne security library, etc, good point

11.5 Can the defect prioritization scheme customized?
Right! Can I integrate the results within my risk management system?
11.6 Can the tool be extended so that custom plugins could be developed for
other IDEs?
That part should be in the IDE integration.

Maybe, but most vendors can provide API to automate or script, I like
to see it in this section

In summary, I believe that the SATEC needs to be restructured to address the
actual problems. We should also move away from any subjective criterion. I
believe that the SATEC should be able to be filled-in by a tool vendor, or
someone who will evaluate the tool. Eventually, we should provide a
spreadsheet that could be filled.

That's a good point, but understand that we then need 2 lists of
questions because the quality of UI kind of questions cannot be sent.
We used the WASC-SATEC for an RFI the majors vendors, so we created an
excel sheet, and removed the 'quality' items, that will be filled by
the DEV and APPSEC team.

But again, if we create 2 lists, then we should add non technical
stuff as pricing which is part of the decision.

Concerning the overall sections, the order should make sense as well.
Anyhow, I suggest the list to rethink about the current criteria and see
what can be measured properly, and what needs to be captured by any tool
evaluator. The following is just a suggestion (came up with that in too
little time), but I believe it captures the interesting part in a better
order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization, etc.)

Well, if we do this we restart from scratch and that's not fair for
all of us who spend lot of times on this project.
I think the current categories should remain, only the sub-categories
questions need to be challenged.

From what I see here, all your remarks fits wells in the current

categories, we should do a task force to analyze your very good
comment, and readjust the questions.

Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.
Romain

As I said, it is not too late for the subcategories, I see that your
comments fits well in the current categories. But if we redo all the
categories again, that will be an endless project, as someone else
could again answer late with very good remarks.

So we should now focus to compile the September 12th answers with yours.

I don't agree to redo the categories again, but I agree to take Romain's comments (very good comments by the way) in addition to our September 12th answers for the subcategories, that the current step anyway, a bit late but we are in it. No offense, we are all very busy, but deadlines are there to make sure that we can deliver the project one day, so that we will not redo the whole project again and again. So why don't we focus in subcategories? Here is my answer to Romain's comment. I didn't answer to what I agree from him. > 1. Tool Setup and Installation > "Setup and Installation" is not really interesting, I believe. The more > important is the platform support (can I run the tool from my linux box, my > mac, our windows server, etc). > 1.1 Time required to perform initial installation > That's usually subjective, unless you say something like "always less than 2 > hours", "always less than a day", etc. But then again, I find this totally > quite irrelevant to the problem. Don't agree, some vendors are stand alone on the desktop, some requires 1 server, other requires multiple server. In a large scale deployment, this section is relevant. > 1.2 Skills required to perform initial installation > Subjective. Agree > 1.3 Privileges required to perform initial installation > I don't find this item very informative. Okay, you need to have root access, > or admin access on the machine... or not. Agree they all need root/admin > 1.4 Documentation setup accuracy > Subjective. Don't agree, vendors will supplies a sample installation guide, and quality vary a lot > 1.5 Platform Support > This one is interesting for the customers. > > 2. Performing a Scan > Logically, I would not talk about scanning just now. But, after the platform > support section, I would talk about language, framework support. > 2.1 Time required to perform a scan > > This does not make any sense. "Time required to scan"... what? This question > however, is answerable if we provide a proper test case and environment to > run the tool. But then again, it's a quite misleading information. Don't agree, you would be very surprised to see the gap between vendors. > 2.2 Number of steps required to perform a scan > > Many tools have scripting interfaces. Using scripts, you reduce your steps > from 7, to 1 (i.e., run the script). How does that count? > In summary, I find this information not interesting at all. Well I agree, vendors varies from 2 to 7 steps in most case. > 2.3 Skills required to perform a scan > I understand that some tools (like PolySpace) require someone to actually > design and model the suspected behavior of the program. But most tools do > not require that. Then again, how to rate the user? Do we assume the user > (who runs the scan) will also look at the findings? Does he also setup the > scan? I definitely see the scan being run by security operation (mostly for > monitoring), and being setup by security engineers... Could be removed, mos vendors will answer 'no skills required'. > 3. Tool Coverage: > > "Tool Coverage" might be the most misleading term here. Coverage of what?! > Coverage of supported weaknesses, languages, version of languages, > framework, application coverage, entry point coverage, etc.? > 3.1 Languages supported by the tool > > Very important. Now, we should not limit ourselves to the languages, but we > should go at the version of framework level. Nowadays, the language is just > a mean, most of the juicy stuff happen in the relationship with the > frameworks... Also, the behavior of the frameworks might be different from > one version to one another... Good point, we should ask for framework, libraries, etc. > 3.2 Support for Semantic Analysis > 3.3 Support for Syntactic Analysis > > I do not understand these items. (Usually, "semantic" is used to say > something like AST-level type of knowledge). I would be, honestly, > more interested to know if the tool is properly capable of inter-procedural > data flow analysis, or if it has some other limitations. Then again, I would > prefer not to talk about the underlying logics (and modeling) of the tool > since I believe this is out of scope. Users don't really care about that, > they just want the tool to work perfectly. If you use a dataflow based > model, abstract interpretation, or whatever one comes up with ... *don't > care*. Don't agree, not all vendors support those points, it is interesting to know. Don't forget, this is a security tool that we ask DEV tems to used, so they are interested in that kind of verification > 3.4 Ability of the tool to understand different components of a project > (.sql, .xml, .xsd, .properties…etc) > > This is a very interesting item. When generalized a little bit, we can > derive several items: >  - Analysis support of configuration files (i.e., the tool gets knowledge > from the configuration files) >  - Analysis support for multiple languages in separated files >  - Cross-languages analysis support (the tool is capable of performing its > analysis from one language, let's say Java, to SQL, and back to Java) > Another item that would be quite interesting, is the support for "new > extensions", or redefinition of extensions. Let's say the tool does > recognize ".pl" as perl, but that I have all my stored procedures (in > PL/SQL) with this extension, I'd like to be able to tell the tool to > consider the .pl to be PL/SQL for this application. The same reasoning needs > to be done for new extensions. Ok that's a good one, we should expand. Most of them can scan config files, but not all of them can scan SQL, so we should split this question. > 3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, > SANS Top 25…etc) > Static analysis tools do not find vulnerabilities. They find source code > weaknesses (there is a huge difference). Now, I do not understand what > "coverage of industry standard vulnerability categories" mean. > - Is this category supposed to be about coverage of type of "stuff" (or > weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we > should use CWE, and nothing else. > - Is this category about the the reporting and classification of findings? > (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... that's > very bad for your PCI compliance!" I don't understand your point on this one, but you are usually very clear so may be we can chat together. I want to know if they look only against the Top 10, or also SANS 25, which is very specific on injections (command per example). Also, on a compliance point of view, there is a big value to clearly show which model is scan (PCI 1.2 is mapped to Top Ten, etc.). Weakness in code = Vulnerability in binary, so all the same from my point of view. > > 4. Detection Accuracy > Usually, that does not mean anything. > 4.1 Number of false positives > 4.2 Number of true negatives > My first comment here was "Gniii?", then I did s/Number/Rate and it made a > bit more sense. > I could understand why someone would want to get a rate of false-positive, > and false-negatives, but true-negatives? True negatives, are the things that > are not reported by the tool, and it's good from the tool not to report > them, and examples would be data flow path that uses a proper validation > routine before sending the data to a sink. You do not want the tool to > report such, and this is a true-negative. > By the way, the rate of FP/FN are very interesting for an experiment point > of view, but there is no way to get this data to mean anything for Joe the > project manager who wants to get a tool. Most likely your data will be very > different than his (if you're making the same experiment on your > applications). Sad reality fix: tools results depend a lot on the > application. > 4.3 Accuracy % > > Accuracy of what? Compared to what? Non-sense to me, cf. previous point. We > cannot measure that in a meaningful way. Well, vendors should answer this questions, and it is very interesting to see there positions. Also, from a lab perspective, scanning against a sample code and look at the false positive it generate is important. Imagine that you use ESAPI for filtering but the tool don't know libraries, it could generate lot of false positive. That an important point, imagine a developer scanning is code for the first time... > > > 5. Triage and Remediation Process > Do we want to talk about the quality of the UI provided by the tool to > facilitate the triage? IMO, the remediation process is out of scope for a > SAST. Why is remediation out of scope? I scan for the first time a huge portal that we developed, and we have 1000 alerts, triage and remediation is a huge point. > 5.1 Average time to triage a finding > This seems to me like rating your assessor more than the tool you use. What if you need to mark as false positive one after the other the same problem? That not a code 19 problem ;) > 5.2 Quality of data surrounding a finding (explanation, tracing, trust > level…etc) > Those are indeed very important information. As an assessor, I want to know > why the heck this tool reported this finding to me. Not only I want to have > paths, confidence, data flow info, etc. but I want to know the internals. > Some tools will report the pre-conditions and post conditions that generated > the finding. This is extremely useful for advanced use of the tools. I > undersand that most tools do not report that, so at least reporting the rule > ID (or something I can track later on, and make sense of) is important. I totally agree on this comment > 5.3 Ability to mark findings as false positive > Mark a finding as FP might have several meaning. Does this mean: > - Mark the findings as FP for the report? > - Mark the findings as FP for the engine, so that next time it will > encounter a similar case, it won't report it? > 5.4 Ability to “diff” assessments > Very important indeed. > 5.5 Ability to merge assessments > Tracking, merging, combining assessment is definitely part of the > workflow... > 5.6 Correctness of remediation advice > 5.7 Completeness of remediation advice > I hope no one actually relies on the tool to give proper remediation advice. > They're usually fine to give an idea, but no way they will give you a good > solution, for your case (even though, in theory they have lots of > information to do so). Are we talking about security people doing the scan, or developers doing the scan? Unless you have an appsec team on support to all developers, completeness is very important, the more we guide the DEV, the more they feel in control, and that the tool is good for them (don't need to call that appsec guy every time I find a problem) > 5.8 Does the tool automatically prioritize defects > Prioritize what? Is this category supposed to be talking about the severity > rating? Is this talking about prioritization at the engine level so that the > tool misses lots of stuff (yeah, that's usually what happen when the flow > gets complex). Reporting in my mind, how do you show the defects in which order? Don't forget, the first scan is usually a nightmare. > 6. UI Simplicity and Intuitiveness > 6.1 Quality of triage interface (need a way to measure this) > 6.2 Quality of remediation interface (need a way to measure this) > Subjective. > 6.3 Support for IDE plug-ins both out of the box and on-demand > "Integration with IDEs", and possible support for new IDEs. Yes, that's > important to get at least, a list of integrated IDEs. > 6.4 Quality of tools’ out of the box plugin UI > Subjective. Why not talking about the features available though the plugin. > > 7. Product Update Process > It's indeed good to know that automated/federated/etc. updates are possible. > 7.1 Frequency of signature update > Interesting, but the reader must be careful not to make much decision based > on that. If the tool gets a new pack of rules every week or every months, > that does not mean much about the quality... No but in a large scale deployment, it means something to download from internet 1500 x 100MB on Monday 9am when all developer arrive at their desk. > 7.2 Relevance of signatures to evolving threats > 7.3 Re-activeness to evolving threats > Are we talking about new weaknesses? The word "threat" is very confusing > here... and does not make sense to me in the context of SAST. I kind of agree on the usage of threats, because threats are hackers, insiders, etc. Vulnerability should be used. > 8. Product Maturity and Scalability > Would be good to know indeed, though... how to get the data? > 8.1 Peak memory usage > 42GB?! That's a very subjective data that depends on many factors (machine, > configuration, application, etc. etc.) Sorry that's an important point. Some vendors scan in the cloud, other in a centralized server, build server, and others on the IDE directly. Adding from 2 to 4GB of RAM on developer setup is an important point... I am sure you understand the impact saying that this tool will use up to 4GB of RAM of the DEV workstation. We need a metric. > 8.2 Number of scans done before a crash or serious degradation in > performance > 42, but only because it was 71 degree in the room, and the train was passing > every 2.5 days. Your example will not engage a new release of the tools. Analyzing the last 3 years releases fix will help to understand if they fix crashes of add functionality. > 8.3 Maximum lines of code the tool can scan per project > It would be good to talk about scalability of the tool, and how to improve > it. For examples, can I scan the same application with several machines > (parallelism)? If I add more RAM/CPU, do I get much better results? Is there > a known limit? > 8.4 What languages does the tool support? > This should be covered in a different section. I don't get it... But yes we can move it but propose something. > 9. Enterprise Offerings > This is also very interesting for companies. However, > the enterprise offerings, are usually, central solution host findings, > review findings, etc. This is not really SAST, but SAST-management. Do we > want to talk about that? I'm happy to have this in the criteria... Important point, as I already said I still don't understand why pricing is not included.... > 9.1 Ability to integrate with major bug tracking systems > This is mostly a general comment, but instead of a boolean answer. We should > ask for the supported bug tracking systems. > Also, it's important to customize this, and to be able to integrate with > JoeBugTracker... We can reformulate, we the vendors usually understand that we need input. > 9.2 Ability to integrate with enterprise software configuration management > To what regard? > > 10. Reporting Capabilities > 10.1 Quality of reports > Subjective. Ok but would you buy a tool with skinny or unreadable reports? Quality is an important factor, and reporting is usually a management issue -> and they pay for the product. > 10.2 Availability of role-based reports > It's indeed important to report different kind of data for the engineer, > dev, QA, managers, etc. Eventually, we're talking about data reporting here, > and tools should provide several ways to slice and represent the data for > the different audience. > 10.3 Availability of report customization > Yup, though, to what extent is the report customizable? Can I just change > the logo, or can I integrate the findings in my word template? I don't think we need a paragraph for each question, but we should split this particular question in multiple questions > > 11. Tool Customization and Automation > I feel that we're finally going to touch the interesting part. Every mature > use of SAST have to make use of automation, and tool customization. This > section is a very important one, and we should emphasize it as much as we > can. > 11.1 Can custom rules be added? > Right, that's the first question to ask. Does the tool support finding > support customization? Now, we need many other points, such as ... What kind > of rules are supported? Can we specific/create a new type of > weakness/findings/category? > 11.2 Do the rules need learning new language\script? > Most likely it will be "yes", unless it's only GUI based. My point is that > even XML rules represent a "language" to describe the rules... Most of them use XML, some use kind of SQL, and others use GUI. I agree, we should reformulate the question > 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or > other build script) > Build automation is crucial, but to me, is different than automation. This > item should be in a different section. Not sure about this one, but we can split the section > 11.4 Can documentation be customized (installation instructions, remediation > advice, finding explanation…etc) > Interesting point. Can we overwrite the remediation given by a tool? Can we change the language, point to interne security library, etc, good point > 11.5 Can the defect prioritization scheme customized? > Right! Can I integrate the results within my risk management system? > 11.6 Can the tool be extended so that custom plugins could be developed for > other IDEs? > That part should be in the IDE integration. Maybe, but most vendors can provide API to automate or script, I like to see it in this section > > > In summary, I believe that the SATEC needs to be restructured to address the > actual problems. We should also move away from any subjective criterion. I > believe that the SATEC should be able to be filled-in by a tool vendor, or > someone who will evaluate the tool. Eventually, we should provide a > spreadsheet that could be filled. That's a good point, but understand that we then need 2 lists of questions because the quality of UI kind of questions cannot be sent. We used the WASC-SATEC for an RFI the majors vendors, so we created an excel sheet, and removed the 'quality' items, that will be filled by the DEV and APPSEC team. But again, if we create 2 lists, then we should add non technical stuff as pricing which is part of the decision. > Concerning the overall sections, the order should make sense as well. > Anyhow, I suggest the list to rethink about the current criteria and see > what can be measured properly, and what needs to be captured by any tool > evaluator. The following is just a suggestion (came up with that in too > little time), but I believe it captures the interesting part in a better > order: > 1. Platform support > 2.1 OS support > 2.2 Scalability tuning (support for 64bits, etc.) > 2. Application technology support > 2.1 Language support (up to the version of language) > 2.2 Framework support > 3. Scan, command and control > 3.1 Scan configuration > 3.2 Build system integration > 3.3 IDE integration > 3.4 Command line support > 3.5 Automation support > 3.6 Enterprise offerings (need of a better terminology) > 4. Application analysis > 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) > 4.2 Customization > 4.3 Triage capabilities > 4.4 Scan results post-processing > 5. Reporting > 5.1 Reports for different audiences > 5.2 Report customization > 5.3 Finding-level reporting information > 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) > 5.3.2 Finding description (paths, pre-post conditions, etc.) > 5.3.3 Finding remediation (available, customizable, etc.) > 6. Miscellanies > 6.1 Knowledge update (rules update) > 6.2 Integration in bug trackers (list of supported BT, customization, etc.) Well, if we do this we restart from scratch and that's not fair for all of us who spend lot of times on this project. I think the current categories should remain, only the sub-categories questions need to be challenged. >From what I see here, all your remarks fits wells in the current categories, we should do a task force to analyze your very good comment, and readjust the questions. > > Btw, I'm sorry to come back with such feedback quite late... but the > deadlines are too aggressive for me. > Romain As I said, it is not too late for the subcategories, I see that your comments fits well in the current categories. But if we redo all the categories again, that will be an endless project, as someone else could again answer late with very good remarks. So we should now focus to compile the September 12th answers with yours. > > > > _______________________________________________ > wasc-satec mailing list > wasc-satec@lists.webappsec.org > http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org > > > > _______________________________________________ > wasc-satec mailing list > wasc-satec@lists.webappsec.org > http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org > >
SK
Sherif Koussa
Fri, Sep 23, 2011 3:36 AM

Thanks for your comments Benoit.

I am going to give this another week for comments since a lot of folks are
in OWASP AppSec USA.

Regards,
Sherif

On Sat, Sep 17, 2011 at 9:18 PM, Benoit Guerette (OWASP) gueb@owasp.orgwrote:

I don't agree to redo the categories again, but I agree to take
Romain's comments (very good comments by the way) in addition to our
September 12th answers for the subcategories, that the current step
anyway, a bit late but we are in it.

No offense, we are all very busy, but deadlines are there to make sure
that we can deliver the project one day, so that we will not redo the
whole project again and again.

So why don't we focus in subcategories? Here is my answer to Romain's
comment. I didn't answer to what I agree from him.

  1. Tool Setup and Installation
    "Setup and Installation" is not really interesting, I believe. The more
    important is the platform support (can I run the tool from my linux box,

my

mac, our windows server, etc).
1.1 Time required to perform initial installation
That's usually subjective, unless you say something like "always less

than 2

hours", "always less than a day", etc. But then again, I find this

totally

quite irrelevant to the problem.

Don't agree, some vendors are stand alone on the desktop, some
requires 1 server, other requires multiple server. In a large scale
deployment, this section is relevant.

1.2 Skills required to perform initial installation
Subjective.

Agree

1.3 Privileges required to perform initial installation
I don't find this item very informative. Okay, you need to have root

access,

or admin access on the machine... or not.

Agree they all need root/admin

1.4 Documentation setup accuracy
Subjective.

Don't agree, vendors will supplies a sample installation guide, and
quality vary a lot

1.5 Platform Support
This one is interesting for the customers.

  1. Performing a Scan
    Logically, I would not talk about scanning just now. But, after the

platform

support section, I would talk about language, framework support.
2.1 Time required to perform a scan

This does not make any sense. "Time required to scan"... what? This

question

however, is answerable if we provide a proper test case and environment

to

run the tool. But then again, it's a quite misleading information.

Don't agree, you would be very surprised to see the gap between vendors.

2.2 Number of steps required to perform a scan

Many tools have scripting interfaces. Using scripts, you reduce your

steps

from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.

Well I agree, vendors varies from 2 to 7 steps in most case.

2.3 Skills required to perform a scan
I understand that some tools (like PolySpace) require someone to actually
design and model the suspected behavior of the program. But most tools do
not require that. Then again, how to rate the user? Do we assume the user
(who runs the scan) will also look at the findings? Does he also setup

the

scan? I definitely see the scan being run by security operation (mostly

for

monitoring), and being setup by security engineers...

Could be removed, mos vendors will answer 'no skills required'.

  1. Tool Coverage:

"Tool Coverage" might be the most misleading term here. Coverage of

what?!

Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?
3.1 Languages supported by the tool

Very important. Now, we should not limit ourselves to the languages, but

we

should go at the version of framework level. Nowadays, the language is

just

a mean, most of the juicy stuff happen in the relationship with the
frameworks... Also, the behavior of the frameworks might

be different from

one version to one another...

Good point, we should ask for framework, libraries, etc.

3.2 Support for Semantic Analysis
3.3 Support for Syntactic Analysis

I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of

inter-procedural

data flow analysis, or if it has some other limitations. Then again, I

would

prefer not to talk about the underlying logics (and modeling) of the tool
since I believe this is out of scope. Users don't really care about that,
they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
care
.

Don't agree, not all vendors support those points, it is interesting
to know. Don't forget, this is a security tool that we ask DEV tems to
used, so they are interested in that kind of verification

3.4 Ability of the tool to understand different components of a project
(.sql, .xml, .xsd, .properties…etc)

This is a very interesting item. When generalized a little bit, we can
derive several items:

  • Analysis support of configuration files (i.e., the tool gets knowledge
    from the configuration files)
  • Analysis support for multiple languages in separated files
  • Cross-languages analysis support (the tool is capable of performing

its

analysis from one language, let's say Java, to SQL, and back to Java)
Another item that would be quite interesting, is the support for "new
extensions", or redefinition of extensions. Let's say the tool does
recognize ".pl" as perl, but that I have all my stored procedures (in
PL/SQL) with this extension, I'd like to be able to tell the tool to
consider the .pl to be PL/SQL for this application. The same reasoning

needs

to be done for new extensions.

Ok that's a good one, we should expand. Most of them can scan config
files, but not all of them can scan SQL, so we should split this
question.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10,
SANS Top 25…etc)
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or
    weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we
    should use CWE, and nothing else.
  • Is this category about the the reporting and classification of

findings?

(such as, "Oh noes, this finding is mapped to OWASP top 10 risks...

that's

very bad for your PCI compliance!"

I don't understand your point on this one, but you are usually very
clear so may be we can chat together. I want to know if they look only
against the Top 10, or also SANS 25, which is very specific on
injections (command per example). Also, on a compliance point of view,
there is a big value to clearly show which model is scan (PCI 1.2 is
mapped to Top Ten, etc.). Weakness in code = Vulnerability in binary,
so all the same from my point of view.

  1. Detection Accuracy
    Usually, that does not mean anything.
    4.1 Number of false positives
    4.2 Number of true negatives
    My first comment here was "Gniii?", then I did s/Number/Rate and it made

a

bit more sense.
I could understand why someone would want to get a rate of

false-positive,

and false-negatives, but true-negatives? True negatives, are the things

that

are not reported by the tool, and it's good from the tool not to report
them, and examples would be data flow path that uses a proper validation
routine before sending the data to a sink. You do not want the tool to
report such, and this is a true-negative.
By the way, the rate of FP/FN are very interesting for an experiment

point

of view, but there is no way to get this data to mean anything for Joe

the

project manager who wants to get a tool. Most likely your data will be

very

different than his (if you're making the same experiment on your
applications). Sad reality fix: tools results depend a lot on the
application.
4.3 Accuracy %

Accuracy of what? Compared to what? Non-sense to me, cf. previous point.

We

cannot measure that in a meaningful way.

Well, vendors should answer this questions, and it is very interesting
to see there positions. Also, from a lab perspective, scanning against
a sample code and look at the false positive it generate is important.
Imagine that you use ESAPI for filtering but the tool don't know
libraries, it could generate lot of false positive.

That an important point, imagine a developer scanning is code for the
first time...

  1. Triage and Remediation Process
    Do we want to talk about the quality of the UI provided by the tool to
    facilitate the triage? IMO, the remediation process is out of scope for a
    SAST.

Why is remediation out of scope? I scan for the first time a huge
portal that we developed, and we have 1000 alerts, triage and
remediation is a huge point.

5.1 Average time to triage a finding
This seems to me like rating your assessor more than the tool you use.

What if you need to mark as false positive one after the other the
same problem? That not a code 19 problem ;)

5.2 Quality of data surrounding a finding (explanation, tracing, trust
level…etc)
Those are indeed very important information. As an assessor, I want to

know

why the heck this tool reported this finding to me. Not only I want to

have

paths, confidence, data flow info, etc. but I want to know the internals.
Some tools will report the pre-conditions and post conditions that

generated

the finding. This is extremely useful for advanced use of the tools. I
undersand that most tools do not report that, so at least reporting the

rule

ID (or something I can track later on, and make sense of) is important.

I totally agree on this comment

5.3 Ability to mark findings as false positive
Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will
    encounter a similar case, it won't report it?
    5.4 Ability to “diff” assessments
    Very important indeed.
    5.5 Ability to merge assessments
    Tracking, merging, combining assessment is definitely part of the
    workflow...
    5.6 Correctness of remediation advice
    5.7 Completeness of remediation advice
    I hope no one actually relies on the tool to give proper remediation

advice.

They're usually fine to give an idea, but no way they will give you a

good

solution, for your case (even though, in theory they have lots of
information to do so).

Are we talking about security people doing the scan, or developers
doing the scan? Unless you have an appsec team on support to all
developers, completeness is very important, the more we guide the DEV,
the more they feel in control, and that the tool is good for them
(don't need to call that appsec guy every time I find a problem)

5.8 Does the tool automatically prioritize defects
Prioritize what? Is this category supposed to be talking about the

severity

rating? Is this talking about prioritization at the engine level so that

the

tool misses lots of stuff (yeah, that's usually what happen when the flow
gets complex).

Reporting in my mind, how do you show the defects in which order?
Don't forget, the first scan is usually a nightmare.

  1. UI Simplicity and Intuitiveness
    6.1 Quality of triage interface (need a way to measure this)
    6.2 Quality of remediation interface (need a way to measure this)
    Subjective.
    6.3 Support for IDE plug-ins both out of the box and on-demand
    "Integration with IDEs", and possible support for new IDEs. Yes, that's
    important to get at least, a list of integrated IDEs.
    6.4 Quality of tools’ out of the box plugin UI
    Subjective. Why not talking about the features available though the

plugin.

  1. Product Update Process
    It's indeed good to know that automated/federated/etc. updates are

possible.

7.1 Frequency of signature update
Interesting, but the reader must be careful not to make much decision

based

on that. If the tool gets a new pack of rules every week or every months,
that does not mean much about the quality...

No but in a large scale deployment, it means something to download
from internet 1500 x 100MB on Monday 9am when all developer arrive at
their desk.

7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats
Are we talking about new weaknesses? The word "threat" is very confusing
here... and does not make sense to me in the context of SAST.

I kind of agree on the usage of threats, because threats are hackers,
insiders, etc. Vulnerability should be used.

  1. Product Maturity and Scalability
    Would be good to know indeed, though... how to get the data?
    8.1 Peak memory usage
    42GB?! That's a very subjective data that depends on many factors

(machine,

configuration, application, etc. etc.)

Sorry that's an important point. Some vendors scan in the cloud, other
in a centralized server, build server, and others on the IDE directly.
Adding from 2 to 4GB of RAM on developer setup is an important
point... I am sure you understand the impact saying that this tool
will use up to 4GB of RAM of the DEV workstation. We need a metric.

8.2 Number of scans done before a crash or serious degradation in
performance
42, but only because it was 71 degree in the room, and the train was

passing

every 2.5 days.

Your example will not engage a new release of the tools. Analyzing the
last 3 years releases fix will help to understand if they fix crashes
of add functionality.

8.3 Maximum lines of code the tool can scan per project
It would be good to talk about scalability of the tool, and how to

improve

it. For examples, can I scan the same application with several machines
(parallelism)? If I add more RAM/CPU, do I get much better results? Is

there

a known limit?
8.4 What languages does the tool support?
This should be covered in a different section.

I don't get it... But yes we can move it but propose something.

  1. Enterprise Offerings
    This is also very interesting for companies. However,
    the enterprise offerings, are usually, central solution host findings,
    review findings, etc. This is not really SAST, but SAST-management. Do we
    want to talk about that? I'm happy to have this in the criteria...

Important point, as I already said I still don't understand why
pricing is not included....

9.1 Ability to integrate with major bug tracking systems
This is mostly a general comment, but instead of a boolean answer. We

should

ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with
JoeBugTracker...

We can reformulate, we the vendors usually understand that we need input.

9.2 Ability to integrate with enterprise software configuration

management

To what regard?

  1. Reporting Capabilities
    10.1 Quality of reports
    Subjective.

Ok but would you buy a tool with skinny or unreadable reports? Quality
is an important factor, and reporting is usually a management issue ->
and they pay for the product.

10.2 Availability of role-based reports
It's indeed important to report different kind of data for the engineer,
dev, QA, managers, etc. Eventually, we're talking about data reporting

here,

and tools should provide several ways to slice and represent the data for
the different audience.
10.3 Availability of report customization
Yup, though, to what extent is the report customizable? Can I just change
the logo, or can I integrate the findings in my word template?

I don't think we need a paragraph for each question, but we should
split this particular question in multiple questions

  1. Tool Customization and Automation
    I feel that we're finally going to touch the interesting part. Every

mature

use of SAST have to make use of automation, and tool customization. This
section is a very important one, and we should emphasize it as much as we
can.
11.1 Can custom rules be added?
Right, that's the first question to ask. Does the tool support finding
support customization? Now, we need many other points, such as ... What

kind

of rules are supported? Can we specific/create a new type of
weakness/findings/category?
11.2 Do the rules need learning new language\script?
Most likely it will be "yes", unless it's only GUI based. My point is

that

even XML rules represent a "language" to describe the rules...

Most of them use XML, some use kind of SQL, and others use GUI. I
agree, we should reformulate the question

11.3 Can the tool be scripted? (e.g. integrated into ANT build script or
other build script)
Build automation is crucial, but to me, is different than automation.

This

item should be in a different section.

Not sure about this one, but we can split the section

11.4 Can documentation be customized (installation instructions,

remediation

advice, finding explanation…etc)
Interesting point. Can we overwrite the remediation given by a tool?

Can we change the language, point to interne security library, etc, good
point

11.5 Can the defect prioritization scheme customized?
Right! Can I integrate the results within my risk management system?
11.6 Can the tool be extended so that custom plugins could be developed

for

other IDEs?
That part should be in the IDE integration.

Maybe, but most vendors can provide API to automate or script, I like
to see it in this section

In summary, I believe that the SATEC needs to be restructured to address

the

actual problems. We should also move away from any subjective criterion.

I

believe that the SATEC should be able to be filled-in by a tool vendor,

or

someone who will evaluate the tool. Eventually, we should provide a
spreadsheet that could be filled.

That's a good point, but understand that we then need 2 lists of
questions because the quality of UI kind of questions cannot be sent.
We used the WASC-SATEC for an RFI the majors vendors, so we created an
excel sheet, and removed the 'quality' items, that will be filled by
the DEV and APPSEC team.

But again, if we create 2 lists, then we should add non technical
stuff as pricing which is part of the decision.

Concerning the overall sections, the order should make sense as well.
Anyhow, I suggest the list to rethink about the current criteria and see
what can be measured properly, and what needs to be captured by any tool
evaluator. The following is just a suggestion (came up with that in too
little time), but I believe it captures the interesting part in a better
order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization,

etc.)

Well, if we do this we restart from scratch and that's not fair for
all of us who spend lot of times on this project.
I think the current categories should remain, only the sub-categories
questions need to be challenged.

From what I see here, all your remarks fits wells in the current
categories, we should do a task force to analyze your very good
comment, and readjust the questions.

Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.
Romain

As I said, it is not too late for the subcategories, I see that your
comments fits well in the current categories. But if we redo all the
categories again, that will be an endless project, as someone else
could again answer late with very good remarks.

So we should now focus to compile the September 12th answers with yours.


wasc-satec mailing list
wasc-satec@lists.webappsec.org


wasc-satec mailing list
wasc-satec@lists.webappsec.org

Thanks for your comments Benoit. I am going to give this another week for comments since a lot of folks are in OWASP AppSec USA. Regards, Sherif On Sat, Sep 17, 2011 at 9:18 PM, Benoit Guerette (OWASP) <gueb@owasp.org>wrote: > I don't agree to redo the categories again, but I agree to take > Romain's comments (very good comments by the way) in addition to our > September 12th answers for the subcategories, that the current step > anyway, a bit late but we are in it. > > No offense, we are all very busy, but deadlines are there to make sure > that we can deliver the project one day, so that we will not redo the > whole project again and again. > > So why don't we focus in subcategories? Here is my answer to Romain's > comment. I didn't answer to what I agree from him. > > > 1. Tool Setup and Installation > > "Setup and Installation" is not really interesting, I believe. The more > > important is the platform support (can I run the tool from my linux box, > my > > mac, our windows server, etc). > > 1.1 Time required to perform initial installation > > That's usually subjective, unless you say something like "always less > than 2 > > hours", "always less than a day", etc. But then again, I find this > totally > > quite irrelevant to the problem. > > Don't agree, some vendors are stand alone on the desktop, some > requires 1 server, other requires multiple server. In a large scale > deployment, this section is relevant. > > > 1.2 Skills required to perform initial installation > > Subjective. > > Agree > > > 1.3 Privileges required to perform initial installation > > I don't find this item very informative. Okay, you need to have root > access, > > or admin access on the machine... or not. > > Agree they all need root/admin > > > 1.4 Documentation setup accuracy > > Subjective. > > Don't agree, vendors will supplies a sample installation guide, and > quality vary a lot > > > 1.5 Platform Support > > This one is interesting for the customers. > > > > 2. Performing a Scan > > Logically, I would not talk about scanning just now. But, after the > platform > > support section, I would talk about language, framework support. > > 2.1 Time required to perform a scan > > > > This does not make any sense. "Time required to scan"... what? This > question > > however, is answerable if we provide a proper test case and environment > to > > run the tool. But then again, it's a quite misleading information. > > Don't agree, you would be very surprised to see the gap between vendors. > > > 2.2 Number of steps required to perform a scan > > > > Many tools have scripting interfaces. Using scripts, you reduce your > steps > > from 7, to 1 (i.e., run the script). How does that count? > > In summary, I find this information not interesting at all. > > Well I agree, vendors varies from 2 to 7 steps in most case. > > > 2.3 Skills required to perform a scan > > I understand that some tools (like PolySpace) require someone to actually > > design and model the suspected behavior of the program. But most tools do > > not require that. Then again, how to rate the user? Do we assume the user > > (who runs the scan) will also look at the findings? Does he also setup > the > > scan? I definitely see the scan being run by security operation (mostly > for > > monitoring), and being setup by security engineers... > > Could be removed, mos vendors will answer 'no skills required'. > > > 3. Tool Coverage: > > > > "Tool Coverage" might be the most misleading term here. Coverage of > what?! > > Coverage of supported weaknesses, languages, version of languages, > > framework, application coverage, entry point coverage, etc.? > > 3.1 Languages supported by the tool > > > > Very important. Now, we should not limit ourselves to the languages, but > we > > should go at the version of framework level. Nowadays, the language is > just > > a mean, most of the juicy stuff happen in the relationship with the > > frameworks... Also, the behavior of the frameworks might > be different from > > one version to one another... > > Good point, we should ask for framework, libraries, etc. > > > 3.2 Support for Semantic Analysis > > 3.3 Support for Syntactic Analysis > > > > I do not understand these items. (Usually, "semantic" is used to say > > something like AST-level type of knowledge). I would be, honestly, > > more interested to know if the tool is properly capable of > inter-procedural > > data flow analysis, or if it has some other limitations. Then again, I > would > > prefer not to talk about the underlying logics (and modeling) of the tool > > since I believe this is out of scope. Users don't really care about that, > > they just want the tool to work perfectly. If you use a dataflow based > > model, abstract interpretation, or whatever one comes up with ... *don't > > care*. > > Don't agree, not all vendors support those points, it is interesting > to know. Don't forget, this is a security tool that we ask DEV tems to > used, so they are interested in that kind of verification > > > 3.4 Ability of the tool to understand different components of a project > > (.sql, .xml, .xsd, .properties…etc) > > > > This is a very interesting item. When generalized a little bit, we can > > derive several items: > > - Analysis support of configuration files (i.e., the tool gets knowledge > > from the configuration files) > > - Analysis support for multiple languages in separated files > > - Cross-languages analysis support (the tool is capable of performing > its > > analysis from one language, let's say Java, to SQL, and back to Java) > > Another item that would be quite interesting, is the support for "new > > extensions", or redefinition of extensions. Let's say the tool does > > recognize ".pl" as perl, but that I have all my stored procedures (in > > PL/SQL) with this extension, I'd like to be able to tell the tool to > > consider the .pl to be PL/SQL for this application. The same reasoning > needs > > to be done for new extensions. > > Ok that's a good one, we should expand. Most of them can scan config > files, but not all of them can scan SQL, so we should split this > question. > > > 3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top 10, > > SANS Top 25…etc) > > Static analysis tools do not find vulnerabilities. They find source code > > weaknesses (there is a huge difference). Now, I do not understand what > > "coverage of industry standard vulnerability categories" mean. > > - Is this category supposed to be about coverage of type of "stuff" (or > > weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we > > should use CWE, and nothing else. > > - Is this category about the the reporting and classification of > findings? > > (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... > that's > > very bad for your PCI compliance!" > > I don't understand your point on this one, but you are usually very > clear so may be we can chat together. I want to know if they look only > against the Top 10, or also SANS 25, which is very specific on > injections (command per example). Also, on a compliance point of view, > there is a big value to clearly show which model is scan (PCI 1.2 is > mapped to Top Ten, etc.). Weakness in code = Vulnerability in binary, > so all the same from my point of view. > > > > > 4. Detection Accuracy > > Usually, that does not mean anything. > > 4.1 Number of false positives > > 4.2 Number of true negatives > > My first comment here was "Gniii?", then I did s/Number/Rate and it made > a > > bit more sense. > > I could understand why someone would want to get a rate of > false-positive, > > and false-negatives, but true-negatives? True negatives, are the things > that > > are not reported by the tool, and it's good from the tool not to report > > them, and examples would be data flow path that uses a proper validation > > routine before sending the data to a sink. You do not want the tool to > > report such, and this is a true-negative. > > By the way, the rate of FP/FN are very interesting for an experiment > point > > of view, but there is no way to get this data to mean anything for Joe > the > > project manager who wants to get a tool. Most likely your data will be > very > > different than his (if you're making the same experiment on your > > applications). Sad reality fix: tools results depend a lot on the > > application. > > 4.3 Accuracy % > > > > Accuracy of what? Compared to what? Non-sense to me, cf. previous point. > We > > cannot measure that in a meaningful way. > > Well, vendors should answer this questions, and it is very interesting > to see there positions. Also, from a lab perspective, scanning against > a sample code and look at the false positive it generate is important. > Imagine that you use ESAPI for filtering but the tool don't know > libraries, it could generate lot of false positive. > > That an important point, imagine a developer scanning is code for the > first time... > > > > > > > 5. Triage and Remediation Process > > Do we want to talk about the quality of the UI provided by the tool to > > facilitate the triage? IMO, the remediation process is out of scope for a > > SAST. > > Why is remediation out of scope? I scan for the first time a huge > portal that we developed, and we have 1000 alerts, triage and > remediation is a huge point. > > > 5.1 Average time to triage a finding > > This seems to me like rating your assessor more than the tool you use. > > What if you need to mark as false positive one after the other the > same problem? That not a code 19 problem ;) > > > 5.2 Quality of data surrounding a finding (explanation, tracing, trust > > level…etc) > > Those are indeed very important information. As an assessor, I want to > know > > why the heck this tool reported this finding to me. Not only I want to > have > > paths, confidence, data flow info, etc. but I want to know the internals. > > Some tools will report the pre-conditions and post conditions that > generated > > the finding. This is extremely useful for advanced use of the tools. I > > undersand that most tools do not report that, so at least reporting the > rule > > ID (or something I can track later on, and make sense of) is important. > > I totally agree on this comment > > > 5.3 Ability to mark findings as false positive > > Mark a finding as FP might have several meaning. Does this mean: > > - Mark the findings as FP for the report? > > - Mark the findings as FP for the engine, so that next time it will > > encounter a similar case, it won't report it? > > 5.4 Ability to “diff” assessments > > Very important indeed. > > 5.5 Ability to merge assessments > > Tracking, merging, combining assessment is definitely part of the > > workflow... > > 5.6 Correctness of remediation advice > > 5.7 Completeness of remediation advice > > I hope no one actually relies on the tool to give proper remediation > advice. > > They're usually fine to give an idea, but no way they will give you a > good > > solution, for your case (even though, in theory they have lots of > > information to do so). > > Are we talking about security people doing the scan, or developers > doing the scan? Unless you have an appsec team on support to all > developers, completeness is very important, the more we guide the DEV, > the more they feel in control, and that the tool is good for them > (don't need to call that appsec guy every time I find a problem) > > > 5.8 Does the tool automatically prioritize defects > > Prioritize what? Is this category supposed to be talking about the > severity > > rating? Is this talking about prioritization at the engine level so that > the > > tool misses lots of stuff (yeah, that's usually what happen when the flow > > gets complex). > > Reporting in my mind, how do you show the defects in which order? > Don't forget, the first scan is usually a nightmare. > > > 6. UI Simplicity and Intuitiveness > > 6.1 Quality of triage interface (need a way to measure this) > > 6.2 Quality of remediation interface (need a way to measure this) > > Subjective. > > 6.3 Support for IDE plug-ins both out of the box and on-demand > > "Integration with IDEs", and possible support for new IDEs. Yes, that's > > important to get at least, a list of integrated IDEs. > > 6.4 Quality of tools’ out of the box plugin UI > > Subjective. Why not talking about the features available though the > plugin. > > > > 7. Product Update Process > > It's indeed good to know that automated/federated/etc. updates are > possible. > > 7.1 Frequency of signature update > > Interesting, but the reader must be careful not to make much decision > based > > on that. If the tool gets a new pack of rules every week or every months, > > that does not mean much about the quality... > > No but in a large scale deployment, it means something to download > from internet 1500 x 100MB on Monday 9am when all developer arrive at > their desk. > > > 7.2 Relevance of signatures to evolving threats > > 7.3 Re-activeness to evolving threats > > Are we talking about new weaknesses? The word "threat" is very confusing > > here... and does not make sense to me in the context of SAST. > > I kind of agree on the usage of threats, because threats are hackers, > insiders, etc. Vulnerability should be used. > > > 8. Product Maturity and Scalability > > Would be good to know indeed, though... how to get the data? > > 8.1 Peak memory usage > > 42GB?! That's a very subjective data that depends on many factors > (machine, > > configuration, application, etc. etc.) > > Sorry that's an important point. Some vendors scan in the cloud, other > in a centralized server, build server, and others on the IDE directly. > Adding from 2 to 4GB of RAM on developer setup is an important > point... I am sure you understand the impact saying that this tool > will use up to 4GB of RAM of the DEV workstation. We need a metric. > > > 8.2 Number of scans done before a crash or serious degradation in > > performance > > 42, but only because it was 71 degree in the room, and the train was > passing > > every 2.5 days. > > Your example will not engage a new release of the tools. Analyzing the > last 3 years releases fix will help to understand if they fix crashes > of add functionality. > > > 8.3 Maximum lines of code the tool can scan per project > > It would be good to talk about scalability of the tool, and how to > improve > > it. For examples, can I scan the same application with several machines > > (parallelism)? If I add more RAM/CPU, do I get much better results? Is > there > > a known limit? > > 8.4 What languages does the tool support? > > This should be covered in a different section. > > I don't get it... But yes we can move it but propose something. > > > 9. Enterprise Offerings > > This is also very interesting for companies. However, > > the enterprise offerings, are usually, central solution host findings, > > review findings, etc. This is not really SAST, but SAST-management. Do we > > want to talk about that? I'm happy to have this in the criteria... > > Important point, as I already said I still don't understand why > pricing is not included.... > > > 9.1 Ability to integrate with major bug tracking systems > > This is mostly a general comment, but instead of a boolean answer. We > should > > ask for the supported bug tracking systems. > > Also, it's important to customize this, and to be able to integrate with > > JoeBugTracker... > > We can reformulate, we the vendors usually understand that we need input. > > > 9.2 Ability to integrate with enterprise software configuration > management > > To what regard? > > > > 10. Reporting Capabilities > > 10.1 Quality of reports > > Subjective. > > Ok but would you buy a tool with skinny or unreadable reports? Quality > is an important factor, and reporting is usually a management issue -> > and they pay for the product. > > > 10.2 Availability of role-based reports > > It's indeed important to report different kind of data for the engineer, > > dev, QA, managers, etc. Eventually, we're talking about data reporting > here, > > and tools should provide several ways to slice and represent the data for > > the different audience. > > 10.3 Availability of report customization > > Yup, though, to what extent is the report customizable? Can I just change > > the logo, or can I integrate the findings in my word template? > > I don't think we need a paragraph for each question, but we should > split this particular question in multiple questions > > > > > 11. Tool Customization and Automation > > I feel that we're finally going to touch the interesting part. Every > mature > > use of SAST have to make use of automation, and tool customization. This > > section is a very important one, and we should emphasize it as much as we > > can. > > 11.1 Can custom rules be added? > > Right, that's the first question to ask. Does the tool support finding > > support customization? Now, we need many other points, such as ... What > kind > > of rules are supported? Can we specific/create a new type of > > weakness/findings/category? > > 11.2 Do the rules need learning new language\script? > > Most likely it will be "yes", unless it's only GUI based. My point is > that > > even XML rules represent a "language" to describe the rules... > > Most of them use XML, some use kind of SQL, and others use GUI. I > agree, we should reformulate the question > > > 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or > > other build script) > > Build automation is crucial, but to me, is different than automation. > This > > item should be in a different section. > > Not sure about this one, but we can split the section > > > 11.4 Can documentation be customized (installation instructions, > remediation > > advice, finding explanation…etc) > > Interesting point. Can we overwrite the remediation given by a tool? > > Can we change the language, point to interne security library, etc, good > point > > > 11.5 Can the defect prioritization scheme customized? > > Right! Can I integrate the results within my risk management system? > > 11.6 Can the tool be extended so that custom plugins could be developed > for > > other IDEs? > > That part should be in the IDE integration. > > Maybe, but most vendors can provide API to automate or script, I like > to see it in this section > > > > > > > In summary, I believe that the SATEC needs to be restructured to address > the > > actual problems. We should also move away from any subjective criterion. > I > > believe that the SATEC should be able to be filled-in by a tool vendor, > or > > someone who will evaluate the tool. Eventually, we should provide a > > spreadsheet that could be filled. > > That's a good point, but understand that we then need 2 lists of > questions because the quality of UI kind of questions cannot be sent. > We used the WASC-SATEC for an RFI the majors vendors, so we created an > excel sheet, and removed the 'quality' items, that will be filled by > the DEV and APPSEC team. > > But again, if we create 2 lists, then we should add non technical > stuff as pricing which is part of the decision. > > > Concerning the overall sections, the order should make sense as well. > > Anyhow, I suggest the list to rethink about the current criteria and see > > what can be measured properly, and what needs to be captured by any tool > > evaluator. The following is just a suggestion (came up with that in too > > little time), but I believe it captures the interesting part in a better > > order: > > 1. Platform support > > 2.1 OS support > > 2.2 Scalability tuning (support for 64bits, etc.) > > 2. Application technology support > > 2.1 Language support (up to the version of language) > > 2.2 Framework support > > 3. Scan, command and control > > 3.1 Scan configuration > > 3.2 Build system integration > > 3.3 IDE integration > > 3.4 Command line support > > 3.5 Automation support > > 3.6 Enterprise offerings (need of a better terminology) > > 4. Application analysis > > 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) > > 4.2 Customization > > 4.3 Triage capabilities > > 4.4 Scan results post-processing > > 5. Reporting > > 5.1 Reports for different audiences > > 5.2 Report customization > > 5.3 Finding-level reporting information > > 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) > > 5.3.2 Finding description (paths, pre-post conditions, etc.) > > 5.3.3 Finding remediation (available, customizable, etc.) > > 6. Miscellanies > > 6.1 Knowledge update (rules update) > > 6.2 Integration in bug trackers (list of supported BT, customization, > etc.) > > Well, if we do this we restart from scratch and that's not fair for > all of us who spend lot of times on this project. > I think the current categories should remain, only the sub-categories > questions need to be challenged. > > From what I see here, all your remarks fits wells in the current > categories, we should do a task force to analyze your very good > comment, and readjust the questions. > > > > > > Btw, I'm sorry to come back with such feedback quite late... but the > > deadlines are too aggressive for me. > > Romain > > As I said, it is not too late for the subcategories, I see that your > comments fits well in the current categories. But if we redo all the > categories again, that will be an endless project, as someone else > could again answer late with very good remarks. > > So we should now focus to compile the September 12th answers with yours. > > > > > > > > > _______________________________________________ > > wasc-satec mailing list > > wasc-satec@lists.webappsec.org > > > http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org > > > > > > > > _______________________________________________ > > wasc-satec mailing list > > wasc-satec@lists.webappsec.org > > > http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org > > > > > > _______________________________________________ > wasc-satec mailing list > wasc-satec@lists.webappsec.org > http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org >
SK
Sherif Koussa
Thu, Sep 29, 2011 6:17 PM

Hi Everyone,

Let's try to get as many comments on the direction and suggestions provided
by Romain and Benoit in this thread before tomorrow. If you have something
to say but won't be able to put it down in writing before tomorrow, please
let me know as well and we can arrange that.

Regards,
Sherif

On Thu, Sep 22, 2011 at 11:36 PM, Sherif Koussa sherif.koussa@gmail.comwrote:

Thanks for your comments Benoit.

I am going to give this another week for comments since a lot of folks are
in OWASP AppSec USA.

Regards,
Sherif

On Sat, Sep 17, 2011 at 9:18 PM, Benoit Guerette (OWASP) gueb@owasp.orgwrote:

I don't agree to redo the categories again, but I agree to take
Romain's comments (very good comments by the way) in addition to our
September 12th answers for the subcategories, that the current step
anyway, a bit late but we are in it.

No offense, we are all very busy, but deadlines are there to make sure
that we can deliver the project one day, so that we will not redo the
whole project again and again.

So why don't we focus in subcategories? Here is my answer to Romain's
comment. I didn't answer to what I agree from him.

  1. Tool Setup and Installation
    "Setup and Installation" is not really interesting, I believe. The more
    important is the platform support (can I run the tool from my linux box,

my

mac, our windows server, etc).
1.1 Time required to perform initial installation
That's usually subjective, unless you say something like "always less

than 2

hours", "always less than a day", etc. But then again, I find this

totally

quite irrelevant to the problem.

Don't agree, some vendors are stand alone on the desktop, some
requires 1 server, other requires multiple server. In a large scale
deployment, this section is relevant.

1.2 Skills required to perform initial installation
Subjective.

Agree

1.3 Privileges required to perform initial installation
I don't find this item very informative. Okay, you need to have root

access,

or admin access on the machine... or not.

Agree they all need root/admin

1.4 Documentation setup accuracy
Subjective.

Don't agree, vendors will supplies a sample installation guide, and
quality vary a lot

1.5 Platform Support
This one is interesting for the customers.

  1. Performing a Scan
    Logically, I would not talk about scanning just now. But, after the

platform

support section, I would talk about language, framework support.
2.1 Time required to perform a scan

This does not make any sense. "Time required to scan"... what? This

question

however, is answerable if we provide a proper test case and environment

to

run the tool. But then again, it's a quite misleading information.

Don't agree, you would be very surprised to see the gap between vendors.

2.2 Number of steps required to perform a scan

Many tools have scripting interfaces. Using scripts, you reduce your

steps

from 7, to 1 (i.e., run the script). How does that count?
In summary, I find this information not interesting at all.

Well I agree, vendors varies from 2 to 7 steps in most case.

2.3 Skills required to perform a scan
I understand that some tools (like PolySpace) require someone to

actually

design and model the suspected behavior of the program. But most tools

do

not require that. Then again, how to rate the user? Do we assume the

user

(who runs the scan) will also look at the findings? Does he also setup

the

scan? I definitely see the scan being run by security operation (mostly

for

monitoring), and being setup by security engineers...

Could be removed, mos vendors will answer 'no skills required'.

  1. Tool Coverage:

"Tool Coverage" might be the most misleading term here. Coverage of

what?!

Coverage of supported weaknesses, languages, version of languages,
framework, application coverage, entry point coverage, etc.?
3.1 Languages supported by the tool

Very important. Now, we should not limit ourselves to the languages, but

we

should go at the version of framework level. Nowadays, the language is

just

a mean, most of the juicy stuff happen in the relationship with the
frameworks... Also, the behavior of the frameworks might

be different from

one version to one another...

Good point, we should ask for framework, libraries, etc.

3.2 Support for Semantic Analysis
3.3 Support for Syntactic Analysis

I do not understand these items. (Usually, "semantic" is used to say
something like AST-level type of knowledge). I would be, honestly,
more interested to know if the tool is properly capable of

inter-procedural

data flow analysis, or if it has some other limitations. Then again, I

would

prefer not to talk about the underlying logics (and modeling) of the

tool

since I believe this is out of scope. Users don't really care about

that,

they just want the tool to work perfectly. If you use a dataflow based
model, abstract interpretation, or whatever one comes up with ... don't
care
.

Don't agree, not all vendors support those points, it is interesting
to know. Don't forget, this is a security tool that we ask DEV tems to
used, so they are interested in that kind of verification

3.4 Ability of the tool to understand different components of a project
(.sql, .xml, .xsd, .properties…etc)

This is a very interesting item. When generalized a little bit, we can
derive several items:

  • Analysis support of configuration files (i.e., the tool gets

knowledge

from the configuration files)

  • Analysis support for multiple languages in separated files
  • Cross-languages analysis support (the tool is capable of performing

its

analysis from one language, let's say Java, to SQL, and back to Java)
Another item that would be quite interesting, is the support for "new
extensions", or redefinition of extensions. Let's say the tool does
recognize ".pl" as perl, but that I have all my stored procedures (in
PL/SQL) with this extension, I'd like to be able to tell the tool to
consider the .pl to be PL/SQL for this application. The same reasoning

needs

to be done for new extensions.

Ok that's a good one, we should expand. Most of them can scan config
files, but not all of them can scan SQL, so we should split this
question.

3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top

10,

SANS Top 25…etc)
Static analysis tools do not find vulnerabilities. They find source code
weaknesses (there is a huge difference). Now, I do not understand what
"coverage of industry standard vulnerability categories" mean.

  • Is this category supposed to be about coverage of type of "stuff" (or
    weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we
    should use CWE, and nothing else.
  • Is this category about the the reporting and classification of

findings?

(such as, "Oh noes, this finding is mapped to OWASP top 10 risks...

that's

very bad for your PCI compliance!"

I don't understand your point on this one, but you are usually very
clear so may be we can chat together. I want to know if they look only
against the Top 10, or also SANS 25, which is very specific on
injections (command per example). Also, on a compliance point of view,
there is a big value to clearly show which model is scan (PCI 1.2 is
mapped to Top Ten, etc.). Weakness in code = Vulnerability in binary,
so all the same from my point of view.

  1. Detection Accuracy
    Usually, that does not mean anything.
    4.1 Number of false positives
    4.2 Number of true negatives
    My first comment here was "Gniii?", then I did s/Number/Rate and it made

a

bit more sense.
I could understand why someone would want to get a rate of

false-positive,

and false-negatives, but true-negatives? True negatives, are the things

that

are not reported by the tool, and it's good from the tool not to report
them, and examples would be data flow path that uses a proper validation
routine before sending the data to a sink. You do not want the tool to
report such, and this is a true-negative.
By the way, the rate of FP/FN are very interesting for an experiment

point

of view, but there is no way to get this data to mean anything for Joe

the

project manager who wants to get a tool. Most likely your data will be

very

different than his (if you're making the same experiment on your
applications). Sad reality fix: tools results depend a lot on the
application.
4.3 Accuracy %

Accuracy of what? Compared to what? Non-sense to me, cf. previous point.

We

cannot measure that in a meaningful way.

Well, vendors should answer this questions, and it is very interesting
to see there positions. Also, from a lab perspective, scanning against
a sample code and look at the false positive it generate is important.
Imagine that you use ESAPI for filtering but the tool don't know
libraries, it could generate lot of false positive.

That an important point, imagine a developer scanning is code for the
first time...

  1. Triage and Remediation Process
    Do we want to talk about the quality of the UI provided by the tool to
    facilitate the triage? IMO, the remediation process is out of scope for

a

SAST.

Why is remediation out of scope? I scan for the first time a huge
portal that we developed, and we have 1000 alerts, triage and
remediation is a huge point.

5.1 Average time to triage a finding
This seems to me like rating your assessor more than the tool you use.

What if you need to mark as false positive one after the other the
same problem? That not a code 19 problem ;)

5.2 Quality of data surrounding a finding (explanation, tracing, trust
level…etc)
Those are indeed very important information. As an assessor, I want to

know

why the heck this tool reported this finding to me. Not only I want to

have

paths, confidence, data flow info, etc. but I want to know the

internals.

Some tools will report the pre-conditions and post conditions that

generated

the finding. This is extremely useful for advanced use of the tools. I
undersand that most tools do not report that, so at least reporting the

rule

ID (or something I can track later on, and make sense of) is important.

I totally agree on this comment

5.3 Ability to mark findings as false positive
Mark a finding as FP might have several meaning. Does this mean:

  • Mark the findings as FP for the report?
  • Mark the findings as FP for the engine, so that next time it will
    encounter a similar case, it won't report it?
    5.4 Ability to “diff” assessments
    Very important indeed.
    5.5 Ability to merge assessments
    Tracking, merging, combining assessment is definitely part of the
    workflow...
    5.6 Correctness of remediation advice
    5.7 Completeness of remediation advice
    I hope no one actually relies on the tool to give proper remediation

advice.

They're usually fine to give an idea, but no way they will give you a

good

solution, for your case (even though, in theory they have lots of
information to do so).

Are we talking about security people doing the scan, or developers
doing the scan? Unless you have an appsec team on support to all
developers, completeness is very important, the more we guide the DEV,
the more they feel in control, and that the tool is good for them
(don't need to call that appsec guy every time I find a problem)

5.8 Does the tool automatically prioritize defects
Prioritize what? Is this category supposed to be talking about the

severity

rating? Is this talking about prioritization at the engine level so that

the

tool misses lots of stuff (yeah, that's usually what happen when the

flow

gets complex).

Reporting in my mind, how do you show the defects in which order?
Don't forget, the first scan is usually a nightmare.

  1. UI Simplicity and Intuitiveness
    6.1 Quality of triage interface (need a way to measure this)
    6.2 Quality of remediation interface (need a way to measure this)
    Subjective.
    6.3 Support for IDE plug-ins both out of the box and on-demand
    "Integration with IDEs", and possible support for new IDEs. Yes, that's
    important to get at least, a list of integrated IDEs.
    6.4 Quality of tools’ out of the box plugin UI
    Subjective. Why not talking about the features available though the

plugin.

  1. Product Update Process
    It's indeed good to know that automated/federated/etc. updates are

possible.

7.1 Frequency of signature update
Interesting, but the reader must be careful not to make much decision

based

on that. If the tool gets a new pack of rules every week or every

months,

that does not mean much about the quality...

No but in a large scale deployment, it means something to download
from internet 1500 x 100MB on Monday 9am when all developer arrive at
their desk.

7.2 Relevance of signatures to evolving threats
7.3 Re-activeness to evolving threats
Are we talking about new weaknesses? The word "threat" is very confusing
here... and does not make sense to me in the context of SAST.

I kind of agree on the usage of threats, because threats are hackers,
insiders, etc. Vulnerability should be used.

  1. Product Maturity and Scalability
    Would be good to know indeed, though... how to get the data?
    8.1 Peak memory usage
    42GB?! That's a very subjective data that depends on many factors

(machine,

configuration, application, etc. etc.)

Sorry that's an important point. Some vendors scan in the cloud, other
in a centralized server, build server, and others on the IDE directly.
Adding from 2 to 4GB of RAM on developer setup is an important
point... I am sure you understand the impact saying that this tool
will use up to 4GB of RAM of the DEV workstation. We need a metric.

8.2 Number of scans done before a crash or serious degradation in
performance
42, but only because it was 71 degree in the room, and the train was

passing

every 2.5 days.

Your example will not engage a new release of the tools. Analyzing the
last 3 years releases fix will help to understand if they fix crashes
of add functionality.

8.3 Maximum lines of code the tool can scan per project
It would be good to talk about scalability of the tool, and how to

improve

it. For examples, can I scan the same application with several machines
(parallelism)? If I add more RAM/CPU, do I get much better results? Is

there

a known limit?
8.4 What languages does the tool support?
This should be covered in a different section.

I don't get it... But yes we can move it but propose something.

  1. Enterprise Offerings
    This is also very interesting for companies. However,
    the enterprise offerings, are usually, central solution host findings,
    review findings, etc. This is not really SAST, but SAST-management. Do

we

want to talk about that? I'm happy to have this in the criteria...

Important point, as I already said I still don't understand why
pricing is not included....

9.1 Ability to integrate with major bug tracking systems
This is mostly a general comment, but instead of a boolean answer. We

should

ask for the supported bug tracking systems.
Also, it's important to customize this, and to be able to integrate with
JoeBugTracker...

We can reformulate, we the vendors usually understand that we need input.

9.2 Ability to integrate with enterprise software configuration

management

To what regard?

  1. Reporting Capabilities
    10.1 Quality of reports
    Subjective.

Ok but would you buy a tool with skinny or unreadable reports? Quality
is an important factor, and reporting is usually a management issue ->
and they pay for the product.

10.2 Availability of role-based reports
It's indeed important to report different kind of data for the engineer,
dev, QA, managers, etc. Eventually, we're talking about data reporting

here,

and tools should provide several ways to slice and represent the data

for

the different audience.
10.3 Availability of report customization
Yup, though, to what extent is the report customizable? Can I just

change

the logo, or can I integrate the findings in my word template?

I don't think we need a paragraph for each question, but we should
split this particular question in multiple questions

  1. Tool Customization and Automation
    I feel that we're finally going to touch the interesting part. Every

mature

use of SAST have to make use of automation, and tool customization. This
section is a very important one, and we should emphasize it as much as

we

can.
11.1 Can custom rules be added?
Right, that's the first question to ask. Does the tool support finding
support customization? Now, we need many other points, such as ... What

kind

of rules are supported? Can we specific/create a new type of
weakness/findings/category?
11.2 Do the rules need learning new language\script?
Most likely it will be "yes", unless it's only GUI based. My point is

that

even XML rules represent a "language" to describe the rules...

Most of them use XML, some use kind of SQL, and others use GUI. I
agree, we should reformulate the question

11.3 Can the tool be scripted? (e.g. integrated into ANT build script or
other build script)
Build automation is crucial, but to me, is different than automation.

This

item should be in a different section.

Not sure about this one, but we can split the section

11.4 Can documentation be customized (installation instructions,

remediation

advice, finding explanation…etc)
Interesting point. Can we overwrite the remediation given by a tool?

Can we change the language, point to interne security library, etc, good
point

11.5 Can the defect prioritization scheme customized?
Right! Can I integrate the results within my risk management system?
11.6 Can the tool be extended so that custom plugins could be developed

for

other IDEs?
That part should be in the IDE integration.

Maybe, but most vendors can provide API to automate or script, I like
to see it in this section

In summary, I believe that the SATEC needs to be restructured to address

the

actual problems. We should also move away from any subjective criterion.

I

believe that the SATEC should be able to be filled-in by a tool vendor,

or

someone who will evaluate the tool. Eventually, we should provide a
spreadsheet that could be filled.

That's a good point, but understand that we then need 2 lists of
questions because the quality of UI kind of questions cannot be sent.
We used the WASC-SATEC for an RFI the majors vendors, so we created an
excel sheet, and removed the 'quality' items, that will be filled by
the DEV and APPSEC team.

But again, if we create 2 lists, then we should add non technical
stuff as pricing which is part of the decision.

Concerning the overall sections, the order should make sense as well.
Anyhow, I suggest the list to rethink about the current criteria and see
what can be measured properly, and what needs to be captured by any tool
evaluator. The following is just a suggestion (came up with that in too
little time), but I believe it captures the interesting part in a better
order:

  1. Platform support
    2.1 OS support
    2.2 Scalability tuning (support for 64bits, etc.)
  2. Application technology support
    2.1 Language support (up to the version of language)
    2.2 Framework support
  3. Scan, command and control
    3.1 Scan configuration
    3.2 Build system integration
    3.3 IDE integration
    3.4 Command line support
    3.5 Automation support
    3.6 Enterprise offerings (need of a better terminology)
  4. Application analysis
    4.1 Testing capabilities (weakness coverage, finding-level data, etc.)
    4.2 Customization
    4.3 Triage capabilities
    4.4 Scan results post-processing
  5. Reporting
    5.1 Reports for different audiences
    5.2 Report customization
    5.3 Finding-level reporting information
    5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.)
    5.3.2 Finding description (paths, pre-post conditions, etc.)
    5.3.3 Finding remediation (available, customizable, etc.)
  6. Miscellanies
    6.1 Knowledge update (rules update)
    6.2 Integration in bug trackers (list of supported BT, customization,

etc.)

Well, if we do this we restart from scratch and that's not fair for
all of us who spend lot of times on this project.
I think the current categories should remain, only the sub-categories
questions need to be challenged.

From what I see here, all your remarks fits wells in the current
categories, we should do a task force to analyze your very good
comment, and readjust the questions.

Btw, I'm sorry to come back with such feedback quite late... but the
deadlines are too aggressive for me.
Romain

As I said, it is not too late for the subcategories, I see that your
comments fits well in the current categories. But if we redo all the
categories again, that will be an endless project, as someone else
could again answer late with very good remarks.

So we should now focus to compile the September 12th answers with yours.


wasc-satec mailing list
wasc-satec@lists.webappsec.org


wasc-satec mailing list
wasc-satec@lists.webappsec.org

Hi Everyone, Let's try to get as many comments on the direction and suggestions provided by Romain and Benoit in this thread before tomorrow. If you have something to say but won't be able to put it down in writing before tomorrow, please let me know as well and we can arrange that. Regards, Sherif On Thu, Sep 22, 2011 at 11:36 PM, Sherif Koussa <sherif.koussa@gmail.com>wrote: > Thanks for your comments Benoit. > > I am going to give this another week for comments since a lot of folks are > in OWASP AppSec USA. > > Regards, > Sherif > > On Sat, Sep 17, 2011 at 9:18 PM, Benoit Guerette (OWASP) <gueb@owasp.org>wrote: > >> I don't agree to redo the categories again, but I agree to take >> Romain's comments (very good comments by the way) in addition to our >> September 12th answers for the subcategories, that the current step >> anyway, a bit late but we are in it. >> >> No offense, we are all very busy, but deadlines are there to make sure >> that we can deliver the project one day, so that we will not redo the >> whole project again and again. >> >> So why don't we focus in subcategories? Here is my answer to Romain's >> comment. I didn't answer to what I agree from him. >> >> > 1. Tool Setup and Installation >> > "Setup and Installation" is not really interesting, I believe. The more >> > important is the platform support (can I run the tool from my linux box, >> my >> > mac, our windows server, etc). >> > 1.1 Time required to perform initial installation >> > That's usually subjective, unless you say something like "always less >> than 2 >> > hours", "always less than a day", etc. But then again, I find this >> totally >> > quite irrelevant to the problem. >> >> Don't agree, some vendors are stand alone on the desktop, some >> requires 1 server, other requires multiple server. In a large scale >> deployment, this section is relevant. >> >> > 1.2 Skills required to perform initial installation >> > Subjective. >> >> Agree >> >> > 1.3 Privileges required to perform initial installation >> > I don't find this item very informative. Okay, you need to have root >> access, >> > or admin access on the machine... or not. >> >> Agree they all need root/admin >> >> > 1.4 Documentation setup accuracy >> > Subjective. >> >> Don't agree, vendors will supplies a sample installation guide, and >> quality vary a lot >> >> > 1.5 Platform Support >> > This one is interesting for the customers. >> > >> > 2. Performing a Scan >> > Logically, I would not talk about scanning just now. But, after the >> platform >> > support section, I would talk about language, framework support. >> > 2.1 Time required to perform a scan >> > >> > This does not make any sense. "Time required to scan"... what? This >> question >> > however, is answerable if we provide a proper test case and environment >> to >> > run the tool. But then again, it's a quite misleading information. >> >> Don't agree, you would be very surprised to see the gap between vendors. >> >> > 2.2 Number of steps required to perform a scan >> > >> > Many tools have scripting interfaces. Using scripts, you reduce your >> steps >> > from 7, to 1 (i.e., run the script). How does that count? >> > In summary, I find this information not interesting at all. >> >> Well I agree, vendors varies from 2 to 7 steps in most case. >> >> > 2.3 Skills required to perform a scan >> > I understand that some tools (like PolySpace) require someone to >> actually >> > design and model the suspected behavior of the program. But most tools >> do >> > not require that. Then again, how to rate the user? Do we assume the >> user >> > (who runs the scan) will also look at the findings? Does he also setup >> the >> > scan? I definitely see the scan being run by security operation (mostly >> for >> > monitoring), and being setup by security engineers... >> >> Could be removed, mos vendors will answer 'no skills required'. >> >> > 3. Tool Coverage: >> > >> > "Tool Coverage" might be the most misleading term here. Coverage of >> what?! >> > Coverage of supported weaknesses, languages, version of languages, >> > framework, application coverage, entry point coverage, etc.? >> > 3.1 Languages supported by the tool >> > >> > Very important. Now, we should not limit ourselves to the languages, but >> we >> > should go at the version of framework level. Nowadays, the language is >> just >> > a mean, most of the juicy stuff happen in the relationship with the >> > frameworks... Also, the behavior of the frameworks might >> be different from >> > one version to one another... >> >> Good point, we should ask for framework, libraries, etc. >> >> > 3.2 Support for Semantic Analysis >> > 3.3 Support for Syntactic Analysis >> > >> > I do not understand these items. (Usually, "semantic" is used to say >> > something like AST-level type of knowledge). I would be, honestly, >> > more interested to know if the tool is properly capable of >> inter-procedural >> > data flow analysis, or if it has some other limitations. Then again, I >> would >> > prefer not to talk about the underlying logics (and modeling) of the >> tool >> > since I believe this is out of scope. Users don't really care about >> that, >> > they just want the tool to work perfectly. If you use a dataflow based >> > model, abstract interpretation, or whatever one comes up with ... *don't >> > care*. >> >> Don't agree, not all vendors support those points, it is interesting >> to know. Don't forget, this is a security tool that we ask DEV tems to >> used, so they are interested in that kind of verification >> >> > 3.4 Ability of the tool to understand different components of a project >> > (.sql, .xml, .xsd, .properties…etc) >> > >> > This is a very interesting item. When generalized a little bit, we can >> > derive several items: >> > - Analysis support of configuration files (i.e., the tool gets >> knowledge >> > from the configuration files) >> > - Analysis support for multiple languages in separated files >> > - Cross-languages analysis support (the tool is capable of performing >> its >> > analysis from one language, let's say Java, to SQL, and back to Java) >> > Another item that would be quite interesting, is the support for "new >> > extensions", or redefinition of extensions. Let's say the tool does >> > recognize ".pl" as perl, but that I have all my stored procedures (in >> > PL/SQL) with this extension, I'd like to be able to tell the tool to >> > consider the .pl to be PL/SQL for this application. The same reasoning >> needs >> > to be done for new extensions. >> >> Ok that's a good one, we should expand. Most of them can scan config >> files, but not all of them can scan SQL, so we should split this >> question. >> >> > 3.5 Coverage of Industry Standard Vulnerability Categories (OWASP Top >> 10, >> > SANS Top 25…etc) >> > Static analysis tools do not find vulnerabilities. They find source code >> > weaknesses (there is a huge difference). Now, I do not understand what >> > "coverage of industry standard vulnerability categories" mean. >> > - Is this category supposed to be about coverage of type of "stuff" (or >> > weaknesses, flaws, bugs, issues, etc.) that the tool can find? If so, we >> > should use CWE, and nothing else. >> > - Is this category about the the reporting and classification of >> findings? >> > (such as, "Oh noes, this finding is mapped to OWASP top 10 risks... >> that's >> > very bad for your PCI compliance!" >> >> I don't understand your point on this one, but you are usually very >> clear so may be we can chat together. I want to know if they look only >> against the Top 10, or also SANS 25, which is very specific on >> injections (command per example). Also, on a compliance point of view, >> there is a big value to clearly show which model is scan (PCI 1.2 is >> mapped to Top Ten, etc.). Weakness in code = Vulnerability in binary, >> so all the same from my point of view. >> >> > >> > 4. Detection Accuracy >> > Usually, that does not mean anything. >> > 4.1 Number of false positives >> > 4.2 Number of true negatives >> > My first comment here was "Gniii?", then I did s/Number/Rate and it made >> a >> > bit more sense. >> > I could understand why someone would want to get a rate of >> false-positive, >> > and false-negatives, but true-negatives? True negatives, are the things >> that >> > are not reported by the tool, and it's good from the tool not to report >> > them, and examples would be data flow path that uses a proper validation >> > routine before sending the data to a sink. You do not want the tool to >> > report such, and this is a true-negative. >> > By the way, the rate of FP/FN are very interesting for an experiment >> point >> > of view, but there is no way to get this data to mean anything for Joe >> the >> > project manager who wants to get a tool. Most likely your data will be >> very >> > different than his (if you're making the same experiment on your >> > applications). Sad reality fix: tools results depend a lot on the >> > application. >> > 4.3 Accuracy % >> > >> > Accuracy of what? Compared to what? Non-sense to me, cf. previous point. >> We >> > cannot measure that in a meaningful way. >> >> Well, vendors should answer this questions, and it is very interesting >> to see there positions. Also, from a lab perspective, scanning against >> a sample code and look at the false positive it generate is important. >> Imagine that you use ESAPI for filtering but the tool don't know >> libraries, it could generate lot of false positive. >> >> That an important point, imagine a developer scanning is code for the >> first time... >> >> > >> > >> > 5. Triage and Remediation Process >> > Do we want to talk about the quality of the UI provided by the tool to >> > facilitate the triage? IMO, the remediation process is out of scope for >> a >> > SAST. >> >> Why is remediation out of scope? I scan for the first time a huge >> portal that we developed, and we have 1000 alerts, triage and >> remediation is a huge point. >> >> > 5.1 Average time to triage a finding >> > This seems to me like rating your assessor more than the tool you use. >> >> What if you need to mark as false positive one after the other the >> same problem? That not a code 19 problem ;) >> >> > 5.2 Quality of data surrounding a finding (explanation, tracing, trust >> > level…etc) >> > Those are indeed very important information. As an assessor, I want to >> know >> > why the heck this tool reported this finding to me. Not only I want to >> have >> > paths, confidence, data flow info, etc. but I want to know the >> internals. >> > Some tools will report the pre-conditions and post conditions that >> generated >> > the finding. This is extremely useful for advanced use of the tools. I >> > undersand that most tools do not report that, so at least reporting the >> rule >> > ID (or something I can track later on, and make sense of) is important. >> >> I totally agree on this comment >> >> > 5.3 Ability to mark findings as false positive >> > Mark a finding as FP might have several meaning. Does this mean: >> > - Mark the findings as FP for the report? >> > - Mark the findings as FP for the engine, so that next time it will >> > encounter a similar case, it won't report it? >> > 5.4 Ability to “diff” assessments >> > Very important indeed. >> > 5.5 Ability to merge assessments >> > Tracking, merging, combining assessment is definitely part of the >> > workflow... >> > 5.6 Correctness of remediation advice >> > 5.7 Completeness of remediation advice >> > I hope no one actually relies on the tool to give proper remediation >> advice. >> > They're usually fine to give an idea, but no way they will give you a >> good >> > solution, for your case (even though, in theory they have lots of >> > information to do so). >> >> Are we talking about security people doing the scan, or developers >> doing the scan? Unless you have an appsec team on support to all >> developers, completeness is very important, the more we guide the DEV, >> the more they feel in control, and that the tool is good for them >> (don't need to call that appsec guy every time I find a problem) >> >> > 5.8 Does the tool automatically prioritize defects >> > Prioritize what? Is this category supposed to be talking about the >> severity >> > rating? Is this talking about prioritization at the engine level so that >> the >> > tool misses lots of stuff (yeah, that's usually what happen when the >> flow >> > gets complex). >> >> Reporting in my mind, how do you show the defects in which order? >> Don't forget, the first scan is usually a nightmare. >> >> > 6. UI Simplicity and Intuitiveness >> > 6.1 Quality of triage interface (need a way to measure this) >> > 6.2 Quality of remediation interface (need a way to measure this) >> > Subjective. >> > 6.3 Support for IDE plug-ins both out of the box and on-demand >> > "Integration with IDEs", and possible support for new IDEs. Yes, that's >> > important to get at least, a list of integrated IDEs. >> > 6.4 Quality of tools’ out of the box plugin UI >> > Subjective. Why not talking about the features available though the >> plugin. >> > >> > 7. Product Update Process >> > It's indeed good to know that automated/federated/etc. updates are >> possible. >> > 7.1 Frequency of signature update >> > Interesting, but the reader must be careful not to make much decision >> based >> > on that. If the tool gets a new pack of rules every week or every >> months, >> > that does not mean much about the quality... >> >> No but in a large scale deployment, it means something to download >> from internet 1500 x 100MB on Monday 9am when all developer arrive at >> their desk. >> >> > 7.2 Relevance of signatures to evolving threats >> > 7.3 Re-activeness to evolving threats >> > Are we talking about new weaknesses? The word "threat" is very confusing >> > here... and does not make sense to me in the context of SAST. >> >> I kind of agree on the usage of threats, because threats are hackers, >> insiders, etc. Vulnerability should be used. >> >> > 8. Product Maturity and Scalability >> > Would be good to know indeed, though... how to get the data? >> > 8.1 Peak memory usage >> > 42GB?! That's a very subjective data that depends on many factors >> (machine, >> > configuration, application, etc. etc.) >> >> Sorry that's an important point. Some vendors scan in the cloud, other >> in a centralized server, build server, and others on the IDE directly. >> Adding from 2 to 4GB of RAM on developer setup is an important >> point... I am sure you understand the impact saying that this tool >> will use up to 4GB of RAM of the DEV workstation. We need a metric. >> >> > 8.2 Number of scans done before a crash or serious degradation in >> > performance >> > 42, but only because it was 71 degree in the room, and the train was >> passing >> > every 2.5 days. >> >> Your example will not engage a new release of the tools. Analyzing the >> last 3 years releases fix will help to understand if they fix crashes >> of add functionality. >> >> > 8.3 Maximum lines of code the tool can scan per project >> > It would be good to talk about scalability of the tool, and how to >> improve >> > it. For examples, can I scan the same application with several machines >> > (parallelism)? If I add more RAM/CPU, do I get much better results? Is >> there >> > a known limit? >> > 8.4 What languages does the tool support? >> > This should be covered in a different section. >> >> I don't get it... But yes we can move it but propose something. >> >> > 9. Enterprise Offerings >> > This is also very interesting for companies. However, >> > the enterprise offerings, are usually, central solution host findings, >> > review findings, etc. This is not really SAST, but SAST-management. Do >> we >> > want to talk about that? I'm happy to have this in the criteria... >> >> Important point, as I already said I still don't understand why >> pricing is not included.... >> >> > 9.1 Ability to integrate with major bug tracking systems >> > This is mostly a general comment, but instead of a boolean answer. We >> should >> > ask for the supported bug tracking systems. >> > Also, it's important to customize this, and to be able to integrate with >> > JoeBugTracker... >> >> We can reformulate, we the vendors usually understand that we need input. >> >> > 9.2 Ability to integrate with enterprise software configuration >> management >> > To what regard? >> > >> > 10. Reporting Capabilities >> > 10.1 Quality of reports >> > Subjective. >> >> Ok but would you buy a tool with skinny or unreadable reports? Quality >> is an important factor, and reporting is usually a management issue -> >> and they pay for the product. >> >> > 10.2 Availability of role-based reports >> > It's indeed important to report different kind of data for the engineer, >> > dev, QA, managers, etc. Eventually, we're talking about data reporting >> here, >> > and tools should provide several ways to slice and represent the data >> for >> > the different audience. >> > 10.3 Availability of report customization >> > Yup, though, to what extent is the report customizable? Can I just >> change >> > the logo, or can I integrate the findings in my word template? >> >> I don't think we need a paragraph for each question, but we should >> split this particular question in multiple questions >> >> > >> > 11. Tool Customization and Automation >> > I feel that we're finally going to touch the interesting part. Every >> mature >> > use of SAST have to make use of automation, and tool customization. This >> > section is a very important one, and we should emphasize it as much as >> we >> > can. >> > 11.1 Can custom rules be added? >> > Right, that's the first question to ask. Does the tool support finding >> > support customization? Now, we need many other points, such as ... What >> kind >> > of rules are supported? Can we specific/create a new type of >> > weakness/findings/category? >> > 11.2 Do the rules need learning new language\script? >> > Most likely it will be "yes", unless it's only GUI based. My point is >> that >> > even XML rules represent a "language" to describe the rules... >> >> Most of them use XML, some use kind of SQL, and others use GUI. I >> agree, we should reformulate the question >> >> > 11.3 Can the tool be scripted? (e.g. integrated into ANT build script or >> > other build script) >> > Build automation is crucial, but to me, is different than automation. >> This >> > item should be in a different section. >> >> Not sure about this one, but we can split the section >> >> > 11.4 Can documentation be customized (installation instructions, >> remediation >> > advice, finding explanation…etc) >> > Interesting point. Can we overwrite the remediation given by a tool? >> >> Can we change the language, point to interne security library, etc, good >> point >> >> > 11.5 Can the defect prioritization scheme customized? >> > Right! Can I integrate the results within my risk management system? >> > 11.6 Can the tool be extended so that custom plugins could be developed >> for >> > other IDEs? >> > That part should be in the IDE integration. >> >> Maybe, but most vendors can provide API to automate or script, I like >> to see it in this section >> >> > >> > >> > In summary, I believe that the SATEC needs to be restructured to address >> the >> > actual problems. We should also move away from any subjective criterion. >> I >> > believe that the SATEC should be able to be filled-in by a tool vendor, >> or >> > someone who will evaluate the tool. Eventually, we should provide a >> > spreadsheet that could be filled. >> >> That's a good point, but understand that we then need 2 lists of >> questions because the quality of UI kind of questions cannot be sent. >> We used the WASC-SATEC for an RFI the majors vendors, so we created an >> excel sheet, and removed the 'quality' items, that will be filled by >> the DEV and APPSEC team. >> >> But again, if we create 2 lists, then we should add non technical >> stuff as pricing which is part of the decision. >> >> > Concerning the overall sections, the order should make sense as well. >> > Anyhow, I suggest the list to rethink about the current criteria and see >> > what can be measured properly, and what needs to be captured by any tool >> > evaluator. The following is just a suggestion (came up with that in too >> > little time), but I believe it captures the interesting part in a better >> > order: >> > 1. Platform support >> > 2.1 OS support >> > 2.2 Scalability tuning (support for 64bits, etc.) >> > 2. Application technology support >> > 2.1 Language support (up to the version of language) >> > 2.2 Framework support >> > 3. Scan, command and control >> > 3.1 Scan configuration >> > 3.2 Build system integration >> > 3.3 IDE integration >> > 3.4 Command line support >> > 3.5 Automation support >> > 3.6 Enterprise offerings (need of a better terminology) >> > 4. Application analysis >> > 4.1 Testing capabilities (weakness coverage, finding-level data, etc.) >> > 4.2 Customization >> > 4.3 Triage capabilities >> > 4.4 Scan results post-processing >> > 5. Reporting >> > 5.1 Reports for different audiences >> > 5.2 Report customization >> > 5.3 Finding-level reporting information >> > 5.3.1 Classification/Taxonomy mapping (i.e., CWE, OWASP, WASC, etc.) >> > 5.3.2 Finding description (paths, pre-post conditions, etc.) >> > 5.3.3 Finding remediation (available, customizable, etc.) >> > 6. Miscellanies >> > 6.1 Knowledge update (rules update) >> > 6.2 Integration in bug trackers (list of supported BT, customization, >> etc.) >> >> Well, if we do this we restart from scratch and that's not fair for >> all of us who spend lot of times on this project. >> I think the current categories should remain, only the sub-categories >> questions need to be challenged. >> >> From what I see here, all your remarks fits wells in the current >> categories, we should do a task force to analyze your very good >> comment, and readjust the questions. >> >> >> > >> > Btw, I'm sorry to come back with such feedback quite late... but the >> > deadlines are too aggressive for me. >> > Romain >> >> As I said, it is not too late for the subcategories, I see that your >> comments fits well in the current categories. But if we redo all the >> categories again, that will be an endless project, as someone else >> could again answer late with very good remarks. >> >> So we should now focus to compile the September 12th answers with yours. >> >> > >> > >> > >> > _______________________________________________ >> > wasc-satec mailing list >> > wasc-satec@lists.webappsec.org >> > >> http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org >> > >> > >> > >> > _______________________________________________ >> > wasc-satec mailing list >> > wasc-satec@lists.webappsec.org >> > >> http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org >> > >> > >> >> _______________________________________________ >> wasc-satec mailing list >> wasc-satec@lists.webappsec.org >> http://lists.webappsec.org/mailman/listinfo/wasc-satec_lists.webappsec.org >> > >