[WEB SECURITY] program to crawl website looking for string patterns

Adam Muntner unix23 at gmail.com
Fri Sep 16 13:23:52 EDT 2011


FuzzDB has a set of regex patterns

http://code.google.com/p/fuzzdb/source/browse/trunk/regex/pii.readme.txt
http://code.google.com/p/fuzzdb/source/browse/trunk/regex/pii.fuzz.txt

for the docs and set of test cases, respectively

Depending on your purpose and the value of depth and completeness to
your test, as well as the structure of the sites to be evaluated
themselves, you may not want to entirely rely on an automated crawler.
Many websites have entire sections that automated web crawlers can
never reach. The crawler may also not handle authenticated session
state well, without some effort. Solution: If depth is critical, make
sure to do a manual crawl, as well. You can load the PII regex
patterns into something like Burp, in order to monitor the traffic
passively for them. Then just have your automated spider tools also
run through burp. You may want to have Burp log the full request and
response to a file, for later, deeper analysis.

One thing you want to do post-crawl analysis for is that there may be
PII in the results, encoded in a way which your regex would miss.
Examples: SSN in URL encoding, base-64 encoded, unicode encoded, etc.

Sounds like an interesting project. Good luck!
Adam

On Fri, Sep 16, 2011 at 9:55 AM, Youngquist, Jason R.
<jryoungquist at ccis.edu> wrote:
> We are looking for a tool that can be configured to crawl for string patterns (ie. SSNs, credit card numbers, etc).  Cornell's Spider 2008 beta has this capability, but every time we used it, it crashed on us.
>
> We also found a program called webshag, but it would only look for pre-defined stuff like email addresses or external links.
>
> Did some googling, but haven't really found anything.  Thoughts?
>
>
>
> Thanks.
> Jason Youngquist, CISSP
> Information Technology Security Engineer
> Technology Services
> Columbia College
> 1001 Rogers Street, Columbia, MO  65216
> (573) 875-7334
> jryoungquist at ccis.edu
> http://www.ccis.edu
>
>
> _______________________________________________
> The Web Security Mailing List
>
> WebSecurity RSS Feed
> http://www.webappsec.org/rss/websecurity.rss
>
> Join WASC on LinkedIn http://www.linkedin.com/e/gis/83336/4B20E4374DBA
>
> WASC on Twitter
> http://twitter.com/wascupdates
>
> websecurity at lists.webappsec.org
> http://lists.webappsec.org/mailman/listinfo/websecurity_lists.webappsec.org
>




More information about the websecurity mailing list