TODO

- Add an ORCONN_BW event to Tor to emit read/write info and also queue sizes
  - See tordiffs/orconn-bw.diff but it probably should be a separate event,
    not hacked onto ORCONN
  - Use nodemon.py to rank nodes based on total bytes, queue sizes, and the 
    ratio of these two
    - Does it agree with results from metatroller's bandwidth stats?

- More NodeRestrictions/PathRestrictions in TorCtl/PathSupport.py
  - BwWeightedGenerator
  - NodeRestrictions:
    - Uptime/LongLivedPorts (Does/should hibernation count?)
    - Published/Updated
    - GeoIP (http://www.maxmind.com/app/python)
      - NodeCountry
  - PathRestrictions:
    - Family
    - GeoIP (http://www.maxmind.com/app/python)
      - OceanPhobicRestrictor (avoids Pacific Ocean or two atlantic crossings)
        or ContinentRestrictor (avoids doing more than N continent crossings)
      - OceanPhilicRestrictor or ContinentJumperRestrictor
        - Can be used as counterpoint to see how bad for performance it is
      - EchelonPhobicRestrictor
        - Does not cross international boundaries for client->Entry or
          Exit->destination hops
  - Perform statistical analysis on paths
    - How often does Tor choose foolish paths normally? 
      - (4 atlantic/pacific crossings)
      - Use speedracer to determine how much slower these paths relly are
    - What is the distribution for Pr(ClientLocation|MiddleNode,ExitNode)
      and Pr(EntryNode|MiddleNode,ExitNode) for these various path choices?
      - Mathematical analysis probably required because this is a large joint
        distribution (not GSoC)
      - Empirical observation possible if you limit to the top 10% of the
        nodes (which carry something like 90% of bandwidth anyways). 
        - Make few million paths without actually building real 
          circuits and tally them up in a 3D table
        - See PathSupport.py unit tests for some examples on this
  - See also:
    http://swiki.cc.gatech.edu:8080/ugResearch/uploads/7/ImprovingTor.pdf
    - You can also perform predecessor observation of this strategy
      empirically. But it is likely the GeoIP stuff is easier to implement 
      and just as effective.

- Create a PathWatcher that StatsHandler can extend from so people can gather
  stats from regular Tor usage

- Use GeoIP to make a map of tor servers color coded by their reliability
  - Or augment an existing Tor map project with this data

- Add circuit prebuilding and port history learning for keeping an optimal
  pool of circuits available for use
  - Build circuits in parallel to speed up scanning

- Rewrite soat.pl in python
   - Improve SSL cert handling/verification. openssl client is broken.
     - The way we store certs is lame. No need to store so many copies
       for diff IPs if they are all the same.
     - Also verify STARTTLS is not molested on smtp, pop and imap ports
       - Means need to make sure openssl lib supports STARTTLS
   - Report failing nodes via SETCONF AuthDirBadExit 
     to potentially alternate control port than used by metatroller 
   - dynamic content scanning
     - tag structure fingerprinting
     - Optionally use same origin policy for dynamic content checks
       - Anything in same origin should not change?
     - filter out dynamic tags with multiple fetches outside of Tor?
       - Or just target specific tags and verify their content
         doesn't change
         - css, script, and object tags and tags that can contain script 
           (there are a LOT of these, but we'd only need to check
            their attributes)
     - Perhaps "double check" to see if a document has changed
       outside of tor after a failure through tor
     - GeoIP-based exit node grouping to reduce geo-location false positives?
   - make sure all http headers match a real browser
   - DNS rebind attack scan
     - http://christ1an.blogspot.com/2007/07/dns-pinning-explained.html
     - Basically we want to make sure that no exit nodes resolve arbitrary
       domains to internal IP addresses
       - http://www.faqs.org/rfcs/rfc1918.html
     - This could be done with periodic calls to 
       "getinfo address-mappings/cache" during scanning, or by 
       changing metatroller to inspect STREAM NEWRESOLVE/REMAP events
   - Improve checking of changes to documents outside of Tor
   - Make a multilingual keyword list of commonly censored terms to google for
     using this scanner
   - Check Exit policy for sketchyness. Mark BadExit if they allow:
     - pop but not pops
     - imap not but imaps
     - telnet but not ssh
     - smtp but not smtps
     - http but not https
     - This also means we have to verify encrypted ports actually work and
       all exits will honor connections through them (in addition to 
       checkign certs)
   - Support multiple scanners in metatroller
     - Improve interaction between soat+metatroller so soat knows
       which exit was responsible for a given ip/url
   - SYN+Reverse DNS resolve scan
     - This can detect exit sniffers that reverse resolve IPs. However,
       it is high-effort (requires someone to run reverse DNS for us), 
       and requires keeping their IP range secret.
 
- Design Reputation System
  - Emit some kind of penalty multiplier based on circuit/stream failure rate
    and the ratio of directory "observed" bandwidth vs avg stream bandwidth
	- Add keyword to directory for clients to use instead of observed
	  bandwidth for routing decisions
	  - Make sure scanners don't listen to this keyword to avoid "Creeping
	    Death" 
	- Queue lengths from the node monitor can also figure into this penalty
	  multiplier
  - Figure out interface to report this and also BadExit determinations
	- Probably involves voting among many scanners
  - Justify this is worthwhile, sane, and at least as resistant as the current
    Tor network to attack
	- Does a reputation system make it easier for an adversary w/ X% of the
	  network to influence it?
	  - Preliminary: http://archives.seul.org/or/dev/Nov-2006/msg00004.html
      - Sybil attacks
	- What about clients that ignore the reputations? Can their behavior game
	  the system, or are they just behaving suboptimally?
      - First impressions: meh; suboptimal
	- Does changings in ratings leak any information about clients? 
	  - Does it influence their paths in predictable ways in a greater degree
	    than bandwidth ranking already does?
    - What about detecting the scan and giving better service? Time of day, 
      source IP, exit IP?

- Stopgap for bootstrapping
  - push traffic through the 0.1.1.x with 0 dirport and earlier servers 
    that claim less than 20KB traffic