* Goals - auto configuration - internet on wan port automatically detected and shared - cloud without internet access announces the sad fact - no persistent writes during normal use (e.g. avoid uci commit for things like internet up/down) - splash status of users is distributed - tbc... * Networks - 10.17.?.0/? - semi-public IPv4 - each GW and each client gets an address in this range - routes to IPv4 internet - GW addresses a managed by P2P table - site-wide global IPv6 - every node & client - public in IPv6 internet - automatic address for mesh nodes, DHCPv6 for clients - mesh-wide link-local IPv6 - used for UDP broadcasts and default for all other node2node communication - Robinson networks (see below) * State machines (FSMs) State machines are implemented using the /sbin/fsm script (see below). ** inetable Controls the different network states that result of the local availability of internet connection and the state of the cloud. #+begin_dot FSM_Update.png -Tpng digraph dsd { Boot -> {Queen; Drone; Robinson}; Queen -> {Ghost; Robinson}; Ghost -> {Queen; Drone; Robinson}; Drone -> {Queen; Robinson}; Robinson -> {Queen; Drone}; } #+end_dot *** Boot The node has recently started and is still looking for its mommy. - gw_mode=0 *** Queen The node as a working direct internet connection. - gw_mode=1, bandwidth >> 0 - DHCP range: derived from router-id *** TODO Ghost The node was a queen recently (within the last 3600s), but now its direct internet access does not work anymore. There still is a working connection in the cloud. - gw_mode=0 - all traffic redirected to another GW (determine how?) - no new DHCP leases (handled via BATMAN by GW nodes) *** Drone The node has no direct internet connection but is in a cloud with working internet connection. - gw_mode=0 - no DHCP (handled via BATMAN by GW nodes) *** TODO Robinson The node is in a cloud without working internet connection. - gw_mode=0 - random IP in 21.x.x.1 - DHCP range: 21.x.x.2 - 21.x.x.254 - fake DNS, resolving all A queries to a the Robinson net; host part of the addr is derived from hash of name to resolve - all internet traffic is redirected to a local httpd, yelling the network status and explaining FFJ ** update Implements all-or-nothing update of nodes (e.g. if the network protocol changes incompatibly). Synchronized via p2ptable firmware-versions with the fields - machine_id - current firmware (some human readable version string) - SHA256 of target firmware; empty if no update shall be performed - time target: set by admin to time when update shall happen - acknowledge time: set by device to time target once ready for an upgrade The security model relies on the requirement to store the update in a secure location on the node. This is intended to happen via ssh. #+begin_dot FSM_Update.png -Tpng digraph { Idle -> Ready; Ready ->{Idle; Scheduled} Scheduled ->{Idle; Scheduled; Applying} } #+end_dot *** Idle Current firmware is installed and no update is required/possible. *** Ready Target firmware is stored in /tmp/firmware-update and verified. *** Scheduled Node has received target time and copied the value to acknowledge time. And this time point has not passed, yet. *** Applying For all nodes of the firmware-versions table one of the following conditions hold: 1. target firmware, update time target and acknowledge update time are empty 2. time target == acknowledge time; And target firmware points to a new version that is locally stored and verified Once this state is reached the update is performed. * Components ** Firmware ID /etc/firmware stores a string identifying the current firmware. It consists of 1. the date of the git commit of the FFJ config 2. a hash of the git commit of the FFJ config 3. the OpenWRT major version 4. the OpenWRT revision Example: 2011-12-06_a4fa439-modified_backfire-29460 ** Router IDs - unique ID :: all routers use /proc/sys/kernel/random/boot_id as unique ID - node ID :: /etc/nodeid is used as unique identifier across reboots and firmware upgrades; it is initialized with the unique ID of the first boot - gateway ID :: 0..254, given only to Queens and Ghosts, managed via p2ptbl "gwid" ** Connectivity tests - /sbin/test_connectivity - ping some test hosts over a specified interface; if at least one responds, we are online - returns connectivity status - TODO: ping multiple hosts in parallel ** Finite state machines FSMs are implemented using - /sbin/fsm :: a script to monitor and change the state: - fsm watch :: check whether a state change shall occur - fsm change :: force a state transition - /etc/fsm//initial_state :: the state set on startup - /etc/fsm//watch/ :: watch scripts that print the next state; If that file does not exist /etc/fsm//watch/default is tried. The script may assume that: - the state they denote is the current state reached via non-failing transition functions - the CWD is /etc/fsm//watch - cmd line param $1 is set to the current state - /etc/fsm//trans/ :: scripts implementing the transition between states, probed in the following order: 1. If a transition name -.trans exists it is executed 2. Otherwise first .leave and then .enter are executed if they exist. 3. If one of them does not exist default.enter and default.leave is tried. 4. If none exists, the state transition happens, but has no effect. The script may assume that: - the CWD is /etc/fsm//trans - cmd line param $1 is set to the old state and $2 is set to the new state - it is called exactly once for a state change - /var/fsm/ :: a tmpfs-based storage of the current state TODO: - proper handling of errors occurring in one of the many scripts (e.g. changing to an error-state or rebooting the device). - handle invalid states ** HBBP: Home-Based Broadcast Protocol - UDP `broadcast` and `listener` - transmit a zero-terminated key and an optional arbitrary-binary payload: key is comparable to an HTTP URI, the payload to HTTP POST data - IPv6-only - restricted to a single network using link-local broadcast and listening on only interface *** Usage *** Wire format One of: - - \0 encapsulated in IPv6 UDP. must not contain \0. ** P2P tables P2P tables are a lightweight distributed key-value store with built-in collision arbitration. Eventual consistency is maintained using a HBBP-based gossip protocol. *** Usage - p2ptbl init :: create a new table named
- p2ptbl update
[iface] :: set the value of to in
no matter if existed before or not; If given, broadcast the update over [iface] - p2ptbl get
:: get the value of in
or zero output if does not exist in
- p2ptbl gossip
:: broadcast
over ; Send at most bytes compressed table data: if the table is larger, a random subset is sent All tables are stored in /tmp/p2ptbl/table. The above tools require the full path to the table. To be synchronized via gossip protocol, a table must be enabled for receiving updates by symlinking /hbbp/p2ptbl/
to /sbin/p2ptbl-recv. *** P2P table format - tab separated - fields - key :: per-table unique token - version :: integer - value(s) :: anything, tab-separated - on merge of two tables, for each key the variant with the largest version number wins - on update, the version number is incremented by some sufficiently large random amount (to avoid collisions) ... e.g. 2^32 *** Gossip protocol HBBP with key "p2ptbl/" and gzip-compressed shuffled random subsets of a table as payload. ** Preferred gateway - each node has a preferred gateway, which is used to access the internets if no local connection is available - how to determine? ... extract from batman? ** Robinson net - captured .mil-network (/16) - when no internet is available, fake DNS responses resolve to a stable address in this range (via hash of name) - once internet becomes available and the names known, a redirection is set up via iptables - after a certain time, the redirection is forgotten ** Multiple web servers Two uhttpd services with www root /www/ for the following purposes: - service :: self-service / debugging / status.xml - listening on port 80 of link-local IPv6 pf br-mesh and br-lan; public IPv6 of br-mesh; and gateway IPv4 (if existing) - redirection :: use for splash/robinson redirection - redirects all traffic to the URL given by /tmp/redirection_target - listening on port 81 gw/robinson IPv4 * Thoughts, Fragments, Questions - VPN node takes part in batman mesh? - no (memory intensive) NAT on mesh nodes - roaming without sticking to the old gateway - continuous bandwidth tests for internet uplinks to update advertised batman gw capabilities? - occasional flooding to/from VPN node (with idle QoS class) - IPv6: use multiple routers for roaming w/o breaking existing connections? - how to support uplinks that do not use the WAN port (e.g. 3G modems)?