1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
|
* Goals
- auto configuration
- internet on wan port automatically detected and shared
- cloud without internet access announces the sad fact
- no persistent writes during normal use (e.g. avoid uci commit for
things like internet up/down)
- splash status of users is distributed
- tbc...
* Networks
- 10.17.?.0/? - semi-public IPv4
- each GW and each client gets an address in this range
- routes to IPv4 internet
- GW addresses a managed by P2P table
- site-wide global IPv6
- every node & client
- public in IPv6 internet
- automatic address for mesh nodes, DHCPv6 for clients
- mesh-wide link-local IPv6
- used for UDP broadcasts and default for all other node2node
communication
- Robinson networks (see below)
* State machines (FSMs)
State machines are implemented using the /sbin/fsm script (see
below).
** inetable
Controls the different network states that result of the local
availability of internet connection and the state of the cloud.
#+begin_dot FSM_Update.png -Tpng
digraph dsd {
Boot -> {Queen; Drone; Robinson};
Queen -> {Ghost; Robinson};
Ghost -> {Queen; Drone; Robinson};
Drone -> {Queen; Robinson};
Robinson -> {Queen; Drone};
}
#+end_dot
*** Boot
The node has recently started and is still looking for its mommy.
- gw_mode=0
*** Queen
The node as a working direct internet connection.
- gw_mode=1, bandwidth >> 0
- DHCP range: derived from router-id
*** TODO Ghost
The node was a queen recently (within the last 3600s), but now its
direct internet access does not work anymore. There still is a
working connection in the cloud.
- gw_mode=0
- all traffic redirected to another GW (determine how?)
- no new DHCP leases (handled via BATMAN by GW nodes)
*** Drone
The node has no direct internet connection but is in a cloud with
working internet connection.
- gw_mode=0
- no DHCP (handled via BATMAN by GW nodes)
*** TODO Robinson
The node is in a cloud without working internet connection.
- gw_mode=0
- random IP in 21.x.x.1
- DHCP range: 21.x.x.2 - 21.x.x.254
- fake DNS, resolving all A queries to a the Robinson net; host
part of the addr is derived from hash of name to resolve
- all internet traffic is redirected to a local httpd, yelling the
network status and explaining FFJ
** update
Implements all-or-nothing update of nodes (e.g. if the network
protocol changes incompatibly). Synchronized via p2ptable
firmware-versions with the fields
- machine_id
- current firmware (some human readable version string)
- SHA256 of target firmware; empty if no update shall be performed
- time target: set by admin to time when update shall happen
- acknowledge time: set by device to time target once ready for an
upgrade
The security model relies on the requirement to store the update in
a secure location on the node. This is intended to happen via ssh.
#+begin_dot FSM_Update.png -Tpng
digraph {
Idle -> Ready;
Ready ->{Idle; Scheduled}
Scheduled ->{Idle; Scheduled; Applying}
}
#+end_dot
*** Idle
Current firmware is installed and no update is required/possible.
*** Ready
Target firmware is stored in /tmp/firmware-update and verified.
*** Scheduled
Node has received target time and copied the value to
acknowledge time. And this time point has not passed, yet.
*** Applying
For all nodes of the firmware-versions table one of the following
conditions hold:
1. target firmware, update time target and acknowledge update time
are empty
2. time target == acknowledge time; And target
firmware points to a new version that is locally stored and
verified
Once this state is reached the update is performed.
* Components
** Firmware ID
/etc/firmware stores a string identifying the current firmware. It
consists of
1. the date of the git commit of the FFJ config
2. a hash of the git commit of the FFJ config
3. the OpenWRT major version
4. the OpenWRT revision
Example:
2011-12-06_a4fa439-modified_backfire-29460
** Router IDs
- unique ID :: all routers use /proc/sys/kernel/random/boot_id as
unique ID
- node ID :: /etc/nodeid is used as unique identifier across
reboots and firmware upgrades; it is initialized with
the unique ID of the first boot
- gateway ID :: 0..254, given only to Queens and Ghosts, managed
via p2ptbl "gwid"
** Connectivity tests
- /sbin/test_connectivity <internet|vpn>
- ping some test hosts over a specified interface; if at least one
responds, we are online
- returns connectivity status
- TODO: ping multiple hosts in parallel
** Finite state machines
FSMs are implemented using
- /sbin/fsm :: a script to monitor and change the state:
- fsm watch <name> :: check whether a state change shall occur
- fsm change <name> <new-state> :: force a state transition
- /etc/fsm/<name>/initial_state :: the state set on startup
- /etc/fsm/<name>/watch/<state> :: watch scripts that print the
next state; If that file does not exist
/etc/fsm/<name>/watch/default is tried. The script may assume that:
- the state they denote is the current state reached via
non-failing transition functions
- the CWD is /etc/fsm/<name>/watch
- cmd line param $1 is set to the current state
- /etc/fsm/<name>/trans/<transition> :: scripts implementing the
transition between states, probed in the following order:
1. If a transition name <oldstate>-<newstate>.trans exists it
is executed
2. Otherwise first <oldstate>.leave and then <newstate>.enter
are executed if they exist.
3. If one of them does not exist default.enter and
default.leave is tried.
4. If none exists, the state transition happens, but has no
effect.
The script may assume that:
- the CWD is /etc/fsm/<name>/trans
- cmd line param $1 is set to the old state and $2 is set to
the new state
- it is called exactly once for a state change
- /var/fsm/<name> :: a tmpfs-based storage of the current state
TODO:
- proper handling of errors occurring in one of the many scripts
(e.g. changing to an error-state or rebooting the device).
- handle invalid states
** HBBP: Home-Based Broadcast Protocol
- UDP `broadcast` and `listener`
- transmit a zero-terminated key and an optional arbitrary-binary
payload: key is comparable to an HTTP URI, the payload to HTTP
POST data
- IPv6-only
- restricted to a single network using link-local broadcast and
listening on only interface
*** Usage
*** Wire format
One of:
- <key>
- <key> \0 <payload>
encapsulated in IPv6 UDP. <key> must not contain \0.
** P2P tables
P2P tables are a lightweight distributed key-value store with
built-in collision arbitration. Eventual consistency is maintained
using a HBBP-based gossip protocol.
*** Usage
- p2ptbl init <table> :: create a new table named <table>
- p2ptbl update <table> <key> <value> [iface] :: set the value of
<key> to <value> in <table> no matter if <key> existed before
or not; If given, broadcast the update over [iface]
- p2ptbl get <table> <key> :: get the value of <key> in <table> or
zero output if <key> does not exist in <table>
- p2ptbl gossip <table> <size> <iface> :: broadcast <table> over
<iface>; Send at most <size> bytes compressed table data: if
the table is larger, a random subset is sent
All tables are stored in /tmp/p2ptbl/table. The above tools
require the full path to the table.
To be synchronized via gossip protocol, a table must be enabled
for receiving updates by symlinking /hbbp/p2ptbl/<table> to
/sbin/p2ptbl-recv.
*** P2P table format
- tab separated
- fields
- key :: per-table unique token
- version :: integer
- value(s) :: anything, tab-separated
- on merge of two tables, for each key the variant with the
largest version number wins
- on update, the version number is incremented by some
sufficiently large random amount (to avoid collisions)
... e.g. 2^32
*** Gossip protocol
HBBP with key "p2ptbl/<table-name>" and gzip-compressed shuffled
random subsets of a table as payload.
** Preferred gateway
- each node has a preferred gateway, which is used to access the
internets if no local connection is available
- how to determine? ... extract from batman?
** Robinson net
- captured .mil-network (/16)
- when no internet is available, fake DNS responses resolve to a
stable address in this range (via hash of name)
- once internet becomes available and the names known, a
redirection is set up via iptables
- after a certain time, the redirection is forgotten
** Multiple web servers
Two uhttpd services with www root /www/<servicename> for the
following purposes:
- service :: self-service / debugging / status.xml
- listening on port 80 of link-local IPv6 pf br-mesh and br-lan;
public IPv6 of br-mesh; and gateway IPv4 (if existing)
- redirection :: use for splash/robinson redirection
- redirects all traffic to the URL given by
/tmp/redirection_target
- listening on port 81 gw/robinson IPv4
* Thoughts, Fragments, Questions
- VPN node takes part in batman mesh?
- no (memory intensive) NAT on mesh nodes
- roaming without sticking to the old gateway
- continuous bandwidth tests for internet uplinks to update
advertised batman gw capabilities?
- occasional flooding to/from VPN node (with idle QoS class)
- IPv6: use multiple routers for roaming w/o breaking existing
connections?
- how to support uplinks that do not use the WAN port (e.g. 3G
modems)?
|