OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
PF load balance problem

From: Diego Linke (gamkgamk.com.br)
Date: Thu Jun 01 2006 - 06:41:20 CDT


Hello Everybody.

I sent this question for list miscopenbsd.org, as I did not get reply I
am sending for this list.

I have a small, yet relevant question regarding PF's load balancing
features. Today I run PF with load balacing in substitution for Layer 3
load balancer switches, in two type of scenarios, the very first where
applications share sessions and the other, where sessions are not shared.

My problem is...

Here is my enviroment

Basically the example enviroment is one server with PF and three Web
Servers which do not share their sessions:

table <lb> { 10.0.0.1, 10.0.0.2, 10.0.0.3 }
rdr on xl0 inet proto tcp from any to IP_PUBLICO port 80 -> { <lb> }
round-robin sticky-address
pass in quick log on xl0 proto tcp from any to <lb> port 80 flags S/SA
modulate state (src.track 1800)

"stick-address" option makes PF always redirect a connection to a
server, it creates a entry in the "Source" table (source-track, which
can be seen with "pfctl -vs Source") and while this entry stills alive
it forwards every other request from the same IP address to this same
Web Server. By default, the entry is alive on "Source" untill the last
state is still alive.

To raise this value we need to set new limit to "src.track" (set timeout
src.track), I did this through the rule which allows the connection, as
you can see in the mentioned rule.

To make it short, PF will load balance connections among the servers on
<lb> table, and keep the same server to the same cliente up to 1800
seconds (30 minutes) after the last state was excluded.

My problem starts to happen now:

Everything above mentioned works perfectly, the issue starts when we
have to delete one IP from the load balance table. For example, if
10.0.0.2 server is down, I need to take it out of the balancing table:

pfctl -t lb -T del 10.0.0.2

In this case, technically load balancing will be kept only among the IPs
10.0.0.1 and 10.0.0.3, which are the only ones that still exists in the
<lb> table. But the problem is, even when the just deleted 10.0.0.2
server is not on <lb> anymore, clients requests/states which were in
"Source" before and that pointed 10.0.0.2, will still there, and
therefore redirections to 10.0.0.2 will continue to happen until
src.track expires (30 minutes in the mentioned situation), or when I do
"pfctl -F Source". But if I do the second approach, I will flush all my
references and sessions for this and all other source-tracks data in my
firewall.

Possible solutions I see:

The only solution I found was to change PF source code, where we could:

1) Create something similar to "pfctl -k" used for states, but "Source"
version of it.

In this case, to delete a server, we would do

pfctl -t lb -T del 10.0.0.2
pfctl -new -flag 10.0.0.2

2) Make sticky-address verify if the IP address is still in the load
balacing options (in this case, if it is on <lb> table still). This
second approach would (maybe) suffer from performance issues, since we
are adding a new check before stick-address handles the request.

Anyone has any better option?
Does any hacker have available time to do this?

Thank you a lot.

--
Diego Linke
Public Key: http://www.gamk.com.br/gamk.asc