HFSC (7) & (8) documentation + assorted changes

This patch adds detailed documentation for HFSC scheduler. It roughly
follows HFSC paper, but tries to not rely too much on math side of things.
Post-paper/Linux specific subjects (timer resolution, ul service curve, etc.)
are also discussed.

I've read it many times over, but it's a lengthy chunk of text - so try
to be understanding in case I made some mistakes.

tc-hfsc(7): explains algorithm in detail (very long)
tc-hfsc(8): explains command line options briefly
tc(8): adds references to new man pages
Makefile: adds man7 directory to install target
q_hfsc.c: minimal help text changes, consistency with tc-hfsc(8)

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This commit is contained in:
Michal Soltys 2011-10-26 04:15:21 -04:00 committed by Stephen Hemminger
parent aa48b5931a
commit 41f6004139
6 changed files with 752 additions and 1 deletions

View File

@ -60,6 +60,8 @@ install: all
install -m 0644 $(shell find etc/iproute2 -maxdepth 1 -type f) $(DESTDIR)$(CONFDIR)
install -m 0755 -d $(DESTDIR)$(MANDIR)/man8
install -m 0644 $(shell find man/man8 -maxdepth 1 -type f) $(DESTDIR)$(MANDIR)/man8
install -m 0755 -d $(DESTDIR)$(MANDIR)/man7
install -m 0644 $(shell find man/man7 -maxdepth 1 -type f) $(DESTDIR)$(MANDIR)/man7
ln -sf tc-bfifo.8 $(DESTDIR)$(MANDIR)/man8/tc-pfifo.8
ln -sf lnstat.8 $(DESTDIR)$(MANDIR)/man8/rtstat.8
ln -sf lnstat.8 $(DESTDIR)$(MANDIR)/man8/ctstat.8

525
man/man7/tc-hfsc.7 Normal file
View File

@ -0,0 +1,525 @@
.TH HFSC 7 "25 February 2009" iproute2 Linux
.ce 1
\fBHIERARCHICAL FAIR SERVICE CURVE\fR
.
.SH "HISTORY & INTRODUCTION"
.
HFSC \- \fBHierarchical Fair Service Curve\fR was first presented at
SIGCOMM'97. Developed as a part of ALTQ (ALTernative Queuing) on NetBSD, found
its way quickly to other BSD systems, and then a few years ago became part of
the linux kernel. Still, it's not the most popular scheduling algorithm \-
especially if compared to HTB \- and it's not well documented from enduser's
perspective. This introduction aims to explain how HFSC works without
going to deep into math side of things (although some if it will be
inevitable).
In short HFSC aims to:
.
.RS 4
.IP \fB1)\fR 4
guarantee precise bandwidth and delay allocation for all leaf classes (realtime
criterion)
.IP \fB2)\fR
allocate excess bandwidth fairly as specified by class hierarchy (linkshare &
upperlimit criterion)
.IP \fB3)\fR
minimize any discrepancy between the service curve and the actual amount of
service provided during linksharing
.RE
.PP
.
The main "selling" point of HFSC is feature \fB(1)\fR, which is achieved by
using nonlinear service curves (more about what it actually is later). This is
particularly useful in VoIP or games, where not only guarantee of consistent
bandwidth is important, but initial delay of a data stream as well. Note that
it matters only for leaf classes (where the actual queues are) \- thus class
hierarchy is ignored in realtime case.
Feature \fB(2)\fR is well, obvious \- any algorithm featuring class hierarchy
(such as HTB or CBQ) strives to achieve that. HFSC does that well, although
you might end with unusual situations, if you define service curves carelessly
\- see section CORNER CASES for examples.
Feature \fB(3)\fR is mentioned due to the nature of the problem. There may be
situations where it's either not possible to guarantee service of all curves at
the same time, and/or it's impossible to do so fairly. Both will be explained
later. Note that this is mainly related to interior (aka aggregate) classes, as
the leafs are already handled by \fB(1)\fR. Still \- it's perfectly possible to
create a leaf class w/o realtime service, and in such case \- the caveats will
naturally extend to leaf classes as well.
.SH ABBREVIATIONS
For the remaining part of the document, we'll use following shortcuts:
.nf
.RS 4
RT \- realtime
LS \- linkshare
UL \- upperlimit
SC \- service curve
.fi
.
.SH "BASICS OF HFSC"
.
To understand how HFSC works, we must first introduce a service curve.
Overall, it's a nondecreasing function of some time unit, returning amount of
service (allowed or allocated amount of bandwidth) by some specific point in
time. The purpose of it should be subconsciously obvious \- if a class was
allowed to transfer not less than the amount specified by its service curve \-
then service curve is not violated.
Still \- we need more elaborate criterion than just the above (although in
most generic case it can be reduced to it). The criterion has to take two
things into account:
.
.RS 4
.IP \(bu 4
idling periods
.IP \(bu
ability to "look back", so if during current active period service curve is violated, maybe it
isn't if we count excess bandwidth received during earlier active period(s)
.RE
.PP
Let's define the criterion as follows:
.RS 4
.nf
.IP "\fB(1)\fR" 4
For each t1, there must exist t0 in set B, so S(t1\-t0)\~<=\~w(t0,t1)
.fi
.RE
.
.PP
Here 'w' denotes the amount of service received during some time period between t0
and t1. B is a set of all times, where a session becomes active after idling
period (further denoted as 'becoming backlogged'). For a clearer picture,
imagine two situations:
.
.RS 4
.IP \fBa)\fR 4
our session was active during two periods, with a small time gap between them
.IP \fBb)\fR
as in (a), but with a larger gap
.RE
.
.PP
Consider \fB(a)\fR \- if the service received during both periods meets
\fB(1)\fR, then all is good. But what if it doesn't do so during the 2nd
period ? If the amount of service received during the 1st period is bigger
than the service curve, then it might compensate for smaller service during
the 2nd period \fIand\fR the gap \- if the gap is small enough.
If the gap is larger \fB(b)\fR \- then it's less likely to happen (unless the
excess bandwidth allocated during the 1st part was really large). Still, the
larger the gap \- the less interesting is what happened in the past (e.g. 10
minutes ago) \- what matters is the current traffic that just started.
From HFSC's perspective, more interesting is answering the following question:
when should we start transferring packets, so a service curve of a class is not
violated. Or rephrasing it: How much X() amount of service should a session
receive by time t, so the service curve is not violated. Function X() defined
as below is the basic building block of HFSC, used in: eligible, deadline,
virtual\-time and fit\-time curves. Of course, X() is based on equation
\fB(1)\fR and is defined recursively:
.RS 4
.IP \(bu 4
At the 1st backlogged period beginning function X is initialized to generic
service curve assigned to a class
.IP \(bu
At any subsequent backlogged period, X() is:
.nf
\fBmin(X() from previous period ; w(t0)+S(t\-t0) for t>=t0),\fR
.fi
\&... where t0 denotes the beginning of the current backlogged period.
.RE
.
.PP
HFSC uses either linear, or two\-piece linear service curves. In case of
linear or two\-piece linear convex functions (first slope < second slope),
min() in X's definition reduces to the 2nd argument. But in case of two\-piece
concave functions, the 1st argument might quickly become lesser for some
t>=t0. Note, that for some backlogged period, X() is defined only from that
period's beginning. We also define X^(\-1)(w) as smallest t>=t0, for which
X(t)\~=\~w. We have to define it this way, as X() is usually not an injection.
The above generic X() can be one of the following:
.
.RS 4
.IP "E()" 4
In realtime criterion, selects packets eligible for sending. If none are
eligible, HFSC will use linkshare criterion. Eligible time \&'et' is calculated
with reference to packets' heads ( et\~=\~E^(\-1)(w) ). It's based on RT
service curve, \fIbut in case of a convex curve, uses its 2nd slope only.\fR
.IP "D()"
In realtime criterion, selects the most suitable packet from the ones chosen
by E(). Deadline time \&'dt' corresponds to packets' tails
(dt\~=\~D^(\-1)(w+l), where \&'l' is packet's length). Based on RT service
curve.
.IP "V()"
In linkshare criterion, arbitrates which packet to send next. Note that V() is
function of a virtual time \- see \fBLINKSHARE CRITERION\fR section for
details. Virtual time \&'vt' corresponds to packets' heads
(vt\~=\~V^(\-1)(w)). Based on LS service curve.
.IP "F()"
An extension to linkshare criterion, used to limit at which speed linkshare
criterion is allowed to dequeue. Fit\-time 'ft' corresponds to packets' heads
as well (ft\~=\~F^(\-1)(w)). Based on UL service curve.
.RE
Be sure to make clean distinction between session's RT, LS and UL service
curves and the above "utility" functions.
.
.SH "REALTIME CRITERION"
.
RT criterion \fIignores class hierarchy\fR and guarantees precise bandwidth and
delay allocation. We say that packet is eligible for sending, when current real
time is bigger than eligible time. From all packets eligible, the one most
suited for sending, is the one with the smallest deadline time. Sounds simply,
but consider following example:
Interface 10mbit, two classes, both with two\-piece linear service curves:
.RS 4
.IP \(bu 4
1st class \- 2mbit for 100ms, then 7mbit (convex \- 1st slope < 2nd slope)
.IP \(bu
2nd class \- 7mbit for 100ms, then 2mbit (concave \- 1st slope > 2nd slope)
.RE
.PP
Assume for a moment, that we only use D() for both finding eligible packets,
and choosing the most fitting one, thus eligible time would be computed as
D^(\-1)(w) and deadline time would be computed as D^(\-1)(w+l). If the 2nd
class starts sending packets 1 second after the 1st class, it's of course
impossible to guarantee 14mbit, as the interface capability is only 10mbit.
The only workaround in this scenario is to allow the 1st class to send the
packets earlier that would normally be allowed. That's where separate E() comes
to help. Putting all the math aside (see HFSC paper for details), E() for RT
concave service curve is just like D(), but for the RT convex service curve \-
it's constructed using \fIonly\fR RT service curve's 2nd slope (in our example
\- 7mbit).
The effect of such E() \- packets will be sent earlier, and at the same time
D() \fIwill\fR be updated \- so current deadline time calculated from it will
be bigger. Thus, when the 2nd class starts sending packets later, both the 1st
and the 2nd class will be eligible, but the 2nd session's deadline time will be
smaller and its packets will be sent first. When the 1st class becomes idle at
some later point, the 2nd class will be able to "buffer" up again for later
active period of the 1st class.
A short remark \- in a situation, where the total amount of bandwidth
available on the interface is bigger than the allocated total realtime parts
(imagine interface 10 mbit, but 1mbit/2mbit and 2mbit/1mbit classes), the sole
speed of the interface could suffice to guarantee the times.
Important part of RT criterion is that apart from updating its D() and E(),
also V() used by LS criterion is updated. Generally the RT criterion is
secondary to LS one, and used \fIonly\fR if there's a risk of violating precise
realtime requirements. Still, the "participation" in bandwidth distributed by
LS criterion is there, so V() has to be updated along the way. LS criterion can
than properly compensate for non\-ideal fair sharing situation, caused by RT
scheduling. If you use UL service curve its F() will be updated as well (UL
service curve is an extension to LS one \- see \fBUPPERLIMIT CRITERION\fR
section).
Anyway \- careless specification of LS and RT service curves can lead to
potentially undesired situations (see CORNER CASES for examples). This wasn't
the case in HFSC paper where LS and RT service curves couldn't be specified
separately.
.SH "LINKSHARING CRITERION"
.
LS criterion's task is to distribute bandwidth according to specified class
hierarchy. Contrary to RT criterion, there're no comparisons between current
real time and virtual time \- the decision is based solely on direct comparison
of virtual times of all active subclasses \- the one with the smallest vt wins
and gets scheduled. One immediate conclusion from this fact is that absolute
values don't matter \- only ratios between them (so for example, two children
classes with simple linear 1mbit service curves will get the same treatment
from LS criterion's perspective, as if they were 5mbit). The other conclusion
is, that in perfectly fluid system with linear curves, all virtual times across
whole class hierarchy would be equal.
Why is VC defined in term of virtual time (and what is it) ?
Imagine an example: class A with two children \- A1 and A2, both with let's say
10mbit SCs. If A2 is idle, A1 receives all the bandwidth of A (and update its
V() in the process). When A2 becomes active, A1's virtual time is already
\fIfar\fR bigger than A2's one. Considering the type of decision made by LS
criterion, A1 would become idle for a lot of time. We can workaround this
situation by adjusting virtual time of the class becoming active \- we do that
by getting such time "up to date". HFSC uses a mean of the smallest and the
biggest virtual time of currently active children fit for sending. As it's not
real time anymore (excluding trivial case of situation where all classes become
active at the same time, and never become idle), it's called virtual time.
Such approach has its price though. The problem is analogous to what was
presented in previous section and is caused by non\-linearity of service
curves:
.IP 1) 4
either it's impossible to guarantee both service curves and satisfy fairness
during certain time periods:
.RS 4
Recall the example from RT section, slightly modified (with 3mbit slopes
instead of 2mbit ones):
.IP \(bu 4
1st class \- 3mbit for 100ms, then 7mbit (convex \- 1st slope < 2nd slope)
.IP \(bu
2nd class \- 7mbit for 100ms, then 3mbit (concave \- 1st slope > 2nd slope)
.PP
They sum up nicely to 10mbit \- interface's capacity. But if we wanted to only
use LS for guarantees and fairness \- it simply won't work. In LS context,
only V() is used for making decision which class to schedule. If the 2nd class
becomes active when the 1st one is in its second slope, the fairness will be
preserved \- ratio will be 1:1 (7mbit:7mbit), but LS itself is of course
unable to guarantee the absolute values themselves \- as it would have to go
beyond of what the interface is capable of.
.RE
.IP 2) 4
and/or it's impossible to guarantee service curves of all classes at all
.RS 4
Even if we didn't use virtual time and allowed a session to be "punished",
there's a possibility that service curves of all classes couldn't be
guaranteed for a brief period. Consider following, a bit more complicated
example:
Root interface, classes A and B with concave and convex curve (summing up to
root), A1 & A2 (children of A), \fIboth\fR with concave curves summing up to A,
B1 & B2 (children of B), \fIboth\fR with convex curves summing up to B.
Assume that A2, B1 and B2 are constantly backlogged, and at some later point
A1 becomes backlogged. We can easily choose slopes, so that even if we
"punish" A2 for earlier excess bandwidth received, A1 will have no chance of
getting bandwidth corresponding to its first slope. Following from the above
example:
.nf
A \- 7mbit, then 3mbit
A1 \- 5mbit, then 2mbit
A2 \- 2mbit, then 1mbit
B \- 3mbit, then 7mbit
B1 \- 2mbit, then 5mbit
B2 \- 1mbit, then 2mbit
.fi
At the point when A1 starts sending, it should get 5mbit to not violate its
service curve. A2 gets punished and doesn't send at all, B1 and B2 both keep
sending at their 5mbit and 2mbit. But as you can see, we already are beyond
interface's capacity \- at 12mbit. A1 could get 3mbit at most. If we used
virtual times and kept fairness property, A1 and A2 would send at 3mbit
together with 5:2 ratio (so respectively at ~2.14mbit and ~0.86mbit).
.RE
.
.SH "UPPERLIMIT CRITERION"
.
UL criterion is an extensions to LS one, that permits sending packets only
if current real time is bigger than fit\-time ('ft'). So the modified LS
criterion becomes: choose the smallest virtual time from all active children,
such that fit\-time < current real time also holds. Fit\-time is calculated
from F(), which is based on UL service curve. As you can see, it's role is
kinda similar to E() used in RT criterion. Also, for obvious reasons \- you
can't specify UL service curve without LS one.
Main purpose of UL service curve is to limit HFSC to bandwidth available on the
upstream router (think adsl home modem/router, and linux server as
nat/firewall/etc. with 100mbit+ connection to mentioned modem/router).
Typically, it's used to create a single class directly under root, setting
linear UL service curve to available bandwidth \- and then creating your class
structure from that class downwards. Of course, you're free to add UL service
(linear or not) curve to any class with LS criterion.
Important part about UL service curve is, that whenever at some point in time
a class doesn't qualify for linksharing due to its fit\-time, the next time it
does qualify, it will update its virtual time to the smallest virtual time of
all active children fit for linksharing. This way, one of the main things LS
criterion tries to achieve \- equality of all virtual times across whole
hierarchy \- is preserved (in perfectly fluid system with only linear curves,
all virtual times would be equal).
Without that, 'vt' would lag behind other virtual times, and could cause
problems. Consider interface with capacity 10mbit, and following leaf classes
(just in case you're skipping this text quickly \- this example shows behavior
that \f(BIdoesn't happen\fR):
.nf
A \- ls 5.0mbit
B \- ls 2.5mbit
C \- ls 2.5mbit, ul 2.5mbit
.fi
If B was idle, while A and C were constantly backlogged, they would normally
(as far as LS criterion is concerned) divide bandwidth in 2:1 ratio. But due
to UL service curve in place, C would get at most 2.5mbit, and A would get the
remaining 7.5mbit. The longer the backlogged period, the more virtual times of
A and C would drift apart. If B became backlogged at some later point in time,
its virtual time would be set to (A's\~vt\~+\~C's\~vt)/2, thus blocking A from
sending any traffic, until B's virtual time catches up with A.
.
.SH "SEPARATE LS / RT SCs"
.
Another difference from original HFSC paper, is that RT and LS SCs can be
specified separately. Moreover \- leaf classes are allowed to have only either
RT SC or LS SC. For interior classes, only LS SCs make sense \- Any RT SC will
be ignored.
.
.SH "CORNER CASES"
.
Separate service curves for LS and RT criteria can lead to certain traps,
that come from "fighting" between ideal linksharing and enforced realtime
guarantees. Those situations didn't exist in original HFSC paper, where
specifying separate LS / RT service curves was not discussed.
Consider interface with capacity 10mbit, with following leaf classes:
.nf
A \- ls 5.0mbit, rt 8mbit
B \- ls 2.5mbit
C \- ls 2.5mbit
.fi
Imagine A and C are constantly backlogged. As B is idle, A and C would divide
bandwidth in 2:1 ratio, considering LS service curve (so in theory \- 6.66 and
3.33). Alas RT criterion takes priority, so A will get 8mbit and LS will be
able to compensate class C for only 2 mbit \- this will cause discrepancy
between virtual times of A and C.
Assume this situation lasts for a lot of time with no idle periods, and
suddenly B becomes active. B's virtual time will be updated to
(A's\~vt\~+\~C's\~vt)/2, effectively landing in the middle between A's and C's
virtual time. The effect \- B, having no RT guarantees, will be punished and
will not be allowed to transfer until C's virtual time catches up.
If the interface had higher capacity \- for example 100mbit, this example
would behave perfectly fine though.
Let's look a bit closer at the above example \- it "cleverly" invalidates one
of the basic things LS criterion tries to achieve \- equality of all virtual
times across class hierarchy. Leaf classes without RT service curves are
literally left to their own fate (governed by messed up virtual times).
Also - it doesn't make much sense. Class A will always be guaranteed up to
8mbit, and this is more than any absolute bandwidth that could happen from its
LS criterion (excluding trivial case of only A being active). If the bandwidth
taken by A is smaller than absolute value from LS criterion, the unused part
will be automatically assigned to other active classes (as A has idling periods
in such case). The only "advantage" is, that even in case of low bandwidth on
average, bursts would be handled at the speed defined by RT criterion. Still,
if extra speed is needed (e.g. due to latency), non linear service curves
should be used in such case.
In the other words - LS criterion is meaningless in the above example.
You can quickly "workaround" it by making sure each leaf class has RT service
curve assigned (thus guaranteeing all of them will get some bandwidth), but it
doesn't make it any more valid.
.
.SH "LINUX AND TIMER RESOLUTION"
.
In certain situations, the scheduler can throttle itself and setup so
called watchdog to wakeup dequeue function at some time later. In case of HFSC
it happens when for example no packet is eligible for scheduling, and UL
service curve is used to limit the speed at which LS criterion is allowed to
dequeue packets. It's called throttling, and accuracy of it is dependent on
how the kernel is compiled.
There're 3 important options in modern kernels, as far as timers' resolution
goes: \&'tickless system', \&'high resolution timer support' and \&'timer
frequency'.
If you have \&'tickless system' enabled, then the timer interrupt will trigger
as slowly as possible, but each time a scheduler throttles itself (or any
other part of the kernel needs better accuracy), the rate will be increased as
needed / possible. The ceiling is either \&'timer frequency' if \&'high
resolution timer support' is not available or not compiled in. Otherwise it's
hardware dependent and can go \fIfar\fR beyond the highest \&'timer frequency'
setting available.
If \&'tickless system' is not enabled, the timer will trigger at a fixed rate
specified by \&'timer frequency' \- regardless if high resolution timers are
or aren't available.
This is important to keep those settings in mind, as in scenario like: no
tickless, no HR timers, frequency set to 100hz \- throttling accuracy would be
at 10ms. It doesn't automatically mean you would be limited to ~0.8mbit/s
(assuming packets at ~1KB) \- as long as your queues are prepared to cover for
timer inaccuracy. Of course, in case of e.g. locally generated udp traffic \-
appropriate socket size is needed as well. Short example to make it more
understandable (assume hardcore anti\-schedule settings \- HZ=100, no HR
timers, no tickless):
.nf
tc qdisc add dev eth0 root handle 1:0 hfsc default 1
tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10mbit
.fi
Assuming packet of ~1KB size and HZ=100, that averages to ~0.8mbit \- anything
beyond it (e.g. the above example with specified rate over 10x bigger) will
require appropriate queuing and cause bursts every ~10 ms. As you can
imagine, any HFSC's RT guarantees will be seriously invalidated by that.
Aforementioned example is mainly important if you deal with old hardware \- as
it's particularly popular for home server chores. Even then, you can easily
set HZ=1000 and have very accurate scheduling for typical adsl speeds.
Anything modern (apic or even hpet msi based timers + \&'tickless system')
will provide enough accuracy for superb 1gbit scheduling. For example, on one
of basically cheap dual core AMD boards I have with following settings:
.nf
tc qdisc add dev eth0 parent root handle 1:0 hfsc default 1
tc class add dev eth0 paretn 1:0 classid 1:1 hfsc rt m2 300mbit
.fi
And simple:
.nf
nc \-u dst.host.com 54321 </dev/zero
nc \-l \-p 54321 >/dev/null
.fi
\&...will yield following effects over period of ~10 seconds (taken from
/proc/interrupts):
.nf
319: 42124229 0 HPET_MSI\-edge hpet2 (before)
319: 42436214 0 HPET_MSI\-edge hpet2 (after 10s.)
.fi
That's roughly 31000/s. Now compare it with HZ=1000 setting. The obvious
drawback of it is that cpu load can be rather extensive with servicing that
many timer interrupts. Example with 300mbit RT service curve on 1gbit link is
particularly ugly, as it requires a lot of throttling with minuscule delays.
Also note that it's just an example showing capability of current hardware.
The above example (essentially 300mbit TBF emulator) is pointless on internal
interface to begin with \- you will pretty much always want regular LS service
curve there, and in such scenario HFSC simply doesn't throttle at all.
300mbit RT service curve (selected columns from mpstat \-P ALL 1):
.nf
10:56:43 PM CPU %sys %irq %soft %idle
10:56:44 PM all 20.10 6.53 34.67 37.19
10:56:44 PM 0 35.00 0.00 63.00 0.00
10:56:44 PM 1 4.95 12.87 6.93 73.27
.fi
So, in rare case you need those speeds with only RT service curve, or with UL
service curve \- remember about drawbacks.
.
.SH "LAYER2 ADAPTATION"
.
Please refer to \fBtc\-stab\fR(8)
.
.SH "SEE ALSO"
.
\fBtc\fR(8), \fBtc\-hfsc\fR(8), \fBtc\-stab\fR(8)
Please direct bugreports and patches to: <net...@vger.kernel.org>
.
.SH "AUTHOR"
.
Manpage created by Michal Soltys (sol...@ziu.info)

61
man/man8/tc-hfsc.8 Normal file
View File

@ -0,0 +1,61 @@
.TH HFSC 8 "25 February 2009" iproute2 Linux
.
.SH NAME
HFSC \- Hierarchical Fair Service Curve's control under linux
.
.SH SYNOPSIS
.nf
tc qdisc add ... hfsc [ \fBdefault\fR CLASSID ]
tc class add ... hfsc [ [ \fBrt\fR SC ] [ \fBls\fR SC ] | [ \fBsc\fR SC ] ] [ \fBul\fR SC ]
\fBrt\fR : realtime service curve
\fBls\fR : linkshare service curve
\fBsc\fR : rt+ls service curve
\fBul\fR : upperlimit service curve
\(bu at least one of \fBrt\fR, \fBls\fR or \fBsc\fR must be specified
\(bu \fBul\fR can only be specified with \fBls\fR or \fBsc\fR
.
.IP "SC := [ [ \fBm1\fR BPS ] \fBd\fR SEC ] \fBm2\fR BPS"
\fBm1\fR : slope of the first segment
\fBd\fR : x\-coordinate of intersection
\fBm2\fR : slope of the second segment
.PP
.IP "SC := [ [ \fBumax\fR BYTE ] \fBdmax\fR SEC ] \fBrate\fR BPS"
\fBumax\fR : maximum unit of work
\fBdmax\fR : maximum delay
\fBrate\fR : rate
.PP
.fi
For description of BYTE, BPS and SEC \- please see \fBUNITS\fR
section of \fBtc\fR(8).
.
.SH DESCRIPTION (qdisc)
HFSC qdisc has only one optional parameter \- \fBdefault\fR. CLASSID specifies
the minor part of the default classid, where packets not classified by other
means (e.g. u32 filter, CLASSIFY target of iptables) will be enqueued. If
\fBdefault\fR is not specified, unclassified packets will be dropped.
.
.SH DESCRIPTION (class)
HFSC class is used to create a class hierarchy for HFSC scheduler. For
explanation of the algorithm, and the meaning behind \fBrt\fR, \fBls\fR,
\fBsc\fR and \fBul\fR service curves \- please refer to \fBtc\-hfsc\fR(7).
As you can see in \fBSYNOPSIS\fR, service curve (SC) can be specified in two
ways. Either as maximum delay for certain amount of work, or as a bandwidth
assigned for certain amount of time. Obviously, \fBm1\fR is simply
\fBumax\fR/\fBdmax\fR.
Both \fBm2\fR and \fBrate\fR are mandatory. If you omit other
parameters, you will specify linear service curve.
.
.SH "SEE ALSO"
.
\fBtc\fR(8), \fBtc\-hfsc\fR(7), \fBtc\-stab\fR(8)
Please direct bugreports and patches to: <net...@vger.kernel.org>
.
.SH "AUTHOR"
.
Manpage created by Michal Soltys (sol...@ziu.info)

156
man/man8/tc-stab.8 Normal file
View File

@ -0,0 +1,156 @@
.TH STAB 8 "25 February 2009" iproute2 Linux
.
.SH NAME
tc\-stab \- Generic size table manipulations
.
.SH SYNOPSIS
.nf
tc qdisc add ... stab \\
.RS 4
[ \fBmtu\fR BYTES ] [ \fBtsize\fR SLOTS ] \\
[ \fBmpu\fR BYTES ] [ \fBoverhead\fR BYTES ] [ \fBlinklayer\fR TYPE ] ...
.RE
TYPE := adsl | atm | ethernet
.fi
For the description of BYTES \- please refer to the \fBUNITS\fR
section of \fBtc\fR(8).
.IP \fBmtu\fR 4
.br
maximum packet size we create size table for, assumed 2048 if not specified explicitly
.IP \fBtsize\fR
.br
required table size, assumed 512 if not specified explicitly
.IP \fBmpu\fR
.br
minimum packet size used in computations
.IP \fBoverhead\fR
.br
per\-packet size overhead (can be negative) used in computations
.IP \fBlinklayer\fR
.br
required linklayer adaptation.
.PP
.
.SH DESCRIPTION
.
Size tables allow manipulation of packet size, as seen by whole scheduler
framework (of course, the actual packet size remains the same). Adjusted packet
size is calculated only once \- when a qdisc enqueues the packet. Initial root
enqueue initializes it to the real packet's size.
Each qdisc can use different size table, but the adjusted size is stored in
area shared by whole qdisc hierarchy attached to the interface (technically,
it's stored in skb). The effect is, that if you have such setup, the last qdisc
with a stab in a chain "wins". For example, consider HFSC with simple pfifo
attached to one of its leaf classes. If that pfifo qdisc has stab defined, it
will override lengths calculated during HFSC's enqueue, and in turn, whenever
HFSC tries to dequeue a packet, it will use potentially invalid size in its
calculations. Normal setups will usually include stab defined only on root
qdisc, but further overriding gives extra flexibility for less usual setups.
Initial size table is calculated by \fBtc\fR tool using \fBmtu\fR and
\fBtsize\fR parameters. The algorithm sets each slot's size to the smallest
power of 2 value, so the whole \fBmtu\fR is covered by the size table. Neither
\fBtsize\fR, nor \fBmtu\fR have to be power of 2 value, so the size
table will usually support more than is required by \fBmtu\fR.
For example, with \fBmtu\fR\~=\~1500 and \fBtsize\fR\~=\~128, a table with 128
slots will be created, where slot 0 will correspond to sizes 0\-16, slot 1 to
17\~\-\~32, \&..., slot 127 to 2033\~\-\~2048. Note, that the sizes
are shifted 1 byte (normally you would expect 0\~\-\~15, 16\~\-\~31, \&...,
2032\~\-\~2047). Sizes assigned to each slot depend on \fBlinklayer\fR parameter.
Stab calculation is also safe for an unusual case, when a size assigned to a
slot would be larger than 2^16\-1 (you will lose the accuracy though).
During kernel part of packet size adjustment, \fBoverhead\fR will be added to
original size, and after subtracting 1 (to land in the proper slot \- see above
about shifting by 1 byte) slot will be calculated. If the size would cause
overflow, more than 1 slot will be used to get the final size. It of course will
affect accuracy, but it's only a guard against unusual situations.
Currently there're two methods of creating values stored in the size table \-
ethernet and atm (adsl):
.IP ethernet 4
.br
This is basically 1\-1 mapping, so following our example from above
(disregarding \fBmpu\fR for a moment) slot 0 would have 8, slot 1 would have 16
and so on, up to slot 127 with 2048. Note, that \fBmpu\fR\~>\~0 must be
specified, and slots that would get less than specified by \fBmpu\fR, will get
\fBmpu\fR instead. If you don't specify \fBmpu\fR, the size table will not be
created at all, although any \fBoverhead\fR value will be respected during
calculations.
.IP "atm, adsl"
.br
ATM linklayer consists of 53 byte cells, where each of them provides 48 bytes
for payload. Also all the cells must be fully utilized, thus the last one is
padded if/as necessary.
When size table is calculated, adjusted size that fits properly into lowest
amount of cells is assigned to a slot. For example, a 100 byte long packet
requires three 48\-byte payloads, so the final size would require 3 ATM cells
\- 159 bytes.
For ATM size tables, 16\~bytes sized slots are perfectly enough. The default
values of \fBmtu\fR and \fBtsize\fR create 4\~bytes sized slots.
.PP
.
.SH "TYPICAL OVERHEADS"
The following values are typical for different adsl scenarios (based on
\fB[1]\fR and \fB[2]\fR):
.nf
LLC based:
.RS 4
PPPoA \- 14 (PPP \- 2, ATM \- 12)
PPPoE \- 40+ (PPPoE \- 8, ATM \- 18, ethernet 14, possibly FCS \- 4+padding)
Bridged \- 32 (ATM \- 18, ethernet 14, possibly FCS \- 4+padding)
IPoA \- 16 (ATM \- 16)
.RE
VC Mux based:
.RS 4
PPPoA \- 10 (PPP \- 2, ATM \- 8)
PPPoE \- 32+ (PPPoE \- 8, ATM \- 10, ethernet 14, possibly FCS \- 4+padding)
Bridged \- 24+ (ATM \- 10, ethernet 14, possibly FCS \- 4+padding)
IPoA \- 8 (ATM \- 8)
.RE
.fi
\p There're few important things regarding the above overheads:
.
.IP \(bu 4
IPoA in LLC case requires SNAP, instead of LLC\-NLPID (see rfc2684) \- this is
the reason, why it actually takes more space than PPPoA.
.IP \(bu
In rare cases, FCS might be preserved on protocols that include ethernet frame
(Bridged and PPPoE). In such situation, any ethernet specific padding
guaranteeing 64 bytes long frame size has to be included as well (see rfc2684).
In the other words, it also guarantees that any packet you send will take
minimum 2 atm cells. You should set \fBmpu\fR accordingly for that.
.IP \(bu
When size table is consulted, and you're shaping traffic for the sake of
another modem/router, ethernet header (without padding) will already be added
to initial packet's length. You should compensate for that by subtracting 14
from the above overheads in such case. If you're shaping directly on the router
(for example, with speedtouch usb modem) using ppp daemon, layer2 header will
not be added yet.
For more thorough explanations, please see \fB[1]\fR and \fB[2]\fR.
.
.SH "SEE ALSO"
.
\fBtc\fR(8), \fBtc\-hfsc\fR(7), \fBtc\-hfsc\fR(8),
.br
\fB[1]\fR http://ace\-host.stuart.id.au/russell/files/tc/tc\-atm/
.br
\fB[2]\fR http://www.faqs.org/rfcs/rfc2684.html
Please direct bugreports and patches to: <net...@vger.kernel.org>
.
.SH "AUTHOR"
.
Manpage created by Michal Soltys (sol...@ziu.info)

View File

@ -370,12 +370,15 @@ was written by Alexey N. Kuznetsov and added in Linux 2.2.
.BR tc-choke (8),
.BR tc-drr (8),
.BR tc-htb (8),
.BR tc-hfsc (8),
.BR tc-hfsc (7),
.BR tc-sfq (8),
.BR tc-red (8),
.BR tc-tbf (8),
.BR tc-pfifo (8),
.BR tc-bfifo (8),
.BR tc-pfifo_fast (8),
.BR tc-stab (8),
.br
.RB "User documentation at " http://lartc.org/ ", but please direct bugreports and patches to: " <netdev@vger.kernel.org>

View File

@ -43,7 +43,7 @@ explain_class(void)
fprintf(stderr,
"Usage: ... hfsc [ [ rt SC ] [ ls SC ] | [ sc SC ] ] [ ul SC ]\n"
"\n"
"SC := [ [ m1 BPS ] [ d SEC ] m2 BPS\n"
"SC := [ [ m1 BPS ] d SEC ] m2 BPS\n"
"\n"
" m1 : slope of first segment\n"
" d : x-coordinate of intersection\n"
@ -57,6 +57,10 @@ explain_class(void)
" dmax : maximum delay\n"
" rate : rate\n"
"\n"
"Remarks:\n"
" - at least one of 'rt', 'ls' or 'sc' must be specified\n"
" - 'ul' can only be specified with 'ls' or 'sc'\n"
"\n"
);
}