libqb/lib/qblog_script.ld.in
Jan Pokorný 20246f544f
High: bare fix for libqb logging not working with ld.bfd/binutils 2.29+
(or rather [read on]: "bare" fix, now that we established means to
analyse the impact of the linker-dependent misbehaviour and to detect
some of its symptoms in preceding two commits, respectively)

Initially with the help of the internal test suite and the failing log
test, it was eventually discovered[1] that these binutils commits going
to the recent 2.29 release affected the treatment of _start_SECNAME
and __stop_SECNAME symbols denoting the boundary start/stop addresses
of a SECNAME orphan section -- specifically in libqb context a custom
section (SECNAME=__verbose) used for link-time ("run-time amortizing")
callsite collection when there's a support in the toolchain[*]:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=cbd0eecf261c2447781f8c89b0d955ee66fae7e9
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b27685f2016c510d03ac9a64f7b04ce8efcf95c4
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7dba9362c172f1073487536eb137feb2da30b0ff

The first one explicitly states:
> Also __start_SECNAME and __stop_SECNAME symbols are marked as hidden
> by ELF linker so that __start_SECNAME and __stop_SECNAME symbols for
> section SECNAME in different modules are unique.

The problem is that libqb silently depends on the previous status quo
ld.bfd linker behaviour of keeping those symbols externally visible,
which was apparently not granted as it has deliberately changed per
above.

And then for 2.29.1 release of binutils once again, as someone actually
noticed something went overboard with the 2.29 changes:

http://lists.gnu.org/archive/html/bug-binutils/2017-08/msg00195.html
(overview of the original bug discussion, rather than directly
https://sourceware.org/bugzilla/show_bug.cgi?id=21964, which is
a result of a conflct resolution when restoring bugzilla backup)
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=487b6440dad57440939fab7afdd84a218b612796

At least that change doesn't invalidate all the effort being put into
the original version of the changeset, only the configure script check
had to be refined so as not to miss the "orphan section magic not
working properly out of the box, without band aid" observation
(see the inline comment) -- the workaround arrangement needs
to be applied in that case as well.

* * *

So regarding the solution itself, the core of the fix was sketched at
the original Fedora targeted bug against binutils[2].  In short, we are
using a custom linker script that (re)describes the mentioned custom
orphan output section, or better yet, assuredly pushes that section, and
more importantly, it's own boundary denoting symbols, through into the
resulting executable when it's being linked (as in compile-time step).

This solution alone, while working for the non-libqb (more on that
below) logging participants, is not good enough, as it requires all
libqb targets to start using new incantation (namely "-Wl,foo.t" switch)
in the final link step during compilation, which might be solvable
with a tweak in libqb's pkg-config file under assumption that practice
of using "pkg-config --libs libqb" is rigidly followed.  Which is likely
a false expectation, and furthermore only for the regular consumption
model, as it doesn't cover the least bit the developmental one (refer
to previous-but-one "tests" commit message), e.g. applied for internal
examples + tests (but no local sub-checkout tree usage can be excluded).

So further extensions were devised to cover both consumption models:

- a. regular:
  courtesy of binutils maintainer[3], we follow an idea to make libqb.so
  (i.e. what the targets link against) rather a linker script on its
  own, which first include the version-specified (e.g.  libqb.so.0) file
  into the link, then lists, in situ, the content of the linker script
  per above, hence -lqb linking has the same effect as having both
  "-lqb -Wl,foo.t" explicitly in the link command prior to this trick

- b. developmental:
  to eliminate any kind of race condition arising from the attempt
  to post-modify libqb.la libtool archive file generated internally
  by libtool, we sort of abuse "inherited_linker_flags" variable
  within this file format, as it forms an accumulative value across
  the whole transitive dependencies chain (if not impaired per the
  note below), fitting exactly our purpose of injecting "-Wl,foo.t"
  switch equivalent for those libtool-linking by L{D,IB}ADD'ing
  libqb.la; it's then enough to craft a custom libtool archive file
  declaring that value, and hook it into such dependency chain through
  libqb_la_LIBADD, and with a little bit of further fiddling, it works
  as desired (note that double occurrence of "-Wl,foo.t" equivalent
  present at some stages of sorting this trick turned out to be,
  surprisingly, counter-productive, which should now demistify the
  very existence of effectively empty qblog_script_noop.ld file);

  NOTE: some forms of libtool distribution (debian + derivatives ones
  in particular) undermine natural transitive dependency propagation
  with a deliberate cut off (https://bugs.debian.org/702737), so we
  need to ensure the "impairment" is not happening by force (corosync
  precedent: https://github.com/corosync/corosync/commit/0f1dc5c1)
  ^ something like this needs to be applied for any such "private
    consumer" (although it hopefully goes without saying this way
    of consuming libqb outside of it's own playground is hardly
    the Right Thing) if portability is important, nonetheless!

* * *

On the address of linker script workaround, there are linkers out there
that do not support the trick, for instance:
- ld.gold from binutils (but it has hardly ever been working with
  orphan sections, anyway:
  https://sourceware.org/bugzilla/show_bug.cgi?id=22291)
- ancient versions of ld.bfd, e.g. 10+ years old one used as a native
  system linker even in the most recent releases of FreeBSD, unless
  GCC toolchain is used instead
If these are hit when (because) the compiler has already demonstrated it
supports "section" attribute, the build system configuration is forcibly
stopped, simply to stay conceptually compatible with the prior state in
which the affinity to leverage that feature hasn't been called off
under any circumstances.  One is, however, able to achieve exactly
this behaviour with --enable-nosection-fallback switch, but if some
other participants in the logging, possibly linked with a more friendly
linker, do utilize this orphan section, logging may silently break
(another reason to require an explicit sign-off).

Another note, the particular self-check change slightly touched in the
previous commit but otherwise predating this whole effort by far needs
to be modified now once again, this time because linker-script-based
workaround for newer linkers as stated causes the section boundary
symbols to be present regardless if that section is utilized, leading
to a self-inflicted breakage due to these empty section symbols suddenly
winning in the symbol resolution mechanism (previously the empty section
would be dropped incl. the boundary symbols), causing problems down the
line.  It also makes this very check self-contained in the same
compilation unit that trigggers it, whereas previously it used to be the
said "arbitrary" winner and things kept silently working just because
failure condition -- empty section -- would be implicitly isolated.

Last but not least, libqb itself needs to be linked with the mentioned
"-Wl,foo.t" equivalent for its own outgoing log messages to be honoured
under all circumstances, which is already achieved with the arrangement
for b. above, and by experiments, further redefinition of those boundary
denoting symbols as weak was necessary so as to make them truly global
within libqb.so proper (at least with binutils 2.29).

* * *

To provide a high-level prioritized overview of what drove the approach:

- PRESERVATION OF BINARY COMPATIBILITY (ABI), which is achieved except
  for a single "ABI nongracefulness" I am aware of but that's more
  a consequence of slightly incorrect assumptions in the logic of
  QB_LOG_INIT_DATA macro function predating this whole affair by
  a long shot and which the patchset finally rectifies:

  if in the run-time dynamic link, following is combined:
  (. libqb, arbitrary variant: pre-/post-fix, binutils < / >= 2.29)
  . an "intermediate" library (something that the end executable links
    with) triggering QB_LOG_INIT_DATA macro and being built with
    pre-fix libqb (and perhaps only with binutils < 2.29)
  . end executable using no libqb's logging at all, but being built
    with post-fix libqb (and arbitrary binutils < / >= 2.29)
  then, unlike when executable is built with pre-fix libqb, the
  special callsite data containing section in the ELF structure
  of the executable is created + its boundary denoting symbols
  defined within, despite the section being empty (did not happen
  with pre-fix libqb), and because the symbols defined within the
  target program have priority over that of shared libraries in the
  symbol resolution fallback scheme, the assertion of QB_LOG_INIT_DATA
  of the mentioned intermediate library will actually be evaluating
  the inequality of boundaries for the section of the executable(!)
  rather than it's own (or whatever higher prio symbols are hit,
  presumably only present if the section at that level is non-empty,
  basically a generalization of the story so far);

  the problem then manifests as unability to run said executable
  as it will fail because of the intermediate library inflicted
  assertion (sadly with very unhelpful "Assertion `0' failed"
  message);

  fortunately, there's enough flexibility so as how to fix
  this, either should be fine:
  . have everything in the executable's library dependency closure
    that links against libqb assurably (compile-time) linked with one
    variant of libqb only (either all pre-fix or post-fix, mind the
    apparent limitation of binutils' versions with the former)
  . have the end executable (that does not use logging at all as
    discussed precondition) linked using substitution like this:
    s/-lqb/-l:libqb.so.0/  (you may need to adapt the number later)
    and you may also need to add this CPPFLAG for the executable:
    -DQB_KILL_ATTRIBUTE_SECTION

- as high level of isolation of the client space from the linker
  (respectively toolchain) subtleties as possible (no new compilation
  flags and such required, plus there's no way to hook any dynamic
  computational ad-hoc decision when the compilation is about to
  happen, anyway), and in turn, versatility is preserved as much as
  possible

* * *

Finally, let's have a look how the already well-known test matrix
overview changes as of this commit, but first as a recap,
"X(Y)" denotes "X linked with linker Y":
  X(a) .. ld.bfd < 2.29
  X(b) .. ld.bfd = 2.29 (+ 2.29.1 and hopefully on)

and here you are (values in <angle brackets> denote non-trivial change
[not mere rewording] introduced as of this commit, in comparison to the
table stated in the preceding commit):

+=========+=========+=========+=========+=========+=========+=========+
#client(x)#        libqb(a) usage       #        libqb(b) usage       #
#   vvv   #---------+---------+---------+---------+---------+---------+
#    V    #  direct | libX(a) : libX(b) #  direct | libX(a) : libX(b) #
+=========+=========+=========+=========+=========+=========+=========+
#  x = a  #   OK    |   OK    :  <OK>   #  <OK>   |  <OK>   :  <OK>   #
#  x = b  #  <OK>   |  <OK>   :  <OK>   #  <OK>   |  <OK>   :  <OK>   #
+=========+=========+=========+=========+=========+=========+=========+

Everything is green \o/

* * *

Note: as of this fix, it is assumed that the non-green counterpart of
this table in the message for the preceding commit (loosely though[!],
as the occurrence of empty callsite section can no longer be attributed
to something bad going on as of this fix that enforces its presence
unconditionally, whereas it would be suppressed when unused before
with kind linkers, hence some other conditions can be witnessed
especially when QB_LOG_INIT_DATA misused in no-logging context)
doubles as an indicator how will mixing the logging participants wrt.
linker+libqb version work out, when "X(Y)" becomes read as "X linked
with linker Y under additional restriction on libqb version when
compile-time link is performed of the particular part":
  X(a) .. ld.bfd < 2.29 OR [arbitrary ld.bfd AND libqb after this fix)
  X(b) .. ld.bfd = 2.29 (and likely on) AND libqb up to, but excluding
          this fix

* * *

Let's also state some imperfections and loops kept open:

Deficiencies:
* whenever anything is compiled against our install-time-modified
  libqb.so so as to force the visibility of the discussed symbols
  (or when compiling [with] libqb internally):
> /usr/bin/ld: warning: ../lib/qblog_script.ld contains output sections; did you forget -T?
  - not solvable as long as we use the linker script, and there's
    hardly any other way not requiring the libqb consumers to adapt
    in any aspect
* as already mentioned, lacking compatibility with ld.gold linker and
  won't foreseeably be (cf. https://bugzilla.redhat.com/1500898#c7)
  - please stick with ld.bfd (i.e. default ld linker), which you
    had to do in the past anyway (at least for compiling libqb
    itself)

Open questions:
* should we enable attribute((__section__)) for powerpc and other minor
  platforms if the feature is proved to be working there as well?
  and if/when that's going to happen, we need to figure out the
  transition plan to be spread throughout an extended period to keep
  the transition smooth -- notably when now-with-callsite-section
  clients will get run-time linked with callsite-section-not-a-default
  libqb (say upon it's downgrade), and for that, the libqb's support
  alone should be enabled year(s) ahead of the actual client space...

* * *

[*] basically GCC's section("SECNAME") __attribute__ annotation of the
global variables + linker described behaviour previously mistakenly
taken for granted

References:
[1] http://oss.clusterlabs.org/pipermail/developers/2017-July/000503.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1477354#c2 + comment 8
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1477354#c9

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2017-12-12 23:23:35 +01:00

31 lines
1.4 KiB
Plaintext

#include <qb/qblog.h>
/* GNU ld script
This atypical arrangement enforces global visibility of boundary symbols
(QB_ATTR_SECTION_START, QB_ATTR_SECTION_STOP) for the custom section
QB_ATTR_SECTION used for compile-time offloading of the logging call sites
tracking. While libqb relies on these being global, default linker from
binutils change the visibility as of version 2.29, making the logging
unusable without artificial stimulus: https://bugzilla.redhat.com/1477354
NOTE: If -lqb not usable for linking (e.g. linker not compatible with
linker scripts ad-hoc modifying output sections), try recent
ld.bfd/binutils linker first when available, otherwise you can
try "-l:libqb.so.<digit>" link switch that bypasses said linker
script -- but beware, logging may be less efficient and may lack
possible future optimizations and extra features. Consequently,
logging issues (typically bound to QB_LOG_INIT_DATA macro) can be
mitigated with QB_KILL_ATTRIBUTE_SECTION macro defined for a build.
NOTE: When concerned about a warning coming from the build process like
warning: [...]libqb.so contains output sections; did you forget -T?
while it finishes OK, consider it merely a harmless side-effect.
*/
SECTIONS {
#ifdef QB_HAVE_ATTRIBUTE_SECTION
QB_ATTR_SECTION : {
QB_ATTR_SECTION_START = .;
*(QB_ATTR_SECTION);
QB_ATTR_SECTION_STOP = .;
}
#endif
}