https://elixir.bootlin.com/linux/latest/source/kernel/gcov/gcc_4_7.c https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov-io.c https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov-io.h https://stackoverflow.com/questions/36839354/generating-gcda-coverage-files-via-qemu-gdb http://blog.techveda.org/howsourcedebuggerswork/ Coverage information is held in two files. A notes file, which is generated by the compiler, and a data file, which is generated by the program under test. Both files use a similar structure. We do not attempt to make these files backwards compatible with previous versions, as you only need coverage information when developing a program. We do hold version information, so that mismatches can be detected, and we use a format that allows tools to skip information they do not understand or are not interested in. Numbers are recorded in the 32 bit unsigned binary form of the endianness of the machine generating the file. 64 bit numbers are stored as two 32 bit numbers, the low part first. Strings are padded with 1 to 4 NUL bytes, to bring the length up to a multiple of 4. The number of 4 bytes is stored, followed by the padded string. Zero length and NULL strings are simply stored as a length of zero (they have no trailing NUL or padding). int32: byte3 byte2 byte1 byte0 | byte0 byte1 byte2 byte3 int64: int32:low int32:high string: int32:0 | int32:length char* char:0 padding padding: | char:0 | char:0 char:0 | char:0 char:0 char:0 item: int32 | int64 | string The basic format of the notes file is file : int32:magic int32:version int32:stamp int32:support_unexecuted_blocks record* The basic format of the data file is file : int32:magic int32:version int32:stamp record* The magic ident is different for the notes and the data files. The magic ident is used to determine the endianness of the file, when reading. The version is the same for both files and is derived from gcc's version number. The stamp value is used to synchronize note and data files and to synchronize merging within a data file. It need not be an absolute time stamp, merely a ticker that increments fast enough and cycles slow enough to distinguish different compile/run/compile cycles. Although the ident and version are formally 32 bit numbers, they are derived from 4 character ASCII strings. The version number consists of a two character major version number (first digit starts from 'A' letter to not to clash with the older numbering scheme), the single character minor version number, and a single character indicating the status of the release. That will be 'e' experimental, 'p' prerelease and 'r' for release. Because, by good fortune, these are in alphabetical order, string collating can be used to compare version strings. Be aware that the 'e' designation will (naturally) be unstable and might be incompatible with itself. For gcc 17.0 experimental, it would be 'B70e' (0x42373065). As we currently do not release more than 5 minor releases, the single character should be always fine. Major number is currently changed roughly every year, which gives us space for next 250 years (maximum allowed number would be 259.9). A record has a tag, length and variable amount of data. record: header data header: int32:tag int32:length data: item* Records are not nested, but there is a record hierarchy. Tag numbers reflect this hierarchy. Tags are unique across note and data files. Some record types have a varying amount of data. The LENGTH is the number of 4bytes that follow and is usually used to determine how much data. The tag value is split into 4 8-bit fields, one for each of four possible levels. The most significant is allocated first. Unused levels are zero. Active levels are odd-valued, so that the LSB of the level is one. A sub-level incorporates the values of its superlevels. This formatting allows you to determine the tag hierarchy, without understanding the tags themselves, and is similar to the standard section numbering used in technical documents. Level values [1..3f] are used for common tags, values [41..9f] for the notes file and [a1..ff] for the data file. The notes file contains the following records note: unit function-graph* unit: header int32:checksum string:source function-graph: announce_function basic_blocks {arcs | lines}* announce_function: header int32:ident int32:lineno_checksum int32:cfg_checksum string:name string:source int32:start_lineno int32:start_column int32:end_lineno basic_block: header int32:flags* arcs: header int32:block_no arc* arc: int32:dest_block int32:flags lines: header int32:block_no line* int32:0 string:NULL line: int32:line_no | int32:0 string:filename The BASIC_BLOCK record holds per-bb flags. The number of blocks can be inferred from its data length. There is one ARCS record per basic block. The number of arcs from a bb is implicit from the data length. It enumerates the destination bb and per-arc flags. There is one LINES record per basic block, it enumerates the source lines which belong to that basic block. Source file names are introduced by a line number of 0, following lines are from the new source file. The initial source file for the function is NULL, but the current source file should be remembered from one LINES record to the next. The end of a block is indicated by an empty filename - this does not reset the current source file. Note there is no ordering of the ARCS and LINES records: they may be in any order, interleaved in any manner. The current filename follows the order the LINES records are stored in the file, *not* the ordering of the blocks they are for. The data file contains the following records. data: {unit summary:object summary:program* function-data*}* unit: header int32:checksum function-data: announce_function present counts announce_function: header int32:ident int32:lineno_checksum int32:cfg_checksum present: header int32:present counts: header int64:count* summary: int32:checksum {count-summary}GCOV_COUNTERS_SUMMABLE count-summary: int32:num int32:runs int64:sum int64:max int64:sum_max histogram histogram: {int32:bitvector}8 histogram-buckets* histogram-buckets: int32:num int64:min int64:sum The ANNOUNCE_FUNCTION record is the same as that in the note file, but without the source location. The COUNTS gives the counter values for instrumented features. The about the whole program. The checksum is used for whole program summaries, and disambiguates different programs which include the same instrumented object file. There may be several program summaries, each with a unique checksum. The object summary's checksum is zero. Note that the data file might contain information from several runs concatenated, or the data might be merged. This file is included by both the compiler, gcov tools and the runtime support library libgcov. IN_LIBGCOV and IN_GCOV are used to distinguish which case is which. If IN_LIBGCOV is nonzero, libgcov is being built. If IN_GCOV is nonzero, the gcov tools are being built. Otherwise the compiler is being built. IN_GCOV may be positive or negative. If positive, we are compiling a tool that requires additional functions (see the code for knowledge of what those functions are).