View Source Constructing and Matching Binaries
This section gives a few examples on how to handle binaries in an efficient way. The sections that follow take an in-depth look at how binaries are implemented and how to best take advantages of the optimizations done by the compiler and runtime system.
Binaries can be efficiently built in the following way:
DO
my_list_to_binary(List) ->
my_list_to_binary(List, <<>>).
my_list_to_binary([H|T], Acc) ->
my_list_to_binary(T, <<Acc/binary,H>>);
my_list_to_binary([], Acc) ->
Acc.
Appending data to a binary as in the example is efficient because it is
specially optimized by the runtime system to avoid copying the Acc
binary
every time.
Prepending data to a binary in a loop is not efficient:
DO NOT
rev_list_to_binary(List) ->
rev_list_to_binary(List, <<>>).
rev_list_to_binary([H|T], Acc) ->
rev_list_to_binary(T, <<H,Acc/binary>>);
rev_list_to_binary([], Acc) ->
Acc.
This is not efficient for long lists because the Acc
binary is copied every
time. One way to make the function more efficient is like this:
DO NOT
rev_list_to_binary(List) ->
rev_list_to_binary(lists:reverse(List), <<>>).
rev_list_to_binary([H|T], Acc) ->
rev_list_to_binary(T, <<Acc/binary,H>>);
rev_list_to_binary([], Acc) ->
Acc.
Another way to avoid copying the binary each time is like this:
DO
rev_list_to_binary([H|T]) ->
RevTail = rev_list_to_binary(T),
<<RevTail/binary,H>>;
rev_list_to_binary([]) ->
<<>>.
Note that in each of the DO examples, the binary to be appended to is always given as the first segment.
Binaries can be efficiently matched in the following way:
DO
my_binary_to_list(<<H,T/binary>>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].
How Binaries are Implemented
Internally, binaries and bitstrings are implemented in the same way. In this section, they are called binaries because that is what they are called in the emulator source code.
Four types of binary objects are available internally:
Two are containers for binary data and are called:
- Refc binaries (short for reference-counted binaries)
- Heap binaries
Two are merely references to a part of a binary and are called:
- sub binaries
- match contexts
Change
In Erlang/OTP 27, the handling of binaries and bitstrings were rewritten. To fully leverage those changes in the run-time system, the compiler needs to be updated, which is planned for a future release.
Since, practically speaking, not much have changed from an efficiency and optimization perspective, the following description has not yet been updated to describe the implementation in Erlang/OTP 27.
Refc Binaries
Refc binaries consist of two parts:
- An object stored on the process heap, called a ProcBin
- The binary object itself, stored outside all process heaps
The binary object can be referenced by any number of ProcBins from any number of processes. The object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.
All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.
Heap Binaries
Heap binaries are small binaries, up to 64 bytes, and are stored directly on the process heap. They are copied when the process is garbage-collected and when they are sent as a message. They do not require any special handling by the garbage collector.
Sub Binaries
The reference objects sub binaries and match contexts can reference part of a refc binary or heap binary.
A sub binary is created by
split_binary/2
and when a binary is matched out in a
binary pattern. A sub binary is a reference into a part of another binary (refc
or heap binary, but never into another sub binary). Therefore, matching out a
binary is relatively cheap because the actual binary data is never copied.
Match Context
A match context is similar to a sub binary, but is optimized for binary matching. For example, it contains a direct pointer to the binary data. For each field that is matched out of a binary, the position in the match context is incremented.
The compiler tries to avoid generating code that creates a sub binary, only to shortly afterwards create a new match context and discard the sub binary. Instead of creating a sub binary, the match context is kept.
The compiler can only do this optimization if it knows that the match context will not be shared. If it would be shared, the functional properties (also called referential transparency) of Erlang would break.
Constructing Binaries
Appending to a binary or bitstring in the following way is specially optimized to avoid copying the binary:
<<Binary/binary, ...>>
%% - OR -
<<Binary/bitstring, ...>>
This optimization is applied by the runtime system in a way that makes it effective in most circumstances (for exceptions, see Circumstances That Force Copying). The optimization in its basic form does not need any help from the compiler. However, the compiler add hints to the runtime system when it is safe to apply the optimization in a more efficient way.
Change
The compiler support for making the optimization more efficient was added in Erlang/OTP 26.
To explain how the basic optimization works, let us examine the following code line by line:
Bin0 = <<0>>, %% 1
Bin1 = <<Bin0/binary,1,2,3>>, %% 2
Bin2 = <<Bin1/binary,4,5,6>>, %% 3
Bin3 = <<Bin2/binary,7,8,9>>, %% 4
Bin4 = <<Bin1/binary,17>>, %% 5 !!!
{Bin4,Bin3} %% 6
Line 1 (marked with the
%% 1
comment), assigns a heap binary to theBin0
variable.Line 2 is an append operation. As
Bin0
has not been involved in an append operation, a new refc binary is created and the contents ofBin0
is copied into it. The ProcBin part of the refc binary has its size set to the size of the data stored in the binary, while the binary object has extra space allocated. The size of the binary object is either twice the size ofBin1
or 256, whichever is larger. In this case it is 256.Line 3 is more interesting.
Bin1
has been used in an append operation, and it has 252 bytes of unused storage at the end, so the 3 new bytes are stored there.Line 4. The same applies here. There are 249 bytes left, so there is no problem storing another 3 bytes.
Line 5. Here something interesting happens. Notice that the result is not appended to the previous result in
Bin3
, but toBin1
. It is expected thatBin4
will be assigned the value<<0,1,2,3,17>>
. It is also expected thatBin3
will retain its value (<<0,1,2,3,4,5,6,7,8,9>>
). Clearly, the runtime system cannot write byte17
into the binary, because that would change the value ofBin3
to<<0,1,2,3,4,17,6,7,8,9>>
.To ensure that the value of
Bin3
is retained, the runtime system copies the contents ofBin1
to a new refc binary before storing the17
byte.Here is not explained how the runtime system can know that it is not allowed to write into
Bin1
; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarilyerl_bits.c
.
Compiler Support For Constructing Binaries
Change
The compiler support for making the optimization more efficient was added in Erlang/OTP 26.
In the example in the previous section, it was shown that the runtime system can handle an append operation to a heap binary by copying it to a refc binary (line 2), and also handle an append operation to a previous version of the binary by copying it (line 5). The support for doing that does not come for free. For example, to make it possible to know when it is necessary to copy the binary, for every append operation, the runtime system must create a sub binary.
When the compiler can determine that none of those situations need to be handled and that the append operation cannot possibly fail, the compiler generates code that causes the runtime system to apply a more efficient variant of the optimization.
Example:
-module(repack).
-export([repack/1]).
repack(Bin) when is_binary(Bin) ->
repack(Bin, <<>>).
repack(<<C:8,T/binary>>, Result) ->
repack(T, <<Result/binary,C:16>>);
repack(<<>>, Result) ->
Result.
The repack/2
function only keeps a single version of the binary, so there is
never any need to copy the binary. The compiler rewrites the creation of the
empty binary in repack/1
to instead create a refc binary with 256 bytes
already reserved; thus, the append operation in repack/2
never needs to handle
a binary not prepared for appending.
Circumstances That Force Copying
The optimization of the binary append operation requires that there is a single ProcBin and a single reference to the ProcBin for the binary. The reason is that the binary object can be moved (reallocated) during an append operation, and when that happens, the pointer in the ProcBin must be updated. If there would be more than one ProcBin pointing to the binary object, it would not be possible to find and update all of them.
Therefore, certain operations on a binary mark it so that any future append operation will be forced to copy the binary. In most cases, the binary object will be shrunk at the same time to reclaim the extra space allocated for growing.
When appending to a binary as follows, only the binary returned from the latest append operation will support further cheap append operations:
Bin = <<Bin0,...>>
In the code fragment in the beginning of this section, appending to Bin
will
be cheap, while appending to Bin0
will force the creation of a new binary and
copying of the contents of Bin0
.
If a binary is sent as a message to a process or port, the binary will be shrunk
and any further append operation will copy the binary data into a new binary.
For example, in the following code fragment Bin1
will be copied in the third
line:
Bin1 = <<Bin0,...>>,
PortOrPid ! Bin1,
Bin = <<Bin1,...>> %% Bin1 will be COPIED
The same happens if you insert a binary into an Ets table, send it to a port
using erlang:port_command/2
, or pass it to
enif_inspect_binary in a NIF.
Matching a binary will also cause it to shrink and the next append operation will copy the binary data:
Bin1 = <<Bin0,...>>,
<<X,Y,Z,T/binary>> = Bin1,
Bin = <<Bin1,...>> %% Bin1 will be COPIED
The reason is that a match context contains a direct pointer to the binary data.
If a process simply keeps binaries (either in "loop data" or in the process dictionary), the garbage collector can eventually shrink the binaries. If only one such binary is kept, it will not be shrunk. If the process later appends to a binary that has been shrunk, the binary object will be reallocated to make place for the data to be appended.
Matching Binaries
Let us revisit the example in the beginning of the previous section:
DO
my_binary_to_list(<<H,T/binary>>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].
The first time my_binary_to_list/1
is called, a
match context is created. The match context
points to the first byte of the binary. 1 byte is matched out and the match
context is updated to point to the second byte in the binary.
At this point it would make sense to create a
sub binary, but in this particular example the
compiler sees that there will soon be a call to a function (in this case, to
my_binary_to_list/1
itself) that immediately will create a new match context
and discard the sub binary.
Therefore my_binary_to_list/1
calls itself with the match context instead of
with a sub binary. The instruction that initializes the matching operation
basically does nothing when it sees that it was passed a match context instead
of a binary.
When the end of the binary is reached and the second clause matches, the match context will simply be discarded (removed in the next garbage collection, as there is no longer any reference to it).
To summarize, my_binary_to_list/1
only needs to create one match context and
no sub binaries.
Notice that the match context in my_binary_to_list/1
was discarded when the
entire binary had been traversed. What happens if the iteration stops before it
has reached the end of the binary? Will the optimization still work?
after_zero(<<0,T/binary>>) ->
T;
after_zero(<<_,T/binary>>) ->
after_zero(T);
after_zero(<<>>) ->
<<>>.
Yes, it will. The compiler will remove the building of the sub binary in the second clause:
...
after_zero(<<_,T/binary>>) ->
after_zero(T);
...
But it will generate code that builds a sub binary in the first clause:
after_zero(<<0,T/binary>>) ->
T;
...
Therefore, after_zero/1
builds one match context and one sub binary (assuming
it is passed a binary that contains a zero byte).
Code like the following will also be optimized:
all_but_zeroes_to_list(Buffer, Acc, 0) ->
{lists:reverse(Acc),Buffer};
all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, Acc, Remaining-1);
all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).
The compiler removes building of sub binaries in the second and third clauses,
and it adds an instruction to the first clause that converts Buffer
from a
match context to a sub binary (or do nothing if Buffer
is a binary already).
But in more complicated code, how can one know whether the optimization is applied or not?
Option bin_opt_info
Use the bin_opt_info
option to have the compiler print a lot of information
about binary optimizations. It can be given either to the compiler or erlc
:
erlc +bin_opt_info Mod.erl
or passed through an environment variable:
export ERL_COMPILER_OPTIONS=bin_opt_info
Notice that the bin_opt_info
is not meant to be a permanent option added to
your Makefile
s, because all messages that it generates cannot be eliminated.
Therefore, passing the option through the environment is in most cases the most
practical approach.
The warnings look as follows:
./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: binary is returned from the function
./efficiency_guide.erl:62: Warning: OPTIMIZED: match context reused
To make it clearer exactly what code the warnings refer to, the warnings in the following examples are inserted as comments after the clause they refer to, for example:
after_zero(<<0,T/binary>>) ->
%% BINARY CREATED: binary is returned from the function
T;
after_zero(<<_,T/binary>>) ->
%% OPTIMIZED: match context reused
after_zero(T);
after_zero(<<>>) ->
<<>>.
The warning for the first clause says that the creation of a sub binary cannot be delayed, because it will be returned. The warning for the second clause says that a sub binary will not be created (yet).
Unused Variables
The compiler figures out if a variable is unused. The same code is generated for each of the following functions:
count1(<<_,T/binary>>, Count) -> count1(T, Count+1);
count1(<<>>, Count) -> Count.
count2(<<H,T/binary>>, Count) -> count2(T, Count+1);
count2(<<>>, Count) -> Count.
count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
count3(<<>>, Count) -> Count.
In each iteration, the first 8 bits in the binary will be skipped, not matched out.