On Bitcoin script and witness sizes

The following investigation deals with the size requirements for scripts and witnesses of different transaction types. Through analysis of first principles and empirical data, a set of estimates is established for scripts and witnesses. These estimates can serve as inputs for quantitative models whose projections might help make future block-size discussions more objective.

This article is structured as follows. The next section contains a brief discussion on relevant terminology. The following two sections focus on the encodings of public keys and signatures used in Bitcoin. Then all Bitcoin transaction output types are discussed individually and in detail. Finally, the results section summarizes the findings.

Terminology

To avoid misconceptions and have a rational discussion, a brief definition of keys terms is in order. In particular, a differentiation between transaction type and transaction output type is helpful, for both terms are sometimes conflated, which can lead to confusion.

The transaction type defines two properties that transactions can have. The first property has to do with the creation of new coins: each block contains exactly one coinbase transaction, which is used by miners to generate new coins; all other transactions in the block spend previously existing coins and are non-coinbase transactions. The second property concerns Bitcoin's Segregated Witness (segwit) upgrade: when one or more of a transaction's inputs is associated with witness data, the entire transaction is considered a segwit transaction; otherwise, the transaction is considered a non-segwit transaction.

The transaction output type, on the other hand, defines the type of a transaction output. Each output contains a locking script that sets the conditions under which the coins locked in the output can be spent, and an output's type is determined by the kind of script used in the output: when, for example, an output's script specifies that a public key is required to unlock its coins, the output's type is Pay-to-Public-Key (P2PK).

The process of spending an output is sometimes referred to as transaction, so spending a P2PK output becomes a P2PK transaction. This, however, means that the term transaction now has two different meanings: on one hand it refers to Bitcoin's transaction data structure; on the other to the act of spending an output. In line with this terminology, a single transaction (first meaning) can contain multiple transactions (second meaning), which can be confusing.

In conclusion, it can be established that only coinbase and segwit transactions refer to genuine transaction types, while all other transaction types, in fact, refer to an input spending an output with a particular transaction output type.

Encoding of Public Keys

Bitcoin uses elliptic curve cryptography, where public keys correspond to x- and y-coordinates of points on an elliptic curve. The elliptic curve used by Bitcoin, secp256k1, is defined in the Standards for Efficiency Cryptography (SEC), and uses 256-bit numbers to represent the x- and y-coordinates of points on the curve.

In addition to establishing parameters for different elliptic curves, the SEC also defines standards to encode public keys. Originally, Bitcoin only supported the uncompressed SEC format. In this format, a public key is represented by a one-byte magic number indicating the kind of SEC encoding used (in this case, uncompressed) and the encoding of the key's two 32-byte x- and y-coordinates. The uncompressed SEC format thus uses a total of 65 bytes.

After some time, support for the compressed SEC format was added to Bitcoin. The compressed format reduces the size of the encoding of the public key significantly by taking advantage of fact that a public key's y-coordinate (with the exception of its sign) can be derived from the public key's x-coordinate. Thus, instead of encoding the public key's x- and y-coordinates, only its x-coordinate is encoded when using compressed SEC format. The ambiguity concerning the y-coordinate's sign is resolved by using two different one-byte magic numbers for the SEC prefix: one indicates the compressed SEC format with a positive y-coordinate; the other a negative y-coordinate. By getting rid of y-coordinate, the encoding's size can be reduced by 32 bytes. Thus, using the compressed format, public keys can be encoded using only 33 bytes.

Image

The figure above shows the share of uncompressed and compressed encodings in P2PK, P2PKH, and P2WSH transactions. The empirical data reveals a steady decline in the use of uncompressed keys in Bitcoin over the years. In fact, in the recent past the share of transactions using uncompressed keys has been so low that, for practical purposes, the size of today's encoded public keys can be assumed to be 33 bytes.

Encoding of Signatures

Bitcoin's signature encoding follows the Distinguished Encoding Rules (DER). A signature consists of two 256-bit numbers, r and s. When encoded with DER, each of the numbers is prefixed by two bytes: one to encode the data type, which is integer; the other to indicate the number's length in bytes.

Next is the encoding of the values of r and s. Although the values are 256-bit values, their encoding is not always 32 bytes. For example, the fact that DER uses signed integers has implications on the size of the encoding: signed integers use the highest bit of a number's binary representation to distinguish between positive and negative values, so values of r and s for which the highest bit is set would be interpreted as a negative numbers. To prevent this, a zero-byte is prepended in cases where the highest bit is set: this leaves the value unchanged but enforces the highest bit not set. Values of r and s that have their highest bit set and require the zero-padding are also referred to as high values, while those who do not have the highest bit set and do not require the zero-padding are called low values.

As of Bitcoin Core release 0.11.1 (in October 2015), the “low s” rule enforces only transactions with low s values are relayed, which implies the encoding of s will have a size of at most 32 bytes; only miners can bypass this restriction and include signatures with a high s value in a block. For r, this restriction does not exist. Assuming uniform distribution of values, a signature should contain a high r value half of the time. On average, the size of its encoding is therefore 32.5 bytes.

With the two bytes indicating the data type and length for each value, the average size so far is 34.5 bytes for r and 34 bytes for s, or 68.5 bytes for both. To get a valid DER signature, two more bytes are required: one to indicate that the encoding holds multiple values, and another to indicate the total length of the encoding. A full DER signature consists of these two bytes followed by the previously discussed encoding of r and s values. The average size of a DER signature is thus 70.5 bytes. Finally, it should be pointed out that Bitcoin supports multiple signature types. To differentiate between these, a byte indicating the signature type is appended to the DER signature. Thus, encoded signatures have an average size of 71.5 bytes.

To complete the analysis of signature sizes, two more aspects require discussion. The first concerns the fact that the signature's r and s values are random. In case the binary representation of a random 256-bit number has eight or of its leading bits set to zero, the number can be encoded using less than 32 bytes. The second aspect concerns a signature-size optimization that is made possible by the fact that the generation of r and s values involves a random input: some implementations discard signatures with high-r values; instead, they select a new random input and derive new r and s values until a low-r value is discovered. Together with a low-s value, which is enforced by for standard transactions, the resulting signatures have a size of at most 71 bytes.

Image

To quantify the overall impact of the different factors influencing the signature size, an analysis of empirical data is in order. The histogram above shows the occurrence of different signature sizes in the Bitcoin blockchain in the two years before block 626,267. As expected, the bulk of signatures have a size of 71 or 72 bytes. The slight bias toward 71 bytes can be credited to the previously discussed low-r optimization. Moreover, a small amount of 70-byte signatures can be observed. These can be attributed to the previously discussed occurrence of leading zero bits in the binary representation of r and s values. In fact, there are even signatures that are only 69 or 68 bytes; however, these occur so infrequently that they are not visible in the histogram. The average signature size according to empirical data is 71.46 bytes, which matches the previous analytic estimate of 71.5 bytes. Thus, for practical purposes, the size of today's signatures can be assumed to be 71.5 bytes.

Pay-to-Public-Key

The locking-script format of Pay-to-Public-Key (P2PK) outputs is <len> <pubkey> OP_CHECKSIG, with <len>, the length of the following public key in bytes; <pubkey>, an SEC-encoded public key; and OP_CHECKSIG, a Bitcoin Script instruction for signature verification.

The encodings of the length of the public key and the Bitcoin script instruction require one byte each. The previously determined estimate for the public key is 33 bytes. The estimate for the locking-script size of P2PK outputs is thus 35 bytes.

The unlocking-script format of P2PK outputs is <len> <sig>, with <len>, the size of the following signature in bytes; and <sig>, a DER-encoded signature.

The encoding of the length of the signature requires one byte and the previously established estimate for the signature's size is 71.5 bytes. The estimate for P2PK unlocking scripts is thus 72.5 bytes.

The overall contribution of P2PK scripts is given by the sum of the locking and unlocking scripts' sizes. The former have a size of 35 bytes; the latter 72.5 bytes. The estimate for P2PK scripts, therefore, is 107.5 bytes.

Image

The validity of the estimate is corroborated by the empirical data shown in the figure above, which features a histogram of P2PK script sizes for all spent P2PK outputs as of block 626,267. As expected, the bulk of transactions have a combined script size of 107 and 108 bytes (with a slight bias toward 107 bytes due to the low-r optimization). The second cluster of sizes centered around 141 bytes corresponds to transactions using uncompressed public keys. Note that the histogram includes data from all P2PK transactions. The second cluster is only an artifact from the early days of Bitcoin when uncompressed keys were more common, so its existence has no impact on the validity of the 107.5-byte estimate for scripts in today's P2PK transactions.

Pay-to-Public-Key-Hash

The locking-script format of Pay-to-Public-Key-Hash (P2PKH) outputs is OP_DUP OP_HASH160 0x14 <hash> OP_EQUALVERIFY OP_CHECKSIG, with OP_DUP, the Bitcoin Script instruction to duplicate the top stack item; OP_HASH160, the Bitcoin Script instruction to apply the HASH-160 function to the top stack item; 0x14, the length of the following 20-byte hash in bytes (using hexadecimal representation); <hash>, a 20-byte HASH-160 of a public key; OP_EQUALVERIFY, the Bitcoin Script instruction to make the transaction invalid if the two top stack items differ; and OP_CHECKSIG, a Bitcoin Script instruction to verify a signature.

The four Bitcoin Script instructions and the encoding of the length of the hash require one byte each. Together with the 20-byte hash, this implies a locking-script size of 25 bytes

The unlocking-script format of P2PKH outputs is <len> <sig> <len> <pubkey>, with <len>, the size of the following signature in bytes; <sig>, a DER-encoded signature, created using the private key from which the public key that is presented next in the locking script was derived; <len>, the size of the following public key in bytes; and <pubkey>, the SEC-encoded public key from which the HASH-160 used in the locking script was derived.

The encodings of the lengths of the signature and the public key require one byte each. Together with the previously established estimates for signatures and public keys of 71.5 and 33 bytes, respectively, this results in an unlocking-script size estimate of 106.5 bytes.

The overall size of P2PKH scripts is the sum of the sizes of the locking and unlocking scripts, which are 25 and 106.5 bytes, respectively. Thus, the estimate for P2PKH scripts is 131.5 bytes.

Image

As before, the validity of this estimate is corroborated by empirical data. The figure above contains a histogram showing the combined script sizes of all P2PKH transactions as of block 626,267. As expected, the of bulk of transactions have a combined script size of 131 and 132 bytes. The second cluster at around 164 bytes corresponds to transactions using uncompressed public keys. Note that the histogram includes all P2PKH transactions, so the second cluster is again an artifact from Bitcoin's early days when uncompressed keys were more widespread. The second cluster thus does not diminish the validity of the 131.5-byte estimate for the combined script size of today's P2PKH transactions.

Bare Multi-Signature

The locking-script format of bare Multi-Signature (multisig) outputs is OP_m <len> <pubkey1> ... <len> <pubkeyn> OP_n OP_CHECKMULTISIG, with OP_m, specifying m, the number of signatures required to spend the output; <len>, the length of the following encoded key in bytes; <pubkey1> ... <pubkeyn>, a list of n SEC-encoded public keys; OP_n, encoding n, the number of public keys; and OP_CHECKMULTISIG, the Bitcoin Script instruction for multi-signature verification.

In contrast to the P2PK and P2PKH output types, where locking scripts had a fixed size, the locking-script size of multisig depends on the number of public keys used in the script. To get a better idea of the most popular multisig use cases and the corresponding locking-script sizes, an analysis of empirical data is in order.

Type Count Frequency Sizes (and Frequencies)
1-of-1 2,438 0.5% 37 B (69.2%), 69 B (30.8%)
1-of-2 58,432 12.0% 71 B (73.1%), 103 B (26.9%), 135 B (<0.1%)
2-of-2 3,674 0.8% 71 B (98.5%), 103 B (0.2%), 135 B (1.3%)
1-of-3 419,671 86.4% 105 B (82.6%), 137 B (17.4%), 169 B (<0.1%), 201 B (<0.1%)
2-of-3 1,617 0.3% 105 B (94.4%), 137 B (0.3%), 169 B (0%), 201 B (5.3%)
3-of-3 38 <0.1% 105 B (92.1%), 137 B (0%), 169 B (0%), 201 B (7.9%)

The table above shows the count, frequency, and locking-script sizes (and their frequency) for the different bare multisig output types as of block 626,267. The data shows that 1-of-2 and 1-of-3 are by far the most popular bare multisig variants—together amounting to more than 98% of all multisig outputs. Consequently, in the following, the focus will be on estimates for those particular variants.

Moreover, the data indicates a strong tendency toward the smallest output size: in the last column, the first entry in each line represent the case where all public keys are encoded using compressed SEC format; for each type, this is the most frequently observed encoding. For any particular row in the last column, moving one entry to the right entry corresponds to one more public key being represented in uncompressed rather than compressed SEC format, which corresponds to an increase of 32 bytes in the locking script's size. Further analysis reveals that multisig outputs from the recent past have a stronger bias towards all compressed keys; the small percentage of outputs using one or more uncompressed public keys is caused by outputs from Bitcoin's early days where uncompressed keys were frequently used.

To conclude the discussion about locking-script sizes, it can be established that 1-of-2 and 1-of-3 multisig outputs are the two most important use cases; and, today, the estimates for their sizes are 71 and 105 bytes, respectively.

The unlocking-script format of multisig outputs is OP_0 <len> <sig1> ... <len> <sigm>, with OP_0, a dummy Bitcoin Script instruction to address a bug in the implementation of OP_CHECKMULTISIG; <len>, the length of the following signature in bytes; and <sig1> ... <sig1>, a list of m DER-encoded signatures.

As was the case for the locking script, the unlocking-script size of multisig outputs is not fixed; in case of unlocking scripts, the size depends on the number of signatures. The dummy Script instruction OP_0 always contributes 1 byte; each signature-length encoding contributes another byte; and each signature another 71.5 bytes. In both of the two most popular use cases (1-of-2 and 1-of-3 multisig), one signature must be provided in the unlocking script. Consequently, the estimate of the unlocking-script size for both variants is 73.5 bytes.

The estimate for the overall size of multisig scripts is the sum of the sizes of the locking and unlocking scripts. For 1-of-2 multisig transactions the respective scripts contribute 71 and 73.5 bytes, resulting in a script-size estimate of 144.5 bytes; for the 1-of-3 variant, the contributions are 105 and 73.5 bytes, yielding a script estimate of 178.5 bytes.

Image

As usual, these estimates are supported by empirical data. The figure above contains a histogram of the combined script sizes of all 1-of-2 and 1-of-3 multisig transactions as of block 626,267. For 1-of-2 multisig, a small amount of transactions with script sizes of 176 and 177 bytes can be observed, which correspond to transactions using uncompressed public keys. As before, this artifact caused by old transactions from Bitcoin's early days does not affect estimates for today's transactions. As expected, the majority of 1-of-2 multisig transaction's script sizes are in line with the 144.5-byte estimate and have a size of either 144 or 145 bytes. In case of 1-of-3 multisig, the majority of transaction's script sizes agree with the 178.5-byte estimate and have a size of either 178 or 179 bytes.

Pay-to-Script-Hash

The locking-script format of Pay-to-Script-Hash (P2SH) outputs is OP_HASH160 0x14 <hash> OP_EQUAL, with OP_HASH160, the Bitcoin Script instruction to apply the HASH-160 function to the top stack item; 0x14, the length of the following 20-byte hash in bytes (using hexadecimal representation); <hash>, a 20-byte HASH-160 of the redeem script locking the output; and OP_EQUAL, the Bitcoin Script instruction to determine whether the two top stack items are equal.

Each of the Bitcoin script instructions contributes one byte, as does the encoding of the length of the hash. Together with the 20-byte hash, the total locking-script size is thus 23 bytes.

The unlocking-script format of P2SH outputs is <data> <len> <redeem script>, with <data>, some data to fulfill the conditions set by the redeem script; <len>, the length of the following redeem script; and <redeem>, the redeem script that will be interpreted as locking script.

In theory, redeem scripts can set arbitrary spending conditions, which means that the size of redeem scrips is variate; similarly, the data to fulfill redeem scripts varies significantly depending on the conditions set forth in the redeem scripts. In light of this, giving estimates for the size of P2SH unlocking scripts seems impractical. However, as the data in the figure below shows, in practice most redeem scripts fall into just three categories. In the year before block 626,267, 69.3% of all redeem scrips were of type P2SH-Pay-to-Witness-Public-Key-Hash (P2SH-P2WPKH); 21.1% of type P2SH-Pay-to-Witness-Script-Hash-multisig (P2SH-P2WSH-multisig); and 9.2% of type P2SH-multisig. Taking into account that only 0.4% of redeem scripts do not fall into any of the previous categories, it seems acceptable to focus on estimates for the most relevant use cases and neglect the rest.

Image

Pay-to-Script-Hash-Pay-to-Witness-Public-Key-Hash

For the most popular use case of P2SH-P2WPKH, the redeem script is OP_0 0x14 <hash>, with OP_0, a Bitcoin Script instruction used to represent the version of the witness program; 0x14, the length of the following 20-byte hash in bytes (using hexadecimal representation); and <hash>, a 20-byte HASH-160 of an public key. The version of the witness program and the encoding of the length of the hash contribute one byte each. Together with the 20-byte hash, this results in a redeem-script size of 22 bytes. Since the unlocking script also includes a byte encoding of the length of the 22-byte redeem script, the total unlocking-script size is 23 bytes.

The witness for P2WPKH has a format of 0x02 <len> <sig> <len> <pubkey>, with 0x02, the number of items contained in the witness script (two in case of P2WPKH: signature and corresponding public key); <len>, the size of the following signature in bytes; <sig>, a DER-encoded signature created using the private key from which the public key that is presented next in the locking script was derived; <len>, the size of the following public key in bytes; and <pubkey>, the public key from which the HASH-160 used in the redeem script was derived. The average size of the witness is given by the byte indicating the number of items in the witness script, the two bytes indicating the signature's and public key's lengths, the average signature size of 71.5 bytes, and the public key's size of 33 bytes. The witness-size estimate for P2WPKH is thus 107.5 bytes.

Pay-to-Script-Hash-Pay-to-Witness-Script-Hash-Multi-Signature

For P2SH-P2WSH-multisig the redeem script is OP_0 0x20 <hash>, with OP_0, a Bitcoin Script instruction used to represent the version of the witness program; 0x20, the length of the following 32-byte hash in bytes (using hexadecimal representation); and <hash>, a 32-byte SHA-256 hash of the witness script locking the output. The version of the witness program and the encoding of the length of the hash contribute one byte each. Together with the 32-byte hash, this results in a redeem-script size of 34 bytes. In addition to the redeem script, the unlocking script also contains an encoding of the length of the redeem script, which contributes one byte. The total unlocking-script size is thus 35 bytes.

The witness for a P2SH-P2WSH-multisig output is <nitems> <data> <len> <witness script>, with <nitems>, the number of items contained in the witness; <data>, some data to fulfill the conditions set by the witness script; <len>, the length of the following witness script; and <witness script>, the witness script that will be interpreted as locking script.

Empirical data shows that the 2-of-2 and 2-of-3 multisig variants account for more than 90% of all P2SH-P2WSH-multisig transactions, so witness-size estimates focus on these two cases.

In case of 2-of-2 multisig, the witness-script size corresponds to the size of the locking script for 2-of-2 multisig, which is OP_2 <len> <pubkey1> <len> <pubkey2> OP_2 OP_CHECKMULTISIG. The three Bitcoin script instruction contribute one byte each; the two encodings of the lengths of the public keys contribute one byte each, too; finally, each of the public keys contributes 33 bytes. The witness-script size estimate is thus 71 bytes. To satisfy the witness script, two 71.5-byte signatures and an encoding of each signature's length accompanied by an OP_0, amounting to a total of 146 bytes, must be supplied in the witness. Finally, the encodings of the number of items in the witness and the length of the witness script contribute one byte each. The total estimate for the size of the witness is thus 219 bytes.

In case of 2-of-3 multisig, an additional 33-byte public key and an encoding of the key's length must be provided in the witness script, so the witness script's size increases from 71 bytes to 105 bytes. As before, two signatures are required to spend the fund, so the the size of the witness' <data> remains 146 bytes. Given one byte each for the encoding of the number of items in the witness and the length of the witness script, the 146-byte <data> part, and the 105-byte witness script itself, the total estimate for the size of the witness is 253 bytes.

Pay-to-Script-Hash-Multi-Signature

As was the case for P2SH-P2WPKH-multisig, for P2SH-multisig the 2-of-2 and 2-of-3 use cases account for more than 90% of all transactions, so locking-script size estimates will be given only for these two cases.

For 2-of-2 P2SH-multisig the redeem script is a 2-of-2 multisig locking script, whose size estimate was previously established to be 71 bytes. To satisfy the redeem script, two 71.5-byte signatures and an encoding of each signature's length accompanied by an OP_0, amounting to a total of 146 bytes, must be supplied in the witness. The total unlocking-script size thus corresponds to the sum of the 146-byte signature part, the 71-byte redeem script, and the byte for the encoding of the redeem script's length. The corresponding unlocking-script size estimate is thus 218 bytes.

For 2-of-3 P2SH-multisig the redeem script is a 2-of-3 multisig locking script, whose size estimate was previously established to be 105 bytes. Since the redeem script has become larger than 75 bytes, two bytes are required to encode its size (see OP_PUSHDATA1 here). All other contributions remain the same, so the locking-script size for 2-of-3 P2SH-multisig corresponds to the sum of the 146-byte <data> part, the 105-byte redeem script, and the two bytes required to encode the redeem script's size. The corresponding unlocking-script estimate is thus 253 bytes.

For previously discussed transaction types, the combined script size was given by the sum of the locking and unlocking script sizes. However, in case of nested Segwit transactions (i.e., P2SH-P2WPKH and P2SH-P2WSH), the witness' size must be included when determining the overall contribution of scripts and witnesses.

So, the script and witness size estimate for P2SH-P2WPKH transactions takes into account the locking script size of 23 bytes, the unlocking-script size of 23 bytes, and the witness size of 107.5 bytes. The corresponding estimate for P2SH-P2WPKH transactions is thus 153.5 bytes.

For P2SH-P2WSH-multisig transactions the locking and unlocking script sizes are 23 and 35 bytes, respectively. For 2-of-2 the witness size is 219 bytes; for 2-of-3 it is 253 bytes. The combined script and witness size estimates for 2-of-2 and 2-of-3 P2SH-P2WSH-multisig transactions are thus 277 and 311 bytes, respectively.

For P2SH-multisig transactions the locking script size is 23 bytes. The unlocking-script size for 2-of-2 P2SH-multisig is 218 bytes; for 2-of-3 it is 253 bytes. Hence, the combined script estimates for 2-of-2 and 2-of-3 P2SH-multisig transactions are 241 and 276 bytes, respectively.

Image

All of the estimates are supported by empirical data. The figure above contains a histogram of the combined script (and witness) sizes of all P2SH transactions as of block 626,267. Going from left to right, the first cluster corresponds to the P2SH-P2WPKH estimate of 153.5 bytes; the next to the 2-of-2 P2SH-multisig estimate of 241 bytes; the third to the 2-of-3 P2SH-multisig estimate of 276 bytes; the forth, which slightly overlaps the third, to the 2-of-2 P2SH-P2WSH-multisig estimate of 277 bytes; and finally, the fifth to the 2-of-3 P2SH-P2WSH-multisig estimate of 311 bytes.

Null Data

The locking-script format of Null Data outputs is OP_RETURN <len> <data> ... <len> <data>, with OP_RETURN, the Bitcoin Script instruction to indicate a Null Data output type; <len>, the length of the following data item; and <data>, arbitrary data.

The Bitcoin script instruction contributes one byte. The encoding of the length of the data contributes one byte if the data is at most 75 bytes and two bytes if the data is larger than 75 bytes. In Bitcoin Core 0.11, the size limitation of the data part was increased from 40 to 80 bytes. The total locking-script size is thus between one byte (just OP_RETURN and no data) and 83 bytes (OP_RETURN, two bytes to encode the length of the data, and 80 bytes of data). In light of this wide range, an analysis of empirical data is necessary to derive a meaningful estimate for the locking-script size.

Image

The figure above contains a histogram of the locking-script sizes of all Null Data outputs as of block 626,267. The data reveals two dominant use cases, which together make up over 90% of all Null Data outputs. The first peak at 22 bytes corresponds to a data size of 20 bytes—conceivably instances of recording HASH-160 or RIPEMD-160 hashes in Null Data outputs. The second peak at 83 bytes represents a data size of 80 bytes, which is the upper limit for Null Data outputs. Possibly, these instances correspond to use cases where a large amount of data is split up into multiple Null Data outputs with each one storing as much data as is possible. The data in the histogram indicates that the bulk-data use case is slightly more popular. Based on this empirical data, the estimate for the locking-script size is 53.6 bytes.

Null Data outputs are designed to be unspendable, so giving estimates for the unlocking-script size is pointless. Moreover, since there can be no Null Data transactions, giving combined script-size estimates is equally pointless.

Pay-to-Witness-Public-Key-Hash

The locking-script format of Pay-to-Witness-Public-Key-Hash (P2WPKH) outputs is OP_0 0x14 <hash>, with OP_0, a Bitcoin Script instruction used to represent the version of the witness program; 0x14, the length of the following 20-byte hash in bytes (using hexadecimal representation); and <hash>, a 20-byte HASH-160 of a public key.

The locking-script size of P2WPKH outputs is always 22 bytes: the 20-byte hash as well as one byte each for the Bitcoin script instruction and the encoding of the length of the hash.

The witness for a P2WPKH output is 0x02 <len> <sig> <len> <pubkey>, with 0x02, the number of items contained in the witness script (two in case of P2PKH: signature and corresponding public key); <len>, the size of the following signature in bytes; <sig>, a signature created using the private key from which the public key that is presented next in the witness was derived; <len>, the size of the following public key in bytes; and <pubkey>, the public key from which the HASH-160 used in the locking script was derived.

The encodings of the number of items in the witness and the lengths of the signature and public keys contribute one byte each. Together with a 71.5-byte signature and a 33-byte public key this results in a witness-size estimate of 107.5 bytes.

The overall contribution of scripts and witnesses of P2WPKH transactions is the sum of the locking-script and witness sizes, which are 22 and 107.5 bytes, respectively. The script-and-witness estimate for P2WPKH transactions is thus 129.5 bytes.

Image

The 129.5-byte estimate is corroborated by empirical data: the figure above, which contains a histogram of the combined script and witness sizes of all P2WPKH transactions as of block 626,267, shows that more than 99% of transactions have a combined script and witness size of either 129 or 130 bytes.

Pay-to-Witness-Script-Hash

The locking-script format of Pay-to-Witness-Script-Hash (P2WSH) outputs is OP_0 0x20 <hash>, with OP_0, a Bitcoin Script instruction used to represent the version of the witness program; 0x20, the length of the following 32-byte hash in bytes (using hexadecimal representation); and <hash>, a 32-byte SHA-256 hash of the witness script locking the output.

The locking-script size of P2WSH outputs is always 34 bytes: the 32-byte hash as well as one byte each for the Bitcoin script instruction and the encoding of the length of the hash.

The general witness format of P2WSH inputs is <nitems> <data> <len> <witness script>, with <nitems>, the number of items contained in the witness; <data>, some data to fulfill the conditions set by the witness script; <len>, the length of the following witness script; <witness script>, the witness script that will be interpreted as locking script.

In theory, witness scripts can set arbitrary spending conditions, which means that the size of witness scripts is variate; similarly, the data to fulfill witness scripts varies significantly depending on the conditions set forth in the witness scripts. In light of this, giving estimates for the size of P2WSH witnesses seems impractical. However, as the data in the figure below shows, in practice most witness scripts impose multisig spending conditions.

Image

Moreover, only three use cases account for more than 98% of all P2WSH-multisig transactions, namely the 1-of-1 (20%), 2-of-2 (17%), and 2-of-3 (62%) multisig variants. Taking this into account, it seems acceptable to focus on estimates for these relevant use cases and neglect the rest.

In case of 1-of-1 P2WSH-multisig, the witness script corresponds to a 1-of-1 multisig locking script, which is OP_1 <len> <pubkey> OP_1 OP_CHECKMULTISIG. The three Bitcoin script instruction contribute one byte each, as does the encoding of the length of the public key, while the public key itself contributes 33 bytes. The witness-script size estimate is thus 37 bytes. To satisfy the witness script, a 71.5-byte signature and an encoding of signature's length accompanied by an OP_0 must be supplied, so the total contribution of the data part is 73.5 bytes. Finally, the encodings of the number of items in the witness and the length of the witness script contribute one byte each. The corresponding witness-size estimate is thus 112.5 bytes.

For 2-of-2 P2WSH-multisig, the witness script contains a second public key and the encoding of the key's length, so the witness-script size is increased by 34 bytes to 71 bytes. To satisfy the witness, a second 71.5-byte signature and the encoding of the signature's length are required, increasing the contribution of the data part by 72.5 bytes to 146 bytes. The remaining contribution of two bytes for the encodings of the number of items in the witness and the length of the witness script remains, so the corresponding witness-size estimate is 219 bytes.

For 2-of-3 P2WSH-multisig, the witness script contains a third public key and the encoding of the key's length, so the witness-script size is increased by another 34 bytes to 105 bytes. All other contributions remain, so the corresponding witness-size estimate is 253 bytes.

For all P2WSH-multisig variants, the overall script and witness contribution is given by the sum of the locking-script and witness sizes. The former is always 34 bytes. The latter 112.5 bytes for 1-of-1; 219 bytes for 2-of-2; and 253 bytes for 2-of-3 P2WSH-multisig. The combined script and witness size estimate is thus 146.5 bytes for 1-of-1; 253 bytes for 2-of-2; and 287 bytes for 2-of-3 P2WSH-multisig.

Image

Empirical data supports all of these estimates. The figure above contains a histogram of the combined script and witness sizes of all P2WSH transactions as of block 626,267. Going from left to right, the first cluster corresponds to the 1-of-1 P2WSH-multisig estimate of 146.5 bytes; the second to the 2-of-2 P2WSH-multisig estimate of 253 bytes; and finally, the third to the 2-of-3 P2WSH-multisig estimate of 287 bytes.

Results and Conclusion

The size of scripts and witnesses for all transaction output types (as of May 2020) were investigated using analysis of first principles and empirical data. Based on the results of the analysis, estimates were established for all relevant use cases; moreover, all estimates were validated using empirical data. The following table summarizes the findings. It shows locking script, unlocking script, and witness sizes for all relevant use cases; also shown is the combined script (and witness) size in bytes, the total weight in weight units (WU), and the virtual size (vsize) in vbytes.

Type locking
script
unlocking
script
witness total
size
total
weight
total
vsize
P2PK 35 B 72.5 B 107.5 B 430 WU 107.5 vB
P2PKH 25 B 106.5 B 131.5 B 526 WU 131.5 vB
P2WPKH (native segwit) 22 B 107.5 B 129.5 B 195.5 WU 48.75 vB
P2WSH-1-of-1-multisig (native segwit) 34 B 112.5 B 146.5 B 243.5 WU 60.875 vB
P2WSH-2-of-2-multisig (native segwit) 34 B 219 B 253 B 389 WU 97.25 vB
P2WSH-2-of-3-multisig (native segwit) 34 B 253 B 287 B 423 WU 105.75 vB
P2SH-2-of-2-multisig 23 B 218 B 241 B 964 WU 241 vB
P2SH-2-of-3-multisig 23 B 253 B 276 B 1104 WU 276 vB
P2SH-P2WPKH (nested segwit) 23 B 23 B 107.5 B 153.5 B 291.5 WU 72.875 vB
P2SH-P2WSH-2-of-2-multisig (nested segw.) 23 B 35 B 219 B 277 B 451 WU 112.75 vB
P2SH-P2WSH-2-of-3-multisig (nested segw.) 23 B 35 B 253 B 311 B 485 WU 121.25 vB
Null Data 53.6 B
Bare multisig (1-of-2) 71 B 73.5 B 144.5 B 578 WU 144.5 vB
Bare multisig (1-of-3) 105 B 73.5 B 178.5 B 714 WU 178.5 vB

This table represents the essence of the findings of the previous investigation. It is meant to serve as a starting point for developing quantitative models whose projections might provide some objectivity to future discussions about various Bitcoin improvements.

If you found the information in this article useful, feel free to contribute: 16pGpaoAhzoneLdRdxPSo9xAAPhzWnP2dA. If you have scientific, Bitcoin-related freelance work, let me know.