Data Set Memory Sizer

Use the table below to calculate how much memory a data set requires by entering the number of columns for each data type.

Since numerical data types have a fixed size, calculaing their memory requirements is simply a matter of specifying the number of columns for each byte size.

However, textual columns can vary in the number of characters they contain, and they can use different byte sizes depending on how characters are encoded. The character encoding determines how many different text characters can be used. For example, plain ASCII only allows 255 different text characters, so it only uses one byte to store a text character. Various forms of Unicode  text allow for allow for additional symbol characters and foreign language characters, so they can use up to four bytes to store a single text character. For character based columns, therefore, determine the byte size for the encoding that is used. Then, determine the average number of characters per row for each byte size.




* A bit just takes up 1/8th of a byte, but it still requires at least one entire byte to store it in memory. However, where multiple single bit columns exist, some applications and programming languages can condense up to 8 distinct bit values into a single byte in order to conserve memory. Check the "Condense" button to indicate if your application can condense single bit columns together instead of storing each in their own byte.

**This table is based on byte size, not data type. The data types listed above are provided as common examples only. Specific applications or programming languages may store any given data type using a different number of bytes than other applications. As applications are updated to newer versions, they may also change the number of bytes they use for a given data type. Always check your application documentation to determine the exact byte size of the data types it uses. For example, different versions of C store the int data type using either 16 or 32 bits. Also, R stores a 1-bit logical value using 8 bytes (it stores almost all non-character values using 8 bytes).

To recreate a customized version of this table yourself, start with the following basic formulas and modify as desired: