In this article you will learn what is CTS, primitive types, value type , reference type, boxing, unboxing, blittable , non nlittable types and how it impact performance.
In Microsoft’s .NET Framework, the Common Type System (CTS) is a standard that specifies how Type definitions and specific values of Types are represented in computer memory. It is intended to allow programs written in different programming languages to easily share information. The CTS specifies no particular syntax or keywords, but instead defines a common set of types that can be used with many different language syntaxes.
For example CTS defines System.Int32 – 4 byte integer
C# defines int as an alias of System.Int32
string -> System.String
object -> System.Object
The specification for the CTS is contained in Ecma standard 335, “Common Language Infrastructure (CLI).” The CLI and the CTS were created by Microsoft, and the Microsoft .NET framework is an implementation of the standard.
Functions of CTS
- To establish a framework that helps enable cross-language integration, type safety, and high performance code execution.
- To provide an object-oriented model that supports the complete implementation of many programming languages.
- To define rules that languages must follow, which helps ensure that objects written in different languages can interact with each other.
- The CTS also defines the rules that ensures that the data types of objects written in various languages are able to interact with each other.
- Languages supported by .NET can implement all or some common data types.
Primitive Types
Certain data types are used so commonly that many compilers allow your code to manipulate them using simplified syntax. For example, you could allocate an integer using the following syntax in C#:
int a = new int(5);
But I’m sure you’ll agree that declaring and initializing an integer using this syntax is rather cumbersome. Fortunately, many compilers (including C#) allow you to use syntax similar to the following instead:
int a = 5;
This certainly makes the code more readable. And, of course, the intermediate language (IL) that is generated when using either syntax is identical. Any data types directly supported by the compiler are called primitive types. Primitive types map directly to types that exist in the base class library. For example, in C# an int maps directly to the System.Int32 type.
C# Primitive Type | BCL Type | Description |
sbyte | System.SByte | Signed 8-bit value |
byte | System.Byte | Unsigned 8-bit value |
short | System.Int16 | Signed 16-bit value |
ushort | System.UInt16 | Unsigned 16-bit value |
int | System.Int32 | Signed 32-bit value |
uint | System.UInt32 | Unsigned 32-bit value |
long | System.Int64 | Signed 64-bit value |
ulong | System.UInt64 | Unsigned 64-bit value |
char | System.Char | 16-bit Unicode character |
float | System.Single | IEEE 32-bit float |
double | System.Double | IEEE 64-bit float |
bool | System.Boolean | A True/False value |
decimal | System.Decimal | 96-bit signed integer times 100 through 1028 (common for financial calculations where rounding errors can’t be tolerated) |
string | System.String | String type |
object | System.Object | Base of all types |
Reference and Value Types
The common type system supports two general categories of types: Value Type (lightweight types) & Reference Type.
Reference types : When an object is allocated from the managed heap, the new operator returns the memory address of the object. You usually store this address in a variable. This is called a reference type variable because the variable does not actually contain the object’s bits; instead, the variable refers to the object’s bits.
There are some performance issues to consider when working with reference types. First, the memory must be allocated from the managed heap, which could force a garbage collection to occur. Second, reference types are always accessed via their pointers. So every time your code references any member of an object on the heap, code must be generated and executed to dereference the pointer in order to perform the desired action. This adversely affects both size and speed. Reference types can be self-describing types, pointer types, or interface types. The type of a reference type can be determined from values of self-describing types. Self-describing types are further split into arrays and class types. The class types are user-defined classes, boxed value types, and delegates.
Eg.
// Reference Type (because of ‘class’)
class RectRef { public int x, y, cx, cy; }
Value types : Value type objects cannot be allocated on the garbage-collected heap, and the variable representing the object does not contain a pointer to an object; the variable contains the object itself. Since the variable contains the object, a pointer does not have to be dereferenced in order to manipulate the object. This, of course, improves performance.Value types are either allocated on the stack or allocated inline in a structure. Value types can be built-in (implemented by the runtime), user-defined, or enumerations.
Eg.
// Value type (because of ‘struct’)
struct RectVal { public int x, y, cx, cy; }
RectRef rr1 = new RectRef(); // Allocated in heap
RectVal rv1; // Allocated on stack (new optional)
rr1.x = 10; // Pointer dereference
rv1.x = 10; // Changed on stack
RectRef rr2 = rr1; // Copies pointer only
RectVal rv2 = rv1; // Allocate on stack & copies members
rr1.x = 20; // Changes rr1 and rr2
rv1.x = 20; // Changes rv1, not rv2
The Rectangle type is declared using struct instead of the more common class. In C#, a type declared using struct is a value type, while types declared using class are reference types.
When possible, you should use value types instead of reference types because your application’s performance will be better. In particular, you should declare a type as a value type if all of the following are true:
- The type acts like a primitive type.
- The type doesn’t need to inherit from any other type.
- The type will not have any other types derived from it.
- Objects of the type are not frequently passed as method arguments since this would cause frequent memory copy operations, hurting performance. The next section on boxing and unboxing will explain this in more detail.
The main advantage of value types is that they are not allocated in the managed heap. Of course, value types have several limitations compared with reference types. Here are some of the ways in which value types and reference types differ.Value type objects have two representations: an unboxed form and a boxed form. Reference types are always in a boxed form. Value types are implicitly derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects’ instance fields match. In addition, System.ValueType overrides the GetHashCode method so that it produces a hash code value using an algorithm that takes into account the values in the objects’ instance fields. When defining your own value types, it is highly recommended that you override and provide explicit implementations for the Equals and GetHashCode methods.
Since you cannot declare a new value type or a new reference type using a value type as a base class, value types should not have virtual functions, cannot be abstract, and are implicitly sealed (a sealed type cannot be used as the base of a new type).
Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to null, indicating that the reference type variable doesn’t currently point to a valid object. Attempting to use a null reference type variable causes a NullReferenceException exception. By contrast, value type variables always contain a value of the underlying type. By default, all members of the value type are initialized to zero. It is not possible to generate a NullReferenceException exception when accessing a value type.
When you assign a value type variable to another value type variable, a copy of the value is made. When you assign a reference type variable to another reference type variable, only the memory address is copied. Because of the previous point, two or more reference type variables may refer to a single object in the heap. This allows operations on one variable to affect the object referenced by the other variable. On the other hand, value type variables each have their own copy of the object’s data, and it is not possible for operations on one value type variable to affect another.
There are rare situations when the runtime must initialize a value type and is unable to call its default constructor. For example, this can happen when a thread local value type must be allocated and initialized when an unmanaged thread first executes managed code. In this situation, the runtime can’t call the type’s constructor but still ensures that all members are initialized to zero or null. For this reason, it is recommended that you don’t define a parameterless constructor on a value type. In fact, the C# compiler (and others) consider this an error and won’t compile the code. This problem is rare, and it never occurs on reference types. There are no restrictions on parameterized constructors for both value types and reference types.
Since unboxed value types are not allocated on the heap, the storage allocated for them is freed as soon as the method that defines an instance of the type is no longer active. This also means that unboxed value type objects cannot receive a notification when their memory is reclaimed. However, a boxed value type will have its Finalize method called when it is garbage-collected. You are strongly discouraged from implementing a value type with a Finalize method. Like a parameterless constructor, C# considers this an error and will not compile the source code.
Boxing and Unboxing
There are many situations in which it is convenient to treat a value type as a reference type. Let’s say that you wanted to create an ArrayList object (a type defined in the System.Collections namespace) to hold a set of Points. The code might look like
// Declare a value type
struct Point {
public int x, y;
}
ArrayList a = new ArrayList();
for (int i = 0; i < 10; i++) {
Point p; // Allocate a Point (not in the heap)
p.x = p.y = i; // Initialize the members in the value type
a.Add(p); // Box the value type and add the
// reference to the array
}
When the Add method is called, memory is allocated in the heap for a Point object. The members currently residing in the Point value type (p) are copied into the newly allocated Point object. The address of the Point object (a reference type) is returned and is then passed to the Add method. The Point object will remain in the heap until it is garbage-collected. The Point value type variable (p) can be reused or freed since the ArrayList never knows anything about it. Boxing enables a unified view of the type system, where a value of any type can ultimately be treated as an object.
The opposite of boxing is, of course, unboxing. Unboxing retrieves a reference to the value type (data fields) contained within an object. Internally, the following is what happens when a reference type is unboxed:
- The common language runtime first ensures that the reference type variable is not null and that it refers to an object that is a boxed value of the desired value type. If either test fails, then an InvalidCastException exception is generated.
- If the types do match, then a pointer to the value type contained inside the object is returned. The value type that this pointer refers to does not include the usual overhead associated with a true object: a pointer to a virtual method table and a sync block.
Note that boxing always creates a new object and copies the unboxed value’s bits to the object. On the other hand, unboxing simply returns a pointer to the data within a boxed object: no memory copy occurs. However, it is commonly the case that your code will cause the data pointed to by the unboxed reference to be copied anyway.The following code demonstrates boxing and unboxing:
public static void Main() {
Int32 v = 5; // Create an unboxed value type variable
Object o = v; // o refers to a boxed version of v
v = 123; // Changes the unboxed value to 123
Console.WriteLine(v + “, ” + (Int32) o); // Displays “123, 5”
}
First, an Int32 unboxed value type (v) is created and initialized to 5. Then an Object reference type (o) is created and it wants to point to v. But reference types must always point to objects in the heap, so C# generated the proper IL code to box v and stored the address of the boxed version of v in o. Now 123 is unboxed and the referenced data is copied into the unboxed value type v; this has no effect on the boxed version of v, so the boxed version keeps its value of 5. Note that this example shows how o is unboxed (which returns a pointer to the data in o), and then the data in o is memory copied to the unboxed value type v.
Now, you have the call to WriteLine. WriteLine wants a String object passed to it but you don’t have a String object. Instead, you have these three items: an Int32 unboxed value type (v), a string, and an Int32 reference (or boxed) type (o). These must somehow be combined to create a String. To accomplish this, the C# compiler generates code that calls the String object’s static Concat method. There are several overloaded versions of Concat. All of them perform identically; the difference is in the number of parameters. Since you want to format a string from three items, the compiler chooses the following version of the Concat method:
public static String Concat(Object arg0, Object arg1, Object arg2);
For the first parameter, arg0, v is passed. But v is an unboxed value parameter and arg0 is an Object, so v must be boxed and the address to the boxed v is passed for arg0. For the arg1 parameter, the address of the “, ” string is passed, identifying the address of a String object. Finally, for the arg2 parameter, o (a reference to an Object) was cast to an Int32. This creates a temporary Int32 value type that receives the unboxed version of the value currently referred to by o. This temporary Int32 value type must be boxed once again with the memory address being passed for Concat’s arg2 parameter.
Once Concat is called, it calls each of the specified object’s ToString methods and concatenates each object’s string representation. The String object returned from Concat is then passed to WriteLine to show the final result.
I should point out that the generated IL code would be more efficient if the call to WriteLine were written as follows:
Console.WriteLine(v + “, ” + o); // Displays “123, 5”
This line is identical to the previous version except that I’ve removed the (Int32) cast that preceded the variable o. This code is more efficient because o is already a reference type to an Object and its address may simply be passed to the Concat method. So, removing the cast saved both an unbox and a box operation.
Here is another example that demonstrates boxing and unboxing:
public static void Main() {
Int32 v = 5; // Create an unboxed value type variable
Object o = v; // o refers to the boxed version of v
v = 123; // Changes the unboxed value type to 123
Console.WriteLine(v); // Displays “123”
v = (Int32) o; // Unboxes o into v
Console.WriteLine(v); // Displays “5”
}
How many boxing operations do you count in this code? The answer is one. There is only one boxing operation because there is a WriteLine method that accepts an Int32 as a parameter:
public static void WriteLine(Int32 value);
*Note : Stack or Heap
It’s more complicated than you might think. Even your claim that “value types are allocated on the stack” isn’t correct. For example:
class Foo
{
int x;
}
int
is a value type, but the value for x will always be on the heap because it will be stored with the rest of the data for the instance of Foo which is a class.
Remember the rule, Reference types always goes to the Heap, whereas Value Types always go where they were declared. If a Value Type is declared outside of a method, but inside a Reference Type it will be placed within the Reference Type on the Heap.
you may be interested in article about C# heap/stack memory , but you might also want to read Eric Lippert’s blog post on “The stack is an implementation detail”. and here is another simple but powerfull article on stack vs heap
Blittable/Non Blittable types
Blittable types are defined as having an identical presentation in memory for managed and unmanaged (COM) environments, and can be directly shared. Understanding the difference between blittable and non-blittable types can aid in using COM Interop or P/Invoke, two techniques for interoperability in .NET applications.
By pinning the data in memory, the garbage collector will be prevented from moving it , allowing it to be shared in-place with the unmanaged application.This means that both managed and unmanaged code will alter the memory locations of these types in a consistent manner, and much less effort is required by the marshaler to maintain data integrity. The following are some examples of blittable types available in the .NET framework:
- System.Byte
- System.SByte
- System.Int16
- System.UInt16
- System.Int32
- System.UInt32
- System.Int64
- System.IntPtr
- System.UIntPtr
Additionally, one-dimensional arrays of these types as well as complex types containing only fields of these types are blittable.
If a type is not one of the blittable types, then it is classified as non-blittable. The reason a type is considered non-blittable is that for one representation in managed memory, it may have several potential representations in unmanaged memory or vice-versa. Alternatively, there may be exactly one representation for the type in both managed and unmanaged memory. It is also often the case that there simply is no representation on one side or the other. The following are some commonly-used non-blittable types in the .NET framework:
- System.Boolean
- System.Char
- System.Object
- System.String
There are many more blittable and non-blittable types, and user-defined types may fit in either category depending on how they are defined
Interoperability overview
Interoperability can be bidirectional sharing of data and methods between unmanaged code and managed .NET code. .NET provides two ways of interoperating between the two: COM Interop and P/Invoke. Though the methodology is different, in both cases marshalling (conversion between representations of data, formats for calling functions and formats for returning values) must take place. COM Interop deals with this conversion between managed code and COM objects, whereas P/Invoke handles interactions between managed code and Win32 code. The concept of blittable and non-blittable data types applies to both — specifically to the problem of converting data between managed and unmanaged memory. This marshalling is performed by the interop marshaller, which is invoked automatically by the CLR when needed.
This is the exact article which I’m looking for, you nicely elaborate the basics of Common Type System in .Net. It helped me lot and clear lots of doubts. Thanks for sharing with us.
LikeLike
thank you for this, its really helpful and easy to understand
LikeLike