Sunday, August 31, 2008

JAVA AND C SHARP COMPARISON

This is a comparison of the C# programming language with the Java programming language. As the two are both garbage-collected runtime-compiled languages with syntax derived from C and C++, there are many similarities between Java and C#. However, there are also many differences, with C# being described as a hybrid of C++ and Java, with additional new features and changes. This page documents the strong general similarities of the languages and then points out those instances where the languages differ.

Object handling
Both C# and Java are designed from the ground up as VMT-based object oriented languages, with a syntax similar to C++. (C++ in turn is derived from C.) Neither language is a superset of C or C++, however. Both use garbage collection as a means of reclaiming memory resources, rather than explicit deallocation of memory. Both include thread synchronization mechanisms as part of their language syntax.


References
C# allows restricted use of pointers. Pointers and arithmetic are potentially unsafe in a managed environment as they can be used to bypass the strict rules for object access. C# addresses that concern by requiring that code blocks or methods that use the feature be marked with the unsafe keyword, so that all clients of such code can be aware that the code may be less secure than otherwise. The compiler requires the /unsafe switch to allow compilation of a program that uses such code. Generally, unsafe code is either used to allow better interoperability with unmanaged APIs or system calls (which are inherently "unsafe"), or for performance reasons. Java does not allow pointers or pointer-arithmetic to be used.


Data types
Both languages support the idea of primitive types (all of which, except for string, are value types in C#/.NET). C# has more primitive types than Java, with unsigned as well as signed integer types being supported, and a special decimal type for decimal fixed-point calculations. Java lacks unsigned types. In particular, Java lacks a primitive type for an unsigned byte. Strings are treated as (immutable) objects in both languages, but support for string literals provides a specialized means of constructing them. C# also allows verbatim strings for quotation without escape sequences, which also allow newlines.

Both allow automatic boxing and unboxing to translate primitive data to and from their object form. Effectively, this makes the primitive types a subtype of the Object type. In C# this also means that primitive types can define methods, such as an override of Object's ToString() method. In Java, separate primitive wrapper classes provide such functionality, which means it requires a static call Integer.toString(42) instead of an instance call 42.ToString(). Another difference is that Java makes heavy use of boxed types in generics (see below), and as such allows an implicit unboxing conversion (in C# this requires a cast). This conversion can potentially throw a null pointer exception, which may not be obvious by code review in Java.


Value types
C# allows the programmer to create user-defined value types, using the struct keyword. From the programmer's perspective, they can be seen as lightweight classes. Unlike regular classes, and like the standard primitives, such value types are allocated on the stack rather than on the heap. They can also be part of an object (either as a field or boxed), or stored in an array, without the memory indirection that normally exists for class types. Structs also come with a number of limitations. Because structs have no notion of a null value and can be used in arrays without initialization, they always come with an implicit default constructor that essentially fills the struct memory space with zeroes. The programmer can only define additional constructors with one or more arguments. This also means that structs lack a virtual method table, and because of that (and the fixed memory footprint), they cannot allow inheritance (but can implement interfaces).


Enumerations
Enumerations in C# are derived from a primitive 8, 16, 32, or 64 bit integer type. Any value of the underlying primitive type is a valid value of the enumeration type, though an explicit cast may be needed to assign it. C# also supports bitmapped enumerations where an actual value may be a combination of enumerated values bitwise or'ed together. Enumerations in Java, on the other hand, are objects. The only valid values in a Java enumeration are the ones listed in the enumeration. As objects, each enumeration can contain its own fields which can be modified. Special enumeration set and map collections provide fully type-safe functionality with minimal overhead. Java enumerations allow differing method implementations for each value in the enumeration. Both C# and Java enumerations can be converted to strings and can be used in a switch statement.


ArraysArray and collection types are also given significance in the syntax of both languages, thanks to an iterator-based foreach statement loop. In C# an array corresponds to an object of the Array class, while in Java each array is a direct subclass of the Object class (but can be cast to an array of an element type that is an ancestor of its true element type), and does not implement any of the collection interfaces. C# has true multidimensional arrays, as well as the arrays-of-arrays that are available in Java (and which in C# are commonly called jagged arrays). Multidimensional arrays can in some cases increase performance because of increased locality (as there is a single pointer dereference, instead of one for every dimension of the array as is the case for jagged arrays). Another advantage is that the entire multidimensional array can be allocated with a single application of operator new, while jagged arrays require loops and allocations for every dimension. Note, though, that Java provides a syntactic construct for allocating a multidimensional jagged array with regular lengths (a rectangular array in the C# terminology); the loops and multiple allocations are then performed by the virtual machine and need not be explicit at the source level.


Inner classes
Both languages allow inner classes, where a class is defined entirely within another class. In Java, these classes have access to both the static and non-static members of the outer class (unless the inner class is declared static, then it only has access to the static members). Local classes can be defined within a method and have access to the method's local variables declared final, and anonymous local classes allow the creation of class instances that override some of their class methods.

C# also provides inner classes, and requires an explicit reference to the outer class to its non-static members. Also, C# provides anonymous delegates as a construct that can provide access to local variables and members.

Partial classes
C# allows a class definition to be split across several source files using a feature called partial classes. Each part must be marked with the keyword partial. All the parts must be presented to the compiler as part of a single compilation. Parts can reference members from other parts. Parts can implement interfaces and one part can define a base class. The feature is useful in code generation scenarios where a code generator can supply one part and the developer another part to be compiled together. The developer can thus edit his part without the risk of a code generator overwriting his code at some later time. Unlike the class extension mechanism a partial class allows "circular" dependencies amongst its parts as they are guaranteed to be resolved at compile time. Java has no corresponding concept.

Thursday, August 28, 2008

JAVACC

JavaCC (Java Compiler Compiler) is an open source parser generator for the Java programming language. JavaCC is similar to Yacc in that it generates a parser for a formal grammar provided in EBNF notation, except the output is Java source code. Unlike Yacc, however, JavaCC generates top-down parsers, which limits it to the LL(k) class of grammars (in particular, left recursion cannot be used). The tree builder that accompanies it, JJTree, constructs its trees from the bottom up.

In 1996, Sun Microsystems released a parser generator called Jack. The developers responsible for Jack created their own company called Metamata and changed the Jack name to JavaCC. Metamata eventually became part of WebGain. After WebGain shut down its operations, JavaCC was moved to its current home.

Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.

JavaCC works with any Java VM version 1.2 or greater. It has been certified to be 100% Pure Java. JavaCC has been tested on countless different platforms without any special porting requirements. Given that we have seen JavaCC run on only around 5 or 6 platforms, we think this is a great testimonial to the "Write Once Run Anywhere" aspect of the Java programming language. We say this as engineers who have personally experienced the benefits of writing Java applications.