Analysis and reduction of memory inefficiencies in java strings
Abstract
This paper describes a novel approach to reduce the memory consumption of Java programs, by focusing on their string memory inefficiencies. In recent Java applications, string data occupies a large amount of the heap area. For example, about 40% of the live heap area is used for string data when a production J2EE application server is running. By investigating the string data in the live heap, we identified two types of memory inefficiencies - duplication and unused literals. In the heap, there are many string objects that have the same values. There also exist many string literals whose values are not actually used by the application. Since these inefficiencies exist as live objects, they cannot be eliminated by existing garbage collection techniques, which only remove dead objects. Quantitative analysis of Java heaps in real applications revealed that more than 50% of the string data in the live heap is wasted by these inefficiencies. To reduce the string memory inefficiencies, this paper proposes two techniques at the Java virtual machine level, StringGC for eliminating duplicated strings at the time of garbage collection, and Lazy Body Creation for delaying part of the literal instantiation until the literal's value is actually used. We also present an interesting technique at the Java program level, which we call BundleConverter, for preventing unused message literals from being instantiated. Prototype implementations on a production Java virtual machine have achieved about 18% reduction of the live heap in the production application server. The proposed techniques could also reduce the live heap of standard Java benchmarks by 11.6% on average, without noticeable performance degradation. Copyright © 2008 ACM.