Overview

Liferay is at the forefront of providing integration with many popular platforms for Documents and Media. One of the options provided by Liferay out-of-the-box is integration with S3.

S3 is a service provided by Amazon. Amazon S3 (Simple Storage Service) is an online file storage web service offered by Amazon Web Services. Amazon S3 provides storage through web services interfaces (REST, SOAP, and BitTorrent).

Liferay comes prepackaged with the required libraries for S3 integration. The artifact that contains all the required classes for integration with S3 – jets3t.jar is under ROOT/WEB-INF/lib of the deployed Liferay web application.

Integrate Liferay with S3

To integrate Liferay with S3, the following properties should be added to portal-ext.properties.

 #
 # S3Store
 #
dl.store.impl=com.liferay.portlet.documentlibrary.store.S3Store
dl.store.s3.access.key=
dl.store.s3.secret.key=
dl.store.s3.bucket.name=

If the document library implementation is small, with a very small number of documents, with a small number of read/writes, there are not many changes required. Adding the configuration to portal-ext.properties and restarting Liferay will enable document library.

Issues with implementing a large document library with Liferay and S3

One of our clients is using Liferay as a document library and was leveraging the Amazon AWS cloud with Amazon S3 to store documents.  When the number of documents that were being uploaded and were being retrieved was small, there were no issues. However, with more concurrent uploads and with more users accessing the platform, we saw a rapid degradation in response times from the application. Looking at the thread dumps gave us an insight into the problem.

Here’s an example:

"http-bio-8080-exec-240" daemon prio=10 tid=0x00007fc6944b8800 nid=0x4b43 waiting on condition [0x00007fc629849000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000758590838> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158)
at org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:402)
at org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:299)
at org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:242)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestGet(RestStorageService.java:981)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.listObjectsInternal(RestStorageService.java:1455)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.listObjectsImpl(RestStorageService.java:1414)
at org.jets3t.service.StorageService.listObjects(StorageService.java:623)

All the threads were blocked on HttpClient. This was our breakthrough while troubleshooting the issues. ThreadSafeClientConnManager maintains a maximum limit of connection on a per route basis and in total. Per default this implementation will create no more than than 2 concurrent connections per given route and no more 20 connections in total. For many real-world applications these limits may prove too constraining, especially if they use HTTP as a transport protocol for their services. Connection limits, however, can be adjusted using HTTP parameters.

Adjusting HTTP parameters with jets3t.properties

To increase the number of connections from Liferay to AWS S3, we created a jets3t.properties file and added it to the global classpath. Since the client was using Tomcat, we created the jets3t.properties file under ${CATALINA_BASE}/lib.

#The maximum number of simultaneous connections to allow globally
#Default: 20
# Note: If you have a fast Internet connection, you can improve the performance
# of your S3 client by increasing this setting and the corresponding S3 Service
# properties s3service.max-thread-count and s3service.admin-max-thread-count.
# However, be careful because if you increase this value too much for your
# connection you may exceed your available bandwidth and cause communications
# errors.
httpclient.max-connections=200
# How many milliseconds to wait before a connection times out. 0 means infinity.
# Default: 60000
httpclient.connection-timeout-ms=120000
# How many milliseconds to wait before a socket connection times out. 0 means
# infinity.
# Default: 60000
httpclient.socket-timeout-ms=120000
# The maximum number of concurrent communication threads that will be started
# by the S3ServiceMulti/S3ServiceSimpleMulti multi-threaded services for upload
# and download operations. This value should not be too high, otherwise you
# risk I/O errors due to bandwidth starvation when tranferring many large files.
# Default: 2
# Note: This value must not exceed the maximum number of HTTP connections
# available to JetS3t, as set by the property httpclient.max-connections
s3service.max-thread-count=200
# The maximum number of concurrent communication threads that will be started
# by the ThreadedStorageService/SimpleThreadedStorageService services for
# upload and download operations. This value should not be too high, otherwise
# you risk I/O errors due to bandwidth starvation when tranferring many large
# files.
# Default: 2
# Note: This value must not exceed the maximum number of HTTP connections
# available to JetS3t, as set by the property httpclient.max-connections
threaded-service.max-thread-count=200

Additional properties can be found at the following webpage:

http://www.jets3t.org/toolkit/configuration.html
Once we created the jets3t.properties file and restarted Liferay instances, no further performance degradation was observed in the environment.