How to fix error on Pentaho Data Integration (Kettle) startup

Howto fix startup error of Pentaho Data Integration (Kettle) on CentOS 6? You just need to modify the spoon.sh startup script after downloading and unzipping Pentaho Data Integration. The modification updates the Java runtime options for Kettle to startup properly. Therefore open

[code language=”bash”]spoon.sh[/code]

and change the end of the file in the following way:

[code language=”bash”]
# ******************************************************************
# ** Set java runtime options **
# ** Change 512m to higher values in case you run out of memory **
# ** or set the PENTAHO_DI_JAVA_OPTIONS environment variable **
# ******************************************************************

if [ -z "$PENTAHO_DI_JAVA_OPTIONS" ]; then
PENTAHO_DI_JAVA_OPTIONS="-Xmx512m -XX:MaxPermSize=256m"
fi

OPT="$OPT $PENTAHO_DI_JAVA_OPTIONS -Djava.library.path=$LIBPATH -DKETTLE_HOME=$KETTLE_HOME -DKETTLE_REPOSITORY=$KETTLE_REPOSITORY -DKETTLE_USER=$KETTLE_USER -DKETTLE_PASSWORD=$KETTLE_PASSWORD -DKETTLE_PLUGIN_PACKAGES=$KETTLE_PLUGIN_PACKAGES -DKETTLE_LOG_SIZE_LIMIT=$KETTLE_LOG_SIZE_LIMIT -Dorg.eclipse.swt.browser.XULRunnerPath=/dev/null"
# ***************
# ** Run… **
# ***************
"$_PENTAHO_JAVA" $OPT $STARTUP -lib $LIBPATH "${1+$@}"
[/code]

Everything I did was appending

[code language=”bash”]-Dorg.eclipse.swt.browser.XULRunnerPath=/dev/null"[/code]

to the

[code language=”bash”]OPT[/code]

part as described in ticket TDI-24139.

[video] – How is Hadoop used at Twitter?

Image representing Hadoop as depicted in Crunc...
Image via CrunchBase

In the following video Dmitriy Ryaboy, a Twitter Analytics Engineer and a former Cloudera Intern, explains how Twitter uses Hadoop and Pig. Enjoy the video and have a good weekend!

[vimeo 11110059]

Enhanced by Zemanta

[video] – HBase and Pig: The Hadoop Ecosystem at Twitter

I have just found this very interesting video dealing with the implementation of HBase and Pig in combination with Hadoop at Twitter:

Twitter’s use of Cassandra, Hadoop, Pig and HBase for highly distributed Data Processing and Analysis

Kevin Weil, Analytics Lead at Twitter recently gave a presentation on Twitter’s use of Cassandra, Pig and HBase. Specially interesting is how Twitter uses Hadoop and Pig in their data analysis process.

[slideshare id=3806196&doc=nosqlattwitter-nosqleu2010-100421124212-phpapp02]

(via @kevinweil)

Another great presentation from Tobias Ivarsson gives an overview on NoSQL:

[slideshare id=3735857&doc=nosqlfordummies-100415085745-phpapp01]

(via @thobe)

Reblog this post [with Zemanta]

[video] – The History of Business Intelligence

I have found a really nice video explaining Business Intelligence as a concept and highlighting the development phases of Business Intelligence through the history.

[youtube _1y5jBESLPE]

Reblog this post [with Zemanta]