I started working on a lab assignment from my MapReduce course in Big Data University a week ago and finished it off today. Not that the lab assignment was hard – you simply had to paste some supplied code into a few files, compile them and run them against some test data they gave you. Simple enough.
Big Data University (BDU) supplied me with a login to a cloud server and so I set up the job via command line tools, compiled and tried to run it. It blew off on some errors that seemed to related to the environment – since I couldn’t get to the environment I decided to go ahead and set up my own Hadoop environment. So I did. Three times.
At first, I just installed it on my laptop, but soon discovered that some components (I suspect Yarn) were messing with my USB mounts and so I decided to install it on Cloud 9, which I did. Every time I brought Hadoop up, the server went down. Then I set up a Virtual Box install of a chopped version of Ubuntu called lxle and installed Hadoop. This time things stayed up, even when I installed and ran Eclipse.
I always find Eclipse give me a headache, probably because it was written by aliens and this time was no different. And even the simplest MapReduce program involve a lot of libraries to access the Hadoop file system, juggle all the components and run them through the gyrations required and finding a list of the required libraries? Good luck, they and their locations change every time.
After several days of flailing I got everything together and working – flailing is actually quite educational.