Hello folks, the map-reduce concept is one of the powerful concepts in computer programming, particularly on functional programming which utilizes the power of distributed and parallel processing to solve a big and heavy problem in a quick time. From Java 8 onwards, Java also got this powerful feature from the functional programming world. Many of the services provided on the internet like Google Search are based on the concept of the map and reduce. In map-reduce, a job is usually split from the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework then sorts the outputs of the map operation, which are then supplied to the reduce tasks.
For example, suppose you want to calculate the average age of all the people in a given town, instead of counting sequentially you can divide the problem into locality or zip code and then calculate the average age for each locality and then combine them using a reduction operation to calculate the average age for the town.
It wasn't possible earlier to perform functional programming in Java earlier, at least, not so easy, but Java 8 also introduced a powerful Stream API that provides many functional programming operations like map, filter, flatmap, reduce, collect, and so on.
There are many such scenarios where map-reduce can really be a game-changer and solve a problem in quick time which wasn't possible earlier due to the sheer scale of data.
As I have said, earlier Java didn't have the support of performing bulk data operation and it wasn't possible to load all the data into memory because of their size but from Java 8 onwards, we have an abstraction called Stream which allows us to perform the bulk data operation on a large chunk of data without loading them into memory.
By using these methods, together with implicit parallelism provided by Stream API, you can now write code in Java that can work efficiently with Big data which wasn't that easy earlier.
If you want to learn more about Stream API features and how to use them efficiently in your code, I suggest you take a look at these best Java Stream and Lambda Courses. It will provide you with first-hand experience of using Stream API in your code.
Again, the old method is many lines of code which is designed to do this all very sequentially. This code doesn't take advantage of a multi-core processor, which is available in all modern servers.
Sure, the example is simple, but imagine if you have millions of items you are processing and you have an 8 core machine or 32 core server. Most of those CPU cores would be completely wasted and idle during this long calculation and it will frustrate your clients who are waiting for the response.
Now if we look at the simple one-line Java 8 way of doing it:
This uses the concept of parallelism, where it creates a parallel stream out of the array, which can be processed by multiple cores and then finally joined back into to map the results together.
The map function will create a stream containing only the values with meet the given criteria, then the average function will reduce this map into a single value.
Now all 8 of your cores are processing this calculation so it should run much faster.
As some wise man has said, a picture is worth more than a thousand words, here is a picture that shows how the map-reduce concept works in practice. In this, the map function could be a cutting operation that applies to each vegetable and then their result is combined using Reduce to prepare a nice Sandwich you can enjoy.
If you want to learn more about the map and reduce operation, or in general, functional programming features introduced in Java 8 then I suggest you see the Java Functional Programming using the Lambdas and Stream course by Ranga Rao Karnam on Udemy. An interactive, hands-on, and useful course to learn all the Java 8 features that matter.
You can perform a lot of bulk operations, calculating stats using the map and reduce in Java 8. As a Java developer you must know how to use map(), flatMap(), and filter() method, these three are key methods for doing functional programming in Java 8.
If you want to learn more about functional programming in Java 8, I also suggest joining From Collections to Streams in Java 8 Using Lambda Expressions course on Pluralsight, one of the best courses to start with functional programming in Java.
You can see that the Java 8 way is much more succinct and readable than the iterative version of the pre-Java 8 code.
Though the code doesn't tell you the impact when the amount of data increases, you can easily make the Java 8 version parallel by just replacing stream() with the parallelStream() but you need to do a lot of hard work to parallelize the iterative version of the code.
That's all about how to do map-reduce in Java 8. This is rather a simple example of a powerful concept like map-reduce but the most important thing is that now you can use this concept in Java 8 with a built-in map() and reduce() method of Stream class. If you are interested in learning more about Stream API or Functional Programming in Java, here are some of the useful resources to learn further and strengthen your knowledge.
Other Java 8 tutorials you may like
Thanks for reading this article so far. If you liked this Java Map and Reduce tutorial and example then please share it with your friends and colleagues. If you have any questions or feedback then please drop a note.
For example, suppose you want to calculate the average age of all the people in a given town, instead of counting sequentially you can divide the problem into locality or zip code and then calculate the average age for each locality and then combine them using a reduction operation to calculate the average age for the town.
It wasn't possible earlier to perform functional programming in Java earlier, at least, not so easy, but Java 8 also introduced a powerful Stream API that provides many functional programming operations like map, filter, flatmap, reduce, collect, and so on.
There are many such scenarios where map-reduce can really be a game-changer and solve a problem in quick time which wasn't possible earlier due to the sheer scale of data.
As I have said, earlier Java didn't have the support of performing bulk data operation and it wasn't possible to load all the data into memory because of their size but from Java 8 onwards, we have an abstraction called Stream which allows us to perform the bulk data operation on a large chunk of data without loading them into memory.
By using these methods, together with implicit parallelism provided by Stream API, you can now write code in Java that can work efficiently with Big data which wasn't that easy earlier.
If you want to learn more about Stream API features and how to use them efficiently in your code, I suggest you take a look at these best Java Stream and Lambda Courses. It will provide you with first-hand experience of using Stream API in your code.
Map Reduce Example in Java 8
In this Java 8 tutorial, we will go over the map function in Java 8. It is used to implement MapReduce type operations. Essentially we map a set of values then we reduce it with a function such as average or sum into a single number. Let's take a look at this sample which will do an average of numbers in the old and new ways.Again, the old method is many lines of code which is designed to do this all very sequentially. This code doesn't take advantage of a multi-core processor, which is available in all modern servers.
Sure, the example is simple, but imagine if you have millions of items you are processing and you have an 8 core machine or 32 core server. Most of those CPU cores would be completely wasted and idle during this long calculation and it will frustrate your clients who are waiting for the response.
Now if we look at the simple one-line Java 8 way of doing it:
double average = peoples.stream().mapToInt(p-> p.getAge()) .average().getAsDouble();
This uses the concept of parallelism, where it creates a parallel stream out of the array, which can be processed by multiple cores and then finally joined back into to map the results together.
The map function will create a stream containing only the values with meet the given criteria, then the average function will reduce this map into a single value.
Now all 8 of your cores are processing this calculation so it should run much faster.
As some wise man has said, a picture is worth more than a thousand words, here is a picture that shows how the map-reduce concept works in practice. In this, the map function could be a cutting operation that applies to each vegetable and then their result is combined using Reduce to prepare a nice Sandwich you can enjoy.
If you want to learn more about the map and reduce operation, or in general, functional programming features introduced in Java 8 then I suggest you see the Java Functional Programming using the Lambdas and Stream course by Ranga Rao Karnam on Udemy. An interactive, hands-on, and useful course to learn all the Java 8 features that matter.
Java Program to demonstrate Map-reduce operation
Here is our Java program which will teach you how to use the map and reduce it in Java 8. The reduced operation is also known as the fold in the functional programming world and is very helpful with the Collection class which holds a lot of items.You can perform a lot of bulk operations, calculating stats using the map and reduce in Java 8. As a Java developer you must know how to use map(), flatMap(), and filter() method, these three are key methods for doing functional programming in Java 8.
If you want to learn more about functional programming in Java 8, I also suggest joining From Collections to Streams in Java 8 Using Lambda Expressions course on Pluralsight, one of the best courses to start with functional programming in Java.
package test; import java.util.ArrayList; import java.util.List; /** * Java Program to demonstrate how to do map reduce in Java. Map, reduce also * known as fold is common operation while dealing with Collection in Java. * @author Javin Paul */ public class Test { public static void main(String args[]) { List<Employee> peoples = new ArrayList<>(); peoples.add(new Employee(101, "Victor", 23)); peoples.add(new Employee(102, "Rick", 21)); peoples.add(new Employee(103, "Sam", 25)); peoples.add(new Employee(104, "John", 27)); peoples.add(new Employee(105, "Grover", 23)); peoples.add(new Employee(106, "Adam", 22)); peoples.add(new Employee(107, "Samy", 224)); peoples.add(new Employee(108, "Duke", 29)); double average = calculateAverage(peoples); System.out.println("Average age of employees are (classic way) : " + average); average = average(peoples); System.out.println("Average age of employees are (lambda way) : " + average); } /** * Java Method to calculate average from a list of object without using * lambdas, doing it on classical java way. * @param employees * @return average age of given list of Employee */ private static double calculateAverage(List<? extends Employee> employees){ int totalEmployee = employees.size(); double sum = 0; for(Employee e : employees){ sum += e.getAge(); } double average = sum/totalEmployee; return average; } /** * Java method which uses map reduce to calculate average of list of * employees in JDK 8. * @param peoples * @return average age of given list of Employees */ private static double average(List<? extends Employee> peoples){ return peoples.stream().mapToInt(p-> p.getAge()) .average() .getAsDouble(); } } class Employee{ private final int id; private final String name; private final int age; public Employee(int id, String name, int age){ this.id = id; this.name = name; this.age = age; } public int getId(){ return id; } public String getName(){ return name; } public int getAge(){ return age; } } Output: Average age of employees are (classic way) : 49.25 Average age of employees are (lambda way) : 49.25
You can see that the Java 8 way is much more succinct and readable than the iterative version of the pre-Java 8 code.
Though the code doesn't tell you the impact when the amount of data increases, you can easily make the Java 8 version parallel by just replacing stream() with the parallelStream() but you need to do a lot of hard work to parallelize the iterative version of the code.
That's all about how to do map-reduce in Java 8. This is rather a simple example of a powerful concept like map-reduce but the most important thing is that now you can use this concept in Java 8 with a built-in map() and reduce() method of Stream class. If you are interested in learning more about Stream API or Functional Programming in Java, here are some of the useful resources to learn further and strengthen your knowledge.
Other Java 8 tutorials you may like
- 10 Example of Lambda Expression in Java 8 (see here)
- 5 Courses to learn Java 8 and Java 9 features (courses)
- 10 Example of forEach() method in Java 8 (example)
- Top 10 Courses to learn Java for Beginners (courses)
- 10 Example of Joining String in Java 8 (see here)
- Top 5 Courses to become a full-stack Java developer (courses)
- 10 Example of Stream API in Java 8 (see here)
- How to use the peek() method of Stream in Java 8? (example)
- 10 Free Courses to learn Spring Framework (courses)
- 10 Example of converting a List to Map in Java 8 (example)
- How to use Stream.flatMap in Java 8(example)
- How to use Stream.map() in Java 8 (example)
- 5 Books to Learn Java 8 and Functional Programming (list)
- 5 Free Courses to learn Java 8 and 9 (courses)
- 20 Example of LocalDate and LocalTime in Java 8 (see here)
- Difference between map() and flatMap() in Java 8? (answer)
Thanks for reading this article so far. If you liked this Java Map and Reduce tutorial and example then please share it with your friends and colleagues. If you have any questions or feedback then please drop a note.
If you are performing map-reduce in large scale, which is always the case, you should use parallelStream() instead of stream() to get full advantage of multiple cores of your server.
ReplyDeleteYes I think Javin is right. If you are using stream it will still be executed on a single core just like a normal sequential routine. For using parallelism we will have to use parallelSteam.
ReplyDeleteYes, that's correct.
DeleteWhen you use parallel stream check you cpu in task manager. All core of cpu are used . So in future if we moved from quad core to octa core performance will increase.
ReplyDeleteHello Rohan, where exactly do you see the core get utilized? Do you mean Task Manager - Performance tab - Open resource Monitor and then CPU tab?
DeleteNot sure why, but the lambda code takes at least twice as long as your 'classical' code. I'm running with parallel stream on a 16 core machine. To make a realistic test, I added 3 million employees to the collection.
ReplyDeleteHmm, interesting, did you checked your JVM is running on server mode?
DeleteThe classical code ran faster for 3 million employees, but i started seeing the difference at around 10 million. Fron this point on, the parallel streams ran faster. So i guess the conclusion is that parallelism comes with an overhead which is only worthwhile once you cross a certain threshold of data volume.
Delete