Understanding Map Reduce Computing

3 min readFeb 15, 2024

MapReduce is a powerful framework for handling large datasets, commonly used in big data processing. It works by splitting the data into smaller pieces and applying

Two key operations: mapping and reducing.

Mapping:

Imagine transforming each piece of data (like a college name) into multiple key-value pairs (like college name and its numerical value). This lets us break down the data into smaller, manageable units.

Reducing:

After mapping, values with the same key are grouped together. Then, a “reduce” function combines these values into a single result. For example, we could sum the numerical values for each college.

One of the primary applications of distributed computing is handling big data effectively. In map reduce computing, we abstract large datasets into key-value pairs. Each pair consists of a key (RK) and a corresponding value (VA).

# Example of key-value pair representation
RK = 'college_name'
VA = 'value'

The Map Function

When applying the map function (f) to a value (VA), represented as (RK, VA), it generates a new set of key-value pairs, denoted as (RK1, VA1), (RK2, VA2), and so forth. The map function operates in the spirit of functional programming, transforming the input value VA into multiple output key-value pairs.

# Example of map function application
def map_function(VA):
    # Perform operations on VA to generate new key-value pairs
    return (RK1, VA1), (RK2, VA2), ...

Key-Value Pairs

In map reduce computing, key-value pairs resemble keys in hash maps, where keys are immutable and values remain constant. The map function produces output pairs, with keys (RK1, RK2, etc.) potentially differing from the original key RK.

# Example of key-value pairs
key_value_pairs = {
    'RK1': 'VA1',
    'RK2': 'VA2',
    ...
}

Grouping and Reducing

After mapping, the next step is grouping, where pairs with the same key are grouped together. If two keys, such as RK1 and KB1, are identical, their corresponding values (VA1 and VB1) are grouped.

# Example of grouping
grouped_values = {
    'RK1': ['VA1', 'VB1', ...],
    'RK2': ['VA2', ...],
    ...
}

Following grouping, the reduce operation combines values with the same key into a single value. For instance, a reduction operation like summation aggregates values under the same key, producing a consolidated result.

# Example of reduce operation
def reduce_function(grouped_values):
    # Perform reduction operation (e.g., summation) on grouped values
    return aggregated_value

Example Illustration

Let’s illustrate this concept with a concrete example: the undergraduate colleges at RK University. Each college, such as JECRC (JECRC) and JIET (JIET), is assigned a numerical value.

# Example input data
college_data = {
    'JECRC: 10,
    'JIET': 11,
    'RKS': 12,
    'VYAS': 13
}

Applying Map and Reduce

We start with a map operation, where each college’s value is enumerated into its factors. Then, we perform a reduce operation, such as summation, to aggregate the factors.


# Example of map operation
mapped_data = map_function(college_data)
# factors of all the colleges
//    'JECRC': 10 - 2,5,10
//    'JIET': 11 - 11 
//    'RKS': 12 - 2,3,4,6,12
//    'VYAS': 13 -13

# Example of reduce operation
# sum of all factors in reduce 
reduced_data = reduce_function(mapped_data)

Output Analysis

Upon completion, we obtain the results of the reduce phase, showcasing the aggregated values for each college. In this scenario,

RKS College emerges with the highest aggregate, reflecting the sum of its factors.


# Example output data
college_data = {
    'JECRC': 17,
    'JIET': 11,
    'RKS': 27,
    'VYAS': 13
}

Conclusion

In summary, map reduce computing offers a powerful framework for processing big data by leveraging two fundamental operations: mapping and reducing.

By specifying these functions, programmers can manipulate vast datasets efficiently, laying the groundwork for advanced data processing.