MapReduce for performance

Using an algorithm to analyze big data better

Overview

There are 2 reasons you are reading this weekā€™s post, firstly you are the teacher wanting to mark me on what I did in that last week, or you are interested in what I did/am doing in data science. Since you want to know, I will tell you. We continued to lean about how to analyze big data, this week we got taught how to use an algorithm called MapReduce. This minimizes time spent on analysis the data, so it gives better performance. As said last time, using these methods to reduce the data can lead to data being left out and possibly a misleading data representation.

What was learned

You are still interested on this algorithm and how useful it is really? If that is the case I will continue to explain, well firstly you are still limited by performance (so if your computer is a potato, then it will be faster but slow). So first understand that its whole purpose it to reduce the dataset, it tries to make it smaller without loosing too much information. Not going to lie, I donā€™t fully understand the actual method behind it. I don't think we need to fully know what is happening fully as for our next assignment we are recommended to use an external library to do it for us.

The usefulness is very high, to analyze big data without a big computer is not easy. Unless you have a big supercomputer with lots of processing power, then big data is just not possible to analyze. But even if there is such a computer, if the data is really large then it may still take a long time to do some analyses it. So, if your data is taking a long time to load, then you may want to implement something to make the data smaller (such as map reduce).

Reflection

Did you understand the content?

Depends on what is meant by did I understand the content. Because I understand the theory and what it is trying to do, but sounds confusing to implement (and I donā€™t understand that part). I should in the upcoming week do some more research on it, and find out more (and possibly even implement it into Python - that is a stretch though). This is also one of the first times this term that I donā€™t fully understand it (but I should understand how to use, just not implement from scratch).

How can you elaborate more?

As said in the previous question, I could have elaborated more by finding out more about this algorithm. This would give me a better understanding of the concept, so that I can write more about it and understand why it is doing what it is doing. Because if I am making a Python script with a library, I would know more of what is happening if the visualization becomes way different. And also because I can try to overcome some of the bias involved with using this process.

What are your next steps?

My next steps is to be excited for the upcoming holidays, and do some more learning about this. As said many times, I should learn more about this, I should also do that so that I wonā€™t have to worry about being confused (or just not knowing fully). There are other assignments also that have to be done over that holidays (so it isnā€™t really a rest time šŸ˜¢). Other then that, I should continue to look back on previous tasks done so that when I get the assignment, so I have a better understanding on what to do.