Random Sampling | Michael Blog

Overview

I heard that because you are reading this, you must be wanting this week’s recap for data science. I will gladly share that information, but first shout out to our sponsor for today’s post, welcome ~~brand name...~~. JK, there is no sponsor but if there was I could have used random sampling to determine which one to use (implying that there was lots of requests), and/or find out the amount of money they are wiling to spend. This is not a good example as random sampling is more used in ‘big data’ and there is not that many brands that will know about me, but the concept is the same. All it is doing is getting an idea of the data by using a subset of the information (selected randomly), and then analyzing them like small data (or normally).

What was learned

The basic method (Simple Random Sampling) is making each element of data to have equal chance of being chosen. This is also the easiest and is what is most commonly thought of when thinking of random sampling.

A more complex method is called Stratified Random Sampling, this splits the data into groups, and then gives those equal chance of being chosen. It is not that different to the basic method because it is just making all categories equal. However, this method can give the smaller groups a bigger voice and thus can lead to data issues.

If you want to be more fancy you can use the Cluster Random Sampling method, this is separating all the data elements into clusters, and then select random clusters which contain some from each groups.

Or, maybe you want something back to basics but with order, this is where Systematic Random Sampling comes in, like the name suggests this is not choosing the data randomly but rather using an order (ie. every nth element)

But why? One of these methods need to be used because humans can’t comprehend that much data (because we are talking about using big data). And otherwise getting an idea of the data would be hard, you just need to chose which one (some of them are harder to make, more expensive with resources)

Reflection

How can you elaborate more?

I most defiantly could have elaborated more then I did in the last week. We had a task to complete, but I didn’t do that much of it and that wasn’t done that well, and there was also 2 weeks that we were given for this (I only looked at it on the second week because 1. I had the exam and 2. I thought I had heaps of time). Overall, there are a significant amount of improvement, but that should be done over time (where I should elaborate more in everything - just look at my first blog post and now, big change!)

What made you curious?

The curiosity of learning about the many ways that big data can be analyzed was interesting. I got curious because there were originally one way (that I knew of) which was to spend lots of time going through the large amounts of data to get an idea of it. But now (with the new knowledge) I can choose one of the methods that is best suited for the result I am wanting. I also wonder how many other ways that there are to be able to analyse big data.

Which activities helped you learn the most?

The activities that were the best with helping me to learn the best were looking over the material done in class. In this instance it was to look over the PowerPoint slides that contained reasoning and instructions on how to analyze data in those many methods. It is good because when I relook over the slides I can sometimes remember how the teacher explained it and simply read the text on that particular slide