Although “Big data” buzzword has been around for a couple of years, there’s still a lot of confusion regarding what exactly it is, how it can help businesses and what do people working on big-data really do. From Wikipedia:
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate
Today, you can use freely available technologies to process and analyse really large sets of data and gather information from them. For example, some of the applications I worked on involved analysing network packets from all the routers of a large telco. Although the data is sampled, it is still huge – in the order of terabytes per day.
Given that you can process such amounts of data (fairly cheaply), as a business, the hardest question you need to answer is:
What do I want to know from my data?
Once you figure out what information you need, the next questions to answer are:
- What algorithms & techniques can I use to get the information I need from my raw data? The people who can help you here are data scientists and statisticians.
- What technologies can I use to implement and run my algorithms? Some of the popular technologies here are Hadoop, Hive, Spark, Elasticsearch, etc. You would expect the software engineers on your big data team to be comfortable with one or more of these.
If you are considering using big-data technologies for your company and planning to hire data scientists and/or software engineers, the Toptal blog recently published a detailed big data hiring guide outlining the concepts and technologies that you would expect your potential hire to know. It covers the techniques that data scientists should be aware of and some of the technologies that a big data software engineer should be comfortable with. They’ve also provided a few sample interview questions with answers that would give you an idea about the variety of topics that you would want to evaluate the candidates on.
I would highly recommend that you to give it a read.