发布时间: 2015-09-22 阅读数: 918

本文翻译自:9 Must-Have Skills to Land Top Big Data Jobs in 2015 by datanami (datanami 是著名的大数据网站) 作者:@善良的右行 from NLPJob翻译小组

The secret is out, and the mad rush is on to leverage big data analytics tools and techniques for competitive advantage before they become commoditized. If you’re in the market for a big data job in 2015, these are the nine skills that will garner you a job offer.

在大数据商品化之前, 利用大数据分析工具和技术来取得竞争优势已不再是秘密。2015年, 如果你还在职场上寻找大数据的相关工作, 那么, 这里介绍的9种技能,将帮助你得到一个工作机会。

1. Apache Hadoop

Sure, it’s entering its second decade now, but there’s no denying that Hadoop had a monstrous year in 2014 and is positioned for an even bigger 2015 as test clusters are moved into production and software vendors increasingly target the distributed storage and processing architecture. While the big data platform is powerful, Hadoop can be a fussy beast and requires care and feeding by proficient technicians. Those who know there way around the core components of the Hadoop stack–such as HDFS, MapReduce, Flume, Oozie, Hive, Pig, HBase, and YARN–will be in high demand.

Hadoop现在已经进入第二个10年发展期了, 但不可否认的是, Hadoop在2014年出现了井喷式发展, 由于Hadoop从测试集群向生产和软件供应商方向不断转移, 其越来越接近于分布式存储和处理机架构, 因此, 这一势头在2015年会更加猛烈。由于大数据平台的强大, Hadoop可能是一个挑剔的怪兽, 它需要熟悉的技术人员细心的照顾和喂养。掌握Hadoop最核心技术 (例如, HDFS, MapReduce, Flume, Oozie, Hive, Pig, HBase, and YARN) 的技术人员在职场上的需求将越来越大。


2. Apache Spark

If Hadoop is a known quantity in the big data world, then Spark is a black horse candidate that has the raw potential to eclipse its elephantine cousin. The rapid rise of the in-memory stack is being proffered as a faster and simpler alternative to MapReduce-style analytics, either within a Hadoop framework or outside it. Best positioned as one of the components in a big data pipeline, Spark still requires technical expertise to program and run, thereby providing job opportunities for those in the know.

如果说Hadoop在大数据世界中已广为人知, 那么Spark就是一匹黑马, 它所蕴含的原始潜力使Hadoop黯然失色。无论是否是Hadoop架构, 快速崛起的内存计算技术被认为是MapReduce风格分析框架更快和更简洁的替代方案。Spark最佳的定位应当是大数据技术族中重要的一个成员。Spark仍然需要专业技术进行编程和运行, 这为知晓该技术的工程师提供了不错的工作机会。


3. NoSQL

On the operational side of the big data house, distributed, scale-out NoSQL databases like MongoDB and Couchbase are taking over jobs previously handled by monolithic SQL databases like Oracle and IBM DB2. On the Web and with mobile apps, NoSQL databases are often the source of data crunched in Hadoop, as well as the destination for application changes put in place after insight is gleaned from Hadoop. In the world of big data, Hadoop and NoSQL occupy opposite sides of a virtuous cycle.

在大数据的操作层面, 诸如 MongoDB 和 Couchbase 等分布式、可扩展的 NoSQL 数据库正在接管市场份额极为庞大的的 SQL 数据库, 例如 Oracle 和 IBM DB2。在 WEB 和移动 app 层面, NoSQL数据库常常被做为 Hadoop分析的数据源。在大数据领域, Hadoop 和 NoSQL 分别成为良性循环的两个端点。


4. Machine Learning and Data Mining(机器学习和数据挖掘)

People have been mining for data as long as they’ve been collecting it. But in today’s big data world, data mining has reached a whole new level. One of the hottest fields in big data last year is machine learning, which is poised for a breakout year in 2015. Big data pros who can harness machine learning technology to build and train predictive analytic apps such as classification, recommendation, and personalization systems are in super high demand, and can command top dollar in the job market.

人们习惯于对收集的数据进行挖掘,但是, 在当今大数据的世界里, 数据挖掘已经达到了一个全新的高度。机器学习成为去年大数据技术最热门的领域之一,2015年顺理成章地成为它的突破之年。大数据将会使那些能够利用机器学习技术去构建和训练像分类、推荐和个性化系统等预测分析应用程序的人成为职场宠儿, 取得就业市场上的顶级薪金。


5. Statistical and Quantitative Analysis(统计和定量分析)

This is what big data is all about. If you have a background in quantitative reasoning and a degree in a field like mathematics or statistics, you’re already halfway there. Add in expertise with a statistical tool like R, SAS, Matlab, SPSS, or Stata, and you’ve got this category locked down. In the past, most quants went to work on Wall Street, but thanks to the big data boom, companies in all sorts of industries across the country are in need of geeks with quantitative backgrounds.

这就是大数据。如果你有定量推理背景和数学或统计学等方面的学位,那么你就成功了一半。此外,再加上一些使用统计工具经验,例如 R, SAS, Matlab, SPSS, 或者是 Stata, 你就能够锁定这些工作岗位啦。在过去,许多量化工程师都会选择在华尔街工作, 但由于大数据的快速发展, 现在各行各样都需要大量的具有定量分析背景的 极客。


6. SQL

The data-centric language is more than 40 years old, but the old grandpa still has a lot of life yet in today’s big data age. While it won’t be used with all big data challenges (see: NoSQL above), the simplify of Structured Query Language makes it a no-brainer for many of them. And thanks to initiatives like Cloudera‘s Impala, SQL is seeing new life as the lingua franca for the next-generation of Hadoop-scale data warehouses.

以数据为中心的语言已有超过40年的历史了, 但是这种祖父级的语言在当前的大数据时代仍然具有生命力。尽管它难以应对大数据的挑战 (见上文NoSQL部分), 但是, 简化了的结构化语言使其在许多方面变得十分容易。同时应该感谢来自于Cloudera所发布的Impala等开源项目, SQL获得了新生, 成为下一代Hadoop规模的数据仓库的通用语言。

7. Data Visualization(数据可视化)

Big data can be tough to comprehend, but in some circumstances there’s no replacement for actually getting your eyeballs onto data. You can do multivariate or logistic regression analysis on your data until the cows come home, but sometimes exploring just a sample of your data in a tool like Tableau or Qlikview can tell you the shape of your data, and even reveal hidden details that change how you proceed. And if you want to be a data artist when you grow up, being well-versed in one or more visualization tools is practically a requirement.

大数据可能不是那么容易理解, 但在某些情况下, 通过鲜活的数据吸引眼球仍然是不可替代的方法。你可以一直采用多元或逻辑回归分析方法解析数据, 但是, 有时候使用类似 Tableau 或 Qlikview 这样的可视化工具探索数据样本能够直观的告诉你所拥有的数据的形态, 甚至是发现那些能够改变你处理数据方法的一些隐蔽细节。当然,如果你长大后想成为数据艺术家, 那么, 精通一个甚至是更多的可视化工具就是必不可少的了。

8. General Purpose Programming Languages

Having experience programming applications in general-purpose languages like Java, C, Python, or Scala could give you the edge over other candidates whose skill sets are confined to analytics. According to Wanted Analytics, there was a 337 percent increase in the number of job postings for “computer programmers” that required background in data analytics. Those who are comfortable at the intersection of traditional app dev and emerging analytics will be able to write their own tickets and move freely between end-user companies and big data startups.

在类似 Java, C, Python, 或 Scala 等通用语言中拥有编程应用经验能够使你相对于那些局限于分析技术的人更具有优势。根据 WantedAnalytics的统计,招聘具有数据分析背景的“计算机编程”职位的数量增长了337%。具有传统应用程序开发和新兴数据分析能力的人将会有极大的就业选择空间, 能够自由的在终端用户企业和大数据创业公司之间进行流动。


9. Creativity and Problem Solving(创造力和问题解决能力)

No matter how many advanced analytic tools and techniques you have on your belt, nothing can replace the ability to think your way through a situation. The implements of big data will inevitably evolve and new technologies will replace the ones listed here. But if you’re equipped with a natural desire to know and a bulldog-like determination to find solutions, then you’ll always have a job offer waiting somewhere.

无论你在高级分析工具和技术方面有多大优势,自主思考能力仍然是无可替代的。大数据处理工具会不可避免的进行演化发展,新技术会不断涌现并替代这里所列 出的技术。但是,如果你能出于本能的渴求新的知识,并且能够像猎犬一样发现问题 的解决方案,就会有大量的工作机会在等着你。




2015年07月23日更新 20990次阅读


2016年05月09日更新 14605次阅读

清华大学教授:大数据时代 统计学依

2015年08月07日更新 11272次阅读


2015年12月31日更新 10628次阅读

精通 R plot—第一部分:颜色

2016年01月20日更新 9409次阅读

非统计学专业的人该如何学习 R 语

2015年07月24日更新 8085次阅读


2015年02月20日发布 3225名学员


2015年08月30日发布 1677名学员


2016年05月10日发布 1555名学员


2016年06月03日发布 1157名学员


2016年11月08日发布 1017名学员


2015年12月17日发布 951名学员
登录 注册