What Does 2017 Have in Store for Apache Spark?

The past few years have seen the emergence of big data technologies. Irrespective of whether you are a big data analytics company or an individual big data specialist, the advances are a fit for everyone. With each new release, the renowned best big data analytics company like Hadoop, Hortonworks, etc. exceed the expectations of the companies that make use of big data services.

big data services

Apache Spark has been a reliable data processing Software for a long time now. Companies making use of big data services are loyal to the Apache Spark because of the performance in processing, streaming, and batching large data sets. This is the reason why all eyes are set on what Apache Spark will offer next.

Can Big Data Affect Election Results?

The News

Matei Zaharia who is the Co-founder and Chief Technologist at Databricks made some interesting announcement about the upcoming features in Spark at the Spark Summit East 2017 which was held in Boston between February 7th and 9th 2017.

The Key Takeaways

Zaharia made some interesting revelations about how Spark has grown over the last year from tens of thousands to more than a hundred thousand meetup members. Industry giants like Microsoft, Facebook, etc. have embraced Apache Spark which seems like an achievement in itself.

You can view the complete talk from Youtube. Here, I will highlight only the key points from the talk.

  • After introducing the three core focus areas—Hardware, Users, and Application, Zaharia talked about the Project Tungsten which is dedicated to innovate in the hardware related aspects. The results of the research and work done under Project Tungsten have allowed Databricks to make useful updates in Spark 2.0.
  • He provided a glimpse into the work in progress with respect to data transfer and integration with deep learning libraries like Intel BigDL.
  • The primary language used for Spark remain to be Java, Scala, Python, and R along with SQL and plan is to improve the experience with Python and R and create a single node compatibility.
  • The highlight of the talk was about creating a “single API for continuous app.” For now, this remains a long-term goal.

Zaharia sure seemed like a visionary with plans. It would be interesting to see how the work in progress finally becomes a reality and how the big data world receives these changes. With so much to look forward to, let’s wait for the version update for Apache Spark.

The big data community has evolved in the past few years and it seems to be growing faster than anything else. A big organization partners with a big data analytics company and realizes the benefit that the analysis can lend to the business decision-making. The number of such organizations is growing and these organizations are now sharing their stories and becoming a part of the case studies to showcase the strength of big data. With the end of the Spark Summit East 2017, one thing is for sure that 2017 will be a breakthrough year for big data.