Big data architects are trained to describe the structure and behaviour of a big data solution and how that big data solution can be delivered using big data technology such as Hadoop. He or she needs to have hands-on experience with Hadoop applications (e.g. development, administration, configuration management, monitoring, debugging, and performance tuning).
The Big Data Architect is required in any organisation that wants to build a big data environment on premises or in the cloud. They are the link between the needs of the organization and the big data scientists and the big data engineers. The big data solutions architect is responsible for managing the full life-cycle of a Hadoop solution. This includes creating the requirements analysis, the platform selection, design of the technical architecture, design of the application design and development, testing, and deployment of the proposed solution.
A Big Data Architect generally should have a lot of experience gained in normal solutions architecture before making the move to big data solutions. 8-15 years of working experience is very common for this position. Obviously, he or she needs to have experience with the major big data solutions like Hadoop, MapReduce, Hive, HBase, MongoDB, Cassandra. Quite often they also need to have experience in big data solutions like Impala, Oozie, Mahout, Flume, ZooKeeper and/or Sqoop.
In addition to big data solutions, a big data solutions architect needs to have a firm understanding of major programming/scripting languages like Java, Linux, Phyton and/or R. As well as have experience in working with ETL tools such as Informatica, Talend and/or Pentaho. He or she should have experience in designing solutions for multiple large data warehouses with a good understanding of cluster and parallel architecture as well as high-scale or distributed RDBMS and/or knowledge on NoSQL platforms. When the big data solution will be developed in the cloud, the big data solutions architecture should have experience with one of the large cloud-computing infrastructure solutions like Amazon Web Services or Elastic MapReduce.
As may be clear by now, the Big Data Architect is a very skilled architect with cross-industry, cross-functional and cross-domain know-how. He or she sketches the big data solution architecture, then monitors and governs the implementation. The design of the Big Data Architecture is the basis of a big data platform and therefore the big data solutions architect should understand and have experience with data security and privacy concerns that could arise and that should be taken care of from the start.
