This job is with Box, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.
WHAT IS BOX?Box is the worlds leading Content Cloud. We are trusted by more than 115K organizations around the world today, including nearly 70% of the Fortune 500 and leaders across deeply regulated industries (such as AstraZeneca, JLL, and Nationwide), to protect their data, fuel collaboration, and power critical workflows with secure, enterprise AI.
By joining Box, you will have the unique opportunity to continue driving our platform forward. Content powers how we work. Its the billions of files and information flowing across teams, departments, and key business processes every single day: contracts, invoices, employee records, financials, product specs, marketing assets, and more. Our mission is to bring intelligence to the world of content management and empower our customers to completely transform workflows across their organizations. With the combination of AI and enterprise content, the opportunity has never been greater to transform how the world works together and at Box you will be on the front lines of this massive shift.
Founded in 2005, Box is headquartered in Redwood City, CA, and we have offices across the United States, Europe, and Asia.
WHY BOX NEEDS YOUData Engineering initiative inside box is expanding and this role will help build the data platform engineering features and capabilities of the cloud cost management platform.
In this role you will be working alongside our team building data pipelines, support our product and analytics team members, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.
WHAT YOULL DOWork with a team of high-performing data engineers and analysts to identify business opportunities, design and build scalable data solutions
Build and own data pipelines that clean, transform, and aggregate data from disparate sources
Create and maintain optimal data pipeline architecture
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using GCP BigQuery and Spark
Build analytics tools that utilize the data pipeline to provide actionable insights into operational efficiency and other key business performance metrics
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader
Influence across teams and other functions, build best practices across the organization
5+ Years of relevant industry or relevant academia experience working with large amounts of data
Experience building and optimizing scalable data pipelines, architectures and data sets
Be cognizant of emerging technology trends and find adoption opportunities to improve existing development processes
Expert in SQL
Experience with at least one of the programming languages: Scala, Java
Experience with scripting language: Python, NodeJS
Experience with GCP (BigQuery, Dataproc, Dataflow/Fusion)
Experience with big data tools: Hadoop, Spark, Kafka, etc
Strong analytic skills related to working with structured and unstructured datasets
Experience supporting and working with cross-functional teams in a dynamic environment
Familiarity with Virtualization/container abstractions and orchestration (Kubernetes, Docker, etc.)
Familiarity with Visualization software: Tableau
Familiarity with frontend web framework: React