top of page
Search


A Beginner’s Guide to Apache Iceberg with Amazon S3 Tables
Learn Apache Iceberg and Amazon S3 Tables with this beginner's guide. Create real Iceberg tables, load data, query, update, and evolve schemas using SageMaker Unified Studio and Athena Spark. Discover ACID guarantees, time travel capabilities, and schema evolution without file rewrites. Perfect for analytics workloads requiring transactional reliability, point-in-time accuracy, and multi-engine support on S3. From the AWS Builder Center blog: https://builder.aws.com/content/3

David McAmis
Mar 291 min read


Analyzing Insurance Churn with SageMaker Data Agent, Amazon S3 Tables and Faker
Learn how to use Amazon SageMaker Data Agent, Amazon S3 Tables, and the Python Faker library to generate realistic synthetic insurance data for churn analysis, without exposing live policyholder records. This post walks through generating synthetic tables from scratch and creating synthetic twins from existing tables, then shows how to use plain-language prompts to analyze churn patterns across products, demographics, payment behavior, and customer interactions. From the AWS

David McAmis
Mar 221 min read


Federated Healthcare Analytics Across Snowflake and Amazon S3 Tables Using Iceberg
Learn how to run federated healthcare analytics across Snowflake and Amazon S3 Tables using Apache Iceberg — no ETL pipelines or data movement required. This guide demonstrates how to join governed patient demographics in Snowflake with ICU admissions and clinical measurements stored in S3 Tables, enabling mortality risk scoring, readmission prediction, and ventilator utilization analysis while keeping sensitive data in place and governance controls intact. From the AWS Build

David McAmis
Mar 141 min read


Building a Student 360 View with SageMaker Data Agent, Databricks Unity Catalog and S3 Tables -- Without Moving a Single Row of Data
Universities can build a Student 360 view, combining student information system data in Databricks Unity Catalog with learning management data in Amazon S3 tables — without moving or duplicating data. Using catalog federation and Amazon SageMaker Unified Studio, institutions can query across federated datasets through natural language with Data Agent, enabling early intervention, improved retention, and personalized learning at scale. From the AWS Builder Center blog:...

David McAmis
Mar 111 min read


Building a Scalable Synthetic Data Pipelines with Amazon S3 Tables, Apache Iceberg and Faker
Are you keen to try out Amazon S3 Tables? Learn how to generate millions of highly realistic synthetic financial transactions using Python's Faker library and AWS Glue, then store them in S3 Tables with Apache Iceberg for testing and development without exposing sensitive production data. This hands-on tutorial demonstrates partition optimization and reproducible data generation at scale. From the AWS Builder Center blog: https://builder.aws.com/content/3A5pA0YR3Ee4qM1Wuv59dz

David McAmis
Feb 281 min read


A Beginner’s Guide to Apache Parquet
Apache Parquet is a widely used file format in modern data analytics and data engineering. It is especially common in in data lakes and data lakehouse architectures, where performance, scalability, and efficient storage are critical. This guide explains what Parquet is, where it came from, how it is structured internally, and how to query it effectively. From the AWS Builder Center Blog: https://builder.aws.com/content/38xMNi5KpMwMMVNvBaHPjEOlv8X/a-beginners-guide-to-apache-p

David McAmis
Jan 301 min read


Inside AWS Glue: Understanding the Spark Engine and Using the Spark UI for Troubleshooting
Learn how AWS Glue uses the Spark engine to create scalable, performant data pipelines, and how to monitor background processes using the Spark UI. From the AWS Builder Center blog: https://builder.aws.com/content/38Rp4bJuE5lSsX89iee3gjDeZkO/inside-aws-glue-understanding-the-spark-engine-and-using-the-spark-ui-for-troubleshooting

David McAmis
Jan 191 min read


A Beginner’s Guide to Orchestrating AWS Glue Jobs with Amazon Managed Workflows for Apache Airflow (MWAA)
If you’re building data pipelines on AWS, you’ve probably used AWS Glue to run ETL (Extract, Transform, Load) jobs. Glue automates data movement and transformation so you can focus on insights rather than infrastructure. But what if you need to schedule those jobs, run them in sequence, or trigger them based on the completion of other tasks? That’s where Amazon Managed Workflows for Apache Airflow (MWAA) comes in. This blog will provide everything you need to set up your firs

David McAmis
Jan 191 min read


Connecting to Salesforce Data Using AWS Glue
Integrating Salesforce data into your AWS analytics ecosystem is an essential step in building a comprehensive view of your customers, sales, and operations. With the growing number of options available in AWS for ingesting and transforming external data, it’s important to understand which approach best suits your needs—especially when comparing traditional Glue ETL jobs with newer Zero-ETL features. From the AWS Builder blog: https://builder.aws.com/content/2zqZrESSbXDWzl2ft

David McAmis
Nov 7, 20251 min read


Troubleshooting AWS Glue Jobs
AWS Glue is a powerful serverless ETL service designed to simplify data integration tasks at scale. However, like any data engineering tool, Glue jobs can—and will—fail due to a variety of issues: configuration errors, data mismatches, IAM permission problems, or underlying infrastructure limits. This post is a practical guide to troubleshooting AWS Glue jobs. From the AWS Builder Blog: https://builder.aws.com/content/2y4nDmkmTBfknTTWR5wwtfrbECQ/troubleshooting-aws-glue-jobs

David McAmis
Nov 7, 20251 min read


Ingest Excel Files into a Data Lake Using AWS Glue
As organizations modernize their data infrastructure, ingesting legacy Excel files into cloud-based data lakes is becoming increasingly important. Whether you’re dealing with departmental spreadsheets or externally sourced data, AWS Glue provides a serverless, low-code approach for transforming and loading Excel files from Amazon S3 into your data lake. From the AWS Builder blog: https://builder.aws.com/content/2y1jJ2tU85XtCpO5CkHsX1EfQNW/how-to-ingest-excel-files-into-a-dat

David McAmis
Jun 5, 20251 min read


AWS Machine Learning - The Art of the Possible (Twitch Series)
Hi everyone, I recently got to host an episode on "AWS Machine Learning - The Art of the Possible" on Twitch. The team covered a super...

David McAmis
May 4, 20211 min read


Snowflake + AWS Resources
Snowflake, the data warehouse built for the cloud, on AWS is an industry-leading platform for both advanced data analytics and machine...

David McAmis
Jul 31, 20202 min read


Generating Leads and Opportunities with Webinars: Getting Started
In this new article series on Medium, we will be looking at new ways of generating leads and opportunities for channel partners and how to a

David McAmis
Apr 22, 20201 min read


Getting Started with AWS Glue
AWS Glue is a full-managed, clusterless ETL service that allows you to quickly extract and prep data for a wide variety of use cases. In...

David McAmis
Nov 7, 20181 min read
bottom of page