Last Updated on May 12, 2025
Here we go a little deeper on processes and services we may use to help your team leverage AI solutions on AWS.
But before we start, here are some things to consider to determine if you’re in a position to leverage into cost-effective AI development:
- Are you a small business looking to dip your toes in the world of Generative AI but have no idea where to start?
- Is your HR department planning a budget and gasping at the salaries of data scientists, machine learning operations specialists and software developers you will need to even think about deploying an AI MVP?
- Have you corresponded with large consulting agencies with an overhead that is pricing you out?
- Are you discussing plans with an offshore consulting agency that doesn’t align with business hours and ethos? Do you simply feel you don’t trust them to get the job done in scope and on target?
- Do you want to learn more on how to harness the power of AI with Machine Learning, Conversational AI, Predictive Analysis, Computer Vision, Natural Language Processing, Business Intelligence or Recommendation engines?
Why use Silver State Data Solutions?
Machine learning and artificial intelligence have emerged as game-changers in business and can uncover valuable insights, automate processes, and help your team make data-driven decisions.
At Silver State Data Solutions we focus on using automated, lo-code/no-code solutions as much as possible to keep your costs at a minimum. The flexibility of AWS pay-as-you-go pricing and cost threshold alerts allow you to monitor every facet of development and cost effectively build, train and deploy models for your business applications.
Here we outline some of the services and how we’ll use them.
Custom Model Development Overview
After identifying your business problem and determining a solution, there are four general domains involved with delivering a machine learning solution:
Data Processing – Prepare, transform, clean and collect for feature engineering specific to solving your business problem.
Model Development – Define architecture, pick the right LLM for your problem, train, tune, build, deploy and evaluate.
Deployment – Deploying trained models to production for making predictions & inferences includes choices including budgets, hardware and compute power.
Monitoring – Early detection of deviations, and supporting timely mitigation measures (including retraining) is key to keeping your model from data drift, and to cotinue to effectively answer your business needs.
Bedrock
With the Amazon Bedrock serverless experience, you don’t need to manage the infrastructure. For security, we use PrivateLink and IAM permissions to establish private connectivity between your foundation models and on-premises networks without exposing your traffic to the internet – and control which services can receive inferences.
Bedrock Agents are fully managed capabilities where developers can use AI to complete complex tasks for a wide range of use cases and deliver up-to-date answers based on proprietary knowledge sources. An agent securely connects to company data through an API and automatically converts the data into a machine-readable format and augments the request with relevant information, generating the most accurate response.
Bedrock Knowledge Bases, can give foundation models and agents contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG).
Fine Tuning. By providing your own labeled training dataset you can fine-tune a foundation model to improve the model’s performance on specific tasks. Bedrock makes a separate copy of the base foundation model and trains this private copy of the model on your data.
Continued pre-training adapts the model from a general domain to a more specific domain, such as medical, law, or finance – and is available for Amazon Titan models in Bedrock.
Some typical use cases for Bedrock are: text generation, virtual assistants, text and image search, text summarization and image generation – all with guardrails to keep answers safe and relevant to your data and your artificial intelligence policies.
AWS offers model development options that cost effectively simplify and automate the process of preparing, training and delivering AI models for your business’ needs – without significant coding or data scientist skills. I will outline a few of the services offered by AWS that we use to implement AI applications for small businesses.
Automated and No Code / Low Code Services
Canvas is a no-code visual interface that allows you to build and deploy models and start making predictions immediately.
Amazon SageMaker Autopilot is a key combination of applications designed to democratize AI for your small business by automating the entire machine learning process, from data preprocessing and feature engineering to model training, tuning, deployment and monitoring.
Autopilot automates key tasks of an automatic ML (AutoML) process like exploring data, selecting the relevant algorithm for the problem type, and then training and tuning it.
These services allows small businesses like yours to cost effectively build and deploy high-quality foundation models in a robust machine learning pipeline with minimal expertise and effort, making AI more accessible for a wide range tasks to enhance employee productivity including content writing, code generation, question & answer chatbots, copywriting, summarization, classification, and information retrieval models.
Data Preparation & Transformation
Clean, organized data is key to getting a model trained well on your data, and Data Wrangler provides an intuitive ETL (extract, transform, load) pipeline that integrates seamlessly with Amazon S3 and SageMaker pipelines, and other AWS services – reducing the need for manual coding or operational overhead. Data Wrangler is suitable for quick, Python-based data manipulation tasks, integrating AWS services into data science workflows, or prototyping, especially when your data is intended to end up in SageMaker.
It provides quick data import, visualization, transformation and includes more than 300 built-in transformations to process and prepare your data for model training – all without needing to code data preparation steps. You can quickly cleanse and graphically understand your data and detect outliers with preconfigured visualization templates.
If your data needs more refining, Glue is a fully managed ETL service designed for data processing tasks like custom ETL jobs or periodic processing of large datasets, Glue also auto-generates Python code to handle issues like distributed processing, scheduling, and integration with data sources. when necessary. Glue DataBrew and Glue Studio provide a visual interface/dashboard for creating ETL workflows, also no code. Glue is appropriate for large-scale, production-grade ETL pipelines or when you need serverless scalability and built-in metadata cataloging.
More Services for Data Processing, Integrity & Validation
Glue DataBrew – Automatically generate statistics and visualizations to understand your data quality. Access over 350 pre-built transformations for common data-quality tasks. You can also create custom Python transformations to handle complex data quality.
Glue Data Quality – Define custom rules using SQL to check for data quality issues.
Schedule data quality checks to run automatically on a recurring basis. Visualize data quality metrics and trends through interactive dashboards
Data Catalog – is the persistent metadata store for your data assets, regardless of where they are located – it contains table definitions, job definitions, schemas, and other control information to help you manage your Glue environment. You can also change data types, column names, add/remove columns and then view these changes in Athena – an interactive query service that helps to analyze data in S3 using SQL.
Explainability and Model Behavior
AutoPilot Explainability with AI Autopilot – Helps provide insights into how machine learning (ML) models make predictions. These tools can help your stakeholders understand model characteristics. It uses clarify and Shapley values to give transparency at how it arrives at predictions, offering clarity on potential bias that can be addressed with human intervention.
Comprehend – You may want to use this tool to extract meaning from your documents, identify named entities, language detection, key topics, sentiment and themes present in your text data.
SageMaker Clarify – visually inspect your dataset and machine learning predictions for several types of bias. Automatically detect data points that are statistical outliers. Generates feature importance values and SHAP explaining which features drive predictions. Generates statistics and plots about data distribution, missing values, errors.
DataBricks – a fast, easy and collaborative Apache Spark™ based analytics platform with one-click setup, streamlined workflows, and the scalability and security of AWS.
Training Models with SageMaker Built In Algorithms
Linear Learner: predicting customer churn
XGBoost: prevent overfitting, good for IMBALANCED data
K-Means Clustering: for grouping data
Random Cut Forest: for anomaly detection
Seq2Seq: anything text related
PCA Principal Component Analysis: for dimensionality reduction & retaining data meaning
NTM – Neural Topic Model: finding document meaning
BlazingText: text classification
DeepAR: forecasting stock prices
Factorization Machines: recommendation engines
Some other benefits of using SageMaker is that it provides a simple interface that allows you to build and train machine learning models with just a few clicks – provides a managed, pay-as-you-go, scalable infrastructure to run experiments and deploy endpoints without the overhead of setting up and maintaining the underlying infrastructure.
SageMaker integrates seamlessly with many AWS services, such as S3 and Lambda, making it easy to build end-to-end machine learning solutions. Sagemaker also uses parallelization across compute instances with GPU support out right out of the box reducing the time and cost to train and tune models at scale without the need to manage infrastructure.
Machine Learning Pipeline
One of the core tenets of cloud computing is the practice of automating infrastructure provisioning through Cloud Formation Templates – Infrastructure as Code (IaC.) These same standards of resource provisioning are applied throughout all of software development and especially machine learning.
Some of tools we’ll use to maintain the ML Pipeline include: SageMaker Studio, S3, Athena, Glue, EMR, CodePipeline, CodeBuild, CloudFormation, Elastic Container Registry, Lambda, and Step Functions.
Security
To access your on premises data and move it to S3 for storage where we will start preparation for model training, we first set up a Site-to-site VPN (virtual private network) using StrongSwan to allow private traffic from your site to your VPC (virtual private cloud.) A VPC Interface Endpoint is used as a target within the VPC for traffic going to S3. The endpoint then routes traffic across PrivateLink to keep data internal to the AWS infrastructure. Then the S3 buckets are accessed through a URL, and we use Route 53 to provide DNS services to internal and external clients to resolve urls to the needed IP address. A private zone is used to provide DNS services within the private network. Route 53 can provide a split horizon DNS service which resolves domain name IP addresses depending on the location of the requestor.
This set of services together allows on-premises clients to request a DNS lookup of the S3 bucket url from the on-premises DNS system. The DNS lookup request is then provided by the Route 53 private zone to the on-premises DNS system of clients. After the on-premises client has the IP address, it then sends traffic across the VPN to the interface endpoint in your VPC. That traffic then traverses AWS PrivateLink to provide network access to S3, without needing to expose the bucket to the internet.
Encryption & Data Storage Options:
KMS – used to encrypt data stored in a variety of AWS services,
including Amazon SageMaker, Amazon Redshift
EBS – enable volume encryption
SSE-C – provides the customer more control over the key management process
Enable RDS encryption – allows all new data written to the underlying storage to be encrypted
Enable data at rest encryption with KMS on: Redshift, ElastiCache, SageMaker Notebooks,
Training Job storage.
- IAM Access Analyzer: helps generate least-privilege policies.
- Network Isolation: ensures secure training and inference containers, blocks internet access for your ML Workflow by using a VPC (virtual private cloud) with VPC Endpoints to securely access data stored in S3.
- Internet Access: Notebooks are internet enabled by default, so you will need to disable this as described above, but we use PrivateLink interface endpoint, or NAT Gateway, to allow outbound connections for training and hosting.
Model Development with AutoPilot
Sagemaker AutoPilot is a fully managed model development and tuning service that evaluates multiple candidate models to determine their predictive power for your needs with minimal manual effort. Just set up the wizard and it automates algorithm selection, data preprocessing, tuning, infrastructure, trial and error and creates the optimized model for your specific needs. Essentially turns you into a machine learning expert with no experience.
Start by loading data from s3, select target columns, select auto model creation, it then creates a model notebook which can be used as your starting point. Once selected, you deploy and monitor model and you can refine the notebook. You can also add human guidance, all with the flexibility of working with or without code.
For cost and infrastructure choices, we use Inference Recommender to analyze your ML model and workload patterns, and provide recommendations for the best compute resources and configurations.
SageMaker AutoPilot also automatically generates AutoPilot Explainability and uses clarify and Shapley values to give transparency at how it arrives at predictions – offering clarity on potential bias which can be addressed with human intervention.
With Quick Model you can test how well your choices actually affect the model down the pipeline, export code to a Jupyter Notebook and generate code (python) for your pipeline that will do the data transformations.
Another no/low-code solution is Canvas – which allows you to just upload data, select data column to you want to make predictions with, build the model and immediately start making predictions. It has a variety of prebuilt docker images with models you can use right out of the box or right from within a Jupyter notebook. You can quickly spin up servers and deploy endpoints with ease.
The Generative AI Application Builder on AWS is another solution that simplifies the development, rapid experimentation, and deployment of generative AI applications without requiring deep AI expertise. It allows users to easily incorporate business data, compare the performance of LLMs, run multi-step tasks through AI Agents, and deploy applications with enterprise-grade architecture
Jumpstart offers a range of pre-built solutions, including models, notebooks, and algorithms for both machine learning and generative AI use cases. Pre-trained models are fully customizable for your use case with your data. Jumpstart offers pre built solutions, that you can evaluate, compare, and then select a model for your needs quickly. You can deploy with just a few clicks. Pre-trained models include tasks like summarization and image generation. You can easily deploy them into production with the console UI or SDK.
Deployment
SageMaker Pipelines provides an intuitive drag-and-drop UI or the Python SDK for creating, executing, scaling, monitoring and auditing end-to-end ML workflows – including deployment, retraining, experiments, testing and more. Fully automated, you just kick off a pipeline and SageMaker Pipeline manages all the steps for you and can alerts (by way of EventBridge) when human intervention – such as approval steps – are needed.
Some of tools we may use in the CI/CD pipeline include:
Step Functions and State Machines give you a visual interface to coordinate and manage components of a task driven workflow. Lambda, a pay-as-you-go code running platform – can perform sprcific tasks during the deployment process. CodeBuild for consistent build and artifact creation.
CodeDeploy can be utilized as part of a broader CI/CD pipeline setup alongside services like CodePipeline and CodeBuild.
CloudFormation – a core tenet of AWS and cloud development – is the main tool of the IaC (infrastructure as code) method and is the key to consistent, transparent and automatic provisioning of system resources.
ECR helps organize and maintain container storage, and Model Registry is used for centralized model storage and versioning including Lineage Tracking.
Monitoring Model Performance & Detecting Drift
Model Monitor is specifically designed to detect and ALERT you of data quality changes, data and model drift quality. Model Registry is used to catalog models for production and manage versions, share, deploy, build libraries, and automate these tasks within the CI/CD pipeline.
Model Cards store model information to help you centralize and standardize model documentation while Lineage Tracking helps you investigate the pipelines and events that creates model artifacts.
CloudWatch can monitor SageMaker metrics, which include pipeline execution metrics and step metrics for pipeline executions and steps.
Sagemaker Debugger saves internal model state to compare to previous states, such as gradients and tensors. You can define rules for unwanted conditions beforehand, and it has built-in rules for framework operations such as debugging model parameters, give insights into resource utilization and potential bottlenecks, Debugger saves logs and can fire CloudWatch events that send alerts via SNS when thresholds are met. Debugger also offers a dashboard view of these processes.
Conclusion
With this vast set of services and tools, AWS enables small business to harness the power of AI solutions in a cost-effective, pay-as-you-go pricing structure, helping our team follow the tenets of the AWS Well Architected Framework to ensure your development process embraces the Six Pillars of architectural best practices:
- Operational Excellence: automation, scalability, monitoring and incident response.
- Security: protect data and systems from unauthorized access and malicious activity.
- Reliability: ability to recover from failures.
- Performance efficiency: meet workload requirements.
- Cost Optimization: right-sizing resources and optimizing costs
- Sustainability: consider environmental impacts
Please contact us to learn more and schedule a consultation today!