AWS Architecture Blog

Managed File Transfer using AWS Transfer Family and Amazon S3

Financial, healthcare, retail and other companies exchange many different types of data. This can include stock information, healthcare claims, and sharing product data files with their partners. These companies need a managed file transfer solution that supports data transformation, and exchange of data over File Transfer Protocol over SSL (FTPS), and Secure File Transfer Protocol (SFTP). This post describes how you can build a managed file transfer solution on Amazon Web Services (AWS). This managed file transfer solution provides features to support data transformation, inbound, and outbound file transfers over FTP protocols.

Managed file transfer using AWS Transfer Family

For this solution, you will use AWS services to build a managed file transfer solution that supports inbound and outbound transfers over FTP protocols. You will use AWS Transfer Family to host endpoints, which support protocols such as SFTP, FTPS, and File Transfer Protocol (FTP).  FTP is not encrypted and we suggest using SFTP or FTPS when possible. The AWS Transfer Family provides fully managed support for file transfers directly into and out of Amazon Simple Storage Service (S3) or Amazon EFS. The AWS Transfer Family integration with Amazon Route 53 can be used for DNS routing. With your data in Amazon S3, you can use AWS services for processing, analytics, machine learning, archiving, home directories, and developer tools.

AWS Transfer Family integration with Amazon S3 can be used for storing file data. You can take advantage of industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and exchange any amount of data. Note that there is a maximum size per individual file. AWS Transfer Family supports a custom API for user authentication. This enables you to use existing credentials or authentication providers, such as Amazon Cognito, Okta, LDAP, and more. You can build a serverless authentication API using Amazon API Gateway and AWS Lambda. Another benefit to using AWS serverless services, is that you do not need to manage and maintain servers. AWS Batch can run jobs with your custom code for transforming data files and processing outbound transfers. To store application data including entitlements and transactions data, you can use a managed relational database service such as Amazon Aurora. Based on your application needs, you can also use a NoSQL database service such as Amazon DynamoDB.

Managed File Transfer Process Flow

Figure 1. Managed File Transfer Process Flow

Figure 1. Managed File Transfer Process Flow

This architecture uses the Amazon Aurora service for storing application data and Amazon Cognito for identity management where user credentials are stored. The solution provides a process for inbound transfer and outbound transfer. In the inbound transfer flow, customers can interact with the SFTP endpoint to download or upload data files. In the outbound transfer flow, the system sends data file to the customer SFTP location based on their entitlements.

Inbound Transfer Flow:

  1. When you request the SFTP domain URL, the request goes to Amazon Route 53 for DNS resolution. Route 53 resolves the domain name and provides the URL of the AWS Transfer Family endpoint.
  2. The user request is sent to the AWS Transfer Family endpoint along with user credentials (user name and password).
  3. AWS Transfer Family invokes a custom authentication API hosted in API Gateway, which reviews and validates the user credentials.
  4. API Gateway and Lambda integration invoke an AuthLogic Lambda function that authenticates the user credentials by calling the Amazon Cognito API. The AuthLogic Lambda function implements custom authentication logic, and complex business rules around entitlements.
  5. Once the user is authenticated, the AuthLogic Lambda function queries the Aurora database to get user entitlements.
  6. Based on the user entitlements, a dynamic AWS Identity and Access Management (IAM) policy and logical directory mapping is returned to AWS Transfer Family.
  7. AWS Transfer Family uses the logical directory-mapping feature to provide user access to logical directories that map to data stored in an S3 bucket. S3 applies the provided IAM policy, which validates and approves user access to data.

Outbound Transfer Flow:

This component can be an optional based on your application needs. You may have analytics or Artificial Intelligence/Machine Learning (AI/ML) applications that use data stored in Amazon S3.

  1. When a customer uploads a file to S3 through an SFTP endpoint, an S3 event notification is created which invokes a Lambda function using Lambda with S3.
  2. The Lambda function initiates code to get subscriptions data from the Aurora database for the given file group. It then creates a transform job for business use cases where data transformation is required. An example of transformation would be converting data from JSON to CSV format, or healthcare data conversion to comply with HL7 Standards. Since the data transformation send jobs can be long-running, they can be run on AWS Batch.
  3. Once data transformation is complete, the transform job creates a send job for the transformed file.
  4. The send job sends the given file to the subscribed customers’ SFTP location.
  5. For business use cases where data transformation is not required, only the send job is created.

Conclusion

In this blog post, we showed how you can use AWS Transfer Family, Amazon S3, and other AWS services to build a managed file transfer application for your business. Using the AWS Managed Services for your managed file transfer application, you can take full benefit of AWS and achieve agility, elasticity, and cost savings. You can also leverage this architecture to migrate an existing home grown or proprietary vendor-managed file transfer application. To get started on inbound transfers, see Building a Simple Data Distribution Service blog for guidance.

Dathu Patil

Dathu Patil

Dathu is a Solutions Architect based out of Boston, MA. He helps customers architect scalable, highly available applications that leverage AWS services. He works as a technical leader alongside customer business, development and infrastructure teams providing deep software knowledge with respect to cloud architecture, design patterns and programming.