How to Develop an SSIS Integration
SQL Server Integration Services (SSIS) is a powerful data integration, transformation, and migration tool, and it's a part of Microsoft SQL Server. It allows developers and database administrators to design, build, and execute workflows to extract, modify, and load data (ETL).
In this tutorial, we'll demonstrate the ETL process, where data from Salesforce and CSV files will be cleansed, transformed, and loaded into an SQL Server database, ready for analysis or further processing. An ETL pipeline ensures data quality and consistency across different data sources, while also providing logging for successful or failed ETL runs.
This scenario utilizes several popular SSIS tasks and components to achieve a reliable data integration process.
Why choose SSIS Components by Devart?
Devart SSIS Data Flow Components are powerful tools designed to simplify the ETL process within SSIS packages and connect cloud applications and databases through their SSIS workflows without writing complex code.
The most powerful features are:
- Export data from various sources to different file formats
- Import XML, CSV, and other files to cloud applications and databases
- Synchronize cloud applications and databases
- Migrate from one cloud application to another
- Replicate data from various cloud applications to relational databases
- Load data between various cloud and relational data sources
- Integrate multiple data sources via SSIS
- Migrate from one database to another
Create a new SSIS project
Before we start, download and install SSIS if you don't have it. SSIS is part of the SQL Server Data Tools (SSDT), which can be added as a workload in SQL Server and Visual Studio. You can refer to the How to Install SSIS page for installation instructions.
To create the project, follow these steps:
1. In Visual Studio, click Create a new project. In the Search for template field, type integration and select Integration Services Project:
2. Name the project, specify its location, and click Create.
Develop an SSIS package
In SSIS, there are different kinds of tasks for different purposes. To learn more about available tasks, go to the SSIS Tasks page.
Stage 1: Extract data from Salesforce and CSV
Salesforce Source Task
The first thing to do is extract data from the Contact object in Salesforce. As usual, the Data Flow Task is used for any kind of data transfer in SSIS as it's possible to drag and drop it from the SSIS toolbox into the Control Flow window. If the toolbox is not open, then you can click the top right side of the Control Flow window.
1. Drag and drop the Data Flow Task into the Control Flow window. Rename the Task to Get data from Salesforce.
2. Double-click the Data Flow Task and move the Salesforce Source component into the Task.
3. To open the Devart Salesforce Source Editor window, right-click the component and select Edit.
From the Salesforce Connection drop-down menu, select Create New Connection Manager. Under Host, select login.salesforce.com and select either AccessRefreshToken or Username and Password from Authentication. Accordingly, provide the values.
If you select AccessRefreshToken, click Web Login to go to the Salesforce login page where you need to provide your credentials. After that, you should see the window, where you can verify the connection by clicking Test Connection.
Now, you can see all the objects from Salesforce listed inside the Devart Salesforce Source component. You can drag and drop a table from the left side into the Query window to select data from that table. For example, we can view data from the Contact table. To do this, click the top right corner of the Query window.
You can close this window and click OK. In this way, we've configured the Devart Salesforce Source component.
CSV Source Task
1. Drag and drop the Flat File Source component into the Data Flow Task.
2. Right-click Flat File Source and select Edit. To create a new Flat File Connection Manager, click New. Then, browse the CSV file to load the data. For example, we have the account_data.csv file.
SSIS Flat File Source will automatically detect the delimiter in the file. Ensure that the Column names in the first data row option is enabled and that header details are available inside the CSV or text file. For this, click the Preview tab.
If you want to rename a field or change its data type, click Advanced. You can also increase the length of the field if required.
3. To save the changes, click OK twice.
4. Drag and drop two row count transformations into the Data Flow Task. They will be used to sort the data on the specified column. Connect the Devart Salesforce Source and Flat File Source components to their corresponding Row Count transformations.
5. Right-click the first Sort Transformation and select Edit. To sort the data based on the AccountId column, select the checkbox left to AccountId, and click OK.
6. Repeat the same operation for the second Sort Transformation and drag and drop the Merge Join Transformation into the Data Flow Task.
7. Connect the left Sort Transformation to Merge Join and select Merge Join Left Input. Link the right Sort Transformation to Merge Join and Merge Join Right Input will be automatically selected.
8. Right-click Merge Join Transformation and select Edit.
9. Inner Join is used to join two sources. In Sort, select the top left checkbox, which will add all columns to Output Alias. And then in Sort 1, select the checkboxes for ResAddress, ResCity, ResState, ResPostalCode. Those columns will be added to the data flow. Finally, click OK.
Thus, we've prepared the Merge Join Transformation.
Stage 2: Perform data transformation
Data Conversion Task
Let's add the Data Conversion Task from the toolbox to the Data Flow Task, and then connect the Merge Join Transformation to the Data Conversion Task.
For example, there is an AccountId field of Nvarchar(18) data type, and you need to convert it to varchar(18). To do this, right-click Data Conversion Task and click Edit. Select the checkbox next to AccountId. As a result, AccountId will be inserted into the list for which a new column will be added in the Data Flow. Change the data type of the column by selecting string [DT_STR] and setting Length to 18.
Copy of AccountId is set in Output Alias to add a new column, Copy of AccountId, with the varchar(18) data type.
Derived Column Task
To add a new column to the Data Flow, embed the Derived Column Transformation from the toolbox into the Data Flow Task. Connect the Data Conversion Task with the Derived Column Transformation.
To configure Derived Column Transformation, double-click it. To add a new column to the Data Flow and name it Created, type the following inside Expression
:
GETDATE()
To insert the CreatedBy column, paste this code into a new row inside Expression
:
(DT_STR, 100, 1252) @[System::UserName]
To save the changes, click OK.
Stage 3: Cleanse and validate data
Conditional Split Task
Let's set up the Conditional Split Transformation.
1. Connect Derived Column Transformation with Conditional Split Transformation.
2. To filter records if Mailing Addresses are blank, create the Blank Mailing Address column and add this value to the Condition field:
LEN(ReplaceNull(MailingStreet,"")) == 0
All other records with Mailing Addresses will be sent to the Conditional Split Default Output.
Lookup Transformation
We'll use the Lookup Transformation to determine whether the data inserted into the SQL Server already exists in the destination table. We're going to add data if it's not in the table.
1. Create the Contact table:
CREATE TABLE Contact ( [Id] nvarchar(18), [IsDeleted] bit, [MasterRecordId] nvarchar(18), [AccountId] varchar(18), [LastName] nvarchar(80), [FirstName] nvarchar(40), [Salutation] nvarchar(255), [Name] nvarchar(121), [OtherStreet] nvarchar(255), [OtherCity] nvarchar(40), [OtherState] nvarchar(80), [OtherPostalCode] nvarchar(20), [OtherCountry] nvarchar(80), [OtherLatitude] float, [OtherLongitude] float, [OtherGeocodeAccuracy] nvarchar(255), [OtherAddress] nvarchar(max), [MailingStreet] nvarchar(255), [MailingCity] nvarchar(40), [MailingState] nvarchar(80), [MailingPostalCode] nvarchar(20), [MailingCountry] nvarchar(80), [MailingLatitude] float, [MailingLongitude] float, [MailingGeocodeAccuracy] nvarchar(255), [MailingAddress] nvarchar(max), [Phone] nvarchar(40), [Fax] nvarchar(40), [MobilePhone] nvarchar(40), [HomePhone] nvarchar(40), [OtherPhone] nvarchar(40), [AssistantPhone] nvarchar(40), [ReportsToId] nvarchar(18), [Email] nvarchar(80), [Title] nvarchar(128), [Department] nvarchar(80), [AssistantName] nvarchar(40), [LeadSource] nvarchar(255), [Birthdate] date, [Description] nvarchar(max), [OwnerId] nvarchar(18), [CreatedDate] datetime, [CreatedById] nvarchar(18), [LastModifiedDate] datetime, [LastModifiedById] nvarchar(18), [SystemModstamp] datetime, [LastActivityDate] date, [LastCURequestDate] datetime, [LastCUUpdateDate] datetime, [LastViewedDate] datetime, [LastReferencedDate] datetime, [EmailBouncedReason] nvarchar(255), [EmailBouncedDate] datetime, [IsEmailBounced] bit, [PhotoUrl] nvarchar(255), [Jigsaw] nvarchar(20), [JigsawContactId] nvarchar(20), [CleanStatus] nvarchar(255), [IndividualId] nvarchar(18), [IsPriorityRecord] bit, [ContactSource] nvarchar(255), [Level__c] nvarchar(255), [Languages__c] nvarchar(100), [ResAddress] varchar(50), [ResCity] varchar(50), [ResState] varchar(50), [ResPostalCode] varchar(50), [Created] datetime, [CreatedBy] varchar(100) )
2. Drag and drop the Lookup Transformation from the toolbox into the Data Flow Task. Then, connect Conditional Split Default Output with Lookup Transformation Task.
3. In Specify how to handle rows with no matching entries, select Redirect rows to no match output and click Connections.
4. To create an OLE DB Connection Manager, click New. Next, provide the SQL Server instance name in Server Name and the database name in Select or enter a database name. To make sure the connection is successful, click Test Connection.
5. To make sure the connection is successful, click Test Connection.
6. Select Use results of an SQL query" and type this query:
select AccountId from Contact
7. Click Columns and map Copy of AccountId from the left side with AccountId.
Stage 4: Load data into SQL Server
SQL Server Destination Task
1. Drag and drop the OLE DB Destination from the toolbox into the Data Flow Task, which will be used to write data into the Contact table. Connect Lookup No Match Output from the Lookup Transformation with OLE DB Destination.
2. To configure the OLE DB Destination, double-click it. In Data Access mode, select Table or View - Fast Load and Contact Table from Database.
3. To ensure all input columns are mapped with destination columns, click Mappings. For AccountId, map Copy of AccountId from the left side to AccountId on the right side. Great! The SSIS package is ready.
4. To execute the package, click Execute or press F5.
You should see that 30 records have been inserted into the Contact table.
Video tutorial: Practical SSIS Webinar
This video shows you how to build an efficient, automated workflow in SSIS to streamline data flows across multiple systems, providing a unified view of customer order status for better decision-making and customer outreach.
Conclusion
If you want your data to be clean, modified, and loaded efficiently into target systems, you should consider building an SSIS integration. This approach ensures scalable, reliable solutions that boost data accuracy and accessibility across your organization. Whether you're integrating data from Salesforce, CSV files, or other sources, proficiency in SSIS not only enhances data management but also supports informed decision-making.
We use cookies to provide you with a better experience on the Devart website. You can read more about our use of cookies in our Cookies Policy.
Click OK to continue browsing the Devart site. Be aware you can disable cookies at any time.