SSIS Best Practices | Microsoft Sql Server | Central Processing Unit
Short Description
Connect Output to RowCount transform ⢠See Performance Best Practices ... Catch is that it requires Sorted inputs ⢠...
Description
Irish SQL Academy 2008. Level 300
Bob Duffy
DTS 2000
SSIS 2005
1.75 Developers
*Figures are only approximations and should not be referenced or quoted
Optimize and Stabilize the basics Measure
Tune
Parallelize
• Minimize Minimize staging staging (else use RawFiles RawFiles if possible) • Hardware Hardware Infrastructure: Disks, RAM, CPU, Network Ne twork • SQL Infrastructure: File Groups, Indexing, Partitioning
• Replace destinations with RowCount • Source->Row Source->RowCoun Countt throughp throughput ut • Source->Destination throughput • OVAL OVAL performance tuning strategy • The Three Three S‟s S‟s • Data Flow Bag of Tricks
• Lookup patterns • Script vs custom transform transform
Sharpen
• Increase the efficiency of every aspect aspect
Share
• Parallelize, partition, par tition, pipeline
Spend
• Buy faster, bigger, better hardware
But be aware of limitations
Row Based (synchronous)
Partially Blocking (asynchronous)
Blocking (asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
Source data Source servers EMC CX600 run SSIS 2 Gb Gb Fiber Channel Channel
Destination server runs SQL Server
Database EMC CX3-80
1 Gb Gb Ethern Ethernet et connections
4 Gb Fiber Fiber Channe Channell Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008
Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008
Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:
Unisys ES7000/one Enterprise Server Microsoft Windows Server 2008 x64 Datacenter Edition 32 socket dual core Intel® Xeon 3.4 GHz (7140M) 256 GB 8 dual port 4Gbit FC Intel® PRO/1000 MT Server Adapter Pre-release build of SQL Server 20 2008 Enterprise Edition (V10.0.1300.4) EMC Clariion CX3-80 (Qty 1) 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC
Quantity: Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:
4 Unisys ES3220L Windows2008 x64 Enterprise Edition 2 socket quad core Intel® Xeon processors @ 2.0GHz 4 GB 1 dual port 4Gbit Emulex FC Intel PRO1000/PT dual port Pre-release build of of SQ SQL Server 20 2008 In Integration Services (V (V10.0.1300.4) 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1
C1
C1
C1
Orders Table Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Orders_1
Orders_2
Orders_3
Orders_4
Orders_5
Orders_6
S S I S
S S I S
S S I S
S S I S
S S I S
S S I S
orders.tbl.1
orders.tbl.2
orders.tbl.3
orders.tbl.4
orders.tbl.5
orders.tbl.6
Partition 55
... ...
Orders_55
S S I S
orders.tbl.55
Partition 56
Orders_56
S S I S
orders.tbl.56
(Package details removed to protect the innocent)
Follow Microsoft Development Guidelines
• Iterative design, development & testing
Understand the Business
• People & Processes • Kimball‟s ETL and SSIS books are are an excellent reference
Get the big picture
Platform consideration considerationss
• Resource contention, processing windows, windows, … • SSIS does not forgive bad database design • Old principles still apply – e.g. load with/without indexes? • Will this run on IA64 / X64? • No BIDS on IA64 – how will I debug? • Is OLE-DB driver XXX available on IA64? • Memory and resource usage on different platforms
Process Modularity
• Break complex ETL into logically distinct packages (vs monolithic design) • Improves development & debug experience
Package Modularity
• Separate sub-processes within package into separate Containers • More elegant, easier to develop • Simple to disable whole Containers when debugging
Component Modularity
• Use Script Task/Transform for one-off problems • Build custom components for maximum re-use
Concise naming conventions Conformed “blueprint” design patterns Presentable layout Annotations Error Logging Configurations
Get as close to the data as possible • Limit number of columns • Filter number of rows
Don‟t be afraid to leverage TSQL • Type conversions, null coercing, coalescing, data type sharpening • select nullif (name, (name, „‟) from contacts order by 1 • select convert(tinyint, code) from sales
Performance Testing & Tuning • Connec Connectt Output to RowCount RowCount trans transform form • See Performance Best Practices
„FastParse FastParse‟‟ for text files
BEFORE:
AFTER:
select select * from etl.uf_FactStoreSales( etl.uf_FactStoreSales(@Date @Date)) dbo.Tbl_Dim_Store.SK_Store_ID dbo .Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di (dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_D dbo.Tbl_Dim_St im_Store. ore.Merch Merch_Div_ _Div_Num Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1
Use the power of TSQL to clean the data 'on the fly'
Avoid overdesign Maximize Parallelism Minimize blocking Minimize ancillary data
• Too many moving parts is inelegant and likely slow • But don‟t don‟t be afraid to experimen experimentt – there are many ways to solve a problem
• Allocate enough threads • EngineThr EngineThreads eads property property on DataFlow DataFlow Task • See Performance Talk
• Synchronous vs. Asynchronous components • Memcopy is expensive expensive
• For example, minimize data retrieved by LookupTx
Three Modes of Operation Tradeoff memory vs. performance Can use Merge Join component instead
• Full Cache – for small lookup datasets • No Cache – for volatile lookup datasets • Partial Cache – for large lookup datasets
• Full Cache is optimal, but uses the most memory, also takes time to load • Partial Cache can be expensive since it populates on the fly using singleton SELECTs • No Cache uses no memory, but takes longer
• Catch is that it requires Sorted inputs • See SSIS Performance white paper for more details
Custom components
Scripts
• Can written written in in any .Net .Net language language • Must be signed, registered and installed – but can be widely re-used • Quite fiddly for single task
• Can be written written in VisualBa VisualBasic.Ne sic.Nett or C# • Are persisted within a package – and have limited reuse • Have template methods already created for you
http://sqlcat.com
http://technet.microsoft.com/en-us/library/bb961995.aspx
http://blogs.msdn.com/sqlperf/archive/2008/02/27/e http://blogs.msdn.com /sqlperf/archive/2008/02/27/etl-world-recor tl-world-record.aspx d.aspx
View more...
Comments