SSIS Best Practices | Microsoft Sql Server | Central Processing Unit

April 22, 2017 | Author: Anonymous | Category: SQL, Microsoft SQL Server
Share Embed


Short Description

Connect Output to RowCount transform • See Performance Best Practices ... Catch is that it requires Sorted inputs • ...

Description

Irish SQL Academy 2008. Level 300

Bob Duffy

DTS 2000

SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted

Optimize and Stabilize the basics Measure

Tune

Parallelize

• Minimize Minimize staging staging (else use RawFiles RawFiles if possible) • Hardware Hardware Infrastructure: Disks, RAM, CPU, Network Ne twork • SQL Infrastructure: File Groups, Indexing, Partitioning

• Replace destinations with RowCount • Source->Row Source->RowCoun Countt throughp throughput ut • Source->Destination throughput • OVAL OVAL performance tuning strategy • The Three Three S‟s S‟s • Data Flow Bag of Tricks

• Lookup patterns • Script vs custom transform transform

Sharpen

• Increase the efficiency of  every aspect aspect

Share

• Parallelize, partition, par tition, pipeline

Spend

• Buy faster, bigger, better hardware

But be aware of limitations

Row Based (synchronous)

Partially Blocking (asynchronous)

Blocking (asynchronous)

http://msdn.microsoft.com/en-us/library/ms345346.aspx

Source data Source servers EMC CX600 run SSIS 2 Gb Gb Fiber Channel Channel

Destination server runs SQL Server

Database EMC CX3-80

1 Gb Gb Ethern Ethernet et connections

4 Gb Fiber Fiber Channe Channell Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008

Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008

Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

Unisys ES7000/one Enterprise Server Microsoft Windows Server 2008 x64 Datacenter Edition 32 socket dual core Intel® Xeon 3.4 GHz (7140M) 256 GB 8 dual port 4Gbit FC Intel® PRO/1000 MT Server Adapter Pre-release build of SQL Server 20 2008 Enterprise Edition (V10.0.1300.4) EMC Clariion CX3-80 (Qty 1) 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

4 Unisys ES3220L Windows2008 x64 Enterprise Edition 2 socket quad core Intel® Xeon processors @ 2.0GHz 4 GB 1 dual port 4Gbit Emulex FC Intel PRO1000/PT dual port Pre-release build of of SQ SQL Server 20 2008 In Integration Services (V (V10.0.1300.4) 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)

C1

C1

C1

C1

Orders Table Partition 1

Partition 2

Partition 3

Partition 4

Partition 5

Partition 6

Orders_1

Orders_2

Orders_3

Orders_4

Orders_5

Orders_6

 S    S   I       S  

 S    S   I       S  

 S    S   I       S  

 S    S   I       S  

 S    S   I       S  

 S    S   I       S  

orders.tbl.1

orders.tbl.2

orders.tbl.3

orders.tbl.4

orders.tbl.5

orders.tbl.6

Partition 55

... ...

Orders_55

 S    S   I       S  

orders.tbl.55

Partition 56

Orders_56

 S    S   I       S  

orders.tbl.56

(Package details removed to protect the innocent)

Follow Microsoft Development Guidelines

• Iterative design, development & testing

Understand the Business

• People & Processes • Kimball‟s ETL and SSIS books are are an excellent reference

Get the big picture

Platform consideration considerationss

• Resource contention, processing windows, windows, … • SSIS does not forgive bad database design • Old principles still apply – e.g. load with/without indexes? • Will this run on IA64 / X64? • No BIDS on IA64 – how will I debug? • Is OLE-DB driver XXX available on IA64? • Memory and resource usage on different platforms

Process Modularity

• Break complex ETL into logically distinct packages (vs monolithic design) • Improves development & debug experience

Package Modularity

• Separate sub-processes within package into separate Containers • More elegant, easier to develop • Simple to disable whole Containers when debugging

Component Modularity

• Use Script Task/Transform for one-off problems • Build custom components for maximum re-use

Concise naming conventions Conformed “blueprint” design patterns Presentable layout Annotations Error Logging Configurations

Get as close to the data as possible • Limit number of columns • Filter number of rows

Don‟t be afraid to leverage TSQL • Type conversions, null coercing, coalescing, data type sharpening • select nullif (name, (name, „‟) from contacts order by 1 • select convert(tinyint, code) from sales

Performance Testing & Tuning • Connec Connectt Output to RowCount RowCount trans transform form • See Performance Best Practices

„FastParse FastParse‟‟ for text files

BEFORE:

AFTER:

select select * from etl.uf_FactStoreSales( etl.uf_FactStoreSales(@Date @Date)) dbo.Tbl_Dim_Store.SK_Store_ID dbo .Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di (dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_D dbo.Tbl_Dim_St im_Store. ore.Merch Merch_Div_ _Div_Num Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1

Use the power of TSQL to clean the data 'on the fly'

Avoid overdesign Maximize Parallelism Minimize blocking Minimize ancillary data

• Too many moving parts is inelegant and likely slow • But don‟t don‟t be afraid to experimen experimentt – there are many ways to solve a problem

• Allocate enough threads • EngineThr EngineThreads eads property property on DataFlow DataFlow Task • See Performance Talk

• Synchronous vs. Asynchronous components • Memcopy is expensive expensive

• For example, minimize data retrieved by LookupTx

Three Modes of  Operation Tradeoff  memory vs. performance Can use Merge Join component instead

• Full Cache – for small lookup datasets • No Cache – for volatile lookup datasets • Partial Cache – for large lookup datasets

• Full Cache is optimal, but uses the most memory, also takes time to load • Partial Cache can be expensive since it populates on the fly using singleton SELECTs • No Cache uses no memory, but takes longer

• Catch is that it requires Sorted inputs • See SSIS Performance white paper for more details

Custom components

Scripts

• Can written written in in any .Net .Net language language • Must be signed, registered and installed – but can be widely re-used • Quite fiddly for single task

• Can be written written in VisualBa VisualBasic.Ne sic.Nett or C# • Are persisted within a package – and have limited reuse • Have template methods already created for you

http://sqlcat.com

http://technet.microsoft.com/en-us/library/bb961995.aspx

http://blogs.msdn.com/sqlperf/archive/2008/02/27/e http://blogs.msdn.com /sqlperf/archive/2008/02/27/etl-world-recor tl-world-record.aspx d.aspx

View more...

Comments

Copyright © 2017 DATENPDF Inc.