Wednesday, November 12, 2008

Using BCP Format Files and SQL BULK IMPORT

You have a bunch of data - let's say locations that you are putting in a store locator database and application - which was given to you by a customer as an Excel spreadsheet. Or maybe even a CSV or Tab-delimited file.

You want to import that data into an SQL database table you coded up with a very similar schema to the data file. Here's what your database table looks like - pretty standard and minimal for this domain:

CREATE TABLE [dbo].[Locations](
[LocationID] [int] IDENTITY(1,1) NOT NULL,
[LocationName] [varchar](500) NOT NULL,
[PhoneNumber] [varchar](50) NOT NULL,
[Address1] [varchar](100) NOT NULL,
[Address2] [varchar](100) NOT NULL,
[City] [varchar](100) NOT NULL,
[StateProvince] [varchar](20) NOT NULL,
[ZipPostalCode] [varchar](20) NOT NULL,
[Latitude] [decimal](18, 9) NULL,
[Longitude] [decimal](18, 9) NULL,
CONSTRAINT [PK_Locations] PRIMARY KEY CLUSTERED ([LocationID] ASC)
)


Here is an example of what your data file might look like, if it is Comma Delimited for instance:

Bob's Meat Mart,565-555-1236,Bob Wallace,423 Highland Ave.,Juneau,WI,54701,44.778885,-91.478939
Alvin Dairy Ranch,565-555-4875,Greg Alvin,1032 Caledonia St.,Plain,WI,54603,43.838412,-91.246566


So you can see in the data file, the order of the columns is like this:

LocationName
PhoneNumber
OwnerName
Address
City
State
ZIP
Latitude
Longitude


It doesn't exactly match up with the order, and number of columns, in our database. If you try to import this data file using a minimal BULK INSERT command such as the following, you will get some error messages like I show below the command and the process will fail completely:

BULK INSERT Locations
FROM 'C:\Work\StoreLocator\Docs\TestData.csv' WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)


Msg 4864, Level 16, State 1, Line 3
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (LocationID).
Msg 4864, Level 16, State 1, Line 3
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (LocationID).


Format File to the Rescue



What we need is a BCP Format File to tell the BULK INSERT command how to manage the difference in number of columns and order of columns. I'll brush over the steps I used here to get one of those - there maybe a different or better way, but this works for me.

You can use the bcp.exe tool that comes with SQL Server 2005 (earlier versions too I think) to generate a "first draft" of a format file. BCP can connect to your database, look at the schema of a table you specify, and then generate a format file which, in its first draft, would describe a 1:1 mapping between a data file and that table.

Note: if you are having trouble getting bcp.exe to connect to your database, make sure that you turn on remote connections (I read somewhere that that is how bcp.exe connects), and that the user you are running the command as has proper access on the database/table. Another trick I figured out was to specify the server instance name using the -S ServerName parameter on the command - I have a couple of database instances, and for some reason these tools tend to default to the wrong one.

Another Note: After struggling for hours trying to get this to work with the XML format file, I tried it with the non-XML format file and got it working in minutes. I won't waste your time trying to explain that whole frustrating process, but I suggest you stick with the non-XML format file.

So, here is the bcp.exe command I came up with to create my non-XML format file, and what it generates:

> bcp StoreLocator.dbo.Locations format nul -T -n -f LocationsFormatFile.fmt -S localhost\SQLEXPRESS


9.0
10
1 SQLINT 0 4 "" 1 LocationID ""
2 SQLCHAR 2 500 "" 2 LocationName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 2 50 "" 3 PhoneNumber SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 2 100 "" 4 Address1 SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 2 100 "" 5 Address2 SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 2 100 "" 6 City SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 2 20 "" 7 StateProvince SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 2 20 "" 8 ZipPostalCode SQL_Latin1_General_CP1_CI_AS
9 SQLDECIMAL 1 19 "" 9 Latitude ""
10 SQLDECIMAL 1 19 "" 10 Longitude ""


Now we need to edit this file so that it actually represents what is in a row on the input data file and things get mapped to the correct column in the table. Also, we'll need to make BULK IMPORT skip the extra "OwnerName" field we have in the data, but not in the table. Here are the articles I read over to figure out how to do that, and the format file I ended up with:



9.0
9
1 SQLCHAR 0 500 "," 2 LocationName SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 50 "," 3 PhoneNumber SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 500 "," 0 EXTRA SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "," 4 Address1 SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 100 "," 6 City SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 20 "," 7 StateProvince SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 20 "," 8 ZipPostalCode SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 19 "," 9 Latitude ""
9 SQLCHAR 0 19 "\r\n" 10 Longitude ""



So, now everything was looking good, except I got a complaint about the Address2 field not being NULL, so I had to modify the script a little bit to support this. Nothing you can do in the Format file.

-- Temporarily set Locations.Address2 to NULLable
ALTER TABLE Locations
ALTER COLUMN [Address2] [varchar](100) NULL

BULK INSERT Locations
FROM 'C:\Work\StoreLocator\Docs\TestData_TSV.txt' WITH (
FORMATFILE='C:\Work\StoreLocator\Database\LocationsFormat.fmt'
)

-- Blank all of the NULL Address2 fields, and rest to NOT NULLable
UPDATE Locations
SET Address2 = ''
WHERE Address2 IS NULL

ALTER TABLE Locations
ALTER COLUMN [Address2] [varchar](100) NOT NULL


Success!

Disqus for A Nofsinger's Blog