|
Home > Archive > microsoft.public.sqlserver.server > June 2002 > Getting correct sequence numbers in a table.
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Getting correct sequence numbers in a table.
|
|
| Ben Brugman 2002-06-28, 4:25 am |
| This is partly a repeat.
We had (it is solved now) a problem. During solving it I tried some
constructions which did not work. In the end the problem was solved
using a Delphi program with an ADO interface. This took about 2
hours to run which was acceptable in our situation.
My problem (partly psycological) ;
I wasn't able to solve the problem within the given time frame using
only tools supplied with SQL-server !
The problem.
An imported table with 3.6 millions rows, about 20 fields, 6 indexes
and quite some relations with other tables.
Three important fields where
Object_id OID
Seq_id S_ID
date D
OID and D together form a unique key. S_ID should be in the order
of the date. But during transport and import this order was not
kept correctly. So the S_ID had to be 'renumbered', with a specific
range.
The end solution was the Delphi program running over all OID, doing a
SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
number then increase the number with one and continue.
(The Delphi program was an adaptation from an existing program and
took only minutes to build).
My Question is :
How to solve this kind of problem with only SQL-server tools ?
Especially on larger tables. (millions or tens of millions of
records).
I tried to cluster the table in the correct order (OID,D), and
make the S_ID an identity key starting with a correct seed
for the range. (Tried this on a small table.) Problems it
took too long, and the S_ID didn't get changed. From experience
I know that the identity key does not always get created in
the order of which the table is clustered. As said because
of the time pressure with did the Delphi solution.
Thanks in advance
Ben Brugman
| |
| Roy Harvey 2002-06-28, 7:25 am |
| Ben,
Interesting problem, and I appreciate your need for a SQL based
solution.
There are a couple of points I am not clear on. First, are the
duplicate dates within a single OID? This can make a big difference
when doing this in SQL. Second, it sounds like the first S_ID within
an OID is not 1. If so, how is that initial S_ID determined?
Lets look at a simplified case, by assuming that the dates within OID
are unique, and S_ID starts at 1 within a given OID. This would
benefit by clustering on (OID,D) until S_ID is correct.
UPDATE T
SET S_ID = (select count(*) from T as T2
where T.OID = T2.OID
and T.D >= T2.D)
But it could get a LOT more complicated, depending on how innacurate
my assumptions were.
Roy
benbrugman@onbekend.nl (Ben Brugman) wrote:
>This is partly a repeat.
>
>We had (it is solved now) a problem. During solving it I tried some
>constructions which did not work. In the end the problem was solved
>using a Delphi program with an ADO interface. This took about 2
>hours to run which was acceptable in our situation.
>
>My problem (partly psycological) ;
>I wasn't able to solve the problem within the given time frame using
>only tools supplied with SQL-server !
>
>The problem.
>An imported table with 3.6 millions rows, about 20 fields, 6 indexes
>and quite some relations with other tables.
>Three important fields where
>Object_id OID
>Seq_id S_ID
>date D
>
>OID and D together form a unique key. S_ID should be in the order
>of the date. But during transport and import this order was not
>kept correctly. So the S_ID had to be 'renumbered', with a specific
>range.
>
>The end solution was the Delphi program running over all OID, doing a
>SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
>number then increase the number with one and continue.
>(The Delphi program was an adaptation from an existing program and
>took only minutes to build).
>
>My Question is :
>How to solve this kind of problem with only SQL-server tools ?
>Especially on larger tables. (millions or tens of millions of
>records).
>
>I tried to cluster the table in the correct order (OID,D), and
>make the S_ID an identity key starting with a correct seed
>for the range. (Tried this on a small table.) Problems it
>took too long, and the S_ID didn't get changed. From experience
>I know that the identity key does not always get created in
>the order of which the table is clustered. As said because
>of the time pressure with did the Delphi solution.
>
>Thanks in advance
>
>Ben Brugman
| |
| Steve Kass 2002-06-28, 7:25 am |
| Ben,
There's a trick that has a good chance of working if there is a clustered
index on (Object_id,date). I think it may also help if Seq_id does not
participate in other indexes.
The problem is that there is no documentation that this works, so you
would have to check after trying it to see if it did. But if it turns out to
be reliable and you do a check that invokes a slower backup plan if it
fails, it might be worth considering. Here's an example using a smaller
table with similar structure:
create table T (
Obj_id int,
Seq_id int,
Dat datetime,
other varchar(1000) default replicate('ABCD',200),
constraint T_id_date primary key (Obj_id, Dat)
)
--sample data
insert into T(Obj_id, Seq_id, Dat)
select orderid, 0, getdate()-productid
from northwind..[order details]
GO
--the update query
declare @count int
set @count = 0
declare @lastObj int
set @lastObj = -1
update T
set @count = Seq_id = case when @lastObj = Obj_id then @count + 1 else 1 end,
@lastObj = Obj_id
go
select * from T
go
drop table T
Steve Kass
Drew University
Ben Brugman wrote:
> This is partly a repeat.
>
> We had (it is solved now) a problem. During solving it I tried some
> constructions which did not work. In the end the problem was solved
> using a Delphi program with an ADO interface. This took about 2
> hours to run which was acceptable in our situation.
>
> My problem (partly psycological) ;
> I wasn't able to solve the problem within the given time frame using
> only tools supplied with SQL-server !
>
> The problem.
> An imported table with 3.6 millions rows, about 20 fields, 6 indexes
> and quite some relations with other tables.
> Three important fields where
> Object_id OID
> Seq_id S_ID
> date D
>
> OID and D together form a unique key. S_ID should be in the order
> of the date. But during transport and import this order was not
> kept correctly. So the S_ID had to be 'renumbered', with a specific
> range.
>
> The end solution was the Delphi program running over all OID, doing a
> SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
> number then increase the number with one and continue.
> (The Delphi program was an adaptation from an existing program and
> took only minutes to build).
>
> My Question is :
> How to solve this kind of problem with only SQL-server tools ?
> Especially on larger tables. (millions or tens of millions of
> records).
>
> I tried to cluster the table in the correct order (OID,D), and
> make the S_ID an identity key starting with a correct seed
> for the range. (Tried this on a small table.) Problems it
> took too long, and the S_ID didn't get changed. From experience
> I know that the identity key does not always get created in
> the order of which the table is clustered. As said because
> of the time pressure with did the Delphi solution.
>
> Thanks in advance
>
> Ben Brugman
| |
| Dan Guzman 2002-06-28, 7:25 am |
| I believe the fundamental issue here is that you are materializing data
which should be derived. You might consider calculating S_ID with a
subquery, user-defined function or computed column. Examples below.
--subquery
SELECT
OID,
(SELECT COUNT(*)
FROM Table1 a
WHERE a.OID = b.OID AND a.D <= b.D) AS S_ID,
D
FROM Table1 b
ORDER BY OID
--user-defined function
CREATE FUNCTION dbo.udf_Get_S_ID(@OID int, @D datetime)
RETURNS int
AS
BEGIN
RETURN( SELECT COUNT(*)
FROM Table1
WHERE OID = @OID AND D <= @D)
END
GO
SELECT
OID,
dbo.udf_Get_S_ID(OID, D) AS S_ID,
D
FROM Table1 a
ORDER BY OID
--computed column based on udf
CREATE TABLE Table1(
OID int NOT NULL,
S_ID AS (dbo.udf_Get_S_ID(OID, D)),
D datetime NOT NULL,
OtherData varchar(100))
GO
ALTER TABLE Table1 ADD CONSTRAINT PK_Table1 PRIMARY KEY (OID, D)
GO
SELECT
OID,
S_ID,
D
FROM Table1 a
ORDER BY OID
GO
--
Hope this helps.
Dan Guzman
SQL Server MVP
-----------------------
SQL FAQ links (courtesy Neil Pike):
http://www.ntfaq.com/Articles/Index...epartmentID=800
http://www.sqlserverfaq.com
http://www.mssqlserver.com/faq
-----------------------
"Ben Brugman" <benbrugman@onbekend.nl> wrote in message
news:3d1c2c2a.8015093@news.nl.uu.net...
> This is partly a repeat.
>
> We had (it is solved now) a problem. During solving it I tried some
> constructions which did not work. In the end the problem was solved
> using a Delphi program with an ADO interface. This took about 2
> hours to run which was acceptable in our situation.
>
> My problem (partly psycological) ;
> I wasn't able to solve the problem within the given time frame using
> only tools supplied with SQL-server !
>
> The problem.
> An imported table with 3.6 millions rows, about 20 fields, 6 indexes
> and quite some relations with other tables.
> Three important fields where
> Object_id OID
> Seq_id S_ID
> date D
>
> OID and D together form a unique key. S_ID should be in the order
> of the date. But during transport and import this order was not
> kept correctly. So the S_ID had to be 'renumbered', with a specific
> range.
>
> The end solution was the Delphi program running over all OID, doing a
> SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
> number then increase the number with one and continue.
> (The Delphi program was an adaptation from an existing program and
> took only minutes to build).
>
> My Question is :
> How to solve this kind of problem with only SQL-server tools ?
> Especially on larger tables. (millions or tens of millions of
> records).
>
> I tried to cluster the table in the correct order (OID,D), and
> make the S_ID an identity key starting with a correct seed
> for the range. (Tried this on a small table.) Problems it
> took too long, and the S_ID didn't get changed. From experience
> I know that the identity key does not always get created in
> the order of which the table is clustered. As said because
> of the time pressure with did the Delphi solution.
>
> Thanks in advance
>
> Ben Brugman
| |
| Ben Brugman 2002-06-28, 9:25 am |
| Xref: tkmsftngp01 microsoft.public.sqlserver.server:220161
On Fri, 28 Jun 2002 08:55:24 -0400, Roy Harvey
<RoyHarv@compuserve.com> wrote:
>Ben,
>
>Interesting problem, and I appreciate your need for a SQL based
>solution.
>
>There are a couple of points I am not clear on. First, are the
>duplicate dates within a single OID? This can make a big difference
>when doing this in SQL. Second, it sounds like the first S_ID within
>an OID is not 1. If so, how is that initial S_ID determined?
dates within a single OID are assumed to be unique.
The range for S_ID is known and goes for all rows.
There are no duplicates of S_ID. (over the complete tabel)
>
>Lets look at a simplified case, by assuming that the dates within OID
>are unique, and S_ID starts at 1 within a given OID. This would
>benefit by clustering on (OID,D) until S_ID is correct.
>
>UPDATE T
> SET S_ID = (select count(*) from T as T2
> where T.OID = T2.OID
> and T.D >= T2.D)
For the offset of S_ID a constant can be added, so that's no problem.
(The S_ID should be unique but I didn't really specify, so let's
ignore this fact)
But would this run efficiently for 3.6 millions rows, because this
would result in 3.6 million subqueries as wel.
I'll have a go with the above script on my small set.
Just wait, I'll have a go. Back in a moment.
; ;
; ;
; ;
OKEE I am back with the results.
The update ran fast. (For 3090 in the QA under a second).
Altered the update to
UPDATE T
SET S_ID = (select count(*) from T as T2
where T.OID > T2.OID OR
( T.OID = T2.OID
and T.D >= T2.D))+offset-1
This gave exactly the results I was looking for. For 3090 rows
this took 9 seconds. For 3.6 million rows if this scales that would
be around 10000 seconds. Probably it will not scale well.
The addition I made does slow the statement down.
Thanks for you help,
For next time I'll keep this technique in mind.
ben brugman
>
>But it could get a LOT more complicated, depending on how innacurate
>my assumptions were.
>
>Roy
>
>
>benbrugman@onbekend.nl (Ben Brugman) wrote:
>
>>This is partly a repeat.
>>
>>We had (it is solved now) a problem. During solving it I tried some
>>constructions which did not work. In the end the problem was solved
>>using a Delphi program with an ADO interface. This took about 2
>>hours to run which was acceptable in our situation.
>>
>>My problem (partly psycological) ;
>>I wasn't able to solve the problem within the given time frame using
>>only tools supplied with SQL-server !
>>
>>The problem.
>>An imported table with 3.6 millions rows, about 20 fields, 6 indexes
>>and quite some relations with other tables.
>>Three important fields where
>>Object_id OID
>>Seq_id S_ID
>>date D
>>
>>OID and D together form a unique key. S_ID should be in the order
>>of the date. But during transport and import this order was not
>>kept correctly. So the S_ID had to be 'renumbered', with a specific
>>range.
>>
>>The end solution was the Delphi program running over all OID, doing a
>>SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
>>number then increase the number with one and continue.
>>(The Delphi program was an adaptation from an existing program and
>>took only minutes to build).
>>
>>My Question is :
>>How to solve this kind of problem with only SQL-server tools ?
>>Especially on larger tables. (millions or tens of millions of
>>records).
>>
>>I tried to cluster the table in the correct order (OID,D), and
>>make the S_ID an identity key starting with a correct seed
>>for the range. (Tried this on a small table.) Problems it
>>took too long, and the S_ID didn't get changed. From experience
>>I know that the identity key does not always get created in
>>the order of which the table is clustered. As said because
>>of the time pressure with did the Delphi solution.
>>
>>Thanks in advance
>>
>>Ben Brugman
>
Ben Brugman
| |
| Ben Brugman 2002-06-28, 9:25 am |
| On Fri, 28 Jun 2002 09:22:42 -0400, Steve Kass <skass@drew.edu> wrote:
There was not a correct clustered index, but my line of thought was
similar with using an identity column, (this performed horrible). This
is a far better solution. THANKS.
I did run a slightly changed query :
offset being the starting number.
The query did run very fast less than a second on 3090 rows.
(Statistics : Cumulative client processing time 8).
So for the table of 3.6 million I could make a copy of the table
with just the 3 columns of interrest, make a clustered index,
run the update query on the script and then do an update with
this information on the actual table. (Joining over het two columns)
THANKS FOR PUTTING ME ON THIS TRACK,
this wil be my starting point if for similar issue's in the future.
Thanks again.
ben brugman
>Ben,
>
> There's a trick that has a good chance of working if there is a clustered
>index on (Object_id,date). I think it may also help if Seq_id does not
>participate in other indexes.
>
> The problem is that there is no documentation that this works, so you
>would have to check after trying it to see if it did. But if it turns out to
>be reliable and you do a check that invokes a slower backup plan if it
>fails, it might be worth considering. Here's an example using a smaller
>table with similar structure:
>
>create table T (
> Obj_id int,
> Seq_id int,
> Dat datetime,
> other varchar(1000) default replicate('ABCD',200),
> constraint T_id_date primary key (Obj_id, Dat)
> )
>
>--sample data
>insert into T(Obj_id, Seq_id, Dat)
>select orderid, 0, getdate()-productid
>from northwind..[order details]
>GO
>
>--the update query
>declare @count int
>set @count = 0
>declare @lastObj int
>set @lastObj = -1
>update T
> set @count = Seq_id = case when @lastObj = Obj_id then @count + 1 else 1 end,
> @lastObj = Obj_id
>go
>
>select * from T
>go
>drop table T
>
>Steve Kass
>Drew University
>
>Ben Brugman wrote:
>
>> This is partly a repeat.
>>
>> We had (it is solved now) a problem. During solving it I tried some
>> constructions which did not work. In the end the problem was solved
>> using a Delphi program with an ADO interface. This took about 2
>> hours to run which was acceptable in our situation.
>>
>> My problem (partly psycological) ;
>> I wasn't able to solve the problem within the given time frame using
>> only tools supplied with SQL-server !
>>
>> The problem.
>> An imported table with 3.6 millions rows, about 20 fields, 6 indexes
>> and quite some relations with other tables.
>> Three important fields where
>> Object_id OID
>> Seq_id S_ID
>> date D
>>
>> OID and D together form a unique key. S_ID should be in the order
>> of the date. But during transport and import this order was not
>> kept correctly. So the S_ID had to be 'renumbered', with a specific
>> range.
>>
>> The end solution was the Delphi program running over all OID, doing a
>> SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
>> number then increase the number with one and continue.
>> (The Delphi program was an adaptation from an existing program and
>> took only minutes to build).
>>
>> My Question is :
>> How to solve this kind of problem with only SQL-server tools ?
>> Especially on larger tables. (millions or tens of millions of
>> records).
>>
>> I tried to cluster the table in the correct order (OID,D), and
>> make the S_ID an identity key starting with a correct seed
>> for the range. (Tried this on a small table.) Problems it
>> took too long, and the S_ID didn't get changed. From experience
>> I know that the identity key does not always get created in
>> the order of which the table is clustered. As said because
>> of the time pressure with did the Delphi solution.
>>
>> Thanks in advance
>>
>> Ben Brugman
>
Ben Brugman
| |
| Roy Harvey 2002-06-28, 10:25 am |
| Ben,
I think I understand a bit better now.
With the S_ID unique across the entire table things change a bit. I
would suggest trying the following for the really large table.
1) Create a work table:
SELECT OID, count(*) as Rows, convert(int, 0) as Running
INTO OID_Totals
FROM T
GROUP BY OID
2) Index the work table:
CREATE UNIQUE CLUSTERED INDEX TX ON OID_Totals(OID)
3) Update the work table:
UPDATE OID_Totals
SET Running = COALESCE((select sum(Rows) from OID_Totals as T2
where OID_Totals.OID > T2.OID), 0)
4) Try the query:
UPDATE T
SET S_ID = (select Running from OID_Totals
where T.OID = OID_Totals.OID)
+ (select count(*) from T as T2
where T.OID = T2.OID
and T.D >= T2.D)
Roy
benbrugman@onbekend.nl (Ben Brugman) wrote:
>On Fri, 28 Jun 2002 08:55:24 -0400, Roy Harvey
><RoyHarv@compuserve.com> wrote:
>
>>Ben,
>>
>>Interesting problem, and I appreciate your need for a SQL based
>>solution.
>>
>>There are a couple of points I am not clear on. First, are the
>>duplicate dates within a single OID? This can make a big difference
>>when doing this in SQL. Second, it sounds like the first S_ID within
>>an OID is not 1. If so, how is that initial S_ID determined?
>dates within a single OID are assumed to be unique.
>The range for S_ID is known and goes for all rows.
>There are no duplicates of S_ID. (over the complete tabel)
>
>>
>>Lets look at a simplified case, by assuming that the dates within OID
>>are unique, and S_ID starts at 1 within a given OID. This would
>>benefit by clustering on (OID,D) until S_ID is correct.
>>
>>UPDATE T
>> SET S_ID = (select count(*) from T as T2
>> where T.OID = T2.OID
>> and T.D >= T2.D)
>For the offset of S_ID a constant can be added, so that's no problem.
>(The S_ID should be unique but I didn't really specify, so let's
>ignore this fact)
>
>But would this run efficiently for 3.6 millions rows, because this
>would result in 3.6 million subqueries as wel.
>
>I'll have a go with the above script on my small set.
>Just wait, I'll have a go. Back in a moment.
>; ;
>; ;
>; ;
>OKEE I am back with the results.
>The update ran fast. (For 3090 in the QA under a second).
>Altered the update to
>
>UPDATE T
> SET S_ID = (select count(*) from T as T2
> where T.OID > T2.OID OR
> ( T.OID = T2.OID
> and T.D >= T2.D))+offset-1
>
>This gave exactly the results I was looking for. For 3090 rows
>this took 9 seconds. For 3.6 million rows if this scales that would
>be around 10000 seconds. Probably it will not scale well.
>The addition I made does slow the statement down.
>
>Thanks for you help,
>For next time I'll keep this technique in mind.
>
>ben brugman
>
>>
>>But it could get a LOT more complicated, depending on how innacurate
>>my assumptions were.
>>
>>Roy
>>
>>
>>benbrugman@onbekend.nl (Ben Brugman) wrote:
>>
>>>This is partly a repeat.
>>>
>>>We had (it is solved now) a problem. During solving it I tried some
>>>constructions which did not work. In the end the problem was solved
>>>using a Delphi program with an ADO interface. This took about 2
>>>hours to run which was acceptable in our situation.
>>>
>>>My problem (partly psycological) ;
>>>I wasn't able to solve the problem within the given time frame using
>>>only tools supplied with SQL-server !
>>>
>>>The problem.
>>>An imported table with 3.6 millions rows, about 20 fields, 6 indexes
>>>and quite some relations with other tables.
>>>Three important fields where
>>>Object_id OID
>>>Seq_id S_ID
>>>date D
>>>
>>>OID and D together form a unique key. S_ID should be in the order
>>>of the date. But during transport and import this order was not
>>>kept correctly. So the S_ID had to be 'renumbered', with a specific
>>>range.
>>>
>>>The end solution was the Delphi program running over all OID, doing a
>>>SELECT on each OID with an ORDER on date and UPDATE the S_ID with a
>>>number then increase the number with one and continue.
>>>(The Delphi program was an adaptation from an existing program and
>>>took only minutes to build).
>>>
>>>My Question is :
>>>How to solve this kind of problem with only SQL-server tools ?
>>>Especially on larger tables. (millions or tens of millions of
>>>records).
>>>
>>>I tried to cluster the table in the correct order (OID,D), and
>>>make the S_ID an identity key starting with a correct seed
>>>for the range. (Tried this on a small table.) Problems it
>>>took too long, and the S_ID didn't get changed. From experience
>>>I know that the identity key does not always get created in
>>>the order of which the table is clustered. As said because
>>>of the time pressure with did the Delphi solution.
>>>
>>>Thanks in advance
>>>
>>>Ben Brugman
>>
>
>Ben Brugman
|
|
|
|
|