Nearest neighbor - Joining two tables

Hello there! :)

Just want to say, I'm still very very new to messing around with geospatial data - The reason for doing it, is to make a better analysis for my master thesis - So please be patient if you can :)

I have two tables: DAWA and Node

DAWA contains: 800.000 entries

  • ConCatAddress = varchar(255)
  • wgs84koordinat_long = float
  • wgs84_koordinat_lat = float
  • Geog = ([GEOGRAPHY]::STGeomFromText(((('Point('+CONVERT([varchar](32),[wgs84koordinat_long]))+' ')+CONVERT([varchar](32),[wgs84koordinat_lat]))+')',(4326)))

Node contains: 900 entries

  • NodeID = int
  • wgs84_long = float
  • wgs84_lat = float
  • Geog ([GEOGRAPHY]::STGeomFromText(((('Point('+CONVERT([varchar](32),[wgs84_long]))+' ')+CONVERT([varchar](32),[wgs84_lat]))+')',(4326)))

What I need to to, is to find the nearest point from an address in DAWA with an entry from Node and Update DAWA with a column called "NearestPoint" and "distance" with the finding.

Right now, I'm using the SQL in the bottom of this post, but here I have to run i manually 900 times, which most likely will get the same address more than once and will take me forever. As you can see, the SQL below will answer my question: Which addresses is nearest to this point.

What I want it to answer is: Which point, from Node, is nearest to this specific address in DAWA.

SQL:

DECLARE @Lat AS float, @Long AS float, @adresse AS varchar(255);
DECLARE @MaxRange AS int, @MinRange AS int, @RangeDivisor AS float, @RangeUnit AS varchar(10) , @RangeDelimiter AS varchar(2);
SET @MinRange = 0;
SET @MaxRange = 10;
SET @RangeDivisor = 1000;
SET @RangeDelimiter = 4;
SET @RangeUnit = 'KM';

SELECT @Lat = dbo.DAWA.wgs84koordinat_lat, @Long = dbo.DAWA.wgs84koordinat_long, @adresse = dbo.DAWA.ConCatAddress
FROM dbo.DAWA --WHERE postnr = '6700';
WHERE wgs84koordinat_long = '10.1894185761792' AND wgs84koordinat_lat = '56.1940385664007'; --FiberNode lokation (NodeID='8' Adr='8200 Aarhus N Skejbyparken 4 1 1')

WITH Distance AS (
SELECT
  KVHx,
  wgs84koordinat_lat,
  wgs84koordinat_long,
  ConCatAddress,
  Geog.STDistance(geography::Point(@Lat, @Long, 4326))/@RangeDivisor [Dist],
  @RangeUnit [Unit]
FROM dbo.DAWA)

SELECT KVHx,  ConCatAddress AS [Address], wgs84koordinat_lat AS latgrad, wgs84koordinat_long AS longgrad, LEFT(Dist,@RangeDelimiter) AS Dist, Unit
FROM Distance
WHERE Dist <= @MaxRange AND Dist >= @MinRange 
ORDER BY Dist

Can any of you help me? I've tried to make some JOINS, but I can't make it how I want.. ! :(

I've found a picture that maybe explaining it better than I:

Picture showing how I want the end result

August 27th, 2015 12:42pm

Hi Patrick

That is unfortunately the down side of the cross product, you are essentially taking a two table one will 900 records the other with 200 000 and multiplying them together, this results in a temporary table with 180 000 000 records which then needs to be sorted, while not the biggest data set it is certainly not a small one and you seem to have hit a limit on the size of your log file.

You can use the Cursor as Naomi suggests which will loop through them one by one or you can update them in a batches using a while loop (To do this it will be easiest to have an identity column on the DAWA table) 

CREATE TABLE #DAWA
(
  	DawaID int identity(1,1),
	ConcatAddress VARCHAR(255) PRIMARY KEY,
	Geog GEOMETRY,
	NearestNeighborNodeId INT,
	NearestNeighborNodeDistance FLOAT
)

create Table #Node
(
	NodeId INT IDENTITY(1,1),
	Geog GEOMETRY,
)

INSERT INTO #DAWA(ConcatAddress,Geog) VALUES
('Adress 1',GEOMETRY::STGeomFromText('POINT (1 7)', 0)),
('Adress 2',GEOMETRY::STGeomFromText('POINT (9 2)', 0)),
('Adress 3',GEOMETRY::STGeomFromText('POINT (10 0)', 0)),
('Adress 4',GEOMETRY::STGeomFromText('POINT (5 5)', 0)),
('Adress 5',GEOMETRY::STGeomFromText('POINT (1 1)', 0))

INSERT INTO #node(Geog) VALUES
(GEOMETRY::STGeomFromText('POINT (6 2)', 0)),
(GEOMETRY::STGeomFromText('POINT (4 5)', 0)),
(GEOMETRY::STGeomFromText('POINT (8 7)', 0));

DECLARE @MaxValue INT = (select max(DawaID) from #DAWA)
DECLARE @BatchSize INT = 2
DECLARE @Counter INT = 1

WHILE (@Counter <= @MaxValue)
BEGIN
    
    select @Counter,@Counter + @BatchSize 

    ;WITH NearestNeighbor AS
    (
	    SELECT 
		    ROW_NUMBER() OVER (PARTITION BY D.ConcatAddress ORDER BY D.Geog.STDistance(N.Geog) ) [NeighborRank],
		    D.ConcatAddress,
		    N.NodeId [NearestNeighborNodeId],
		    D.Geog.STDistance(N.Geog) [Distance]
	    FROM #DAWA D
	    CROSS JOIN #Node N
      where D.DawaID >= @Counter and D.DawaID < @Counter + @BatchSize 
    )

    UPDATE #DAWA
    SET	NearestNeighborNodeId = NN.[NearestNeighborNodeId],
	    NearestNeighborNodeDistance = nn.[Distance]
    FROM #DAWA D inner join NearestNeighbor NN on NN.ConcatAddress = D.ConcatAddress
    WHERE [NeighborRank] = 1

    --View intermediate results
    --select * from #DAWA

    set @Counter = @Counter + @BatchSize
END

select * from #DAWA

DROP TABLE #DAWA;
DROP TABLE #Node;
you would want to use much larger batch sizes, start with 100 and test higher values until you get a good compromise between speed and transaction size.

Free Windows Admin Tool Kit Click here and download it now
August 29th, 2015 2:35pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics