Assuming no nulls, you GROUP BY
the unique columns, and SELECT
the MIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
In case you have a GUID instead of an integer, you can replace
MIN(RowId)
with
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
The *args
and **kwargs
is a common idiom to allow arbitrary number of arguments to functions as described in the section more on defining functions in the Python documentation.
The *args
will give you all function parameters as a tuple:
def foo(*args):
for a in args:
print(a)
foo(1)
# 1
foo(1,2,3)
# 1
# 2
# 3
The **kwargs
will give you all
keyword arguments except for those corresponding to a formal parameter as a dictionary.
def bar(**kwargs):
for a in kwargs:
print(a, kwargs[a])
bar(name='one', age=27)
# name one
# age 27
Both idioms can be mixed with normal arguments to allow a set of fixed and some variable arguments:
def foo(kind, *args, **kwargs):
pass
It is also possible to use this the other way around:
def foo(a, b, c):
print(a, b, c)
obj = {'b':10, 'c':'lee'}
foo(100,**obj)
# 100 10 lee
Another usage of the *l
idiom is to unpack argument lists when calling a function.
def foo(bar, lee):
print(bar, lee)
l = [1,2]
foo(*l)
# 1 2
In Python 3 it is possible to use *l
on the left side of an assignment (Extended Iterable Unpacking), though it gives a list instead of a tuple in this context:
first, *rest = [1,2,3,4]
first, *l, last = [1,2,3,4]
Also Python 3 adds new semantic (refer PEP 3102):
def func(arg1, arg2, arg3, *, kwarg1, kwarg2):
pass
Such function accepts only 3 positional arguments, and everything after *
can only be passed as keyword arguments.
Note:
- A Python
dict
, semantically used for keyword argument passing, are arbitrarily ordered. However, in Python 3.6, keyword arguments are guaranteed to remember insertion order.
- "The order of elements in
**kwargs
now corresponds to the order in which keyword arguments were passed to the function." - What’s New In Python 3.6
- In fact, all dicts in CPython 3.6 will remember insertion order as an implementation detail, this becomes standard in Python 3.7.
Best Solution
First, simplify the address string by collapsing all whitespace to a single space between each word, and forcing everything to lower case (or upper case if you prefer):
Then, I would strip out things like "st" in "41st Street" or "nd" in "42nd Street":
Note that the second sub() will work with a space between the "2" and the "nd", but I didn't set the first one to do that; because I'm not sure how you can tell the difference between "41 St Ave" and "41 St" (that second one is "41 Street" abbreviated).
Be sure to read all the help for the re module; it's powerful but cryptic.
Then, I would split what you have left into a list of words, and apply the Soundex algorithm to list items that don't look like numbers:
http://en.wikipedia.org/wiki/Soundex
http://wwwhomes.uni-bielefeld.de/gibbon/Forms/Python/SEARCH/soundex.html
Then you can work with the list or join it back to a string as you think best.
The whole idea of the Soundex thing is to handle misspelled addresses. That may not be what you want, in which case just ignore this Soundex idea.
Good luck.