Python – Spliting a file into lines in Python using re.split

list-comprehensionpythonregex

I'm trying to split a file with a list comprehension using code similar to:

lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]

However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?

Best Solution

Put the regular expression hammer away :-)

  1. You can iterate over a file directly; readlines() is almost obsolete these days.
  2. Read about str.strip() (and its friends, lstrip() and rstrip()).
  3. Don't use file as a variable name. It's bad form, because file is a built-in function.

You can write your code as:

lines = []
f = open(filename)
for line in f:
    if not line.startswith('com'):
        lines.append(line.strip())

If you are still getting blank lines in there, you can add in a test:

lines = []
f = open(filename)
for line in f:
    if line.strip() and not line.startswith('com'):
        lines.append(line.strip())

If you really want it in one line:

lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]

Finally, if you're on python 2.6, look at the with statement to improve things a little more.