I have a string that is randomly generated:
polymer_str = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine"
I'd like to find the longest sequence of "diNCO diol" and the longest of "diNCO diamine". So in the case above the longest "diNCO diol" sequence is 1 and the longest "diNCO diamine" is 3.
How would I go about doing this using python's re module?
Thanks in advance.
EDIT:
I mean the longest number of repeats of a given string. So the longest string with "diNCO diamine" is 3:
diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine
Best Answer
Expanding on Ealdwulf's answer:
Documentation on
re.findall
can be found here.This could be written as one line, but it becomes less readable in that form.
Alternative:
If
polymer_str
is huge, it will be more memory efficient to usere.finditer
. Here's how you might go about it:The biggest difference between
findall
andfinditer
is that the first returns a list object, while the second iterates over Match objects. Also, thefinditer
approach will be somewhat slower.