You want a regex to remove control characters (< chr(32) and > chr(126)) from
strings ie.
line = re.sub(r"[^a-z0-9-';.]", " ", line) # replace all chars NOT A-Z,a-z, 0-9, [-';.] with " "
1. What is the best way to include all the required chars rather than list them all within the r"" ?
You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'
line = re.sub(r'[\x00-\x1f\x7f-\xff]', "", "test ^A ^B testing this again")
No comments:
Post a Comment