About Me

My photo
Working as Technical Lead in CollabNET software private limited.

Friday 18 September, 2009

Removing control characters - python

You want a regex to remove control characters (< chr(32) and > chr(126)) from
strings ie.

line = re.sub(r"[^a-z0-9-';.]", " ", line) # replace all chars NOT A-Z,a-z, 0-9, [-';.] with " "

1. What is the best way to include all the required chars rather than list them all within the r"" ?

You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'

line = re.sub(r'[\x00-\x1f\x7f-\xff]', "", "test ^A ^B testing this again")

No comments: