Quantcast
Channel: What is the correct way to use unicode characters in a python regex - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Answer by Bohemian for What is the correct way to use unicode characters in a python regex

$
0
0

Rather than seek out specific unwanted chars, you could remove everything not wanted:

re.sub('[^\\s!-~]', '', my_str)

This throws away all characters not:

  • whitespace (spaces, tabs, newlines, etc)
  • printable "normal" ascii characters (! is the first printable char and ~ is the last under decimal 128)

You could include more chars if needed - just adjust the character class.


Viewing all articles
Browse latest Browse all 3

Trending Articles





<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>