Don’t know if you also stumbled across any following problems, when you design a simple menus/app which needs to take in someone’s email or phone number, some would just enter ‘fuckyou100’ as their name and no proper email nor phone number.

YOU WILL NEED THE FOLLOWING TRICKS!

A few things to note down for validating a name:

Sure, people can use the F word as their first or last name. Numbers and special characters though? Nah, not really.

Space(” “) should be allowed but really depends how you structure your code, if you prompt user to enter their first name and then the last name, space might be a no-go.

Sometime people would enter both upper or lower cases or only one of those but that is not something to error.

I’ve got to show you the re package. You can read all about it here https://docs.python.org/3/library/re.html. We’re gonna turn this problem into a regular expression matching solution.

You can define a function for validating a name input like following:

import re

def validate_name(name):
# Function to validate user input of name
    
    # - Allows multiple letters (both uppercase and lowercase), a space, and multiple letters (both uppercase and lowercase) after the space
    # - Format example: 'Firstname Lastname'

    regex = r'^[a-zA-Z\s]+\s[a-zA-Z\s]+$'

    return bool(re.fullmatch(regex, name))

# test:
# print(validate_name("1234"))

Let’s breakdown the regex in the new function – the regular expression ^[a-zA-Z\s]+\s[a-zA-Z\s]+$:

  1. ^ means the start of the string.
  2. [a-zA-Z\s]+ matches one or more occurrences of letters (both uppercase and lowercase) or whitespace characters. This part makes sure that the name can start with one or more letters.
  3. \s matches a single whitespace character. This ensures that there is exactly one space between the first name and the last name.
  4. $ means the end of the string.

As re.fullmatch() will only return 'None' when there is an unmatch so we use bool to force it return a 'False' (or not if you don’t need it to).

As for phone number, here in New Zealand, we’re looking at 0226666666, or (021)666 6666 or 022 66 666. So, with or without space, maybe even a bracket. Again we don’t expect special characters or letters in here. You can just swap out the regex in the code above with this:

regex = r'^[\d\(\)\s?\+\-]+$'
Explanations (you don’t need to if you already understand the idea)
  1. ^: indicates the start of the string, ensuring that the pattern matches from the beginning of the string.
  2. [\d\(\)\s?\-\+]+:
    • [\d\(\)\s?\-]: This is a character set (enclosed within square brackets [])
      • \d: Matches any digit (equivalent to [0-9]).
      • \( and \) : Matches the opening and closing parentheses, respectively. They are escaped with a backslash (\) because parentheses are special characters in regular expressions and need to be treated as literals.
      • \s?: Matches zero or one whitespace character. The ? makes the whitespace character optional.
      • \-: Matches a hyphen character.
      • \+: Matches the plus sign (+)
    • +: This is a quantifier that matches one or more occurrences of the characters within the character set.
  3. $: again indicates it is at the end of the string.

For validating email address, we could replace with:

regex = r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]$' 
Detail Explanations
  1. ^: start of the string.
  2. [A-Za-z0-9._%+-]+@:
    • [A-Za-z0-9._%+-]+: This is a character set (before the @ symbol).
      • A-Za-z: Matches any uppercase or lowercase letter.
      • 0-9: Matches any digit.
      • ._%+-: Matches the characters period (.), underscore (_), percent sign (%), plus sign (+), and hyphen (-).
    • @: Matches the literal @ symbol.
  3. [A-Za-z0-9.-]+:
    • [A-Za-z0-9.-]+: This is another character set that defines the allowed characters for the domain part of the email address (after the @ symbol).
      • A-Za-z: Matches any uppercase or lowercase letter.
      • 0-9: Matches any digit.
      • .-: Matches the characters period (.) and hyphen (-), which are allowed in domain names.
  4. \.: Matches the literal . character. It’s escaped with a backslash (\) because the period is a special character in regular expressions and needs to be treated as a literal.
  5. [A-Za-z]$: Matches any uppercase or lowercase letter.
  6. $: Anchors the regex at the end of the string, ensuring that the pattern matches until the end of the string.

What are the differences between re.match(), refullmatch() and re.search()? Checkout a quick summary below (Source: Stackoverflow) for the differences between the 3 common used string matching functions from re package:

Quick overview
  • re.match is anchored at the start ^pattern
    • Ensures the string begins with the pattern
  • re.fullmatch is anchored at the start and end of the pattern ^pattern$
    • Ensures the full string matches the pattern, in this case ^ and $ are just carefully added internally
  • re.search is not anchored pattern
    • Ensures the string contains the pattern

A more in-depth comparison of re.match vs re.search can be found here


Leave a Reply

Your email address will not be published. Required fields are marked *