如何在Microsoft Excel中使用正则表达式（正则表达式）

2018-06-10 01:27:16

如何在Excel中使用正则表达式并利用Excel的强大网格（如设置数据操作）？

单元格内函数返回字符串中的匹配模式或替换值。

Sub循环访问一列数据并提取与相邻单元格的匹配。

什么设置是必要的？

什么是Excel的正则表达式的特殊字符？

我知道正则表达式在许多情况下并不理想（要使用或不使用正则表达式？），因为excel可以使用Left ， Mid ， Right ， Instr类型命令进行类似的操作。

正则表达式用于模式匹配。

要在Excel中使用，请按以下步骤操作：

步骤1 ：将VBA引用添加到“Microsoft VBScript Regular Expressions 5.5”

选择“开发者”标签（我没有这个标签，我该怎么办？）

从“代码”功能区部分选择“Visual Basic”图标

在“Microsoft Visual Basic for Applications”窗口中，从顶部菜单中选择“工具”。

选择“参考”

选中“Microsoft VBScript Regular Expressions 5.5”旁边的复选框以包含在工作簿中。

点击“确定”

第2步 ：定义你的模式

基本定义：

-范围。

例如， az与从a到z的小写字母匹配

例如0-5匹配从0到5的任何数字

[]恰好与这些括号内的一个对象匹配。

例如[a]匹配字母a

例如[abc]匹配单个字母，可以是a，b或c

例如[az]匹配字母表中的任何单个小写字母。

()为返回目的分组不同的匹配。看下面的例子。

{}用于在其之前定义的模式的重复副本的乘数。

例如[a]{2}匹配两个连续的小写字母a： aa

例如[a]{1,3}至少匹配一个和最多三个小写字母a ， aa ， aaa

+匹配之前定义的至少一个或多个模式。

例如a+将匹配连续的a ， aa ， aaa等

? 匹配之前定义的零或一个模式。

例如模式可能存在也可能不存在，但只能匹配一次。

例如[az]? 匹配空字符串或任何单个小写字母。

*匹配之前定义的零个或多个模式。 - 例如通配符可能存在或不存在的模式。 - 例如[az]*匹配空字符串或小写字母串。

. 匹配除换行符之外的任何字符n

例如a. 匹配以字母开头的两个字符串，并以除n以外的任何内容结尾

| OR运算符

例如a|b表示a或b可以匹配。

例如， red|white|orange恰好匹配其中一种颜色。

^ NOT运算符

例如[^0-9]字符不能包含数字

例如[^aA]字符不能是小写的a或大写的A

转义后面的特殊字符（覆盖上面的行为）

例如. ，， ( ， ? ， $ ， ^

锚定模式：

^匹配必须发生在字符串的开头

例如^a第一个字符必须是小写字母a

例如^[0-9]第一个字符必须是数字。

$匹配必须出现在字符串末尾

例如a$最后一个字符必须是小写字母a

优先表：

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |

预定义的字符缩写：

abr    same as       meaning
d     [0-9]         Any single digit
D     [^0-9]        Any single character that's not a digit
w     [a-zA-Z0-9_]  Any word character
W     [^a-zA-Z0-9_] Any non-word character
s     [ rtnf]   Any space character
S     [^ rtnf]  Any non-space character
n     [n]          New line

示例1 ：以宏运行

下面的示例宏查看单元格A1中的值以查看前1或2个字符是否为数字。如果是这样，他们被删除，并显示其余的字符串。如果不是，则会出现一个框，告诉您没有找到匹配项。 12abc单元格A1值将返回abc ， 1abc值将返回abc ， abc123值将返回“未匹配”，因为数字不在字符串的起始位置。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range

    Set Myrange = ActiveSheet.Range("A1")

    If strPattern <> "" Then
        strInput = Myrange.Value

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub

示例2 ：作为内嵌函数运行

此示例与示例1相同，但设置为作为内嵌函数运行。要使用，请将代码更改为：

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String


    strPattern = "^[0-9]{1,3}"

    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

将您的字符串（“12abc”）放在单元格A1 。在单元格B1输入此公式=simpleCellRegex(A1) ，结果将为“abc”。

示例3 ：循环范围

此示例与示例1相同，但循环了一系列单元格。

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range

    Set Myrange = ActiveSheet.Range("A1:A5")

    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value

            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With

            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub

示例4 ：拆分不同的图案

这个例子循环遍历一个范围（ A1 ， A2和A3 ），并查找一个以三位数字开头的字符串，后跟一个字母字符，然后是四位数字。输出通过使用()将模式匹配拆分为相邻的单元格。 $1表示在第一组()内匹配的第一个模式。

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range

    Set Myrange = ActiveSheet.Range("A1:A3")

    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"

        If strPattern <> "" Then
            strInput = C.Value

            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With

            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

结果：

其他模式示例

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceeding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    </[a-zA-Z][0-9]>            Exact non-word character except any single alpha followed by any single digit

要直接在Excel公式中使用正则表达式，以下UDF（用户定义函数）可以提供帮助。它或多或少直接将正则表达式功能作为excel函数公开。

怎么运行的

它需要2-3个参数。

使用正则表达式的文本。

正则表达式。

指定结果外观的格式字符串。它可以包含$0 ， $1 ， $2等等。 $0是整个比赛， $1和up对应于正则表达式中的各个比赛组。默认为$0 。

一些例子

提取电子邮件地址：

=regex("Peter Gordon: some@email.com, 47", "w+@w+.w+")
=regex("Peter Gordon: some@email.com, 47", "w+@w+.w+", "$0")

结果在： some@email.com

提取几个子字符串：

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (d+)$", "E-Mail: $2, Name: $1")

结果于： E-Mail: some@email.com, Name: Peter Gordon

将单个单元格中的组合字符串拆分为多个单元格中的组件：

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (d+)$", "$" & 1)
=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (d+)$", "$" & 2)

结果在： Peter Gordon some@email.com ...

如何使用

要使用此UDF，请执行以下操作（大致基于此Microsoft页面，它们有一些很好的附加信息！）：

在启用了宏的文件（'.xlsm'）中的Excel中，按ALT+F11打开Microsoft Visual Basic for Applications编辑器。

将VBA引用添加到正则表达式库（无耻复制来自波特兰跑步者++的答案）：

点击工具 - >参考（请原谅德国人的截图）

在列表中找到Microsoft VBScript Regular Expressions 5.5并勾选它旁边的复选框。

点击确定。

点击插入模块。如果你给你的模块一个不同的名字，确保模块名称与下面的UDF名称不同（例如命名模块regex Regex和函数regex导致#NAME！错误）。

图标行中的第二个图标 - >模块

在中间的大文本窗口中插入以下内容：

Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
    Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
    Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
    Dim replaceNumber As Integer

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With
    With outputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "$(d+)"
    End With
    With outReplaceRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    Set inputMatches = inputRegexObj.Execute(strInput)
    If inputMatches.Count = 0 Then
        regex = False
    Else
        Set replaceMatches = outputRegexObj.Execute(outputPattern)
        For Each replaceMatch In replaceMatches
            replaceNumber = replaceMatch.SubMatches(0)
            outReplaceRegexObj.Pattern = "$" & replaceNumber

            If replaceNumber = 0 Then
                outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
            Else
                If replaceNumber > inputMatches(0).SubMatches.Count Then
                    'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                    regex = CVErr(xlErrValue)
                    Exit Function
                Else
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                End If
            End If
        Next
        regex = outputPattern
    End If
End Function

保存并关闭Microsoft Visual Basic for Applications编辑器窗口。

对于那些急于求成的人来说，扩展patszim的答案。

打开Excel工作簿。

Alt + F11打开VBA /宏窗口。

在工具和 引用下添加对正则表达式的引用
！[Excel VBA表单添加参考

并选择Microsoft VBScript正则表达式5.5
！[Excel VBA添加正则表达式参考

插入一个新模块（代码需要驻留在模块中，否则它不起作用）。
！[Excel VBA插入代码模块

在新插入的模块中，

添加下面的代码：

Function RegxFunc(strInput As String, regexPattern As String) As String
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .pattern = regexPattern
    End With

    If regEx.Test(strInput) Then
        Set matches = regEx.Execute(strInput)
        RegxFunc = matches(0).Value
    Else
        RegxFunc = "not matched"
    End If
End Function

正则表达式模式放置在其中一个单元格中，并使用绝对引用 。！[Excel正则表达式函数in-cell usage 函数将与其创建的工作簿绑定。
如果需要在不同的工作簿中使用它，请将该函数存储在Personal.XLSB中

链接地址: http://www.djcxy.com/p/29863.html

上一篇: How to use Regular Expressions (Regex) in Microsoft Excel both in

下一篇: DOMDocument::loadHTML(): input conversion failed due to input error