{ "cells": [ { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": false }, "source": [ "# 字符串" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在计算机里,所有的东西最终都要被转换成数值。又由于计算机靠的是电路,所以,最终只能处理 `1` 和 `0`,于是,最基本的数值是二进制;于是,连整数、浮点数字,都要最终转换成二进制数值。这就是为什么在所有编程语言中 `1.1 + 2.2` 并不是你所想象的 `3.3` 的原因。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.3000000000000003" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1.1 + 2.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为最终所有的值都要转换成二进制 —— 这时候,小数的精度就有损耗,多次浮点数字转换成二进制相互运算之后再从二进制转换为十进制之后返回的结果,精度损耗就更大了。因此,在计算机上,浮点数字的精度总有极限。\n", "\n", "字符串也一样。一个字符串由 0 个字符或者多个字符构成,它最终也要被转换成数值,再进一步被转换成二进制数值。空字符串的值是 `None`,即便是这个 `None` —— 也最终还是要被转换成二进制的 `0`。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 字符码表的转换" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "把**单个字符**转换成码值的函数是 `ord()`,它只接收单个字符,否则会报错;它返回该字符的 unicode 编码。与 `ord()` 相对的函数是 `chr()`,它接收且只接收一个整数作为参数,而后返回相应的字符。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "97" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'z'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "27653" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'挊'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "ord('a')\n", "chr(122)\n", "\n", "ord('氅')\n", "chr(25354)\n", "\n", "# ord('Python') # 这一句会报错" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "## 字符串的标示" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标示一个字符串,有 4 种方式,用单引号、用双引号,用三个单引号或者三个双引号:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Simple is better than complex.'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Simple is better than complex.' # 用单引号" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Simple is better than complex.'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"Simple is better than complex.\" # 用双引号" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\nSimple is better than complex.\\nComplex is better than complicated.\\n'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 用三个单引号。注意输出结果中的 \\n\n", "# 这个字符串,看起来是两行,保存在内存或者变量之中的时候,\n", "# 是一整串,其中的换行是用 \\n 表示的。\n", "'''\n", "Simple is better than complex.\n", "Complex is better than complicated.\n", "''' " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\nSimple is better than complex.\\nComplex is better than complicated.\\n'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#用三个双引号。注意输出结果中的 \\n\n", "\"\"\"\n", "Simple is better than complex.\n", "Complex is better than complicated.\n", "\"\"\" " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Simple is better than complex.\n", "Complex is better than complicated.\n", "\n" ] } ], "source": [ "print(\n", "\"\"\"\n", "Simple is better than complex.\n", "Complex is better than complicated.\n", "\"\"\"\n", ") #用 print() 输出的时候,\\n 就是不可见字符,字符串本身如下:\n", "# '\\nSimple is better than complex.\\nComplex is better than complicated.\\n'\n", "# 其中的 \\n 被打印出来的时候显示成换行" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true }, "source": [ "## 字符串与数值之间的转换" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "由数字构成的字符串,可以被转换成数值,转换整数用 `int()`,转换浮点数字用 `float()`。\n", "\n", "与之相对,用 `str()`,可以将数值转换成字符串类型。\n", "\n", "注意,int() 在接收字符串为参数的时候,只能做整数转换。下面代码最后一行会报错:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "3.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'3.1415926'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "int('3')\n", "float('3')\n", "str(3.1415926)\n", "# int('3.1415926') # 这一行会报错" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`input()` 这个内建函数的功能是接收用户的键盘输入,而后将其作为字符串返回。它可以接收一个字符串作为参数,在接收用户键盘输入之前,会把这个参数输出到屏幕,作为给用户的提示语。这个参数是可选参数,直接写 `input()`,即,没有提供参数,那么它在要求用户输入的时候,就没有提示语。\n", "\n", "以下代码会报错,因为 `age < 18` 不是合法的逻辑表达式,因为 `age` 是由 `input()` 传递过来的字符串;于是,它不是数字,那么它不可以与数字比较……" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Please tell me your age: 19\n" ] }, { "ename": "TypeError", "evalue": "'<' not supported between instances of 'str' and 'int'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Please tell me your age: '\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0mage\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m18\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'I can not sell you drinks...'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Have a nice drink!'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: '<' not supported between instances of 'str' and 'int'" ] } ], "source": [ "age = input('Please tell me your age: ')\n", "if age < 18:\n", " print('I can not sell you drinks...')\n", "else:\n", " print('Have a nice drink!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "要改成这样才可能行:\n", "为什么是可能行而不是一定行?如果用户 `input` 键盘输入的是 `eighteen` 或者 ` 十八 ` 等,依然会导致 `int()` 失败并得到 `ValueError` 的报错。用户输入的不可控,可能会导致千奇百怪的报错。但在这里,我们先简化处理,在引导语中加入一个正确的示例并默认用户会按引导语正确输入。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "KeyboardInterrupt", "evalue": "Interrupted by user", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[4], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m age \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mint\u001b[39m(\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'''\u001b[39;49m\u001b[38;5;124;43mPlease tell me your age: \u001b[39;49m\n\u001b[0;32m 2\u001b[0m \u001b[38;5;124;43m an int number , e.g: 22\u001b[39;49m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;124;43m'''\u001b[39;49m\u001b[43m)\u001b[49m)\n\u001b[0;32m 4\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m age \u001b[38;5;241m<\u001b[39m \u001b[38;5;241m18\u001b[39m:\n\u001b[0;32m 5\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mI can not sell you drinks...\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", "File \u001b[1;32m~\\AppData\\Roaming\\Python\\Python311\\site-packages\\ipykernel\\kernelbase.py:1270\u001b[0m, in \u001b[0;36mKernel.raw_input\u001b[1;34m(self, prompt)\u001b[0m\n\u001b[0;32m 1268\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraw_input was called, but this frontend does not support input requests.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 1269\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m StdinNotImplementedError(msg)\n\u001b[1;32m-> 1270\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_input_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 1271\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mprompt\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 1272\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_parent_ident\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mshell\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 1273\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_parent\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mshell\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 1274\u001b[0m \u001b[43m \u001b[49m\u001b[43mpassword\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[0;32m 1275\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32m~\\AppData\\Roaming\\Python\\Python311\\site-packages\\ipykernel\\kernelbase.py:1313\u001b[0m, in \u001b[0;36mKernel._input_request\u001b[1;34m(self, prompt, ident, parent, password)\u001b[0m\n\u001b[0;32m 1310\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m:\n\u001b[0;32m 1311\u001b[0m \u001b[38;5;66;03m# re-raise KeyboardInterrupt, to truncate traceback\u001b[39;00m\n\u001b[0;32m 1312\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInterrupted by user\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m-> 1313\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m(msg) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m 1314\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m 1315\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlog\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalid Message:\u001b[39m\u001b[38;5;124m\"\u001b[39m, exc_info\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n", "\u001b[1;31mKeyboardInterrupt\u001b[0m: Interrupted by user" ] } ], "source": [ "age = int(input('''Please tell me your age: \n", " an int number , e.g: 22\n", "'''))\n", "if age < 18:\n", " print('I can not sell you drinks...')\n", "else:\n", " print('Have a nice drink!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 转义符" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有一个重要的字符,叫做 “转义符”,`\\`,也有的地方把它称为 “脱字符”,因为它的英文原文是 _Escaping Character_。它本身不被当作字符,你要想在字符串里含有这个字符,得这样写 `\\\\`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\\\'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'\\\\'" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "EOL while scanning string literal (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m '\\'\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m EOL while scanning string literal\n" ] } ], "source": [ "'\\'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果你想输出这么个字符串,`He said, it's fine.`,如果用双引号扩起来 `\"` 倒没啥问题,但是如果用单引号扩起来就麻烦了,因为编译器会把 `it` 后面的那个单引号 `'` 当作字符串结尾。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 'He said, it's fine.'\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "'He said, it's fine.'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "于是你就得用转义符 `\\`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"He said, it's fine.\"" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "\"He said, it's fine.\"" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "\"He said, it's fine.\"" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "# 要么你这么写:\n", "'He said, it\\'s fine.'\n", "# 要么你这么写:\n", "\"He said, it's fine.\"\n", "# 要么,不管用单引号还是双引号标示字符串,都习惯于用 \\' 和 \\\" 书写属于字符串内部的引号……\n", "\"He said, it\\'s fine.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "转义符号 `\\` 的另外两个常用形式是和 `t`、`n` 连起来用,`\\t` 代表制表符(就是用 TAB `⇥` 键敲出来的东西),`\\n` 代表换行符(就是用 Enter `⏎` 敲出来的东西)。\n", "\n", "所以,一个字符串,有两种形式,**raw** 和 **presentation**,在后者中,`\\t` 被转换成制表符,`\\n` 被转换成换行。\n", "\n", "在写程序的过程中,我们在代码中写的是 _raw_,而例如当我们调用 `print()` 将字符串输出到屏幕上时,是 _presentation_:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "He said, it's fine.\n" ] } ], "source": [ "s = \"He said, it\\'s fine.\" # raw\n", "print(s) # presentation" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "## 字符串的操作符" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "字符串可以用空格 `' '` 或者 `+` 拼接:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hey! You!'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Hey!' + ' ' + 'You!' # 使用操作符 +" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hey!You!'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Hey!' 'You!' # 空格与 + 的作用是相同的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "字符串还可以与整数倍操作符 `*` 操作,`'Ha' * 3` 的意思是说,把字符串 `'Ha'` 复制三遍:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'HaHaHa'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Ha' * 3" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3.143.143.14'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'3.14' * 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "字符串还可以用 `in` 和 `not in` 操作符 —— 看看某个字符或者字符串是否被包含在某个字符串中,返回的是布尔值:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'o' in 'Hey, You!'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 字符串的索引" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "字符串是由一系列的字符构成的。在 Python 当中,有一个容器(Container)的概念,这个概念前面提到过,后面还会深入讲解。现在需要知道的是,字符串是容器的一种;容器可分为两种,有序的和无序的 —— 字符串属于**有序容器**。\n", "\n", "字符串里的每个字符,对应着一个从 `0` 开始的索引。比较有趣的是,索引可以是负数:\n", "\n", "\n", "| 0 | 1 | 2 | 3 | 4 | 5 |\n", "| ---- | ---- | ---- | ---- | ---- | ---- |\n", "| P | y | t | h | o | n |\n", "| -6 | -5 | -4 | -3 | -2 | -1 |" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 P\n", "1 y\n", "2 t\n", "3 h\n", "4 o\n", "5 n\n" ] } ], "source": [ "s = 'Python'\n", "for char in s:\n", " print(s.index(char), char)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于有序容器中的元素 —— 字符串就是字符的有序容器 —— 由于它们是有索引的,所以我们可以根据索引提取容器中的值,你可以把 `[]` 当作是有序容器的操作符之一,我们姑且将其称为 “*索引操作符*”。注意以下代码第 3 行中,`s` 后面的 `[]`,以及里面的变量 `i`:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P\n", "y\n", "t\n", "h\n", "o\n", "n\n" ] } ], "source": [ "s = 'Python'\n", "for i in range(len(s)):\n", " print(s[i])\n", "\n", "#上面的代码仅是为了演示索引操作符的使用,更简洁的写法是:\n", "for i in s:\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以使用*索引操作符*根据*索引**提取**字符串这个*有序容器*中的*一个或多个元素,即,其中的字符或字符串。这个 “提取” 的动作有个专门的术语,叫做 “Slicing”(切片)。索引操作符 `[]` 中可以有一个、两个或者三个整数参数,如果有两个参数,需要用 `:` 隔开。它最终可以写成以下 4 种形式:\n", "\n", "> * `s[index]` —— 返回索引值为 `index` 的那个字符\n", "> * `s[start:]` —— 返回从索引值为 `start` 开始一直到字符串末尾的所有字符\n", "> * `s[start:stop]` —— 返回从索引值为 `start` 开始一直到索引值为 `stop` 的那个字符*之前*的所有字符\n", "> * `s[:stop]` —— 返回从字符串开头一直到索引值为 `stop` 的那个字符*之前*的所有字符\n", "> * `s[start:stop:step]` —— 返回从索引值为 `start` 开始一直到索引值为 `stop` 的那个字符*之前*的,以 `step` 为步长提取的所有字符" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'y'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'thon'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'tho'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Pytho'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'yh'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = 'Python'\n", "s[1]\n", "s[2:]\n", "s[2:5]\n", "s[:5]\n", "s[1:5:2]" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "## 处理字符串的内建函数" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "[Python 内建函数](https://docs.python.org/3/library/functions.html#slice)中,把字符串当做处理对象的有:`ord()`、`input()`、`int()`、`float()`、`len()`、`print()`。再次注意,`ord()` 只接收单个字符为参数。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "9" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "13" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'A'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "请照抄一遍这个数字 3.14: 3.14\n" ] }, { "data": { "text/plain": [ "3" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "28.26" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "4" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "3.143.143.14\n" ] } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "ord('\\n')\n", "ord('\\t')\n", "ord('\\r')\n", "chr(65) # 与 ord() 相对的函数\n", "s = input('请照抄一遍这个数字 3.14: ')\n", "int('3')\n", "# int(s) 这一句会报错…… 所以暂时注释掉了\n", "float(s) * 9\n", "len(s)\n", "print(s*3)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "toc-hr-collapsed": true }, "source": [ "## 处理字符串的 Method" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "在 Python 中,字符串是一个**对象** —— 更准确地讲,是 str 类(`Class str`)的对象。\n", "\n", "调用 str 类的 Methods 是使用 `.` 这个符号,比如:\n", "\n", "```python\n", "'Python'.upper()\n", "```" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "toc-hr-collapsed": false }, "source": [ "### 大小写转换" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "转换字符串大小写的是 `str.upper()`、`str.lower()`;另外,还有专门针对行首字母大写的 `str.capitalize()` 和针对每个词的首字母大写的 `str.title()`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'NOW IS BETTER THAN NEVER.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'now is better than never.'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "'Now is better than never.'.upper()\n", "\n", "'Now is better than never.'.lower()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Now is better than never.'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Now Is Better Than Never.'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = 'Now is better than never.'\n", "s.capitalize() # 句首字母大写\n", "s.title() # 每个单词首字母大写" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'nOW IS BETTER THAN NEVER.'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Now Is Better Than Never.'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'nOW iS bETTER tHAN nEVER.'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = 'Now is better than never.'\n", "s.swapcase() # 逐个字符更替大小写\n", "s.title() \n", "s.title().swapcase() " ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "另外,还有个 `str.encode()` 在处理非英文字符串(比如中文)的时候,经常会用到:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'\\xe7\\xae\\x80\\xe5\\x8d\\x95\\xe4\\xbc\\x98\\xe4\\xba\\x8e\\xe5\\xa4\\x8d\\xe6\\x9d\\x82\\xe3\\x80\\x82'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# str.encode(encoding=\"utf-8\", errors=\"strict\")\n", "# 关于更多可能的 encoding list, 请参阅:\n", "# https://docs.python.org/3/library/codecs.html#standard-encodings\n", "s = '简单优于复杂。'\n", "s.encode()" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### 搜索与替换" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "让我们从 `str.count()` 这个搜寻子字符串出现次数的 Method(即,`str` 这个 `Class` 中定义的函数)开始。\n", "\n", "官方文档是这么写的:\n", "\n", "> `str.count(sub[,start[,end]])`\n", "\n", "下面的函数说明加了默认值,以便初次阅读更容易理解:\n", "\n", "> `str.count(sub [,start=0[, end=len(str)]])`\n", "\n", "这里的方括号 `[]` 表示该参数可选;方括号里再次嵌套了一个方括号,这个意思是说,在这个可选参数 `start` 出现的情况下,还可以再有一个可选参数 `end`;\n", "\n", "而 `=` 表示该参数有个默认值。\n", "\n", "> * 只给定 `sub` 一个参数的话,于是从第一个字符开始搜索到字符串结束;\n", "> * 如果,随后给定了一个可选参数的话,那么它是 `start`,于是从 `start` 开始,搜索到字符串结束;\n", "> * 如果 `start` 之后还有参数的话,那么它是 `end`;于是从 `start` 开始,搜索到 `end - 1` 结束(即不包含索引值为 `end` 的那个字符)。\n", "> \n", "> 返回值为字符串中 `sub` 出现的次数。\n", "\n", "注意:字符串中第一个字符的索引值是 `0`。\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "' '" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "3" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = \"\"\"Simple is better than complex.\n", "Complex is better than complicated.\"\"\"\n", "s.lower().count('mp')\n", "s[6]\n", "s.lower().count('mp', 10)\n", "s.lower().count('mp', 10, 30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "以下是 `str` 的搜索与替换的 Methods:`str.find()`, `str.rfind()`, `str.index()` 的示例:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Example of str.find():\n" ] }, { "data": { "text/plain": [ "2" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "24" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "-1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Example of str.rfind():\n" ] }, { "data": { "text/plain": [ "56" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "56" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "-1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Example of str.index():\n" ] }, { "data": { "text/plain": [ "2" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "56" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "# str.find(sub[, start[, end]])\n", "print('Example of str.find():')\n", "s = \"\"\"Simple is better than complex.\n", "Complex is better than complicated.\"\"\"\n", "s.lower().find('mpl')\n", "s.lower().find('mpl', 10)\n", "s.lower().find('mpl', 10, 20) # 没有找到就返回 -1\n", "print()\n", "\n", "print('Example of str.rfind():')\n", "# str.rfind(sub[, start[, end]])\n", "# rfind() 返回最后 sub 出现的那次的位置;find()是最早的那次\n", "s.lower().rfind('mpl')\n", "s.lower().rfind('mpl', 10)\n", "s.lower().rfind('mpl', 10, 20) # 没有找到就返回 -1\n", "print()\n", "\n", "print('Example of str.index():')\n", "# str.index(sub[, start[, end]])\n", "# 作用与 find() 相同,但如果没找到的话,会触发 ValueError 异常\n", "# https://docs.python.org/3/library/exceptions.html#ValueError\n", "s.lower().index('mpl')\n", "# str.rindex(sub[, start[, end]])\n", "# 作用与 rfind() 相同,但如果没找到的话,会触发 ValueError 异常\n", "s.lower().rindex('mpl')\n", "print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`str.startswith()` 和 `str.endswith()` 是用来判断一个*字符串*是否以某个*子字符串*起始或者结束的:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s.lower().startswith('S'): False\n", "s.lower().startswith('b', 10): True\n", "s.lower().startswith('e', 11, 20): True\n", "s.lower().endswith('.'): True\n", "s.lower().endswith('.', 10): True\n", "s.lower().endswith('.', 10, 20): False\n", "s.lower().startswith('S'): False\n" ] } ], "source": [ "s = \"\"\"Simple is better than complex.\n", "Complex is better than complicated.\"\"\"\n", "\n", "# str.startswith(prefix[, start[, end]])\n", "print(\"s.lower().startswith('S'):\", \\\n", " s.lower().startswith('S'))\n", "print(\"s.lower().startswith('b', 10):\", \\\n", " s.lower().startswith('b', 10))\n", "print(\"s.lower().startswith('e', 11, 20):\", \\\n", " s.lower().startswith('e', 11, 20))\n", "\n", "# str.endswith(suffix[, start[, end]])\n", "print(\"s.lower().endswith('.'):\", \\\n", " s.lower().endswith('.'))\n", "print(\"s.lower().endswith('.', 10):\", \\\n", " s.lower().endswith('.', 10))\n", "print(\"s.lower().endswith('.', 10, 20):\", \\\n", " s.lower().endswith('.', 10, 20))\n", "\n", "\n", "print(\"s.lower().startswith('S'):\", \n", " s.lower().startswith('S'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "为了找到位置而进行搜索之前,你可能经常需要事先确认需要寻找的字符串在寻找对象中是否存在,这个时候,可以用 `in` 操作符:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "s = \"\"\"Simple is better than complex.\n", "Complex is better than complicated.\"\"\"\n", "# 如果你只想知道 “有没有”,而无需知道 “在哪里”,那么可以用:\n", "print('mpl' in s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "能搜索,就应该能替换 —— `str.replace()`,它的函数说明是这样的:\n", "\n", "> `str.replace(old, new[, count])`\n", "\n", "用 `new` 替换 `old`,替换 `count` 个实例(实例:example,每次处理的对象就是实例,即具体的操作对象),其中,`count` 这个参数是可选的。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s.lower().replace('mp', '[ ]', 2):\n", "\n", "si[ ]le is better than co[ ]lex.\n", "complex is better than complicated.\n" ] } ], "source": [ "s = \"\"\"Simple is better than complex.\n", "Complex is better than complicated.\"\"\"\n", "\n", "# str.replace(old, new[, count])\n", "print(\"s.lower().replace('mp', '[ ]', 2):\\n\")\n", "print(s.lower().replace('mp', '[ ]', 2))" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "另外,还有个专门替换 TAB(`\\t`)的 Method,\n", "\n", "> `str.expandtabs( tabsize=8)` \n", "\n", "它的作用非常简单,就是把字符串中的 TAB(`\\t`)替换成空格,默认是替换成 `8` 个空格 —— 当然你也可以指定究竟替换成几个空格" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"Special cases aren't special enough to break the rules.\"" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "\"Special cases aren't special enough to break the rules.\"" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "# str.expandtabs(tabsize=8)\n", "s = \"Special\\tcases\\taren't\\tspecial\\tenough\\tto\\tbreak\\tthe\\trules.\"\n", "s.expandtabs()\n", "s.expandtabs(2)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### 去除子字符" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> `str.strip([chars])`\n", "\n", "它最常用的场景是去除一个字符串首尾的所有空白,包括空格、TAB、换行符等等。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "button": false, "collapsed": true, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "'\\t Simple is better than complex. \\t \\n'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "\t Simple is better than complex. \t \n", "\n" ] }, { "data": { "text/plain": [ "'Simple is better than complex.'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = \"\\t Simple is better than complex. \\t \\n\"\n", "s\n", "print(s)\n", "s.strip()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "但是,如果给定了一个字符串作为参数,那么参数字符串中的所有字母都会被当做需要从首尾剔除的对象,直到新的首尾字母不包含在参数中,就会停止剔除:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Simple is better than complex.'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'mple is better than comple'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "' is better than co'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = \"Simple is better than complex.\"\n", "s\n", "s.strip('Six.p') # p 全部处理完之后,p 并不在首尾,所以原字符串中的 p 字母不受影响;\n", "s.strip('pSix.mle') # 这一次,首尾的 p 被处理了…… 参数中的字符顺序对结果没有影响,换成 Sipx.mle 也一样……" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以只对左侧处理,`str.lstrip()` 或者只对右侧处理,`str.rstrip()`" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "button": false, "collapsed": true, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "'Simple is better than complex.'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'mple is better than complex.'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "' is better than complex.'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "# str.lstrip([chars])\n", "s = \"Simple is better than complex.\"\n", "s\n", "s.lstrip('Six.p') # p 全部处理完之后,p 并不在首部,所以原字符串中的 p 字母不受影响;\n", "s.lstrip('pSix.mle') # 这一次,首部的 p 被处理了…… 参数中的字符顺序对结果没有影响,换成 Sipx.mle 也一样……" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "button": false, "collapsed": true, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "'Simple is better than complex.'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Simple is better than comple'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Simple is better than co'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "# str.rstrip([chars])\n", "s = \"Simple is better than complex.\"\n", "s\n", "s.rstrip('Six.p') # p 全部处理完之后,p 并不在尾部,所以原字符串中的 p 字母不受影响;\n", "s.rstrip('pSix.mle') # 这一次,尾部的 p 被处理了…… 参数中的字符顺序对结果没有影响,换成 Sipx.mle 也一样……" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### 拆分字符串" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在计算机里,数据一般保存在文件之中。计算机擅长处理的是 “格式化数据”,即,这些数据按照一定的格式排列 —— 电子表格、数据库,就是一种保存方式。Microsoft 的 Excel 和 Apple 的 Numbers,都可以将表格导出为 `.csv` 文件。这是文本文件,里面的每一行可能由多个数据构成,数据之间用 `,`(或 `;`、`\\t`)分隔:\n", "\n", "```text\n", "Name,Age,Location\n", "John,18,New York\n", "Mike,22,San Francisco\n", "Janny,25,Miami\n", "Sunny,21,Shanghai\n", "```\n", "\n", "文本文件中的这样一段内容,被读进来之后,保存在某个变量,那么,那个变量的值长成这个样子:\n", "\n", "> `'Name,Age,Location\\nJohn,18,New York\\nMike,22,San Francisco\\nJanny,25,Miami\\nSunny,21,Shanghai'`" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Name,Age,Location\\nJohn,18,New York\\nMike,22,San Francisco\\nJanny,25,Miami\\nSunny,21,Shanghai'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Name,Age,Location',\n", " 'John,18,New York',\n", " 'Mike,22,San Francisco',\n", " 'Janny,25,Miami',\n", " 'Sunny,21,Shanghai']" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = \"\"\"Name,Age,Location\n", "John,18,New York\n", "Mike,22,San Francisco\n", "Janny,25,Miami\n", "Sunny,21,Shanghai\"\"\"\n", "\n", "s # s 被打印出来的时候,\\n 都被转换成换行了\n", "s.splitlines() # 注意输出结果前后的方括号,[],表示这个返回结果是一个 List" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`str.split()`, 是将一个字符串,根据分隔符进行拆分:\n", "\n", "> `str.split(sep=None, maxsplit=-1)`" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Mike,22,San Francisco'" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Mike,22,San', 'Francisco']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Mike', '22', 'San Francisco']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Mike', '22', 'San Francisco']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Mike', '22,San Francisco']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Mike,22,San Francisco']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "['Mike', '22', 'San Francisco']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = \"\"\"Name,Age,Location\n", "John,18,New York\n", "Mike,22,San Francisco\n", "Janny,25,Miami\n", "Sunny,21,Shanghai\"\"\"\n", "\n", "r = s.splitlines()[2] # 取出返回列表中索引值为 2 的那一行\n", "r\n", "r.split() # 如果没有给 str.split() 传递参数,那么默认为用 None 分割(各种空白,比如,\\t 和 \\r 都被当作 None)\n", "r.split(sep=',') \n", "r.split(',') # 上一行可以这样写。\n", "\n", "r.split(sep=',', maxsplit=1) # 第二个参数指定拆分几次\n", "# r.split(sep=',', 1) # 上一行不能这样写。\n", "r.split(sep=',', maxsplit=0) # 0 次,即不拆分\n", "r.split(sep=',', maxsplit=-1) # 默认值是 -1,拆分全部" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### 拼接字符串" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Python'" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = ''\n", "t = ['P', 'y', 't', 'h', 'o', 'n']\n", "s.join(t)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### 字符串排版" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将字符串居中放置 —— 前提是设定整行的长度:\n", "\n", "> `str.center(width[, fillchar])`\n", "\n", "注意,第 2 个参数可选,且只接收单个字符 —— `char` 是 _character_ 的缩写。" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "' Sparse Is Better Than Dense! '" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'================Sparse Is Better Than Dense!================'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Sparse Is Better Than Dense!'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "' Sparse Is Better Than Dense!'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'................................Sparse Is Better Than Dense!'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "s = 'Sparse is better than dense!'\n", "s.title().center(60)\n", "s.title().center(60, '=')\n", "s.title().center(10) # 如果宽度参数小于字符串长度,则返回原字符串\n", "\n", "\n", "s = 'Sparse is better than dense!'\n", "s.title().rjust(60)\n", "s.title().rjust(60, '.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将字符串靠左或者靠右对齐放置:\n", "\n", "> * `str.ljust(width)`\n", "> * `str.rjust(width)`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "另外,还有个字符串 Method 是,将字符串转换成左侧由 `0` 填充的指定长度字符串。例如,这在批量生成文件名的时候就很有用……" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "001.mp3\n", "002.mp3\n", "003.mp3\n", "004.mp3\n", "005.mp3\n", "006.mp3\n", "007.mp3\n", "008.mp3\n", "009.mp3\n", "010.mp3\n" ] } ], "source": [ "for i in range(1, 11):\n", " filename = str(i).zfill(3) + '.mp3'\n", " print(filename)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "toc-hr-collapsed": true }, "source": [ "### 格式化字符串" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "所谓对字符串进行格式化,指的是将特定变量插入字符串特定位置的过程。常用的 Methods 有两个,一个是 `str.format()`,另外一个是 `f-string`。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 使用 str.format() " ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'John is 25 years old.'" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'Are you John? :-{+}'" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'John is a grown up? True'" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "name = 'John'\n", "age = 25\n", "'{} is {} years old.'.format(name, age)\n", "# 不写占位符索引就默认每个占位符的索引从第一个开始是 0, 1, 2 ...(占位符数量 - 1)\n", "# '{} {}'.format(a, b) 和 '{0} {1}'.format(a, b) 是一样的。\n", "\n", "# '{0} is {2} years old.'.format(name, age)\n", "# 这一句会报错,因为 2 超出实际参数索引极限\n", "\n", "# 两个连续使用的大括号,不被认为是占位符;且只打印出一对大括号\n", "\"Are you {0}? :-{{+}}\".format(name)\n", "\n", "# \"%s is %d years old.\" % (name, age)\n", "# 上一行这是兼容 Python 2 的老式写法,可以从此忽略……\n", "\n", "# str.format() 里可以直接写表达式……\n", "'{} is a grown up? {}'.format(name, age >= 18)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 使用 f-string" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_f-string_ 与 `str.format()` 的功用差不多,只是写法简洁一些 —— 在字符串标示之前加上一个字母 `f`:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'John is 25 years old.'" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "'John is a grown up? True'" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "# https://docs.python.org/3/library/stdtypes.html#printf-style-bytes-formatting\n", "# f-string\n", "\n", "name = 'John'\n", "age = 25\n", "f'{name} is {age} years old.'\n", "f'{name} is a grown up? {age >= 18}'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "只不过,str.format() 的用法中,索引顺序可以任意指定,于是相对更为灵活,下面的例子只是为了演示参数位置可以任意指定:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'25 is John years old.'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name = 'John'\n", "age = 25\n", "'{1} is {0} years old.'.format(name, age)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "toc-hr-collapsed": false }, "source": [ "### 字符串属性" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "字符串还有一系列的 Methods,返回的是布尔值,用来判断字符串的构成属性:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'1234567890'.isalnum(): True\n", "'abcdefghij'.isalpha(): True\n", "'山巅一寺一壶酒'.isascii(): False\n", "'0.123456789'.isdecimal(): False\n", "'0.123456789'.isdigit(): False\n", "'0.123456789'.isnumeric(): False\n", "'Continue'.islower(): False\n", "'Simple Is Better Than Complex'.isupper(): False\n", "'Simple Is Better Than Complex'.istitle(): True\n", "'\t'.isprintable(): False\n", "'\t'.isspace(): True\n", "'for'.isidentifier(): True\n" ] } ], "source": [ "# str.isalnum()\n", "print(\"'1234567890'.isalnum():\", \\\n", " '1234567890'.isalnum()) # '3.14'.isalnum() 返回的是 False\n", "\n", "# str.isalpha()\n", "print(\"'abcdefghij'.isalpha():\", \\\n", " 'abcdefghij'.isalpha()) \n", "\n", "# str.isascii()\n", "print(\"'山巅一寺一壶酒'.isascii():\", \\\n", " '山巅一寺一壶酒'.isascii())\n", "\n", "# str.isdecimal()\n", "print(\"'0.123456789'.isdecimal():\", \\\n", " '0.1234567890'.isdecimal())\n", "\n", "# str.isdigit()\n", "print(\"'0.123456789'.isdigit():\", \\\n", " '0.1234567890'.isdigit()) # 注意,如果字符串是 identifier,返回值也是 False\n", "\n", "# str.isnumeric()\n", "print(\"'0.123456789'.isnumeric():\", \\\n", " '0.1234567890'.isnumeric())\n", "\n", "# str.islower()\n", "print(\"'Continue'.islower():\", \\\n", " 'Continue'.islower())\n", "\n", "# str.isupper()\n", "print(\"'Simple Is Better Than Complex'.isupper():\", \\\n", " 'Simple Is Better Than Complex'.isupper())\n", "\n", "# str.istitle()\n", "print(\"'Simple Is Better Than Complex'.istitle():\", \\\n", " 'Simple Is Better Than Complex'.istitle())\n", "\n", "# str.isprintable()\n", "print(\"'\\t'.isprintable():\", \\\n", " '\\t'.isprintable())\n", "\n", "# str.isspace()\n", "print(\"'\\t'.isspace():\", \\\n", " '\\t'.isspace())\n", "\n", "# str.isidentifier()\n", "print(\"'for'.isidentifier():\", \\\n", " 'for'.isidentifier())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" }, "toc-autonumbering": true }, "nbformat": 4, "nbformat_minor": 2 }