1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711
|
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Pivoting</title>
<script>// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
// be compatible with the behavior of Pandoc < 2.8).
document.addEventListener('DOMContentLoaded', function(e) {
var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
var i, h, a;
for (i = 0; i < hs.length; i++) {
h = hs[i];
if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6
a = h.attributes;
while (a.length > 0) h.removeAttribute(a[0].name);
}
});
</script>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<style type="text/css">
code {
white-space: pre;
}
.sourceCode {
overflow: visible;
}
</style>
<style type="text/css" data-origin="pandoc">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; }
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; }
code span.at { color: #7d9029; }
code span.bn { color: #40a070; }
code span.bu { color: #008000; }
code span.cf { color: #007020; font-weight: bold; }
code span.ch { color: #4070a0; }
code span.cn { color: #880000; }
code span.co { color: #60a0b0; font-style: italic; }
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; }
code span.do { color: #ba2121; font-style: italic; }
code span.dt { color: #902000; }
code span.dv { color: #40a070; }
code span.er { color: #ff0000; font-weight: bold; }
code span.ex { }
code span.fl { color: #40a070; }
code span.fu { color: #06287e; }
code span.im { color: #008000; font-weight: bold; }
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; }
code span.kw { color: #007020; font-weight: bold; }
code span.op { color: #666666; }
code span.ot { color: #007020; }
code span.pp { color: #bc7a00; }
code span.sc { color: #4070a0; }
code span.ss { color: #bb6688; }
code span.st { color: #4070a0; }
code span.va { color: #19177c; }
code span.vs { color: #4070a0; }
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; }
</style>
<script>
// apply pandoc div.sourceCode style to pre.sourceCode instead
(function() {
var sheets = document.styleSheets;
for (var i = 0; i < sheets.length; i++) {
if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
try { var rules = sheets[i].cssRules; } catch (e) { continue; }
var j = 0;
while (j < rules.length) {
var rule = rules[j];
// check if there is a div.sourceCode rule
if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") {
j++;
continue;
}
var style = rule.style.cssText;
// check if color or background-color is set
if (rule.style.color === '' && rule.style.backgroundColor === '') {
j++;
continue;
}
// replace div.sourceCode by a pre.sourceCode rule
sheets[i].deleteRule(j);
sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
}
}
})();
</script>
<style type="text/css">body {
background-color: #fff;
margin: 1em auto;
max-width: 700px;
overflow: visible;
padding-left: 2em;
padding-right: 2em;
font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
font-size: 14px;
line-height: 1.35;
}
#TOC {
clear: both;
margin: 0 0 10px 10px;
padding: 4px;
width: 400px;
border: 1px solid #CCCCCC;
border-radius: 5px;
background-color: #f6f6f6;
font-size: 13px;
line-height: 1.3;
}
#TOC .toctitle {
font-weight: bold;
font-size: 15px;
margin-left: 5px;
}
#TOC ul {
padding-left: 40px;
margin-left: -1.5em;
margin-top: 5px;
margin-bottom: 5px;
}
#TOC ul ul {
margin-left: -2em;
}
#TOC li {
line-height: 16px;
}
table {
margin: 1em auto;
border-width: 1px;
border-color: #DDDDDD;
border-style: outset;
border-collapse: collapse;
}
table th {
border-width: 2px;
padding: 5px;
border-style: inset;
}
table td {
border-width: 1px;
border-style: inset;
line-height: 18px;
padding: 5px 5px;
}
table, table th, table td {
border-left-style: none;
border-right-style: none;
}
table thead, table tr.even {
background-color: #f7f7f7;
}
p {
margin: 0.5em 0;
}
blockquote {
background-color: #f6f6f6;
padding: 0.25em 0.75em;
}
hr {
border-style: solid;
border: none;
border-top: 1px solid #777;
margin: 28px 0;
}
dl {
margin-left: 0;
}
dl dd {
margin-bottom: 13px;
margin-left: 13px;
}
dl dt {
font-weight: bold;
}
ul {
margin-top: 0;
}
ul li {
list-style: circle outside;
}
ul ul {
margin-bottom: 0;
}
pre, code {
background-color: #f7f7f7;
border-radius: 3px;
color: #333;
white-space: pre-wrap;
}
pre {
border-radius: 3px;
margin: 5px 0px 10px 0px;
padding: 10px;
}
pre:not([class]) {
background-color: #f7f7f7;
}
code {
font-family: Consolas, Monaco, 'Courier New', monospace;
font-size: 85%;
}
p > code, li > code {
padding: 2px 0px;
}
div.figure {
text-align: center;
}
img {
background-color: #FFFFFF;
padding: 2px;
border: 1px solid #DDDDDD;
border-radius: 3px;
border: 1px solid #CCCCCC;
margin: 0 5px;
}
h1 {
margin-top: 0;
font-size: 35px;
line-height: 40px;
}
h2 {
border-bottom: 4px solid #f7f7f7;
padding-top: 10px;
padding-bottom: 2px;
font-size: 145%;
}
h3 {
border-bottom: 2px solid #f7f7f7;
padding-top: 10px;
font-size: 120%;
}
h4 {
border-bottom: 1px solid #f7f7f7;
margin-left: 8px;
font-size: 105%;
}
h5, h6 {
border-bottom: 1px solid #ccc;
font-size: 105%;
}
a {
color: #0033dd;
text-decoration: none;
}
a:hover {
color: #6666ff; }
a:visited {
color: #800080; }
a:visited:hover {
color: #BB00BB; }
a[href^="http:"] {
text-decoration: underline; }
a[href^="https:"] {
text-decoration: underline; }
code > span.kw { color: #555; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #d14; }
code > span.fl { color: #d14; }
code > span.ch { color: #d14; }
code > span.st { color: #d14; }
code > span.co { color: #888888; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #900; font-weight: bold; }
code > span.er { color: #a61717; background-color: #e3d2d2; }
</style>
</head>
<body>
<h1 class="title toc-ignore">Pivoting</h1>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>This vignette describes the use of the new
<code>pivot_longer()</code> and <code>pivot_wider()</code> functions.
Their goal is to improve the usability of <code>gather()</code> and
<code>spread()</code>, and incorporate state-of-the-art features found
in other packages.</p>
<p>For some time, it’s been obvious that there is something
fundamentally wrong with the design of <code>spread()</code> and
<code>gather()</code>. Many people don’t find the names intuitive and
find it hard to remember which direction corresponds to spreading and
which to gathering. It also seems surprisingly hard to remember the
arguments to these functions, meaning that many people (including me!)
have to consult the documentation every time.</p>
<p>There are two important new features inspired by other R packages
that have been advancing reshaping in R:</p>
<ul>
<li><p><code>pivot_longer()</code> can work with multiple value
variables that may have different types, inspired by the enhanced
<code>melt()</code> and <code>dcast()</code> functions provided by the
<a href="https://github.com/Rdatatable/data.table/wiki">data.table</a>
package by Matt Dowle and Arun Srinivasan.</p></li>
<li><p><code>pivot_longer()</code> and <code>pivot_wider()</code> can
take a data frame that specifies precisely how metadata stored in column
names becomes data variables (and vice versa), inspired by the <a href="https://winvector.github.io/cdata/">cdata</a> package by John
Mount and Nina Zumel.</p></li>
</ul>
<p>In this vignette, you’ll learn the key ideas behind
<code>pivot_longer()</code> and <code>pivot_wider()</code> as you see
them used to solve a variety of data reshaping challenges ranging from
simple to complex.</p>
<p>To begin we’ll load some needed packages. In real analysis code, I’d
imagine you’d do with the <code>library(tidyverse)</code>, but I can’t
do that here since this vignette is embedded in a package.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="fu">library</span>(tidyr)</span>
<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span>
<span id="cb1-3"><a href="#cb1-3" tabindex="-1"></a><span class="fu">library</span>(readr)</span></code></pre></div>
</div>
<div id="longer" class="section level2">
<h2>Longer</h2>
<p><code>pivot_longer()</code> makes datasets <strong>longer</strong> by
increasing the number of rows and decreasing the number of columns. I
don’t believe it makes sense to describe a dataset as being in “long
form”. Length is a relative term, and you can only say (e.g.) that
dataset A is longer than dataset B.</p>
<p><code>pivot_longer()</code> is commonly needed to tidy wild-caught
datasets as they often optimise for ease of data entry or ease of
comparison rather than ease of analysis. The following sections show how
to use <code>pivot_longer()</code> for a wide range of realistic
datasets.</p>
<div id="pew" class="section level3">
<h3>String data in column names</h3>
<p>The <code>relig_income</code> dataset stores counts based on a survey
which (among other things) asked people about their religion and annual
income:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a>relig_income</span>
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a><span class="co">#> # A tibble: 18 × 11</span></span>
<span id="cb2-3"><a href="#cb2-3" tabindex="-1"></a><span class="co">#> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`</span></span>
<span id="cb2-4"><a href="#cb2-4" tabindex="-1"></a><span class="co">#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb2-5"><a href="#cb2-5" tabindex="-1"></a><span class="co">#> 1 Agnostic 27 34 60 81 76 137 122</span></span>
<span id="cb2-6"><a href="#cb2-6" tabindex="-1"></a><span class="co">#> 2 Atheist 12 27 37 52 35 70 73</span></span>
<span id="cb2-7"><a href="#cb2-7" tabindex="-1"></a><span class="co">#> 3 Buddhist 27 21 30 34 33 58 62</span></span>
<span id="cb2-8"><a href="#cb2-8" tabindex="-1"></a><span class="co">#> 4 Catholic 418 617 732 670 638 1116 949</span></span>
<span id="cb2-9"><a href="#cb2-9" tabindex="-1"></a><span class="co">#> 5 Don’t k… 15 14 15 11 10 35 21</span></span>
<span id="cb2-10"><a href="#cb2-10" tabindex="-1"></a><span class="co">#> 6 Evangel… 575 869 1064 982 881 1486 949</span></span>
<span id="cb2-11"><a href="#cb2-11" tabindex="-1"></a><span class="co">#> 7 Hindu 1 9 7 9 11 34 47</span></span>
<span id="cb2-12"><a href="#cb2-12" tabindex="-1"></a><span class="co">#> 8 Histori… 228 244 236 238 197 223 131</span></span>
<span id="cb2-13"><a href="#cb2-13" tabindex="-1"></a><span class="co">#> 9 Jehovah… 20 27 24 24 21 30 15</span></span>
<span id="cb2-14"><a href="#cb2-14" tabindex="-1"></a><span class="co">#> 10 Jewish 19 19 25 25 30 95 69</span></span>
<span id="cb2-15"><a href="#cb2-15" tabindex="-1"></a><span class="co">#> # ℹ 8 more rows</span></span>
<span id="cb2-16"><a href="#cb2-16" tabindex="-1"></a><span class="co">#> # ℹ 3 more variables: `$100-150k` <dbl>, `>150k` <dbl>,</span></span>
<span id="cb2-17"><a href="#cb2-17" tabindex="-1"></a><span class="co">#> # `Don't know/refused` <dbl></span></span></code></pre></div>
<p>This dataset contains three variables:</p>
<ul>
<li><code>religion</code>, stored in the rows,</li>
<li><code>income</code> spread across the column names, and</li>
<li><code>count</code> stored in the cell values.</li>
</ul>
<p>To tidy it we use <code>pivot_longer()</code>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a>relig_income <span class="sc">%>%</span> </span>
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="sc">!</span>religion, </span>
<span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="st">"income"</span>, </span>
<span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"count"</span></span>
<span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a> )</span>
<span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a><span class="co">#> # A tibble: 180 × 3</span></span>
<span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a><span class="co">#> religion income count</span></span>
<span id="cb3-9"><a href="#cb3-9" tabindex="-1"></a><span class="co">#> <chr> <chr> <dbl></span></span>
<span id="cb3-10"><a href="#cb3-10" tabindex="-1"></a><span class="co">#> 1 Agnostic <$10k 27</span></span>
<span id="cb3-11"><a href="#cb3-11" tabindex="-1"></a><span class="co">#> 2 Agnostic $10-20k 34</span></span>
<span id="cb3-12"><a href="#cb3-12" tabindex="-1"></a><span class="co">#> 3 Agnostic $20-30k 60</span></span>
<span id="cb3-13"><a href="#cb3-13" tabindex="-1"></a><span class="co">#> 4 Agnostic $30-40k 81</span></span>
<span id="cb3-14"><a href="#cb3-14" tabindex="-1"></a><span class="co">#> 5 Agnostic $40-50k 76</span></span>
<span id="cb3-15"><a href="#cb3-15" tabindex="-1"></a><span class="co">#> 6 Agnostic $50-75k 137</span></span>
<span id="cb3-16"><a href="#cb3-16" tabindex="-1"></a><span class="co">#> 7 Agnostic $75-100k 122</span></span>
<span id="cb3-17"><a href="#cb3-17" tabindex="-1"></a><span class="co">#> 8 Agnostic $100-150k 109</span></span>
<span id="cb3-18"><a href="#cb3-18" tabindex="-1"></a><span class="co">#> 9 Agnostic >150k 84</span></span>
<span id="cb3-19"><a href="#cb3-19" tabindex="-1"></a><span class="co">#> 10 Agnostic Don't know/refused 96</span></span>
<span id="cb3-20"><a href="#cb3-20" tabindex="-1"></a><span class="co">#> # ℹ 170 more rows</span></span></code></pre></div>
<ul>
<li><p>The first argument is the dataset to reshape,
<code>relig_income</code>.</p></li>
<li><p><code>cols</code> describes which columns need to be reshaped. In
this case, it’s every column apart from <code>religion</code>.</p></li>
<li><p><code>names_to</code> gives the name of the variable that will be
created from the data stored in the column names,
i.e. <code>income</code>.</p></li>
<li><p><code>values_to</code> gives the name of the variable that will
be created from the data stored in the cell value,
i.e. <code>count</code>.</p></li>
</ul>
<p>Neither the <code>names_to</code> nor the <code>values_to</code>
column exists in <code>relig_income</code>, so we provide them as
strings surrounded by quotes.</p>
</div>
<div id="billboard" class="section level3">
<h3>Numeric data in column names</h3>
<p>The <code>billboard</code> dataset records the billboard rank of
songs in the year 2000. It has a form similar to the
<code>relig_income</code> data, but the data encoded in the column names
is really a number, not a string.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a>billboard</span>
<span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a><span class="co">#> # A tibble: 317 × 79</span></span>
<span id="cb4-3"><a href="#cb4-3" tabindex="-1"></a><span class="co">#> artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8</span></span>
<span id="cb4-4"><a href="#cb4-4" tabindex="-1"></a><span class="co">#> <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb4-5"><a href="#cb4-5" tabindex="-1"></a><span class="co">#> 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA</span></span>
<span id="cb4-6"><a href="#cb4-6" tabindex="-1"></a><span class="co">#> 2 2Ge+her The … 2000-09-02 91 87 92 NA NA NA NA NA</span></span>
<span id="cb4-7"><a href="#cb4-7" tabindex="-1"></a><span class="co">#> 3 3 Doors D… Kryp… 2000-04-08 81 70 68 67 66 57 54 53</span></span>
<span id="cb4-8"><a href="#cb4-8" tabindex="-1"></a><span class="co">#> 4 3 Doors D… Loser 2000-10-21 76 76 72 69 67 65 55 59</span></span>
<span id="cb4-9"><a href="#cb4-9" tabindex="-1"></a><span class="co">#> 5 504 Boyz Wobb… 2000-04-15 57 34 25 17 17 31 36 49</span></span>
<span id="cb4-10"><a href="#cb4-10" tabindex="-1"></a><span class="co">#> 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2</span></span>
<span id="cb4-11"><a href="#cb4-11" tabindex="-1"></a><span class="co">#> 7 A*Teens Danc… 2000-07-08 97 97 96 95 100 NA NA NA</span></span>
<span id="cb4-12"><a href="#cb4-12" tabindex="-1"></a><span class="co">#> 8 Aaliyah I Do… 2000-01-29 84 62 51 41 38 35 35 38</span></span>
<span id="cb4-13"><a href="#cb4-13" tabindex="-1"></a><span class="co">#> 9 Aaliyah Try … 2000-03-18 59 53 38 28 21 18 16 14</span></span>
<span id="cb4-14"><a href="#cb4-14" tabindex="-1"></a><span class="co">#> 10 Adams, Yo… Open… 2000-08-26 76 76 74 69 68 67 61 58</span></span>
<span id="cb4-15"><a href="#cb4-15" tabindex="-1"></a><span class="co">#> # ℹ 307 more rows</span></span>
<span id="cb4-16"><a href="#cb4-16" tabindex="-1"></a><span class="co">#> # ℹ 68 more variables: wk9 <dbl>, wk10 <dbl>, wk11 <dbl>, wk12 <dbl>,</span></span>
<span id="cb4-17"><a href="#cb4-17" tabindex="-1"></a><span class="co">#> # wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>, wk18 <dbl>,</span></span>
<span id="cb4-18"><a href="#cb4-18" tabindex="-1"></a><span class="co">#> # wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,</span></span>
<span id="cb4-19"><a href="#cb4-19" tabindex="-1"></a><span class="co">#> # wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>,</span></span>
<span id="cb4-20"><a href="#cb4-20" tabindex="-1"></a><span class="co">#> # wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,</span></span>
<span id="cb4-21"><a href="#cb4-21" tabindex="-1"></a><span class="co">#> # wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>, …</span></span></code></pre></div>
<p>We can start with the same basic specification as for the
<code>relig_income</code> dataset. Here we want the names to become a
variable called <code>week</code>, and the values to become a variable
called <code>rank</code>. I also use <code>values_drop_na</code> to drop
rows that correspond to missing values. Not every song stays in the
charts for all 76 weeks, so the structure of the input data force the
creation of unnecessary explicit <code>NA</code>s.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a>billboard <span class="sc">%>%</span> </span>
<span id="cb5-2"><a href="#cb5-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb5-3"><a href="#cb5-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="fu">starts_with</span>(<span class="st">"wk"</span>), </span>
<span id="cb5-4"><a href="#cb5-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="st">"week"</span>, </span>
<span id="cb5-5"><a href="#cb5-5" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"rank"</span>,</span>
<span id="cb5-6"><a href="#cb5-6" tabindex="-1"></a> <span class="at">values_drop_na =</span> <span class="cn">TRUE</span></span>
<span id="cb5-7"><a href="#cb5-7" tabindex="-1"></a> )</span>
<span id="cb5-8"><a href="#cb5-8" tabindex="-1"></a><span class="co">#> # A tibble: 5,307 × 5</span></span>
<span id="cb5-9"><a href="#cb5-9" tabindex="-1"></a><span class="co">#> artist track date.entered week rank</span></span>
<span id="cb5-10"><a href="#cb5-10" tabindex="-1"></a><span class="co">#> <chr> <chr> <date> <chr> <dbl></span></span>
<span id="cb5-11"><a href="#cb5-11" tabindex="-1"></a><span class="co">#> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87</span></span>
<span id="cb5-12"><a href="#cb5-12" tabindex="-1"></a><span class="co">#> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82</span></span>
<span id="cb5-13"><a href="#cb5-13" tabindex="-1"></a><span class="co">#> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72</span></span>
<span id="cb5-14"><a href="#cb5-14" tabindex="-1"></a><span class="co">#> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77</span></span>
<span id="cb5-15"><a href="#cb5-15" tabindex="-1"></a><span class="co">#> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87</span></span>
<span id="cb5-16"><a href="#cb5-16" tabindex="-1"></a><span class="co">#> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94</span></span>
<span id="cb5-17"><a href="#cb5-17" tabindex="-1"></a><span class="co">#> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99</span></span>
<span id="cb5-18"><a href="#cb5-18" tabindex="-1"></a><span class="co">#> 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91</span></span>
<span id="cb5-19"><a href="#cb5-19" tabindex="-1"></a><span class="co">#> 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87</span></span>
<span id="cb5-20"><a href="#cb5-20" tabindex="-1"></a><span class="co">#> 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92</span></span>
<span id="cb5-21"><a href="#cb5-21" tabindex="-1"></a><span class="co">#> # ℹ 5,297 more rows</span></span></code></pre></div>
<p>It would be nice to easily determine how long each song stayed in the
charts, but to do that, we’ll need to convert the <code>week</code>
variable to an integer. We can do that by using two additional
arguments: <code>names_prefix</code> strips off the <code>wk</code>
prefix, and <code>names_transform</code> converts <code>week</code> into
an integer:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" tabindex="-1"></a>billboard <span class="sc">%>%</span> </span>
<span id="cb6-2"><a href="#cb6-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb6-3"><a href="#cb6-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="fu">starts_with</span>(<span class="st">"wk"</span>), </span>
<span id="cb6-4"><a href="#cb6-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="st">"week"</span>, </span>
<span id="cb6-5"><a href="#cb6-5" tabindex="-1"></a> <span class="at">names_prefix =</span> <span class="st">"wk"</span>,</span>
<span id="cb6-6"><a href="#cb6-6" tabindex="-1"></a> <span class="at">names_transform =</span> as.integer,</span>
<span id="cb6-7"><a href="#cb6-7" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"rank"</span>,</span>
<span id="cb6-8"><a href="#cb6-8" tabindex="-1"></a> <span class="at">values_drop_na =</span> <span class="cn">TRUE</span>,</span>
<span id="cb6-9"><a href="#cb6-9" tabindex="-1"></a> )</span></code></pre></div>
<p>Alternatively, you could do this with a single argument by using
<code>readr::parse_number()</code> which automatically strips
non-numeric components:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" tabindex="-1"></a>billboard <span class="sc">%>%</span> </span>
<span id="cb7-2"><a href="#cb7-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb7-3"><a href="#cb7-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="fu">starts_with</span>(<span class="st">"wk"</span>), </span>
<span id="cb7-4"><a href="#cb7-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="st">"week"</span>, </span>
<span id="cb7-5"><a href="#cb7-5" tabindex="-1"></a> <span class="at">names_transform =</span> readr<span class="sc">::</span>parse_number,</span>
<span id="cb7-6"><a href="#cb7-6" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"rank"</span>,</span>
<span id="cb7-7"><a href="#cb7-7" tabindex="-1"></a> <span class="at">values_drop_na =</span> <span class="cn">TRUE</span>,</span>
<span id="cb7-8"><a href="#cb7-8" tabindex="-1"></a> )</span></code></pre></div>
</div>
<div id="many-variables-in-column-names" class="section level3">
<h3>Many variables in column names</h3>
<p>A more challenging situation occurs when you have multiple variables
crammed into the column names. For example, take the <code>who</code>
dataset:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" tabindex="-1"></a>who</span>
<span id="cb8-2"><a href="#cb8-2" tabindex="-1"></a><span class="co">#> # A tibble: 7,240 × 60</span></span>
<span id="cb8-3"><a href="#cb8-3" tabindex="-1"></a><span class="co">#> country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544</span></span>
<span id="cb8-4"><a href="#cb8-4" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb8-5"><a href="#cb8-5" tabindex="-1"></a><span class="co">#> 1 Afghani… AF AFG 1980 NA NA NA NA</span></span>
<span id="cb8-6"><a href="#cb8-6" tabindex="-1"></a><span class="co">#> 2 Afghani… AF AFG 1981 NA NA NA NA</span></span>
<span id="cb8-7"><a href="#cb8-7" tabindex="-1"></a><span class="co">#> 3 Afghani… AF AFG 1982 NA NA NA NA</span></span>
<span id="cb8-8"><a href="#cb8-8" tabindex="-1"></a><span class="co">#> 4 Afghani… AF AFG 1983 NA NA NA NA</span></span>
<span id="cb8-9"><a href="#cb8-9" tabindex="-1"></a><span class="co">#> 5 Afghani… AF AFG 1984 NA NA NA NA</span></span>
<span id="cb8-10"><a href="#cb8-10" tabindex="-1"></a><span class="co">#> 6 Afghani… AF AFG 1985 NA NA NA NA</span></span>
<span id="cb8-11"><a href="#cb8-11" tabindex="-1"></a><span class="co">#> 7 Afghani… AF AFG 1986 NA NA NA NA</span></span>
<span id="cb8-12"><a href="#cb8-12" tabindex="-1"></a><span class="co">#> 8 Afghani… AF AFG 1987 NA NA NA NA</span></span>
<span id="cb8-13"><a href="#cb8-13" tabindex="-1"></a><span class="co">#> 9 Afghani… AF AFG 1988 NA NA NA NA</span></span>
<span id="cb8-14"><a href="#cb8-14" tabindex="-1"></a><span class="co">#> 10 Afghani… AF AFG 1989 NA NA NA NA</span></span>
<span id="cb8-15"><a href="#cb8-15" tabindex="-1"></a><span class="co">#> # ℹ 7,230 more rows</span></span>
<span id="cb8-16"><a href="#cb8-16" tabindex="-1"></a><span class="co">#> # ℹ 52 more variables: new_sp_m4554 <dbl>, new_sp_m5564 <dbl>,</span></span>
<span id="cb8-17"><a href="#cb8-17" tabindex="-1"></a><span class="co">#> # new_sp_m65 <dbl>, new_sp_f014 <dbl>, new_sp_f1524 <dbl>,</span></span>
<span id="cb8-18"><a href="#cb8-18" tabindex="-1"></a><span class="co">#> # new_sp_f2534 <dbl>, new_sp_f3544 <dbl>, new_sp_f4554 <dbl>,</span></span>
<span id="cb8-19"><a href="#cb8-19" tabindex="-1"></a><span class="co">#> # new_sp_f5564 <dbl>, new_sp_f65 <dbl>, new_sn_m014 <dbl>,</span></span>
<span id="cb8-20"><a href="#cb8-20" tabindex="-1"></a><span class="co">#> # new_sn_m1524 <dbl>, new_sn_m2534 <dbl>, new_sn_m3544 <dbl>,</span></span>
<span id="cb8-21"><a href="#cb8-21" tabindex="-1"></a><span class="co">#> # new_sn_m4554 <dbl>, new_sn_m5564 <dbl>, new_sn_m65 <dbl>, …</span></span></code></pre></div>
<p><code>country</code>, <code>iso2</code>, <code>iso3</code>, and
<code>year</code> are already variables, so they can be left as is. But
the columns from <code>new_sp_m014</code> to <code>newrel_f65</code>
encode four variables in their names:</p>
<ul>
<li><p>The <code>new_</code>/<code>new</code> prefix indicates these are
counts of new cases. This dataset only contains new cases, so we’ll
ignore it here because it’s constant.</p></li>
<li><p><code>sp</code>/<code>rel</code>/<code>ep</code> describe how the
case was diagnosed.</p></li>
<li><p><code>m</code>/<code>f</code> gives the gender.</p></li>
<li><p><code>014</code>/<code>1524</code>/<code>2535</code>/<code>3544</code>/<code>4554</code>/<code>65</code>
supplies the age range.</p></li>
</ul>
<p>We can break these variables up by specifying multiple column names
in <code>names_to</code>, and then either providing
<code>names_sep</code> or <code>names_pattern</code>. Here
<code>names_pattern</code> is the most natural fit. It has a similar
interface to <code>extract</code>: you give it a regular expression
containing groups (defined by <code>()</code>) and it puts each group in
a column.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" tabindex="-1"></a>who <span class="sc">%>%</span> </span>
<span id="cb9-2"><a href="#cb9-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb9-3"><a href="#cb9-3" tabindex="-1"></a> <span class="at">cols =</span> new_sp_m014<span class="sc">:</span>newrel_f65,</span>
<span id="cb9-4"><a href="#cb9-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="fu">c</span>(<span class="st">"diagnosis"</span>, <span class="st">"gender"</span>, <span class="st">"age"</span>), </span>
<span id="cb9-5"><a href="#cb9-5" tabindex="-1"></a> <span class="at">names_pattern =</span> <span class="st">"new_?(.*)_(.)(.*)"</span>,</span>
<span id="cb9-6"><a href="#cb9-6" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"count"</span></span>
<span id="cb9-7"><a href="#cb9-7" tabindex="-1"></a> )</span>
<span id="cb9-8"><a href="#cb9-8" tabindex="-1"></a><span class="co">#> # A tibble: 405,440 × 8</span></span>
<span id="cb9-9"><a href="#cb9-9" tabindex="-1"></a><span class="co">#> country iso2 iso3 year diagnosis gender age count</span></span>
<span id="cb9-10"><a href="#cb9-10" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl></span></span>
<span id="cb9-11"><a href="#cb9-11" tabindex="-1"></a><span class="co">#> 1 Afghanistan AF AFG 1980 sp m 014 NA</span></span>
<span id="cb9-12"><a href="#cb9-12" tabindex="-1"></a><span class="co">#> 2 Afghanistan AF AFG 1980 sp m 1524 NA</span></span>
<span id="cb9-13"><a href="#cb9-13" tabindex="-1"></a><span class="co">#> 3 Afghanistan AF AFG 1980 sp m 2534 NA</span></span>
<span id="cb9-14"><a href="#cb9-14" tabindex="-1"></a><span class="co">#> 4 Afghanistan AF AFG 1980 sp m 3544 NA</span></span>
<span id="cb9-15"><a href="#cb9-15" tabindex="-1"></a><span class="co">#> 5 Afghanistan AF AFG 1980 sp m 4554 NA</span></span>
<span id="cb9-16"><a href="#cb9-16" tabindex="-1"></a><span class="co">#> 6 Afghanistan AF AFG 1980 sp m 5564 NA</span></span>
<span id="cb9-17"><a href="#cb9-17" tabindex="-1"></a><span class="co">#> 7 Afghanistan AF AFG 1980 sp m 65 NA</span></span>
<span id="cb9-18"><a href="#cb9-18" tabindex="-1"></a><span class="co">#> 8 Afghanistan AF AFG 1980 sp f 014 NA</span></span>
<span id="cb9-19"><a href="#cb9-19" tabindex="-1"></a><span class="co">#> 9 Afghanistan AF AFG 1980 sp f 1524 NA</span></span>
<span id="cb9-20"><a href="#cb9-20" tabindex="-1"></a><span class="co">#> 10 Afghanistan AF AFG 1980 sp f 2534 NA</span></span>
<span id="cb9-21"><a href="#cb9-21" tabindex="-1"></a><span class="co">#> # ℹ 405,430 more rows</span></span></code></pre></div>
<p>We could go one step further use readr functions to convert the
gender and age to factors. I think this is good practice when you have
categorical variables with a known set of values.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" tabindex="-1"></a>who <span class="sc">%>%</span> </span>
<span id="cb10-2"><a href="#cb10-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb10-3"><a href="#cb10-3" tabindex="-1"></a> <span class="at">cols =</span> new_sp_m014<span class="sc">:</span>newrel_f65,</span>
<span id="cb10-4"><a href="#cb10-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="fu">c</span>(<span class="st">"diagnosis"</span>, <span class="st">"gender"</span>, <span class="st">"age"</span>), </span>
<span id="cb10-5"><a href="#cb10-5" tabindex="-1"></a> <span class="at">names_pattern =</span> <span class="st">"new_?(.*)_(.)(.*)"</span>,</span>
<span id="cb10-6"><a href="#cb10-6" tabindex="-1"></a> <span class="at">names_transform =</span> <span class="fu">list</span>(</span>
<span id="cb10-7"><a href="#cb10-7" tabindex="-1"></a> <span class="at">gender =</span> <span class="sc">~</span> readr<span class="sc">::</span><span class="fu">parse_factor</span>(.x, <span class="at">levels =</span> <span class="fu">c</span>(<span class="st">"f"</span>, <span class="st">"m"</span>)),</span>
<span id="cb10-8"><a href="#cb10-8" tabindex="-1"></a> <span class="at">age =</span> <span class="sc">~</span> readr<span class="sc">::</span><span class="fu">parse_factor</span>(</span>
<span id="cb10-9"><a href="#cb10-9" tabindex="-1"></a> .x,</span>
<span id="cb10-10"><a href="#cb10-10" tabindex="-1"></a> <span class="at">levels =</span> <span class="fu">c</span>(<span class="st">"014"</span>, <span class="st">"1524"</span>, <span class="st">"2534"</span>, <span class="st">"3544"</span>, <span class="st">"4554"</span>, <span class="st">"5564"</span>, <span class="st">"65"</span>), </span>
<span id="cb10-11"><a href="#cb10-11" tabindex="-1"></a> <span class="at">ordered =</span> <span class="cn">TRUE</span></span>
<span id="cb10-12"><a href="#cb10-12" tabindex="-1"></a> )</span>
<span id="cb10-13"><a href="#cb10-13" tabindex="-1"></a> ),</span>
<span id="cb10-14"><a href="#cb10-14" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"count"</span>,</span>
<span id="cb10-15"><a href="#cb10-15" tabindex="-1"></a>)</span></code></pre></div>
<p>Doing it this way is a little more efficient than doing a mutate
after the fact, <code>pivot_longer()</code> only has to transform one
occurence of each name where a <code>mutate()</code> would need to
transform many repetitions.</p>
</div>
<div id="multiple-observations-per-row" class="section level3">
<h3>Multiple observations per row</h3>
<p>So far, we have been working with data frames that have one
observation per row, but many important pivoting problems involve
multiple observations per row. You can usually recognise this case
because name of the column that you want to appear in the output is part
of the column name in the input. In this section, you’ll learn how to
pivot this sort of data.</p>
<p>The following example is adapted from the <a href="https://CRAN.R-project.org/package=data.table/vignettes/datatable-reshape.html">data.table
vignette</a>, as inspiration for tidyr’s solution to this problem.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" tabindex="-1"></a>household</span>
<span id="cb11-2"><a href="#cb11-2" tabindex="-1"></a><span class="co">#> # A tibble: 5 × 5</span></span>
<span id="cb11-3"><a href="#cb11-3" tabindex="-1"></a><span class="co">#> family dob_child1 dob_child2 name_child1 name_child2</span></span>
<span id="cb11-4"><a href="#cb11-4" tabindex="-1"></a><span class="co">#> <int> <date> <date> <chr> <chr> </span></span>
<span id="cb11-5"><a href="#cb11-5" tabindex="-1"></a><span class="co">#> 1 1 1998-11-26 2000-01-29 Susan Jose </span></span>
<span id="cb11-6"><a href="#cb11-6" tabindex="-1"></a><span class="co">#> 2 2 1996-06-22 NA Mark <NA> </span></span>
<span id="cb11-7"><a href="#cb11-7" tabindex="-1"></a><span class="co">#> 3 3 2002-07-11 2004-04-05 Sam Seth </span></span>
<span id="cb11-8"><a href="#cb11-8" tabindex="-1"></a><span class="co">#> 4 4 2004-10-10 2009-08-27 Craig Khai </span></span>
<span id="cb11-9"><a href="#cb11-9" tabindex="-1"></a><span class="co">#> 5 5 2000-12-05 2005-02-28 Parker Gracie</span></span></code></pre></div>
<p>Note that we have two pieces of information (or values) for each
child: their <code>name</code> and their <code>dob</code> (date of
birth). These need to go into separate columns in the result. Again we
supply multiple variables to <code>names_to</code>, using
<code>names_sep</code> to split up each variable name. Note the special
name <code>.value</code>: this tells <code>pivot_longer()</code> that
that part of the column name specifies the “value” being measured (which
will become a variable in the output).</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" tabindex="-1"></a>household <span class="sc">%>%</span> </span>
<span id="cb12-2"><a href="#cb12-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb12-3"><a href="#cb12-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="sc">!</span>family, </span>
<span id="cb12-4"><a href="#cb12-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="fu">c</span>(<span class="st">".value"</span>, <span class="st">"child"</span>), </span>
<span id="cb12-5"><a href="#cb12-5" tabindex="-1"></a> <span class="at">names_sep =</span> <span class="st">"_"</span>, </span>
<span id="cb12-6"><a href="#cb12-6" tabindex="-1"></a> <span class="at">values_drop_na =</span> <span class="cn">TRUE</span></span>
<span id="cb12-7"><a href="#cb12-7" tabindex="-1"></a> )</span>
<span id="cb12-8"><a href="#cb12-8" tabindex="-1"></a><span class="co">#> # A tibble: 9 × 4</span></span>
<span id="cb12-9"><a href="#cb12-9" tabindex="-1"></a><span class="co">#> family child dob name </span></span>
<span id="cb12-10"><a href="#cb12-10" tabindex="-1"></a><span class="co">#> <int> <chr> <date> <chr> </span></span>
<span id="cb12-11"><a href="#cb12-11" tabindex="-1"></a><span class="co">#> 1 1 child1 1998-11-26 Susan </span></span>
<span id="cb12-12"><a href="#cb12-12" tabindex="-1"></a><span class="co">#> 2 1 child2 2000-01-29 Jose </span></span>
<span id="cb12-13"><a href="#cb12-13" tabindex="-1"></a><span class="co">#> 3 2 child1 1996-06-22 Mark </span></span>
<span id="cb12-14"><a href="#cb12-14" tabindex="-1"></a><span class="co">#> 4 3 child1 2002-07-11 Sam </span></span>
<span id="cb12-15"><a href="#cb12-15" tabindex="-1"></a><span class="co">#> 5 3 child2 2004-04-05 Seth </span></span>
<span id="cb12-16"><a href="#cb12-16" tabindex="-1"></a><span class="co">#> 6 4 child1 2004-10-10 Craig </span></span>
<span id="cb12-17"><a href="#cb12-17" tabindex="-1"></a><span class="co">#> 7 4 child2 2009-08-27 Khai </span></span>
<span id="cb12-18"><a href="#cb12-18" tabindex="-1"></a><span class="co">#> 8 5 child1 2000-12-05 Parker</span></span>
<span id="cb12-19"><a href="#cb12-19" tabindex="-1"></a><span class="co">#> 9 5 child2 2005-02-28 Gracie</span></span></code></pre></div>
<p>Note the use of <code>values_drop_na = TRUE</code>: the input shape
forces the creation of explicit missing variables for observations that
don’t exist.</p>
<p>A similar problem problem also exists in the <code>anscombe</code>
dataset built in to base R:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" tabindex="-1"></a>anscombe</span>
<span id="cb13-2"><a href="#cb13-2" tabindex="-1"></a><span class="co">#> x1 x2 x3 x4 y1 y2 y3 y4</span></span>
<span id="cb13-3"><a href="#cb13-3" tabindex="-1"></a><span class="co">#> 1 10 10 10 8 8.04 9.14 7.46 6.58</span></span>
<span id="cb13-4"><a href="#cb13-4" tabindex="-1"></a><span class="co">#> 2 8 8 8 8 6.95 8.14 6.77 5.76</span></span>
<span id="cb13-5"><a href="#cb13-5" tabindex="-1"></a><span class="co">#> 3 13 13 13 8 7.58 8.74 12.74 7.71</span></span>
<span id="cb13-6"><a href="#cb13-6" tabindex="-1"></a><span class="co">#> 4 9 9 9 8 8.81 8.77 7.11 8.84</span></span>
<span id="cb13-7"><a href="#cb13-7" tabindex="-1"></a><span class="co">#> 5 11 11 11 8 8.33 9.26 7.81 8.47</span></span>
<span id="cb13-8"><a href="#cb13-8" tabindex="-1"></a><span class="co">#> 6 14 14 14 8 9.96 8.10 8.84 7.04</span></span>
<span id="cb13-9"><a href="#cb13-9" tabindex="-1"></a><span class="co">#> 7 6 6 6 8 7.24 6.13 6.08 5.25</span></span>
<span id="cb13-10"><a href="#cb13-10" tabindex="-1"></a><span class="co">#> 8 4 4 4 19 4.26 3.10 5.39 12.50</span></span>
<span id="cb13-11"><a href="#cb13-11" tabindex="-1"></a><span class="co">#> 9 12 12 12 8 10.84 9.13 8.15 5.56</span></span>
<span id="cb13-12"><a href="#cb13-12" tabindex="-1"></a><span class="co">#> 10 7 7 7 8 4.82 7.26 6.42 7.91</span></span>
<span id="cb13-13"><a href="#cb13-13" tabindex="-1"></a><span class="co">#> 11 5 5 5 8 5.68 4.74 5.73 6.89</span></span></code></pre></div>
<p>This dataset contains four pairs of variables (<code>x1</code> and
<code>y1</code>, <code>x2</code> and <code>y2</code>, etc) that underlie
Anscombe’s quartet, a collection of four datasets that have the same
summary statistics (mean, sd, correlation etc), but have quite different
data. We want to produce a dataset with columns <code>set</code>,
<code>x</code> and <code>y</code>.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" tabindex="-1"></a>anscombe <span class="sc">%>%</span> </span>
<span id="cb14-2"><a href="#cb14-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb14-3"><a href="#cb14-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="fu">everything</span>(), </span>
<span id="cb14-4"><a href="#cb14-4" tabindex="-1"></a> <span class="at">cols_vary =</span> <span class="st">"slowest"</span>,</span>
<span id="cb14-5"><a href="#cb14-5" tabindex="-1"></a> <span class="at">names_to =</span> <span class="fu">c</span>(<span class="st">".value"</span>, <span class="st">"set"</span>), </span>
<span id="cb14-6"><a href="#cb14-6" tabindex="-1"></a> <span class="at">names_pattern =</span> <span class="st">"(.)(.)"</span></span>
<span id="cb14-7"><a href="#cb14-7" tabindex="-1"></a> )</span>
<span id="cb14-8"><a href="#cb14-8" tabindex="-1"></a><span class="co">#> # A tibble: 44 × 3</span></span>
<span id="cb14-9"><a href="#cb14-9" tabindex="-1"></a><span class="co">#> set x y</span></span>
<span id="cb14-10"><a href="#cb14-10" tabindex="-1"></a><span class="co">#> <chr> <dbl> <dbl></span></span>
<span id="cb14-11"><a href="#cb14-11" tabindex="-1"></a><span class="co">#> 1 1 10 8.04</span></span>
<span id="cb14-12"><a href="#cb14-12" tabindex="-1"></a><span class="co">#> 2 1 8 6.95</span></span>
<span id="cb14-13"><a href="#cb14-13" tabindex="-1"></a><span class="co">#> 3 1 13 7.58</span></span>
<span id="cb14-14"><a href="#cb14-14" tabindex="-1"></a><span class="co">#> 4 1 9 8.81</span></span>
<span id="cb14-15"><a href="#cb14-15" tabindex="-1"></a><span class="co">#> 5 1 11 8.33</span></span>
<span id="cb14-16"><a href="#cb14-16" tabindex="-1"></a><span class="co">#> 6 1 14 9.96</span></span>
<span id="cb14-17"><a href="#cb14-17" tabindex="-1"></a><span class="co">#> 7 1 6 7.24</span></span>
<span id="cb14-18"><a href="#cb14-18" tabindex="-1"></a><span class="co">#> 8 1 4 4.26</span></span>
<span id="cb14-19"><a href="#cb14-19" tabindex="-1"></a><span class="co">#> 9 1 12 10.8 </span></span>
<span id="cb14-20"><a href="#cb14-20" tabindex="-1"></a><span class="co">#> 10 1 7 4.82</span></span>
<span id="cb14-21"><a href="#cb14-21" tabindex="-1"></a><span class="co">#> # ℹ 34 more rows</span></span></code></pre></div>
<p>Setting <code>cols_vary</code> to <code>"slowest"</code> groups the
values from columns <code>x1</code> and <code>y1</code> together in the
rows of the output before moving on to <code>x2</code> and
<code>y2</code>. This argument often produces more intuitively ordered
output when you are pivoting every column in your dataset.</p>
<p>A similar situation can arise with panel data. For example, take this
example dataset provided by <a href="https://github.com/gesistsa/rio/issues/193">Thomas Leeper</a>. We
can tidy it using the same approach as for <code>anscombe</code>:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" tabindex="-1"></a>pnl <span class="ot"><-</span> <span class="fu">tibble</span>(</span>
<span id="cb15-2"><a href="#cb15-2" tabindex="-1"></a> <span class="at">x =</span> <span class="dv">1</span><span class="sc">:</span><span class="dv">4</span>,</span>
<span id="cb15-3"><a href="#cb15-3" tabindex="-1"></a> <span class="at">a =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">1</span>,<span class="dv">0</span>, <span class="dv">0</span>),</span>
<span id="cb15-4"><a href="#cb15-4" tabindex="-1"></a> <span class="at">b =</span> <span class="fu">c</span>(<span class="dv">0</span>, <span class="dv">1</span>, <span class="dv">1</span>, <span class="dv">1</span>),</span>
<span id="cb15-5"><a href="#cb15-5" tabindex="-1"></a> <span class="at">y1 =</span> <span class="fu">rnorm</span>(<span class="dv">4</span>),</span>
<span id="cb15-6"><a href="#cb15-6" tabindex="-1"></a> <span class="at">y2 =</span> <span class="fu">rnorm</span>(<span class="dv">4</span>),</span>
<span id="cb15-7"><a href="#cb15-7" tabindex="-1"></a> <span class="at">z1 =</span> <span class="fu">rep</span>(<span class="dv">3</span>, <span class="dv">4</span>),</span>
<span id="cb15-8"><a href="#cb15-8" tabindex="-1"></a> <span class="at">z2 =</span> <span class="fu">rep</span>(<span class="sc">-</span><span class="dv">2</span>, <span class="dv">4</span>),</span>
<span id="cb15-9"><a href="#cb15-9" tabindex="-1"></a>)</span>
<span id="cb15-10"><a href="#cb15-10" tabindex="-1"></a></span>
<span id="cb15-11"><a href="#cb15-11" tabindex="-1"></a>pnl <span class="sc">%>%</span> </span>
<span id="cb15-12"><a href="#cb15-12" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb15-13"><a href="#cb15-13" tabindex="-1"></a> <span class="at">cols =</span> <span class="sc">!</span><span class="fu">c</span>(x, a, b), </span>
<span id="cb15-14"><a href="#cb15-14" tabindex="-1"></a> <span class="at">names_to =</span> <span class="fu">c</span>(<span class="st">".value"</span>, <span class="st">"time"</span>), </span>
<span id="cb15-15"><a href="#cb15-15" tabindex="-1"></a> <span class="at">names_pattern =</span> <span class="st">"(.)(.)"</span></span>
<span id="cb15-16"><a href="#cb15-16" tabindex="-1"></a> )</span>
<span id="cb15-17"><a href="#cb15-17" tabindex="-1"></a><span class="co">#> # A tibble: 8 × 6</span></span>
<span id="cb15-18"><a href="#cb15-18" tabindex="-1"></a><span class="co">#> x a b time y z</span></span>
<span id="cb15-19"><a href="#cb15-19" tabindex="-1"></a><span class="co">#> <int> <dbl> <dbl> <chr> <dbl> <dbl></span></span>
<span id="cb15-20"><a href="#cb15-20" tabindex="-1"></a><span class="co">#> 1 1 1 0 1 -0.516 3</span></span>
<span id="cb15-21"><a href="#cb15-21" tabindex="-1"></a><span class="co">#> 2 1 1 0 2 2.48 -2</span></span>
<span id="cb15-22"><a href="#cb15-22" tabindex="-1"></a><span class="co">#> 3 2 1 1 1 0.240 3</span></span>
<span id="cb15-23"><a href="#cb15-23" tabindex="-1"></a><span class="co">#> 4 2 1 1 2 0.233 -2</span></span>
<span id="cb15-24"><a href="#cb15-24" tabindex="-1"></a><span class="co">#> 5 3 0 1 1 -1.33 3</span></span>
<span id="cb15-25"><a href="#cb15-25" tabindex="-1"></a><span class="co">#> 6 3 0 1 2 -0.986 -2</span></span>
<span id="cb15-26"><a href="#cb15-26" tabindex="-1"></a><span class="co">#> 7 4 0 1 1 0.401 3</span></span>
<span id="cb15-27"><a href="#cb15-27" tabindex="-1"></a><span class="co">#> 8 4 0 1 2 -0.965 -2</span></span></code></pre></div>
</div>
</div>
<div id="wider" class="section level2">
<h2>Wider</h2>
<p><code>pivot_wider()</code> is the opposite of
<code>pivot_longer()</code>: it makes a dataset <strong>wider</strong>
by increasing the number of columns and decreasing the number of rows.
It’s relatively rare to need <code>pivot_wider()</code> to make tidy
data, but it’s often useful for creating summary tables for
presentation, or data in a format needed by other tools.</p>
<div id="capture-recapture-data" class="section level3">
<h3>Capture-recapture data</h3>
<p>The <code>fish_encounters</code> dataset, contributed by <a href="https://fishsciences.github.io/post/visualizing-fish-encounter-histories/">Myfanwy
Johnston</a>, describes when fish swimming down a river are detected by
automatic monitoring stations:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" tabindex="-1"></a>fish_encounters</span>
<span id="cb16-2"><a href="#cb16-2" tabindex="-1"></a><span class="co">#> # A tibble: 114 × 3</span></span>
<span id="cb16-3"><a href="#cb16-3" tabindex="-1"></a><span class="co">#> fish station seen</span></span>
<span id="cb16-4"><a href="#cb16-4" tabindex="-1"></a><span class="co">#> <fct> <fct> <int></span></span>
<span id="cb16-5"><a href="#cb16-5" tabindex="-1"></a><span class="co">#> 1 4842 Release 1</span></span>
<span id="cb16-6"><a href="#cb16-6" tabindex="-1"></a><span class="co">#> 2 4842 I80_1 1</span></span>
<span id="cb16-7"><a href="#cb16-7" tabindex="-1"></a><span class="co">#> 3 4842 Lisbon 1</span></span>
<span id="cb16-8"><a href="#cb16-8" tabindex="-1"></a><span class="co">#> 4 4842 Rstr 1</span></span>
<span id="cb16-9"><a href="#cb16-9" tabindex="-1"></a><span class="co">#> 5 4842 Base_TD 1</span></span>
<span id="cb16-10"><a href="#cb16-10" tabindex="-1"></a><span class="co">#> 6 4842 BCE 1</span></span>
<span id="cb16-11"><a href="#cb16-11" tabindex="-1"></a><span class="co">#> 7 4842 BCW 1</span></span>
<span id="cb16-12"><a href="#cb16-12" tabindex="-1"></a><span class="co">#> 8 4842 BCE2 1</span></span>
<span id="cb16-13"><a href="#cb16-13" tabindex="-1"></a><span class="co">#> 9 4842 BCW2 1</span></span>
<span id="cb16-14"><a href="#cb16-14" tabindex="-1"></a><span class="co">#> 10 4842 MAE 1</span></span>
<span id="cb16-15"><a href="#cb16-15" tabindex="-1"></a><span class="co">#> # ℹ 104 more rows</span></span></code></pre></div>
<p>Many tools used to analyse this data need it in a form where each
station is a column:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" tabindex="-1"></a>fish_encounters <span class="sc">%>%</span> </span>
<span id="cb17-2"><a href="#cb17-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb17-3"><a href="#cb17-3" tabindex="-1"></a> <span class="at">names_from =</span> station, </span>
<span id="cb17-4"><a href="#cb17-4" tabindex="-1"></a> <span class="at">values_from =</span> seen</span>
<span id="cb17-5"><a href="#cb17-5" tabindex="-1"></a> )</span>
<span id="cb17-6"><a href="#cb17-6" tabindex="-1"></a><span class="co">#> # A tibble: 19 × 12</span></span>
<span id="cb17-7"><a href="#cb17-7" tabindex="-1"></a><span class="co">#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW</span></span>
<span id="cb17-8"><a href="#cb17-8" tabindex="-1"></a><span class="co">#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int></span></span>
<span id="cb17-9"><a href="#cb17-9" tabindex="-1"></a><span class="co">#> 1 4842 1 1 1 1 1 1 1 1 1 1 1</span></span>
<span id="cb17-10"><a href="#cb17-10" tabindex="-1"></a><span class="co">#> 2 4843 1 1 1 1 1 1 1 1 1 1 1</span></span>
<span id="cb17-11"><a href="#cb17-11" tabindex="-1"></a><span class="co">#> 3 4844 1 1 1 1 1 1 1 1 1 1 1</span></span>
<span id="cb17-12"><a href="#cb17-12" tabindex="-1"></a><span class="co">#> 4 4845 1 1 1 1 1 NA NA NA NA NA NA</span></span>
<span id="cb17-13"><a href="#cb17-13" tabindex="-1"></a><span class="co">#> 5 4847 1 1 1 NA NA NA NA NA NA NA NA</span></span>
<span id="cb17-14"><a href="#cb17-14" tabindex="-1"></a><span class="co">#> 6 4848 1 1 1 1 NA NA NA NA NA NA NA</span></span>
<span id="cb17-15"><a href="#cb17-15" tabindex="-1"></a><span class="co">#> 7 4849 1 1 NA NA NA NA NA NA NA NA NA</span></span>
<span id="cb17-16"><a href="#cb17-16" tabindex="-1"></a><span class="co">#> 8 4850 1 1 NA 1 1 1 1 NA NA NA NA</span></span>
<span id="cb17-17"><a href="#cb17-17" tabindex="-1"></a><span class="co">#> 9 4851 1 1 NA NA NA NA NA NA NA NA NA</span></span>
<span id="cb17-18"><a href="#cb17-18" tabindex="-1"></a><span class="co">#> 10 4854 1 1 NA NA NA NA NA NA NA NA NA</span></span>
<span id="cb17-19"><a href="#cb17-19" tabindex="-1"></a><span class="co">#> # ℹ 9 more rows</span></span></code></pre></div>
<p>This dataset only records when a fish was detected by the station -
it doesn’t record when it wasn’t detected (this is common with this type
of data). That means the output data is filled with <code>NA</code>s.
However, in this case we know that the absence of a record means that
the fish was not <code>seen</code>, so we can ask
<code>pivot_wider()</code> to fill these missing values in with
zeros:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" tabindex="-1"></a>fish_encounters <span class="sc">%>%</span> </span>
<span id="cb18-2"><a href="#cb18-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb18-3"><a href="#cb18-3" tabindex="-1"></a> <span class="at">names_from =</span> station, </span>
<span id="cb18-4"><a href="#cb18-4" tabindex="-1"></a> <span class="at">values_from =</span> seen,</span>
<span id="cb18-5"><a href="#cb18-5" tabindex="-1"></a> <span class="at">values_fill =</span> <span class="dv">0</span></span>
<span id="cb18-6"><a href="#cb18-6" tabindex="-1"></a> )</span>
<span id="cb18-7"><a href="#cb18-7" tabindex="-1"></a><span class="co">#> # A tibble: 19 × 12</span></span>
<span id="cb18-8"><a href="#cb18-8" tabindex="-1"></a><span class="co">#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW</span></span>
<span id="cb18-9"><a href="#cb18-9" tabindex="-1"></a><span class="co">#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int></span></span>
<span id="cb18-10"><a href="#cb18-10" tabindex="-1"></a><span class="co">#> 1 4842 1 1 1 1 1 1 1 1 1 1 1</span></span>
<span id="cb18-11"><a href="#cb18-11" tabindex="-1"></a><span class="co">#> 2 4843 1 1 1 1 1 1 1 1 1 1 1</span></span>
<span id="cb18-12"><a href="#cb18-12" tabindex="-1"></a><span class="co">#> 3 4844 1 1 1 1 1 1 1 1 1 1 1</span></span>
<span id="cb18-13"><a href="#cb18-13" tabindex="-1"></a><span class="co">#> 4 4845 1 1 1 1 1 0 0 0 0 0 0</span></span>
<span id="cb18-14"><a href="#cb18-14" tabindex="-1"></a><span class="co">#> 5 4847 1 1 1 0 0 0 0 0 0 0 0</span></span>
<span id="cb18-15"><a href="#cb18-15" tabindex="-1"></a><span class="co">#> 6 4848 1 1 1 1 0 0 0 0 0 0 0</span></span>
<span id="cb18-16"><a href="#cb18-16" tabindex="-1"></a><span class="co">#> 7 4849 1 1 0 0 0 0 0 0 0 0 0</span></span>
<span id="cb18-17"><a href="#cb18-17" tabindex="-1"></a><span class="co">#> 8 4850 1 1 0 1 1 1 1 0 0 0 0</span></span>
<span id="cb18-18"><a href="#cb18-18" tabindex="-1"></a><span class="co">#> 9 4851 1 1 0 0 0 0 0 0 0 0 0</span></span>
<span id="cb18-19"><a href="#cb18-19" tabindex="-1"></a><span class="co">#> 10 4854 1 1 0 0 0 0 0 0 0 0 0</span></span>
<span id="cb18-20"><a href="#cb18-20" tabindex="-1"></a><span class="co">#> # ℹ 9 more rows</span></span></code></pre></div>
</div>
<div id="aggregation" class="section level3">
<h3>Aggregation</h3>
<p>You can also use <code>pivot_wider()</code> to perform simple
aggregation. For example, take the <code>warpbreaks</code> dataset built
in to base R (converted to a tibble for the better print method):</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" tabindex="-1"></a>warpbreaks <span class="ot"><-</span> warpbreaks <span class="sc">%>%</span> </span>
<span id="cb19-2"><a href="#cb19-2" tabindex="-1"></a> <span class="fu">as_tibble</span>() <span class="sc">%>%</span> </span>
<span id="cb19-3"><a href="#cb19-3" tabindex="-1"></a> <span class="fu">select</span>(wool, tension, breaks)</span>
<span id="cb19-4"><a href="#cb19-4" tabindex="-1"></a>warpbreaks</span>
<span id="cb19-5"><a href="#cb19-5" tabindex="-1"></a><span class="co">#> # A tibble: 54 × 3</span></span>
<span id="cb19-6"><a href="#cb19-6" tabindex="-1"></a><span class="co">#> wool tension breaks</span></span>
<span id="cb19-7"><a href="#cb19-7" tabindex="-1"></a><span class="co">#> <fct> <fct> <dbl></span></span>
<span id="cb19-8"><a href="#cb19-8" tabindex="-1"></a><span class="co">#> 1 A L 26</span></span>
<span id="cb19-9"><a href="#cb19-9" tabindex="-1"></a><span class="co">#> 2 A L 30</span></span>
<span id="cb19-10"><a href="#cb19-10" tabindex="-1"></a><span class="co">#> 3 A L 54</span></span>
<span id="cb19-11"><a href="#cb19-11" tabindex="-1"></a><span class="co">#> 4 A L 25</span></span>
<span id="cb19-12"><a href="#cb19-12" tabindex="-1"></a><span class="co">#> 5 A L 70</span></span>
<span id="cb19-13"><a href="#cb19-13" tabindex="-1"></a><span class="co">#> 6 A L 52</span></span>
<span id="cb19-14"><a href="#cb19-14" tabindex="-1"></a><span class="co">#> 7 A L 51</span></span>
<span id="cb19-15"><a href="#cb19-15" tabindex="-1"></a><span class="co">#> 8 A L 26</span></span>
<span id="cb19-16"><a href="#cb19-16" tabindex="-1"></a><span class="co">#> 9 A L 67</span></span>
<span id="cb19-17"><a href="#cb19-17" tabindex="-1"></a><span class="co">#> 10 A M 18</span></span>
<span id="cb19-18"><a href="#cb19-18" tabindex="-1"></a><span class="co">#> # ℹ 44 more rows</span></span></code></pre></div>
<p>This is a designed experiment with nine replicates for every
combination of <code>wool</code> (<code>A</code> and <code>B</code>) and
<code>tension</code> (<code>L</code>, <code>M</code>,
<code>H</code>):</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" tabindex="-1"></a>warpbreaks <span class="sc">%>%</span> </span>
<span id="cb20-2"><a href="#cb20-2" tabindex="-1"></a> <span class="fu">count</span>(wool, tension)</span>
<span id="cb20-3"><a href="#cb20-3" tabindex="-1"></a><span class="co">#> # A tibble: 6 × 3</span></span>
<span id="cb20-4"><a href="#cb20-4" tabindex="-1"></a><span class="co">#> wool tension n</span></span>
<span id="cb20-5"><a href="#cb20-5" tabindex="-1"></a><span class="co">#> <fct> <fct> <int></span></span>
<span id="cb20-6"><a href="#cb20-6" tabindex="-1"></a><span class="co">#> 1 A L 9</span></span>
<span id="cb20-7"><a href="#cb20-7" tabindex="-1"></a><span class="co">#> 2 A M 9</span></span>
<span id="cb20-8"><a href="#cb20-8" tabindex="-1"></a><span class="co">#> 3 A H 9</span></span>
<span id="cb20-9"><a href="#cb20-9" tabindex="-1"></a><span class="co">#> 4 B L 9</span></span>
<span id="cb20-10"><a href="#cb20-10" tabindex="-1"></a><span class="co">#> 5 B M 9</span></span>
<span id="cb20-11"><a href="#cb20-11" tabindex="-1"></a><span class="co">#> 6 B H 9</span></span></code></pre></div>
<p>What happens if we attempt to pivot the levels of <code>wool</code>
into the columns?</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" tabindex="-1"></a>warpbreaks <span class="sc">%>%</span> </span>
<span id="cb21-2"><a href="#cb21-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb21-3"><a href="#cb21-3" tabindex="-1"></a> <span class="at">names_from =</span> wool, </span>
<span id="cb21-4"><a href="#cb21-4" tabindex="-1"></a> <span class="at">values_from =</span> breaks</span>
<span id="cb21-5"><a href="#cb21-5" tabindex="-1"></a> )</span>
<span id="cb21-6"><a href="#cb21-6" tabindex="-1"></a><span class="co">#> Warning: Values from `breaks` are not uniquely identified; output will contain</span></span>
<span id="cb21-7"><a href="#cb21-7" tabindex="-1"></a><span class="co">#> list-cols.</span></span>
<span id="cb21-8"><a href="#cb21-8" tabindex="-1"></a><span class="co">#> • Use `values_fn = list` to suppress this warning.</span></span>
<span id="cb21-9"><a href="#cb21-9" tabindex="-1"></a><span class="co">#> • Use `values_fn = {summary_fun}` to summarise duplicates.</span></span>
<span id="cb21-10"><a href="#cb21-10" tabindex="-1"></a><span class="co">#> • Use the following dplyr code to identify duplicates.</span></span>
<span id="cb21-11"><a href="#cb21-11" tabindex="-1"></a><span class="co">#> {data} |></span></span>
<span id="cb21-12"><a href="#cb21-12" tabindex="-1"></a><span class="co">#> dplyr::summarise(n = dplyr::n(), .by = c(tension, wool)) |></span></span>
<span id="cb21-13"><a href="#cb21-13" tabindex="-1"></a><span class="co">#> dplyr::filter(n > 1L)</span></span>
<span id="cb21-14"><a href="#cb21-14" tabindex="-1"></a><span class="co">#> # A tibble: 3 × 3</span></span>
<span id="cb21-15"><a href="#cb21-15" tabindex="-1"></a><span class="co">#> tension A B </span></span>
<span id="cb21-16"><a href="#cb21-16" tabindex="-1"></a><span class="co">#> <fct> <list> <list> </span></span>
<span id="cb21-17"><a href="#cb21-17" tabindex="-1"></a><span class="co">#> 1 L <dbl [9]> <dbl [9]></span></span>
<span id="cb21-18"><a href="#cb21-18" tabindex="-1"></a><span class="co">#> 2 M <dbl [9]> <dbl [9]></span></span>
<span id="cb21-19"><a href="#cb21-19" tabindex="-1"></a><span class="co">#> 3 H <dbl [9]> <dbl [9]></span></span></code></pre></div>
<p>We get a warning that each cell in the output corresponds to multiple
cells in the input. The default behaviour produces list-columns, which
contain all the individual values. A more useful output would be summary
statistics, e.g. <code>mean</code> breaks for each combination of wool
and tension:</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" tabindex="-1"></a>warpbreaks <span class="sc">%>%</span> </span>
<span id="cb22-2"><a href="#cb22-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb22-3"><a href="#cb22-3" tabindex="-1"></a> <span class="at">names_from =</span> wool, </span>
<span id="cb22-4"><a href="#cb22-4" tabindex="-1"></a> <span class="at">values_from =</span> breaks,</span>
<span id="cb22-5"><a href="#cb22-5" tabindex="-1"></a> <span class="at">values_fn =</span> mean</span>
<span id="cb22-6"><a href="#cb22-6" tabindex="-1"></a> )</span>
<span id="cb22-7"><a href="#cb22-7" tabindex="-1"></a><span class="co">#> # A tibble: 3 × 3</span></span>
<span id="cb22-8"><a href="#cb22-8" tabindex="-1"></a><span class="co">#> tension A B</span></span>
<span id="cb22-9"><a href="#cb22-9" tabindex="-1"></a><span class="co">#> <fct> <dbl> <dbl></span></span>
<span id="cb22-10"><a href="#cb22-10" tabindex="-1"></a><span class="co">#> 1 L 44.6 28.2</span></span>
<span id="cb22-11"><a href="#cb22-11" tabindex="-1"></a><span class="co">#> 2 M 24 28.8</span></span>
<span id="cb22-12"><a href="#cb22-12" tabindex="-1"></a><span class="co">#> 3 H 24.6 18.8</span></span></code></pre></div>
<p>For more complex summary operations, I recommend summarising before
reshaping, but for simple cases it’s often convenient to summarise
within <code>pivot_wider()</code>.</p>
</div>
<div id="generate-column-name-from-multiple-variables" class="section level3">
<h3>Generate column name from multiple variables</h3>
<p>Imagine, as in <a href="https://stackoverflow.com/questions/24929954" class="uri">https://stackoverflow.com/questions/24929954</a>, that we
have information containing the combination of product, country, and
year. In tidy form it might look like this:</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" tabindex="-1"></a>production <span class="ot"><-</span> </span>
<span id="cb23-2"><a href="#cb23-2" tabindex="-1"></a> <span class="fu">expand_grid</span>(</span>
<span id="cb23-3"><a href="#cb23-3" tabindex="-1"></a> <span class="at">product =</span> <span class="fu">c</span>(<span class="st">"A"</span>, <span class="st">"B"</span>), </span>
<span id="cb23-4"><a href="#cb23-4" tabindex="-1"></a> <span class="at">country =</span> <span class="fu">c</span>(<span class="st">"AI"</span>, <span class="st">"EI"</span>), </span>
<span id="cb23-5"><a href="#cb23-5" tabindex="-1"></a> <span class="at">year =</span> <span class="dv">2000</span><span class="sc">:</span><span class="dv">2014</span></span>
<span id="cb23-6"><a href="#cb23-6" tabindex="-1"></a> ) <span class="sc">%>%</span></span>
<span id="cb23-7"><a href="#cb23-7" tabindex="-1"></a> <span class="fu">filter</span>((product <span class="sc">==</span> <span class="st">"A"</span> <span class="sc">&</span> country <span class="sc">==</span> <span class="st">"AI"</span>) <span class="sc">|</span> product <span class="sc">==</span> <span class="st">"B"</span>) <span class="sc">%>%</span> </span>
<span id="cb23-8"><a href="#cb23-8" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">production =</span> <span class="fu">rnorm</span>(<span class="fu">nrow</span>(.)))</span>
<span id="cb23-9"><a href="#cb23-9" tabindex="-1"></a>production</span>
<span id="cb23-10"><a href="#cb23-10" tabindex="-1"></a><span class="co">#> # A tibble: 45 × 4</span></span>
<span id="cb23-11"><a href="#cb23-11" tabindex="-1"></a><span class="co">#> product country year production</span></span>
<span id="cb23-12"><a href="#cb23-12" tabindex="-1"></a><span class="co">#> <chr> <chr> <int> <dbl></span></span>
<span id="cb23-13"><a href="#cb23-13" tabindex="-1"></a><span class="co">#> 1 A AI 2000 0.722 </span></span>
<span id="cb23-14"><a href="#cb23-14" tabindex="-1"></a><span class="co">#> 2 A AI 2001 2.79 </span></span>
<span id="cb23-15"><a href="#cb23-15" tabindex="-1"></a><span class="co">#> 3 A AI 2002 0.0848</span></span>
<span id="cb23-16"><a href="#cb23-16" tabindex="-1"></a><span class="co">#> 4 A AI 2003 0.351 </span></span>
<span id="cb23-17"><a href="#cb23-17" tabindex="-1"></a><span class="co">#> 5 A AI 2004 1.12 </span></span>
<span id="cb23-18"><a href="#cb23-18" tabindex="-1"></a><span class="co">#> 6 A AI 2005 -2.26 </span></span>
<span id="cb23-19"><a href="#cb23-19" tabindex="-1"></a><span class="co">#> 7 A AI 2006 0.566 </span></span>
<span id="cb23-20"><a href="#cb23-20" tabindex="-1"></a><span class="co">#> 8 A AI 2007 -0.451 </span></span>
<span id="cb23-21"><a href="#cb23-21" tabindex="-1"></a><span class="co">#> 9 A AI 2008 -0.0190</span></span>
<span id="cb23-22"><a href="#cb23-22" tabindex="-1"></a><span class="co">#> 10 A AI 2009 -1.69 </span></span>
<span id="cb23-23"><a href="#cb23-23" tabindex="-1"></a><span class="co">#> # ℹ 35 more rows</span></span></code></pre></div>
<p>We want to widen the data so we have one column for each combination
of <code>product</code> and <code>country</code>. The key is to specify
multiple variables for <code>names_from</code>:</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" tabindex="-1"></a>production <span class="sc">%>%</span> </span>
<span id="cb24-2"><a href="#cb24-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb24-3"><a href="#cb24-3" tabindex="-1"></a> <span class="at">names_from =</span> <span class="fu">c</span>(product, country), </span>
<span id="cb24-4"><a href="#cb24-4" tabindex="-1"></a> <span class="at">values_from =</span> production</span>
<span id="cb24-5"><a href="#cb24-5" tabindex="-1"></a> )</span>
<span id="cb24-6"><a href="#cb24-6" tabindex="-1"></a><span class="co">#> # A tibble: 15 × 4</span></span>
<span id="cb24-7"><a href="#cb24-7" tabindex="-1"></a><span class="co">#> year A_AI B_AI B_EI</span></span>
<span id="cb24-8"><a href="#cb24-8" tabindex="-1"></a><span class="co">#> <int> <dbl> <dbl> <dbl></span></span>
<span id="cb24-9"><a href="#cb24-9" tabindex="-1"></a><span class="co">#> 1 2000 0.722 0.410 -0.270 </span></span>
<span id="cb24-10"><a href="#cb24-10" tabindex="-1"></a><span class="co">#> 2 2001 2.79 -0.402 1.17 </span></span>
<span id="cb24-11"><a href="#cb24-11" tabindex="-1"></a><span class="co">#> 3 2002 0.0848 0.789 -0.399 </span></span>
<span id="cb24-12"><a href="#cb24-12" tabindex="-1"></a><span class="co">#> 4 2003 0.351 0.164 -0.0338</span></span>
<span id="cb24-13"><a href="#cb24-13" tabindex="-1"></a><span class="co">#> 5 2004 1.12 0.344 -1.01 </span></span>
<span id="cb24-14"><a href="#cb24-14" tabindex="-1"></a><span class="co">#> 6 2005 -2.26 -1.70 0.692 </span></span>
<span id="cb24-15"><a href="#cb24-15" tabindex="-1"></a><span class="co">#> 7 2006 0.566 -0.661 -1.05 </span></span>
<span id="cb24-16"><a href="#cb24-16" tabindex="-1"></a><span class="co">#> 8 2007 -0.451 1.38 0.221 </span></span>
<span id="cb24-17"><a href="#cb24-17" tabindex="-1"></a><span class="co">#> 9 2008 -0.0190 0.456 -0.608 </span></span>
<span id="cb24-18"><a href="#cb24-18" tabindex="-1"></a><span class="co">#> 10 2009 -1.69 0.0122 0.771 </span></span>
<span id="cb24-19"><a href="#cb24-19" tabindex="-1"></a><span class="co">#> # ℹ 5 more rows</span></span></code></pre></div>
<p>When either <code>names_from</code> or <code>values_from</code>
select multiple variables, you can control how the column names in the
output constructed with <code>names_sep</code> and
<code>names_prefix</code>, or the workhorse <code>names_glue</code>:</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb25-1"><a href="#cb25-1" tabindex="-1"></a>production <span class="sc">%>%</span> </span>
<span id="cb25-2"><a href="#cb25-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb25-3"><a href="#cb25-3" tabindex="-1"></a> <span class="at">names_from =</span> <span class="fu">c</span>(product, country), </span>
<span id="cb25-4"><a href="#cb25-4" tabindex="-1"></a> <span class="at">values_from =</span> production,</span>
<span id="cb25-5"><a href="#cb25-5" tabindex="-1"></a> <span class="at">names_sep =</span> <span class="st">"."</span>,</span>
<span id="cb25-6"><a href="#cb25-6" tabindex="-1"></a> <span class="at">names_prefix =</span> <span class="st">"prod."</span></span>
<span id="cb25-7"><a href="#cb25-7" tabindex="-1"></a> )</span>
<span id="cb25-8"><a href="#cb25-8" tabindex="-1"></a><span class="co">#> # A tibble: 15 × 4</span></span>
<span id="cb25-9"><a href="#cb25-9" tabindex="-1"></a><span class="co">#> year prod.A.AI prod.B.AI prod.B.EI</span></span>
<span id="cb25-10"><a href="#cb25-10" tabindex="-1"></a><span class="co">#> <int> <dbl> <dbl> <dbl></span></span>
<span id="cb25-11"><a href="#cb25-11" tabindex="-1"></a><span class="co">#> 1 2000 0.722 0.410 -0.270 </span></span>
<span id="cb25-12"><a href="#cb25-12" tabindex="-1"></a><span class="co">#> 2 2001 2.79 -0.402 1.17 </span></span>
<span id="cb25-13"><a href="#cb25-13" tabindex="-1"></a><span class="co">#> 3 2002 0.0848 0.789 -0.399 </span></span>
<span id="cb25-14"><a href="#cb25-14" tabindex="-1"></a><span class="co">#> 4 2003 0.351 0.164 -0.0338</span></span>
<span id="cb25-15"><a href="#cb25-15" tabindex="-1"></a><span class="co">#> 5 2004 1.12 0.344 -1.01 </span></span>
<span id="cb25-16"><a href="#cb25-16" tabindex="-1"></a><span class="co">#> 6 2005 -2.26 -1.70 0.692 </span></span>
<span id="cb25-17"><a href="#cb25-17" tabindex="-1"></a><span class="co">#> 7 2006 0.566 -0.661 -1.05 </span></span>
<span id="cb25-18"><a href="#cb25-18" tabindex="-1"></a><span class="co">#> 8 2007 -0.451 1.38 0.221 </span></span>
<span id="cb25-19"><a href="#cb25-19" tabindex="-1"></a><span class="co">#> 9 2008 -0.0190 0.456 -0.608 </span></span>
<span id="cb25-20"><a href="#cb25-20" tabindex="-1"></a><span class="co">#> 10 2009 -1.69 0.0122 0.771 </span></span>
<span id="cb25-21"><a href="#cb25-21" tabindex="-1"></a><span class="co">#> # ℹ 5 more rows</span></span>
<span id="cb25-22"><a href="#cb25-22" tabindex="-1"></a></span>
<span id="cb25-23"><a href="#cb25-23" tabindex="-1"></a>production <span class="sc">%>%</span> </span>
<span id="cb25-24"><a href="#cb25-24" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb25-25"><a href="#cb25-25" tabindex="-1"></a> <span class="at">names_from =</span> <span class="fu">c</span>(product, country), </span>
<span id="cb25-26"><a href="#cb25-26" tabindex="-1"></a> <span class="at">values_from =</span> production,</span>
<span id="cb25-27"><a href="#cb25-27" tabindex="-1"></a> <span class="at">names_glue =</span> <span class="st">"prod_{product}_{country}"</span></span>
<span id="cb25-28"><a href="#cb25-28" tabindex="-1"></a> )</span>
<span id="cb25-29"><a href="#cb25-29" tabindex="-1"></a><span class="co">#> # A tibble: 15 × 4</span></span>
<span id="cb25-30"><a href="#cb25-30" tabindex="-1"></a><span class="co">#> year prod_A_AI prod_B_AI prod_B_EI</span></span>
<span id="cb25-31"><a href="#cb25-31" tabindex="-1"></a><span class="co">#> <int> <dbl> <dbl> <dbl></span></span>
<span id="cb25-32"><a href="#cb25-32" tabindex="-1"></a><span class="co">#> 1 2000 0.722 0.410 -0.270 </span></span>
<span id="cb25-33"><a href="#cb25-33" tabindex="-1"></a><span class="co">#> 2 2001 2.79 -0.402 1.17 </span></span>
<span id="cb25-34"><a href="#cb25-34" tabindex="-1"></a><span class="co">#> 3 2002 0.0848 0.789 -0.399 </span></span>
<span id="cb25-35"><a href="#cb25-35" tabindex="-1"></a><span class="co">#> 4 2003 0.351 0.164 -0.0338</span></span>
<span id="cb25-36"><a href="#cb25-36" tabindex="-1"></a><span class="co">#> 5 2004 1.12 0.344 -1.01 </span></span>
<span id="cb25-37"><a href="#cb25-37" tabindex="-1"></a><span class="co">#> 6 2005 -2.26 -1.70 0.692 </span></span>
<span id="cb25-38"><a href="#cb25-38" tabindex="-1"></a><span class="co">#> 7 2006 0.566 -0.661 -1.05 </span></span>
<span id="cb25-39"><a href="#cb25-39" tabindex="-1"></a><span class="co">#> 8 2007 -0.451 1.38 0.221 </span></span>
<span id="cb25-40"><a href="#cb25-40" tabindex="-1"></a><span class="co">#> 9 2008 -0.0190 0.456 -0.608 </span></span>
<span id="cb25-41"><a href="#cb25-41" tabindex="-1"></a><span class="co">#> 10 2009 -1.69 0.0122 0.771 </span></span>
<span id="cb25-42"><a href="#cb25-42" tabindex="-1"></a><span class="co">#> # ℹ 5 more rows</span></span></code></pre></div>
</div>
<div id="tidy-census" class="section level3">
<h3>Tidy census</h3>
<p>The <code>us_rent_income</code> dataset contains information about
median income and rent for each state in the US for 2017 (from the
American Community Survey, retrieved with the <a href="https://walker-data.com/tidycensus/">tidycensus</a> package).</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1" tabindex="-1"></a>us_rent_income</span>
<span id="cb26-2"><a href="#cb26-2" tabindex="-1"></a><span class="co">#> # A tibble: 104 × 5</span></span>
<span id="cb26-3"><a href="#cb26-3" tabindex="-1"></a><span class="co">#> GEOID NAME variable estimate moe</span></span>
<span id="cb26-4"><a href="#cb26-4" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> <dbl> <dbl></span></span>
<span id="cb26-5"><a href="#cb26-5" tabindex="-1"></a><span class="co">#> 1 01 Alabama income 24476 136</span></span>
<span id="cb26-6"><a href="#cb26-6" tabindex="-1"></a><span class="co">#> 2 01 Alabama rent 747 3</span></span>
<span id="cb26-7"><a href="#cb26-7" tabindex="-1"></a><span class="co">#> 3 02 Alaska income 32940 508</span></span>
<span id="cb26-8"><a href="#cb26-8" tabindex="-1"></a><span class="co">#> 4 02 Alaska rent 1200 13</span></span>
<span id="cb26-9"><a href="#cb26-9" tabindex="-1"></a><span class="co">#> 5 04 Arizona income 27517 148</span></span>
<span id="cb26-10"><a href="#cb26-10" tabindex="-1"></a><span class="co">#> 6 04 Arizona rent 972 4</span></span>
<span id="cb26-11"><a href="#cb26-11" tabindex="-1"></a><span class="co">#> 7 05 Arkansas income 23789 165</span></span>
<span id="cb26-12"><a href="#cb26-12" tabindex="-1"></a><span class="co">#> 8 05 Arkansas rent 709 5</span></span>
<span id="cb26-13"><a href="#cb26-13" tabindex="-1"></a><span class="co">#> 9 06 California income 29454 109</span></span>
<span id="cb26-14"><a href="#cb26-14" tabindex="-1"></a><span class="co">#> 10 06 California rent 1358 3</span></span>
<span id="cb26-15"><a href="#cb26-15" tabindex="-1"></a><span class="co">#> # ℹ 94 more rows</span></span></code></pre></div>
<p>Here both <code>estimate</code> and <code>moe</code> are values
columns, so we can supply them to <code>values_from</code>:</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1" tabindex="-1"></a>us_rent_income <span class="sc">%>%</span> </span>
<span id="cb27-2"><a href="#cb27-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb27-3"><a href="#cb27-3" tabindex="-1"></a> <span class="at">names_from =</span> variable, </span>
<span id="cb27-4"><a href="#cb27-4" tabindex="-1"></a> <span class="at">values_from =</span> <span class="fu">c</span>(estimate, moe)</span>
<span id="cb27-5"><a href="#cb27-5" tabindex="-1"></a> )</span>
<span id="cb27-6"><a href="#cb27-6" tabindex="-1"></a><span class="co">#> # A tibble: 52 × 6</span></span>
<span id="cb27-7"><a href="#cb27-7" tabindex="-1"></a><span class="co">#> GEOID NAME estimate_income estimate_rent moe_income moe_rent</span></span>
<span id="cb27-8"><a href="#cb27-8" tabindex="-1"></a><span class="co">#> <chr> <chr> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb27-9"><a href="#cb27-9" tabindex="-1"></a><span class="co">#> 1 01 Alabama 24476 747 136 3</span></span>
<span id="cb27-10"><a href="#cb27-10" tabindex="-1"></a><span class="co">#> 2 02 Alaska 32940 1200 508 13</span></span>
<span id="cb27-11"><a href="#cb27-11" tabindex="-1"></a><span class="co">#> 3 04 Arizona 27517 972 148 4</span></span>
<span id="cb27-12"><a href="#cb27-12" tabindex="-1"></a><span class="co">#> 4 05 Arkansas 23789 709 165 5</span></span>
<span id="cb27-13"><a href="#cb27-13" tabindex="-1"></a><span class="co">#> 5 06 California 29454 1358 109 3</span></span>
<span id="cb27-14"><a href="#cb27-14" tabindex="-1"></a><span class="co">#> 6 08 Colorado 32401 1125 109 5</span></span>
<span id="cb27-15"><a href="#cb27-15" tabindex="-1"></a><span class="co">#> 7 09 Connecticut 35326 1123 195 5</span></span>
<span id="cb27-16"><a href="#cb27-16" tabindex="-1"></a><span class="co">#> 8 10 Delaware 31560 1076 247 10</span></span>
<span id="cb27-17"><a href="#cb27-17" tabindex="-1"></a><span class="co">#> 9 11 District of Columbia 43198 1424 681 17</span></span>
<span id="cb27-18"><a href="#cb27-18" tabindex="-1"></a><span class="co">#> 10 12 Florida 25952 1077 70 3</span></span>
<span id="cb27-19"><a href="#cb27-19" tabindex="-1"></a><span class="co">#> # ℹ 42 more rows</span></span></code></pre></div>
<p>Note that the name of the variable is automatically appended to the
output columns.</p>
</div>
<div id="implicit-missing-values" class="section level3">
<h3>Implicit missing values</h3>
<p>Occasionally, you’ll come across data where your names variable is
encoded as a factor, but not all of the data will be represented.</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1" tabindex="-1"></a>weekdays <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"Mon"</span>, <span class="st">"Tue"</span>, <span class="st">"Wed"</span>, <span class="st">"Thu"</span>, <span class="st">"Fri"</span>, <span class="st">"Sat"</span>, <span class="st">"Sun"</span>)</span>
<span id="cb28-2"><a href="#cb28-2" tabindex="-1"></a></span>
<span id="cb28-3"><a href="#cb28-3" tabindex="-1"></a>daily <span class="ot"><-</span> <span class="fu">tibble</span>(</span>
<span id="cb28-4"><a href="#cb28-4" tabindex="-1"></a> <span class="at">day =</span> <span class="fu">factor</span>(<span class="fu">c</span>(<span class="st">"Tue"</span>, <span class="st">"Thu"</span>, <span class="st">"Fri"</span>, <span class="st">"Mon"</span>), <span class="at">levels =</span> weekdays),</span>
<span id="cb28-5"><a href="#cb28-5" tabindex="-1"></a> <span class="at">value =</span> <span class="fu">c</span>(<span class="dv">2</span>, <span class="dv">3</span>, <span class="dv">1</span>, <span class="dv">5</span>)</span>
<span id="cb28-6"><a href="#cb28-6" tabindex="-1"></a>)</span>
<span id="cb28-7"><a href="#cb28-7" tabindex="-1"></a></span>
<span id="cb28-8"><a href="#cb28-8" tabindex="-1"></a>daily</span>
<span id="cb28-9"><a href="#cb28-9" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 2</span></span>
<span id="cb28-10"><a href="#cb28-10" tabindex="-1"></a><span class="co">#> day value</span></span>
<span id="cb28-11"><a href="#cb28-11" tabindex="-1"></a><span class="co">#> <fct> <dbl></span></span>
<span id="cb28-12"><a href="#cb28-12" tabindex="-1"></a><span class="co">#> 1 Tue 2</span></span>
<span id="cb28-13"><a href="#cb28-13" tabindex="-1"></a><span class="co">#> 2 Thu 3</span></span>
<span id="cb28-14"><a href="#cb28-14" tabindex="-1"></a><span class="co">#> 3 Fri 1</span></span>
<span id="cb28-15"><a href="#cb28-15" tabindex="-1"></a><span class="co">#> 4 Mon 5</span></span></code></pre></div>
<p><code>pivot_wider()</code> defaults to generating columns from the
values that are actually represented in the data, but you might want to
include a column for each possible level in case the data changes in the
future.</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1" tabindex="-1"></a>daily <span class="sc">%>%</span></span>
<span id="cb29-2"><a href="#cb29-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb29-3"><a href="#cb29-3" tabindex="-1"></a> <span class="at">names_from =</span> day, </span>
<span id="cb29-4"><a href="#cb29-4" tabindex="-1"></a> <span class="at">values_from =</span> value</span>
<span id="cb29-5"><a href="#cb29-5" tabindex="-1"></a> )</span>
<span id="cb29-6"><a href="#cb29-6" tabindex="-1"></a><span class="co">#> # A tibble: 1 × 4</span></span>
<span id="cb29-7"><a href="#cb29-7" tabindex="-1"></a><span class="co">#> Tue Thu Fri Mon</span></span>
<span id="cb29-8"><a href="#cb29-8" tabindex="-1"></a><span class="co">#> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb29-9"><a href="#cb29-9" tabindex="-1"></a><span class="co">#> 1 2 3 1 5</span></span></code></pre></div>
<p>The <code>names_expand</code> argument will turn implicit factor
levels into explicit ones, forcing them to be represented in the result.
It also sorts the column names using the level order, which produces
more intuitive results in this case.</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1" tabindex="-1"></a>daily <span class="sc">%>%</span> </span>
<span id="cb30-2"><a href="#cb30-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb30-3"><a href="#cb30-3" tabindex="-1"></a> <span class="at">names_from =</span> day, </span>
<span id="cb30-4"><a href="#cb30-4" tabindex="-1"></a> <span class="at">values_from =</span> value, </span>
<span id="cb30-5"><a href="#cb30-5" tabindex="-1"></a> <span class="at">names_expand =</span> <span class="cn">TRUE</span></span>
<span id="cb30-6"><a href="#cb30-6" tabindex="-1"></a> )</span>
<span id="cb30-7"><a href="#cb30-7" tabindex="-1"></a><span class="co">#> # A tibble: 1 × 7</span></span>
<span id="cb30-8"><a href="#cb30-8" tabindex="-1"></a><span class="co">#> Mon Tue Wed Thu Fri Sat Sun</span></span>
<span id="cb30-9"><a href="#cb30-9" tabindex="-1"></a><span class="co">#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb30-10"><a href="#cb30-10" tabindex="-1"></a><span class="co">#> 1 5 2 NA 3 1 NA NA</span></span></code></pre></div>
<p>If multiple <code>names_from</code> columns are provided,
<code>names_expand</code> will generate a Cartesian product of all
possible combinations of the <code>names_from</code> values. Notice that
the following data has omitted some rows where the percentage value
would be <code>0</code>. <code>names_expand</code> allows us to make
those explicit during the pivot.</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1" tabindex="-1"></a>percentages <span class="ot"><-</span> <span class="fu">tibble</span>(</span>
<span id="cb31-2"><a href="#cb31-2" tabindex="-1"></a> <span class="at">year =</span> <span class="fu">c</span>(<span class="dv">2018</span>, <span class="dv">2019</span>, <span class="dv">2020</span>, <span class="dv">2020</span>),</span>
<span id="cb31-3"><a href="#cb31-3" tabindex="-1"></a> <span class="at">type =</span> <span class="fu">factor</span>(<span class="fu">c</span>(<span class="st">"A"</span>, <span class="st">"B"</span>, <span class="st">"A"</span>, <span class="st">"B"</span>), <span class="at">levels =</span> <span class="fu">c</span>(<span class="st">"A"</span>, <span class="st">"B"</span>)),</span>
<span id="cb31-4"><a href="#cb31-4" tabindex="-1"></a> <span class="at">percentage =</span> <span class="fu">c</span>(<span class="dv">100</span>, <span class="dv">100</span>, <span class="dv">40</span>, <span class="dv">60</span>)</span>
<span id="cb31-5"><a href="#cb31-5" tabindex="-1"></a>)</span>
<span id="cb31-6"><a href="#cb31-6" tabindex="-1"></a></span>
<span id="cb31-7"><a href="#cb31-7" tabindex="-1"></a>percentages</span>
<span id="cb31-8"><a href="#cb31-8" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 3</span></span>
<span id="cb31-9"><a href="#cb31-9" tabindex="-1"></a><span class="co">#> year type percentage</span></span>
<span id="cb31-10"><a href="#cb31-10" tabindex="-1"></a><span class="co">#> <dbl> <fct> <dbl></span></span>
<span id="cb31-11"><a href="#cb31-11" tabindex="-1"></a><span class="co">#> 1 2018 A 100</span></span>
<span id="cb31-12"><a href="#cb31-12" tabindex="-1"></a><span class="co">#> 2 2019 B 100</span></span>
<span id="cb31-13"><a href="#cb31-13" tabindex="-1"></a><span class="co">#> 3 2020 A 40</span></span>
<span id="cb31-14"><a href="#cb31-14" tabindex="-1"></a><span class="co">#> 4 2020 B 60</span></span>
<span id="cb31-15"><a href="#cb31-15" tabindex="-1"></a></span>
<span id="cb31-16"><a href="#cb31-16" tabindex="-1"></a>percentages <span class="sc">%>%</span> </span>
<span id="cb31-17"><a href="#cb31-17" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb31-18"><a href="#cb31-18" tabindex="-1"></a> <span class="at">names_from =</span> <span class="fu">c</span>(year, type),</span>
<span id="cb31-19"><a href="#cb31-19" tabindex="-1"></a> <span class="at">values_from =</span> percentage,</span>
<span id="cb31-20"><a href="#cb31-20" tabindex="-1"></a> <span class="at">names_expand =</span> <span class="cn">TRUE</span>,</span>
<span id="cb31-21"><a href="#cb31-21" tabindex="-1"></a> <span class="at">values_fill =</span> <span class="dv">0</span></span>
<span id="cb31-22"><a href="#cb31-22" tabindex="-1"></a> )</span>
<span id="cb31-23"><a href="#cb31-23" tabindex="-1"></a><span class="co">#> # A tibble: 1 × 6</span></span>
<span id="cb31-24"><a href="#cb31-24" tabindex="-1"></a><span class="co">#> `2018_A` `2018_B` `2019_A` `2019_B` `2020_A` `2020_B`</span></span>
<span id="cb31-25"><a href="#cb31-25" tabindex="-1"></a><span class="co">#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb31-26"><a href="#cb31-26" tabindex="-1"></a><span class="co">#> 1 100 0 0 100 40 60</span></span></code></pre></div>
<p>A related problem can occur when there are implicit missing factor
levels or combinations in the <code>id_cols</code>. In this case, there
are missing rows (rather than columns) that you’d like to explicitly
represent. For this example, we’ll modify our <code>daily</code> data
with a <code>type</code> column, and pivot on that instead, keeping
<code>day</code> as an id column.</p>
<div class="sourceCode" id="cb32"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1" tabindex="-1"></a>daily <span class="ot"><-</span> <span class="fu">mutate</span>(daily, <span class="at">type =</span> <span class="fu">factor</span>(<span class="fu">c</span>(<span class="st">"A"</span>, <span class="st">"B"</span>, <span class="st">"B"</span>, <span class="st">"A"</span>)))</span>
<span id="cb32-2"><a href="#cb32-2" tabindex="-1"></a>daily</span>
<span id="cb32-3"><a href="#cb32-3" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 3</span></span>
<span id="cb32-4"><a href="#cb32-4" tabindex="-1"></a><span class="co">#> day value type </span></span>
<span id="cb32-5"><a href="#cb32-5" tabindex="-1"></a><span class="co">#> <fct> <dbl> <fct></span></span>
<span id="cb32-6"><a href="#cb32-6" tabindex="-1"></a><span class="co">#> 1 Tue 2 A </span></span>
<span id="cb32-7"><a href="#cb32-7" tabindex="-1"></a><span class="co">#> 2 Thu 3 B </span></span>
<span id="cb32-8"><a href="#cb32-8" tabindex="-1"></a><span class="co">#> 3 Fri 1 B </span></span>
<span id="cb32-9"><a href="#cb32-9" tabindex="-1"></a><span class="co">#> 4 Mon 5 A</span></span></code></pre></div>
<p>All of our <code>type</code> levels are represented in the columns,
but we are missing some rows related to the unrepresented
<code>day</code> factor levels.</p>
<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1" tabindex="-1"></a>daily <span class="sc">%>%</span></span>
<span id="cb33-2"><a href="#cb33-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb33-3"><a href="#cb33-3" tabindex="-1"></a> <span class="at">names_from =</span> type, </span>
<span id="cb33-4"><a href="#cb33-4" tabindex="-1"></a> <span class="at">values_from =</span> value,</span>
<span id="cb33-5"><a href="#cb33-5" tabindex="-1"></a> <span class="at">values_fill =</span> <span class="dv">0</span></span>
<span id="cb33-6"><a href="#cb33-6" tabindex="-1"></a> )</span>
<span id="cb33-7"><a href="#cb33-7" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 3</span></span>
<span id="cb33-8"><a href="#cb33-8" tabindex="-1"></a><span class="co">#> day A B</span></span>
<span id="cb33-9"><a href="#cb33-9" tabindex="-1"></a><span class="co">#> <fct> <dbl> <dbl></span></span>
<span id="cb33-10"><a href="#cb33-10" tabindex="-1"></a><span class="co">#> 1 Tue 2 0</span></span>
<span id="cb33-11"><a href="#cb33-11" tabindex="-1"></a><span class="co">#> 2 Thu 0 3</span></span>
<span id="cb33-12"><a href="#cb33-12" tabindex="-1"></a><span class="co">#> 3 Fri 0 1</span></span>
<span id="cb33-13"><a href="#cb33-13" tabindex="-1"></a><span class="co">#> 4 Mon 5 0</span></span></code></pre></div>
<p>We can use <code>id_expand</code> in the same way that we used
<code>names_expand</code>, which will expand out (and sort) the implicit
missing rows in the <code>id_cols</code>.</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="#cb34-1" tabindex="-1"></a>daily <span class="sc">%>%</span> </span>
<span id="cb34-2"><a href="#cb34-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb34-3"><a href="#cb34-3" tabindex="-1"></a> <span class="at">names_from =</span> type, </span>
<span id="cb34-4"><a href="#cb34-4" tabindex="-1"></a> <span class="at">values_from =</span> value,</span>
<span id="cb34-5"><a href="#cb34-5" tabindex="-1"></a> <span class="at">values_fill =</span> <span class="dv">0</span>,</span>
<span id="cb34-6"><a href="#cb34-6" tabindex="-1"></a> <span class="at">id_expand =</span> <span class="cn">TRUE</span></span>
<span id="cb34-7"><a href="#cb34-7" tabindex="-1"></a> )</span>
<span id="cb34-8"><a href="#cb34-8" tabindex="-1"></a><span class="co">#> # A tibble: 7 × 3</span></span>
<span id="cb34-9"><a href="#cb34-9" tabindex="-1"></a><span class="co">#> day A B</span></span>
<span id="cb34-10"><a href="#cb34-10" tabindex="-1"></a><span class="co">#> <fct> <dbl> <dbl></span></span>
<span id="cb34-11"><a href="#cb34-11" tabindex="-1"></a><span class="co">#> 1 Mon 5 0</span></span>
<span id="cb34-12"><a href="#cb34-12" tabindex="-1"></a><span class="co">#> 2 Tue 2 0</span></span>
<span id="cb34-13"><a href="#cb34-13" tabindex="-1"></a><span class="co">#> 3 Wed 0 0</span></span>
<span id="cb34-14"><a href="#cb34-14" tabindex="-1"></a><span class="co">#> 4 Thu 0 3</span></span>
<span id="cb34-15"><a href="#cb34-15" tabindex="-1"></a><span class="co">#> 5 Fri 0 1</span></span>
<span id="cb34-16"><a href="#cb34-16" tabindex="-1"></a><span class="co">#> 6 Sat 0 0</span></span>
<span id="cb34-17"><a href="#cb34-17" tabindex="-1"></a><span class="co">#> 7 Sun 0 0</span></span></code></pre></div>
</div>
<div id="unused-columns" class="section level3">
<h3>Unused columns</h3>
<p>Imagine you’ve found yourself in a situation where you have columns
in your data that are completely unrelated to the pivoting process, but
you’d still like to retain their information somehow. For example, in
<code>updates</code> we’d like to pivot on the <code>system</code>
column to create one row summaries of each county’s system updates.</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="#cb35-1" tabindex="-1"></a>updates <span class="ot"><-</span> <span class="fu">tibble</span>(</span>
<span id="cb35-2"><a href="#cb35-2" tabindex="-1"></a> <span class="at">county =</span> <span class="fu">c</span>(<span class="st">"Wake"</span>, <span class="st">"Wake"</span>, <span class="st">"Wake"</span>, <span class="st">"Guilford"</span>, <span class="st">"Guilford"</span>),</span>
<span id="cb35-3"><a href="#cb35-3" tabindex="-1"></a> <span class="at">date =</span> <span class="fu">c</span>(<span class="fu">as.Date</span>(<span class="st">"2020-01-01"</span>) <span class="sc">+</span> <span class="dv">0</span><span class="sc">:</span><span class="dv">2</span>, <span class="fu">as.Date</span>(<span class="st">"2020-01-03"</span>) <span class="sc">+</span> <span class="dv">0</span><span class="sc">:</span><span class="dv">1</span>),</span>
<span id="cb35-4"><a href="#cb35-4" tabindex="-1"></a> <span class="at">system =</span> <span class="fu">c</span>(<span class="st">"A"</span>, <span class="st">"B"</span>, <span class="st">"C"</span>, <span class="st">"A"</span>, <span class="st">"C"</span>),</span>
<span id="cb35-5"><a href="#cb35-5" tabindex="-1"></a> <span class="at">value =</span> <span class="fu">c</span>(<span class="fl">3.2</span>, <span class="dv">4</span>, <span class="fl">5.5</span>, <span class="dv">2</span>, <span class="fl">1.2</span>)</span>
<span id="cb35-6"><a href="#cb35-6" tabindex="-1"></a>)</span>
<span id="cb35-7"><a href="#cb35-7" tabindex="-1"></a></span>
<span id="cb35-8"><a href="#cb35-8" tabindex="-1"></a>updates</span>
<span id="cb35-9"><a href="#cb35-9" tabindex="-1"></a><span class="co">#> # A tibble: 5 × 4</span></span>
<span id="cb35-10"><a href="#cb35-10" tabindex="-1"></a><span class="co">#> county date system value</span></span>
<span id="cb35-11"><a href="#cb35-11" tabindex="-1"></a><span class="co">#> <chr> <date> <chr> <dbl></span></span>
<span id="cb35-12"><a href="#cb35-12" tabindex="-1"></a><span class="co">#> 1 Wake 2020-01-01 A 3.2</span></span>
<span id="cb35-13"><a href="#cb35-13" tabindex="-1"></a><span class="co">#> 2 Wake 2020-01-02 B 4 </span></span>
<span id="cb35-14"><a href="#cb35-14" tabindex="-1"></a><span class="co">#> 3 Wake 2020-01-03 C 5.5</span></span>
<span id="cb35-15"><a href="#cb35-15" tabindex="-1"></a><span class="co">#> 4 Guilford 2020-01-03 A 2 </span></span>
<span id="cb35-16"><a href="#cb35-16" tabindex="-1"></a><span class="co">#> 5 Guilford 2020-01-04 C 1.2</span></span></code></pre></div>
<p>We could do that with a typical <code>pivot_wider()</code> call, but
we completely lose all information about the <code>date</code>
column.</p>
<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="#cb36-1" tabindex="-1"></a>updates <span class="sc">%>%</span> </span>
<span id="cb36-2"><a href="#cb36-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb36-3"><a href="#cb36-3" tabindex="-1"></a> <span class="at">id_cols =</span> county, </span>
<span id="cb36-4"><a href="#cb36-4" tabindex="-1"></a> <span class="at">names_from =</span> system, </span>
<span id="cb36-5"><a href="#cb36-5" tabindex="-1"></a> <span class="at">values_from =</span> value</span>
<span id="cb36-6"><a href="#cb36-6" tabindex="-1"></a> )</span>
<span id="cb36-7"><a href="#cb36-7" tabindex="-1"></a><span class="co">#> # A tibble: 2 × 4</span></span>
<span id="cb36-8"><a href="#cb36-8" tabindex="-1"></a><span class="co">#> county A B C</span></span>
<span id="cb36-9"><a href="#cb36-9" tabindex="-1"></a><span class="co">#> <chr> <dbl> <dbl> <dbl></span></span>
<span id="cb36-10"><a href="#cb36-10" tabindex="-1"></a><span class="co">#> 1 Wake 3.2 4 5.5</span></span>
<span id="cb36-11"><a href="#cb36-11" tabindex="-1"></a><span class="co">#> 2 Guilford 2 NA 1.2</span></span></code></pre></div>
<p>For this example, we’d like to retain the most recent update date
across all systems in a particular county. To accomplish that we can use
the <code>unused_fn</code> argument, which allows us to summarize values
from the columns not utilized in the pivoting process.</p>
<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1" tabindex="-1"></a>updates <span class="sc">%>%</span> </span>
<span id="cb37-2"><a href="#cb37-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb37-3"><a href="#cb37-3" tabindex="-1"></a> <span class="at">id_cols =</span> county, </span>
<span id="cb37-4"><a href="#cb37-4" tabindex="-1"></a> <span class="at">names_from =</span> system, </span>
<span id="cb37-5"><a href="#cb37-5" tabindex="-1"></a> <span class="at">values_from =</span> value,</span>
<span id="cb37-6"><a href="#cb37-6" tabindex="-1"></a> <span class="at">unused_fn =</span> <span class="fu">list</span>(<span class="at">date =</span> max)</span>
<span id="cb37-7"><a href="#cb37-7" tabindex="-1"></a> )</span>
<span id="cb37-8"><a href="#cb37-8" tabindex="-1"></a><span class="co">#> # A tibble: 2 × 5</span></span>
<span id="cb37-9"><a href="#cb37-9" tabindex="-1"></a><span class="co">#> county A B C date </span></span>
<span id="cb37-10"><a href="#cb37-10" tabindex="-1"></a><span class="co">#> <chr> <dbl> <dbl> <dbl> <date> </span></span>
<span id="cb37-11"><a href="#cb37-11" tabindex="-1"></a><span class="co">#> 1 Wake 3.2 4 5.5 2020-01-03</span></span>
<span id="cb37-12"><a href="#cb37-12" tabindex="-1"></a><span class="co">#> 2 Guilford 2 NA 1.2 2020-01-04</span></span></code></pre></div>
<p>You can also retain the data but delay the aggregation entirely by
using <code>list()</code> as the summary function.</p>
<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="#cb38-1" tabindex="-1"></a>updates <span class="sc">%>%</span> </span>
<span id="cb38-2"><a href="#cb38-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb38-3"><a href="#cb38-3" tabindex="-1"></a> <span class="at">id_cols =</span> county, </span>
<span id="cb38-4"><a href="#cb38-4" tabindex="-1"></a> <span class="at">names_from =</span> system, </span>
<span id="cb38-5"><a href="#cb38-5" tabindex="-1"></a> <span class="at">values_from =</span> value,</span>
<span id="cb38-6"><a href="#cb38-6" tabindex="-1"></a> <span class="at">unused_fn =</span> <span class="fu">list</span>(<span class="at">date =</span> list)</span>
<span id="cb38-7"><a href="#cb38-7" tabindex="-1"></a> )</span>
<span id="cb38-8"><a href="#cb38-8" tabindex="-1"></a><span class="co">#> # A tibble: 2 × 5</span></span>
<span id="cb38-9"><a href="#cb38-9" tabindex="-1"></a><span class="co">#> county A B C date </span></span>
<span id="cb38-10"><a href="#cb38-10" tabindex="-1"></a><span class="co">#> <chr> <dbl> <dbl> <dbl> <list> </span></span>
<span id="cb38-11"><a href="#cb38-11" tabindex="-1"></a><span class="co">#> 1 Wake 3.2 4 5.5 <date [3]></span></span>
<span id="cb38-12"><a href="#cb38-12" tabindex="-1"></a><span class="co">#> 2 Guilford 2 NA 1.2 <date [2]></span></span></code></pre></div>
</div>
<div id="contact-list" class="section level3">
<h3>Contact list</h3>
<p>A final challenge is inspired by <a href="https://github.com/jienagu/tidyverse_examples/blob/master/example_long_wide.R">Jiena
Gu</a>. Imagine you have a contact list that you’ve copied and pasted
from a website:</p>
<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1" tabindex="-1"></a>contacts <span class="ot"><-</span> <span class="fu">tribble</span>(</span>
<span id="cb39-2"><a href="#cb39-2" tabindex="-1"></a> <span class="sc">~</span>field, <span class="sc">~</span>value,</span>
<span id="cb39-3"><a href="#cb39-3" tabindex="-1"></a> <span class="st">"name"</span>, <span class="st">"Jiena McLellan"</span>,</span>
<span id="cb39-4"><a href="#cb39-4" tabindex="-1"></a> <span class="st">"company"</span>, <span class="st">"Toyota"</span>, </span>
<span id="cb39-5"><a href="#cb39-5" tabindex="-1"></a> <span class="st">"name"</span>, <span class="st">"John Smith"</span>, </span>
<span id="cb39-6"><a href="#cb39-6" tabindex="-1"></a> <span class="st">"company"</span>, <span class="st">"google"</span>, </span>
<span id="cb39-7"><a href="#cb39-7" tabindex="-1"></a> <span class="st">"email"</span>, <span class="st">"john@google.com"</span>,</span>
<span id="cb39-8"><a href="#cb39-8" tabindex="-1"></a> <span class="st">"name"</span>, <span class="st">"Huxley Ratcliffe"</span></span>
<span id="cb39-9"><a href="#cb39-9" tabindex="-1"></a>)</span></code></pre></div>
<p>This is challenging because there’s no variable that identifies which
observations belong together. We can fix this by noting that every
contact starts with a name, so we can create a unique id by counting
every time we see “name” as the <code>field</code>:</p>
<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1" tabindex="-1"></a>contacts <span class="ot"><-</span> contacts <span class="sc">%>%</span> </span>
<span id="cb40-2"><a href="#cb40-2" tabindex="-1"></a> <span class="fu">mutate</span>(</span>
<span id="cb40-3"><a href="#cb40-3" tabindex="-1"></a> <span class="at">person_id =</span> <span class="fu">cumsum</span>(field <span class="sc">==</span> <span class="st">"name"</span>)</span>
<span id="cb40-4"><a href="#cb40-4" tabindex="-1"></a> )</span>
<span id="cb40-5"><a href="#cb40-5" tabindex="-1"></a>contacts</span>
<span id="cb40-6"><a href="#cb40-6" tabindex="-1"></a><span class="co">#> # A tibble: 6 × 3</span></span>
<span id="cb40-7"><a href="#cb40-7" tabindex="-1"></a><span class="co">#> field value person_id</span></span>
<span id="cb40-8"><a href="#cb40-8" tabindex="-1"></a><span class="co">#> <chr> <chr> <int></span></span>
<span id="cb40-9"><a href="#cb40-9" tabindex="-1"></a><span class="co">#> 1 name Jiena McLellan 1</span></span>
<span id="cb40-10"><a href="#cb40-10" tabindex="-1"></a><span class="co">#> 2 company Toyota 1</span></span>
<span id="cb40-11"><a href="#cb40-11" tabindex="-1"></a><span class="co">#> 3 name John Smith 2</span></span>
<span id="cb40-12"><a href="#cb40-12" tabindex="-1"></a><span class="co">#> 4 company google 2</span></span>
<span id="cb40-13"><a href="#cb40-13" tabindex="-1"></a><span class="co">#> 5 email john@google.com 2</span></span>
<span id="cb40-14"><a href="#cb40-14" tabindex="-1"></a><span class="co">#> 6 name Huxley Ratcliffe 3</span></span></code></pre></div>
<p>Now that we have a unique identifier for each person, we can pivot
<code>field</code> and <code>value</code> into the columns:</p>
<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1" tabindex="-1"></a>contacts <span class="sc">%>%</span> </span>
<span id="cb41-2"><a href="#cb41-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb41-3"><a href="#cb41-3" tabindex="-1"></a> <span class="at">names_from =</span> field, </span>
<span id="cb41-4"><a href="#cb41-4" tabindex="-1"></a> <span class="at">values_from =</span> value</span>
<span id="cb41-5"><a href="#cb41-5" tabindex="-1"></a> )</span>
<span id="cb41-6"><a href="#cb41-6" tabindex="-1"></a><span class="co">#> # A tibble: 3 × 4</span></span>
<span id="cb41-7"><a href="#cb41-7" tabindex="-1"></a><span class="co">#> person_id name company email </span></span>
<span id="cb41-8"><a href="#cb41-8" tabindex="-1"></a><span class="co">#> <int> <chr> <chr> <chr> </span></span>
<span id="cb41-9"><a href="#cb41-9" tabindex="-1"></a><span class="co">#> 1 1 Jiena McLellan Toyota <NA> </span></span>
<span id="cb41-10"><a href="#cb41-10" tabindex="-1"></a><span class="co">#> 2 2 John Smith google john@google.com</span></span>
<span id="cb41-11"><a href="#cb41-11" tabindex="-1"></a><span class="co">#> 3 3 Huxley Ratcliffe <NA> <NA></span></span></code></pre></div>
</div>
</div>
<div id="longer-then-wider" class="section level2">
<h2>Longer, then wider</h2>
<p>Some problems can’t be solved by pivoting in a single direction. The
examples in this section show how you might combine
<code>pivot_longer()</code> and <code>pivot_wider()</code> to solve more
complex problems.</p>
<div id="world-bank" class="section level3">
<h3>World bank</h3>
<p><code>world_bank_pop</code> contains data from the World Bank about
population per country from 2000 to 2018.</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb42-1"><a href="#cb42-1" tabindex="-1"></a>world_bank_pop</span>
<span id="cb42-2"><a href="#cb42-2" tabindex="-1"></a><span class="co">#> # A tibble: 1,064 × 20</span></span>
<span id="cb42-3"><a href="#cb42-3" tabindex="-1"></a><span class="co">#> country indicator `2000` `2001` `2002` `2003` `2004` `2005` `2006`</span></span>
<span id="cb42-4"><a href="#cb42-4" tabindex="-1"></a><span class="co">#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb42-5"><a href="#cb42-5" tabindex="-1"></a><span class="co">#> 1 ABW SP.URB.TOTL 4.16e4 4.20e+4 4.22e+4 4.23e+4 4.23e+4 4.24e+4 4.26e+4</span></span>
<span id="cb42-6"><a href="#cb42-6" tabindex="-1"></a><span class="co">#> 2 ABW SP.URB.GROW 1.66e0 9.56e-1 4.01e-1 1.97e-1 9.46e-2 1.94e-1 3.67e-1</span></span>
<span id="cb42-7"><a href="#cb42-7" tabindex="-1"></a><span class="co">#> 3 ABW SP.POP.TOTL 8.91e4 9.07e+4 9.18e+4 9.27e+4 9.35e+4 9.45e+4 9.56e+4</span></span>
<span id="cb42-8"><a href="#cb42-8" tabindex="-1"></a><span class="co">#> 4 ABW SP.POP.GROW 2.54e0 1.77e+0 1.19e+0 9.97e-1 9.01e-1 1.00e+0 1.18e+0</span></span>
<span id="cb42-9"><a href="#cb42-9" tabindex="-1"></a><span class="co">#> 5 AFE SP.URB.TOTL 1.16e8 1.20e+8 1.24e+8 1.29e+8 1.34e+8 1.39e+8 1.44e+8</span></span>
<span id="cb42-10"><a href="#cb42-10" tabindex="-1"></a><span class="co">#> 6 AFE SP.URB.GROW 3.60e0 3.66e+0 3.72e+0 3.71e+0 3.74e+0 3.81e+0 3.81e+0</span></span>
<span id="cb42-11"><a href="#cb42-11" tabindex="-1"></a><span class="co">#> 7 AFE SP.POP.TOTL 4.02e8 4.12e+8 4.23e+8 4.34e+8 4.45e+8 4.57e+8 4.70e+8</span></span>
<span id="cb42-12"><a href="#cb42-12" tabindex="-1"></a><span class="co">#> 8 AFE SP.POP.GROW 2.58e0 2.59e+0 2.61e+0 2.62e+0 2.64e+0 2.67e+0 2.70e+0</span></span>
<span id="cb42-13"><a href="#cb42-13" tabindex="-1"></a><span class="co">#> 9 AFG SP.URB.TOTL 4.31e6 4.36e+6 4.67e+6 5.06e+6 5.30e+6 5.54e+6 5.83e+6</span></span>
<span id="cb42-14"><a href="#cb42-14" tabindex="-1"></a><span class="co">#> 10 AFG SP.URB.GROW 1.86e0 1.15e+0 6.86e+0 7.95e+0 4.59e+0 4.47e+0 5.03e+0</span></span>
<span id="cb42-15"><a href="#cb42-15" tabindex="-1"></a><span class="co">#> # ℹ 1,054 more rows</span></span>
<span id="cb42-16"><a href="#cb42-16" tabindex="-1"></a><span class="co">#> # ℹ 11 more variables: `2007` <dbl>, `2008` <dbl>, `2009` <dbl>, `2010` <dbl>,</span></span>
<span id="cb42-17"><a href="#cb42-17" tabindex="-1"></a><span class="co">#> # `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>,</span></span>
<span id="cb42-18"><a href="#cb42-18" tabindex="-1"></a><span class="co">#> # `2016` <dbl>, `2017` <dbl></span></span></code></pre></div>
<p>My goal is to produce a tidy dataset where each variable is in a
column. It’s not obvious exactly what steps are needed yet, but I’ll
start with the most obvious problem: year is spread across multiple
columns.</p>
<div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb43-1"><a href="#cb43-1" tabindex="-1"></a>pop2 <span class="ot"><-</span> world_bank_pop <span class="sc">%>%</span> </span>
<span id="cb43-2"><a href="#cb43-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb43-3"><a href="#cb43-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="st">`</span><span class="at">2000</span><span class="st">`</span><span class="sc">:</span><span class="st">`</span><span class="at">2017</span><span class="st">`</span>, </span>
<span id="cb43-4"><a href="#cb43-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="st">"year"</span>, </span>
<span id="cb43-5"><a href="#cb43-5" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"value"</span></span>
<span id="cb43-6"><a href="#cb43-6" tabindex="-1"></a> )</span>
<span id="cb43-7"><a href="#cb43-7" tabindex="-1"></a>pop2</span>
<span id="cb43-8"><a href="#cb43-8" tabindex="-1"></a><span class="co">#> # A tibble: 19,152 × 4</span></span>
<span id="cb43-9"><a href="#cb43-9" tabindex="-1"></a><span class="co">#> country indicator year value</span></span>
<span id="cb43-10"><a href="#cb43-10" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> <dbl></span></span>
<span id="cb43-11"><a href="#cb43-11" tabindex="-1"></a><span class="co">#> 1 ABW SP.URB.TOTL 2000 41625</span></span>
<span id="cb43-12"><a href="#cb43-12" tabindex="-1"></a><span class="co">#> 2 ABW SP.URB.TOTL 2001 42025</span></span>
<span id="cb43-13"><a href="#cb43-13" tabindex="-1"></a><span class="co">#> 3 ABW SP.URB.TOTL 2002 42194</span></span>
<span id="cb43-14"><a href="#cb43-14" tabindex="-1"></a><span class="co">#> 4 ABW SP.URB.TOTL 2003 42277</span></span>
<span id="cb43-15"><a href="#cb43-15" tabindex="-1"></a><span class="co">#> 5 ABW SP.URB.TOTL 2004 42317</span></span>
<span id="cb43-16"><a href="#cb43-16" tabindex="-1"></a><span class="co">#> 6 ABW SP.URB.TOTL 2005 42399</span></span>
<span id="cb43-17"><a href="#cb43-17" tabindex="-1"></a><span class="co">#> 7 ABW SP.URB.TOTL 2006 42555</span></span>
<span id="cb43-18"><a href="#cb43-18" tabindex="-1"></a><span class="co">#> 8 ABW SP.URB.TOTL 2007 42729</span></span>
<span id="cb43-19"><a href="#cb43-19" tabindex="-1"></a><span class="co">#> 9 ABW SP.URB.TOTL 2008 42906</span></span>
<span id="cb43-20"><a href="#cb43-20" tabindex="-1"></a><span class="co">#> 10 ABW SP.URB.TOTL 2009 43079</span></span>
<span id="cb43-21"><a href="#cb43-21" tabindex="-1"></a><span class="co">#> # ℹ 19,142 more rows</span></span></code></pre></div>
<p>Next we need to consider the <code>indicator</code> variable:</p>
<div class="sourceCode" id="cb44"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb44-1"><a href="#cb44-1" tabindex="-1"></a>pop2 <span class="sc">%>%</span> </span>
<span id="cb44-2"><a href="#cb44-2" tabindex="-1"></a> <span class="fu">count</span>(indicator)</span>
<span id="cb44-3"><a href="#cb44-3" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 2</span></span>
<span id="cb44-4"><a href="#cb44-4" tabindex="-1"></a><span class="co">#> indicator n</span></span>
<span id="cb44-5"><a href="#cb44-5" tabindex="-1"></a><span class="co">#> <chr> <int></span></span>
<span id="cb44-6"><a href="#cb44-6" tabindex="-1"></a><span class="co">#> 1 SP.POP.GROW 4788</span></span>
<span id="cb44-7"><a href="#cb44-7" tabindex="-1"></a><span class="co">#> 2 SP.POP.TOTL 4788</span></span>
<span id="cb44-8"><a href="#cb44-8" tabindex="-1"></a><span class="co">#> 3 SP.URB.GROW 4788</span></span>
<span id="cb44-9"><a href="#cb44-9" tabindex="-1"></a><span class="co">#> 4 SP.URB.TOTL 4788</span></span></code></pre></div>
<p>Here <code>SP.POP.GROW</code> is population growth,
<code>SP.POP.TOTL</code> is total population, and <code>SP.URB.*</code>
are the same but only for urban areas. Let’s split this up into two
variables: <code>area</code> (total or urban) and the actual variable
(population or growth):</p>
<div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1" tabindex="-1"></a>pop3 <span class="ot"><-</span> pop2 <span class="sc">%>%</span> </span>
<span id="cb45-2"><a href="#cb45-2" tabindex="-1"></a> <span class="fu">separate</span>(indicator, <span class="fu">c</span>(<span class="cn">NA</span>, <span class="st">"area"</span>, <span class="st">"variable"</span>))</span>
<span id="cb45-3"><a href="#cb45-3" tabindex="-1"></a>pop3</span>
<span id="cb45-4"><a href="#cb45-4" tabindex="-1"></a><span class="co">#> # A tibble: 19,152 × 5</span></span>
<span id="cb45-5"><a href="#cb45-5" tabindex="-1"></a><span class="co">#> country area variable year value</span></span>
<span id="cb45-6"><a href="#cb45-6" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> <chr> <dbl></span></span>
<span id="cb45-7"><a href="#cb45-7" tabindex="-1"></a><span class="co">#> 1 ABW URB TOTL 2000 41625</span></span>
<span id="cb45-8"><a href="#cb45-8" tabindex="-1"></a><span class="co">#> 2 ABW URB TOTL 2001 42025</span></span>
<span id="cb45-9"><a href="#cb45-9" tabindex="-1"></a><span class="co">#> 3 ABW URB TOTL 2002 42194</span></span>
<span id="cb45-10"><a href="#cb45-10" tabindex="-1"></a><span class="co">#> 4 ABW URB TOTL 2003 42277</span></span>
<span id="cb45-11"><a href="#cb45-11" tabindex="-1"></a><span class="co">#> 5 ABW URB TOTL 2004 42317</span></span>
<span id="cb45-12"><a href="#cb45-12" tabindex="-1"></a><span class="co">#> 6 ABW URB TOTL 2005 42399</span></span>
<span id="cb45-13"><a href="#cb45-13" tabindex="-1"></a><span class="co">#> 7 ABW URB TOTL 2006 42555</span></span>
<span id="cb45-14"><a href="#cb45-14" tabindex="-1"></a><span class="co">#> 8 ABW URB TOTL 2007 42729</span></span>
<span id="cb45-15"><a href="#cb45-15" tabindex="-1"></a><span class="co">#> 9 ABW URB TOTL 2008 42906</span></span>
<span id="cb45-16"><a href="#cb45-16" tabindex="-1"></a><span class="co">#> 10 ABW URB TOTL 2009 43079</span></span>
<span id="cb45-17"><a href="#cb45-17" tabindex="-1"></a><span class="co">#> # ℹ 19,142 more rows</span></span></code></pre></div>
<p>Now we can complete the tidying by pivoting <code>variable</code> and
<code>value</code> to make <code>TOTL</code> and <code>GROW</code>
columns:</p>
<div class="sourceCode" id="cb46"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb46-1"><a href="#cb46-1" tabindex="-1"></a>pop3 <span class="sc">%>%</span> </span>
<span id="cb46-2"><a href="#cb46-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb46-3"><a href="#cb46-3" tabindex="-1"></a> <span class="at">names_from =</span> variable, </span>
<span id="cb46-4"><a href="#cb46-4" tabindex="-1"></a> <span class="at">values_from =</span> value</span>
<span id="cb46-5"><a href="#cb46-5" tabindex="-1"></a> )</span>
<span id="cb46-6"><a href="#cb46-6" tabindex="-1"></a><span class="co">#> # A tibble: 9,576 × 5</span></span>
<span id="cb46-7"><a href="#cb46-7" tabindex="-1"></a><span class="co">#> country area year TOTL GROW</span></span>
<span id="cb46-8"><a href="#cb46-8" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> <dbl> <dbl></span></span>
<span id="cb46-9"><a href="#cb46-9" tabindex="-1"></a><span class="co">#> 1 ABW URB 2000 41625 1.66 </span></span>
<span id="cb46-10"><a href="#cb46-10" tabindex="-1"></a><span class="co">#> 2 ABW URB 2001 42025 0.956 </span></span>
<span id="cb46-11"><a href="#cb46-11" tabindex="-1"></a><span class="co">#> 3 ABW URB 2002 42194 0.401 </span></span>
<span id="cb46-12"><a href="#cb46-12" tabindex="-1"></a><span class="co">#> 4 ABW URB 2003 42277 0.197 </span></span>
<span id="cb46-13"><a href="#cb46-13" tabindex="-1"></a><span class="co">#> 5 ABW URB 2004 42317 0.0946</span></span>
<span id="cb46-14"><a href="#cb46-14" tabindex="-1"></a><span class="co">#> 6 ABW URB 2005 42399 0.194 </span></span>
<span id="cb46-15"><a href="#cb46-15" tabindex="-1"></a><span class="co">#> 7 ABW URB 2006 42555 0.367 </span></span>
<span id="cb46-16"><a href="#cb46-16" tabindex="-1"></a><span class="co">#> 8 ABW URB 2007 42729 0.408 </span></span>
<span id="cb46-17"><a href="#cb46-17" tabindex="-1"></a><span class="co">#> 9 ABW URB 2008 42906 0.413 </span></span>
<span id="cb46-18"><a href="#cb46-18" tabindex="-1"></a><span class="co">#> 10 ABW URB 2009 43079 0.402 </span></span>
<span id="cb46-19"><a href="#cb46-19" tabindex="-1"></a><span class="co">#> # ℹ 9,566 more rows</span></span></code></pre></div>
</div>
<div id="multi-choice" class="section level3">
<h3>Multi-choice</h3>
<p>Based on a suggestion by <a href="https://github.com/MaximeWack">Maxime Wack</a>, <a href="https://github.com/tidyverse/tidyr/issues/384" class="uri">https://github.com/tidyverse/tidyr/issues/384</a>), the
final example shows how to deal with a common way of recording multiple
choice data. Often you will get such data as follows:</p>
<div class="sourceCode" id="cb47"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb47-1"><a href="#cb47-1" tabindex="-1"></a>multi <span class="ot"><-</span> <span class="fu">tribble</span>(</span>
<span id="cb47-2"><a href="#cb47-2" tabindex="-1"></a> <span class="sc">~</span>id, <span class="sc">~</span>choice1, <span class="sc">~</span>choice2, <span class="sc">~</span>choice3,</span>
<span id="cb47-3"><a href="#cb47-3" tabindex="-1"></a> <span class="dv">1</span>, <span class="st">"A"</span>, <span class="st">"B"</span>, <span class="st">"C"</span>,</span>
<span id="cb47-4"><a href="#cb47-4" tabindex="-1"></a> <span class="dv">2</span>, <span class="st">"C"</span>, <span class="st">"B"</span>, <span class="cn">NA</span>,</span>
<span id="cb47-5"><a href="#cb47-5" tabindex="-1"></a> <span class="dv">3</span>, <span class="st">"D"</span>, <span class="cn">NA</span>, <span class="cn">NA</span>,</span>
<span id="cb47-6"><a href="#cb47-6" tabindex="-1"></a> <span class="dv">4</span>, <span class="st">"B"</span>, <span class="st">"D"</span>, <span class="cn">NA</span></span>
<span id="cb47-7"><a href="#cb47-7" tabindex="-1"></a>)</span></code></pre></div>
<p>But the actual order isn’t important, and you’d prefer to have the
individual questions in the columns. You can achieve the desired
transformation in two steps. First, you make the data longer,
eliminating the explicit <code>NA</code>s, and adding a column to
indicate that this choice was chosen:</p>
<div class="sourceCode" id="cb48"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb48-1"><a href="#cb48-1" tabindex="-1"></a>multi2 <span class="ot"><-</span> multi <span class="sc">%>%</span> </span>
<span id="cb48-2"><a href="#cb48-2" tabindex="-1"></a> <span class="fu">pivot_longer</span>(</span>
<span id="cb48-3"><a href="#cb48-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="sc">!</span>id, </span>
<span id="cb48-4"><a href="#cb48-4" tabindex="-1"></a> <span class="at">values_drop_na =</span> <span class="cn">TRUE</span></span>
<span id="cb48-5"><a href="#cb48-5" tabindex="-1"></a> ) <span class="sc">%>%</span> </span>
<span id="cb48-6"><a href="#cb48-6" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">checked =</span> <span class="cn">TRUE</span>)</span>
<span id="cb48-7"><a href="#cb48-7" tabindex="-1"></a>multi2</span>
<span id="cb48-8"><a href="#cb48-8" tabindex="-1"></a><span class="co">#> # A tibble: 8 × 4</span></span>
<span id="cb48-9"><a href="#cb48-9" tabindex="-1"></a><span class="co">#> id name value checked</span></span>
<span id="cb48-10"><a href="#cb48-10" tabindex="-1"></a><span class="co">#> <dbl> <chr> <chr> <lgl> </span></span>
<span id="cb48-11"><a href="#cb48-11" tabindex="-1"></a><span class="co">#> 1 1 choice1 A TRUE </span></span>
<span id="cb48-12"><a href="#cb48-12" tabindex="-1"></a><span class="co">#> 2 1 choice2 B TRUE </span></span>
<span id="cb48-13"><a href="#cb48-13" tabindex="-1"></a><span class="co">#> 3 1 choice3 C TRUE </span></span>
<span id="cb48-14"><a href="#cb48-14" tabindex="-1"></a><span class="co">#> 4 2 choice1 C TRUE </span></span>
<span id="cb48-15"><a href="#cb48-15" tabindex="-1"></a><span class="co">#> 5 2 choice2 B TRUE </span></span>
<span id="cb48-16"><a href="#cb48-16" tabindex="-1"></a><span class="co">#> 6 3 choice1 D TRUE </span></span>
<span id="cb48-17"><a href="#cb48-17" tabindex="-1"></a><span class="co">#> 7 4 choice1 B TRUE </span></span>
<span id="cb48-18"><a href="#cb48-18" tabindex="-1"></a><span class="co">#> 8 4 choice2 D TRUE</span></span></code></pre></div>
<p>Then you make the data wider, filling in the missing observations
with <code>FALSE</code>:</p>
<div class="sourceCode" id="cb49"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb49-1"><a href="#cb49-1" tabindex="-1"></a>multi2 <span class="sc">%>%</span> </span>
<span id="cb49-2"><a href="#cb49-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb49-3"><a href="#cb49-3" tabindex="-1"></a> <span class="at">id_cols =</span> id,</span>
<span id="cb49-4"><a href="#cb49-4" tabindex="-1"></a> <span class="at">names_from =</span> value, </span>
<span id="cb49-5"><a href="#cb49-5" tabindex="-1"></a> <span class="at">values_from =</span> checked, </span>
<span id="cb49-6"><a href="#cb49-6" tabindex="-1"></a> <span class="at">values_fill =</span> <span class="cn">FALSE</span></span>
<span id="cb49-7"><a href="#cb49-7" tabindex="-1"></a> )</span>
<span id="cb49-8"><a href="#cb49-8" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 5</span></span>
<span id="cb49-9"><a href="#cb49-9" tabindex="-1"></a><span class="co">#> id A B C D </span></span>
<span id="cb49-10"><a href="#cb49-10" tabindex="-1"></a><span class="co">#> <dbl> <lgl> <lgl> <lgl> <lgl></span></span>
<span id="cb49-11"><a href="#cb49-11" tabindex="-1"></a><span class="co">#> 1 1 TRUE TRUE TRUE FALSE</span></span>
<span id="cb49-12"><a href="#cb49-12" tabindex="-1"></a><span class="co">#> 2 2 FALSE TRUE TRUE FALSE</span></span>
<span id="cb49-13"><a href="#cb49-13" tabindex="-1"></a><span class="co">#> 3 3 FALSE FALSE FALSE TRUE </span></span>
<span id="cb49-14"><a href="#cb49-14" tabindex="-1"></a><span class="co">#> 4 4 FALSE TRUE FALSE TRUE</span></span></code></pre></div>
</div>
</div>
<div id="manual-specs" class="section level2">
<h2>Manual specs</h2>
<p>The arguments to <code>pivot_longer()</code> and
<code>pivot_wider()</code> allow you to pivot a wide range of datasets.
But the creativity that people apply to their data structures is
seemingly endless, so it’s quite possible that you will encounter a
dataset that you can’t immediately see how to reshape with
<code>pivot_longer()</code> and <code>pivot_wider()</code>. To gain more
control over pivoting, you can instead create a “spec” data frame that
describes exactly how data stored in the column names becomes variables
(and vice versa). This section introduces you to the spec data
structure, and show you how to use it when <code>pivot_longer()</code>
and <code>pivot_wider()</code> are insufficient.</p>
<div id="longer-1" class="section level3">
<h3>Longer</h3>
<p>To see how this works, lets return to the simplest case of pivoting
applied to the <code>relig_income</code> dataset. Now pivoting happens
in two steps: we first create a spec object (using
<code>build_longer_spec()</code>) then use that to describe the pivoting
operation:</p>
<div class="sourceCode" id="cb50"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb50-1"><a href="#cb50-1" tabindex="-1"></a>spec <span class="ot"><-</span> relig_income <span class="sc">%>%</span> </span>
<span id="cb50-2"><a href="#cb50-2" tabindex="-1"></a> <span class="fu">build_longer_spec</span>(</span>
<span id="cb50-3"><a href="#cb50-3" tabindex="-1"></a> <span class="at">cols =</span> <span class="sc">!</span>religion, </span>
<span id="cb50-4"><a href="#cb50-4" tabindex="-1"></a> <span class="at">names_to =</span> <span class="st">"income"</span>,</span>
<span id="cb50-5"><a href="#cb50-5" tabindex="-1"></a> <span class="at">values_to =</span> <span class="st">"count"</span></span>
<span id="cb50-6"><a href="#cb50-6" tabindex="-1"></a> )</span>
<span id="cb50-7"><a href="#cb50-7" tabindex="-1"></a><span class="fu">pivot_longer_spec</span>(relig_income, spec)</span>
<span id="cb50-8"><a href="#cb50-8" tabindex="-1"></a><span class="co">#> # A tibble: 180 × 3</span></span>
<span id="cb50-9"><a href="#cb50-9" tabindex="-1"></a><span class="co">#> religion income count</span></span>
<span id="cb50-10"><a href="#cb50-10" tabindex="-1"></a><span class="co">#> <chr> <chr> <dbl></span></span>
<span id="cb50-11"><a href="#cb50-11" tabindex="-1"></a><span class="co">#> 1 Agnostic <$10k 27</span></span>
<span id="cb50-12"><a href="#cb50-12" tabindex="-1"></a><span class="co">#> 2 Agnostic $10-20k 34</span></span>
<span id="cb50-13"><a href="#cb50-13" tabindex="-1"></a><span class="co">#> 3 Agnostic $20-30k 60</span></span>
<span id="cb50-14"><a href="#cb50-14" tabindex="-1"></a><span class="co">#> 4 Agnostic $30-40k 81</span></span>
<span id="cb50-15"><a href="#cb50-15" tabindex="-1"></a><span class="co">#> 5 Agnostic $40-50k 76</span></span>
<span id="cb50-16"><a href="#cb50-16" tabindex="-1"></a><span class="co">#> 6 Agnostic $50-75k 137</span></span>
<span id="cb50-17"><a href="#cb50-17" tabindex="-1"></a><span class="co">#> 7 Agnostic $75-100k 122</span></span>
<span id="cb50-18"><a href="#cb50-18" tabindex="-1"></a><span class="co">#> 8 Agnostic $100-150k 109</span></span>
<span id="cb50-19"><a href="#cb50-19" tabindex="-1"></a><span class="co">#> 9 Agnostic >150k 84</span></span>
<span id="cb50-20"><a href="#cb50-20" tabindex="-1"></a><span class="co">#> 10 Agnostic Don't know/refused 96</span></span>
<span id="cb50-21"><a href="#cb50-21" tabindex="-1"></a><span class="co">#> # ℹ 170 more rows</span></span></code></pre></div>
<p>(This gives the same result as before, just with more code. There’s
no need to use it here, it is presented as a simple example for using
<code>spec</code>.)</p>
<p>What does <code>spec</code> look like? It’s a data frame with one row
for each column in the wide format version of the data that is not
present in the long format, and two special columns that start with
<code>.</code>:</p>
<ul>
<li><code>.name</code> gives the name of the column.</li>
<li><code>.value</code> gives the name of the column that the values in
the cells will go into.</li>
</ul>
<p>There is also one column in <code>spec</code> for each column present
in the long format of the data that is not present in the wide format of
the data. This corresponds to the <code>names_to</code> argument in
<code>pivot_longer()</code> and <code>build_longer_spec()</code> and the
<code>names_from</code> argument in <code>pivot_wider()</code> and
<code>build_wider_spec()</code>. In this example, the income column is a
character vector of the names of columns being pivoted.</p>
<div class="sourceCode" id="cb51"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb51-1"><a href="#cb51-1" tabindex="-1"></a>spec</span>
<span id="cb51-2"><a href="#cb51-2" tabindex="-1"></a><span class="co">#> # A tibble: 10 × 3</span></span>
<span id="cb51-3"><a href="#cb51-3" tabindex="-1"></a><span class="co">#> .name .value income </span></span>
<span id="cb51-4"><a href="#cb51-4" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> </span></span>
<span id="cb51-5"><a href="#cb51-5" tabindex="-1"></a><span class="co">#> 1 <$10k count <$10k </span></span>
<span id="cb51-6"><a href="#cb51-6" tabindex="-1"></a><span class="co">#> 2 $10-20k count $10-20k </span></span>
<span id="cb51-7"><a href="#cb51-7" tabindex="-1"></a><span class="co">#> 3 $20-30k count $20-30k </span></span>
<span id="cb51-8"><a href="#cb51-8" tabindex="-1"></a><span class="co">#> 4 $30-40k count $30-40k </span></span>
<span id="cb51-9"><a href="#cb51-9" tabindex="-1"></a><span class="co">#> 5 $40-50k count $40-50k </span></span>
<span id="cb51-10"><a href="#cb51-10" tabindex="-1"></a><span class="co">#> 6 $50-75k count $50-75k </span></span>
<span id="cb51-11"><a href="#cb51-11" tabindex="-1"></a><span class="co">#> 7 $75-100k count $75-100k </span></span>
<span id="cb51-12"><a href="#cb51-12" tabindex="-1"></a><span class="co">#> 8 $100-150k count $100-150k </span></span>
<span id="cb51-13"><a href="#cb51-13" tabindex="-1"></a><span class="co">#> 9 >150k count >150k </span></span>
<span id="cb51-14"><a href="#cb51-14" tabindex="-1"></a><span class="co">#> 10 Don't know/refused count Don't know/refused</span></span></code></pre></div>
</div>
<div id="wider-1" class="section level3">
<h3>Wider</h3>
<p>Below we widen <code>us_rent_income</code> with
<code>pivot_wider()</code>. The result is ok, but I think it could be
improved:</p>
<div class="sourceCode" id="cb52"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb52-1"><a href="#cb52-1" tabindex="-1"></a>us_rent_income <span class="sc">%>%</span> </span>
<span id="cb52-2"><a href="#cb52-2" tabindex="-1"></a> <span class="fu">pivot_wider</span>(</span>
<span id="cb52-3"><a href="#cb52-3" tabindex="-1"></a> <span class="at">names_from =</span> variable, </span>
<span id="cb52-4"><a href="#cb52-4" tabindex="-1"></a> <span class="at">values_from =</span> <span class="fu">c</span>(estimate, moe)</span>
<span id="cb52-5"><a href="#cb52-5" tabindex="-1"></a> )</span>
<span id="cb52-6"><a href="#cb52-6" tabindex="-1"></a><span class="co">#> # A tibble: 52 × 6</span></span>
<span id="cb52-7"><a href="#cb52-7" tabindex="-1"></a><span class="co">#> GEOID NAME estimate_income estimate_rent moe_income moe_rent</span></span>
<span id="cb52-8"><a href="#cb52-8" tabindex="-1"></a><span class="co">#> <chr> <chr> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb52-9"><a href="#cb52-9" tabindex="-1"></a><span class="co">#> 1 01 Alabama 24476 747 136 3</span></span>
<span id="cb52-10"><a href="#cb52-10" tabindex="-1"></a><span class="co">#> 2 02 Alaska 32940 1200 508 13</span></span>
<span id="cb52-11"><a href="#cb52-11" tabindex="-1"></a><span class="co">#> 3 04 Arizona 27517 972 148 4</span></span>
<span id="cb52-12"><a href="#cb52-12" tabindex="-1"></a><span class="co">#> 4 05 Arkansas 23789 709 165 5</span></span>
<span id="cb52-13"><a href="#cb52-13" tabindex="-1"></a><span class="co">#> 5 06 California 29454 1358 109 3</span></span>
<span id="cb52-14"><a href="#cb52-14" tabindex="-1"></a><span class="co">#> 6 08 Colorado 32401 1125 109 5</span></span>
<span id="cb52-15"><a href="#cb52-15" tabindex="-1"></a><span class="co">#> 7 09 Connecticut 35326 1123 195 5</span></span>
<span id="cb52-16"><a href="#cb52-16" tabindex="-1"></a><span class="co">#> 8 10 Delaware 31560 1076 247 10</span></span>
<span id="cb52-17"><a href="#cb52-17" tabindex="-1"></a><span class="co">#> 9 11 District of Columbia 43198 1424 681 17</span></span>
<span id="cb52-18"><a href="#cb52-18" tabindex="-1"></a><span class="co">#> 10 12 Florida 25952 1077 70 3</span></span>
<span id="cb52-19"><a href="#cb52-19" tabindex="-1"></a><span class="co">#> # ℹ 42 more rows</span></span></code></pre></div>
<p>I think it would be better to have columns <code>income</code>,
<code>rent</code>, <code>income_moe</code>, and <code>rent_moe</code>,
which we can achieve with a manual spec. The current spec looks like
this:</p>
<div class="sourceCode" id="cb53"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb53-1"><a href="#cb53-1" tabindex="-1"></a>spec1 <span class="ot"><-</span> us_rent_income <span class="sc">%>%</span> </span>
<span id="cb53-2"><a href="#cb53-2" tabindex="-1"></a> <span class="fu">build_wider_spec</span>(</span>
<span id="cb53-3"><a href="#cb53-3" tabindex="-1"></a> <span class="at">names_from =</span> variable, </span>
<span id="cb53-4"><a href="#cb53-4" tabindex="-1"></a> <span class="at">values_from =</span> <span class="fu">c</span>(estimate, moe)</span>
<span id="cb53-5"><a href="#cb53-5" tabindex="-1"></a> )</span>
<span id="cb53-6"><a href="#cb53-6" tabindex="-1"></a>spec1</span>
<span id="cb53-7"><a href="#cb53-7" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 3</span></span>
<span id="cb53-8"><a href="#cb53-8" tabindex="-1"></a><span class="co">#> .name .value variable</span></span>
<span id="cb53-9"><a href="#cb53-9" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> </span></span>
<span id="cb53-10"><a href="#cb53-10" tabindex="-1"></a><span class="co">#> 1 estimate_income estimate income </span></span>
<span id="cb53-11"><a href="#cb53-11" tabindex="-1"></a><span class="co">#> 2 estimate_rent estimate rent </span></span>
<span id="cb53-12"><a href="#cb53-12" tabindex="-1"></a><span class="co">#> 3 moe_income moe income </span></span>
<span id="cb53-13"><a href="#cb53-13" tabindex="-1"></a><span class="co">#> 4 moe_rent moe rent</span></span></code></pre></div>
<p>For this case, we mutate <code>spec</code> to carefully construct the
column names:</p>
<div class="sourceCode" id="cb54"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb54-1"><a href="#cb54-1" tabindex="-1"></a>spec2 <span class="ot"><-</span> spec1 <span class="sc">%>%</span></span>
<span id="cb54-2"><a href="#cb54-2" tabindex="-1"></a> <span class="fu">mutate</span>(</span>
<span id="cb54-3"><a href="#cb54-3" tabindex="-1"></a> <span class="at">.name =</span> <span class="fu">paste0</span>(variable, <span class="fu">ifelse</span>(.value <span class="sc">==</span> <span class="st">"moe"</span>, <span class="st">"_moe"</span>, <span class="st">""</span>))</span>
<span id="cb54-4"><a href="#cb54-4" tabindex="-1"></a> )</span>
<span id="cb54-5"><a href="#cb54-5" tabindex="-1"></a>spec2</span>
<span id="cb54-6"><a href="#cb54-6" tabindex="-1"></a><span class="co">#> # A tibble: 4 × 3</span></span>
<span id="cb54-7"><a href="#cb54-7" tabindex="-1"></a><span class="co">#> .name .value variable</span></span>
<span id="cb54-8"><a href="#cb54-8" tabindex="-1"></a><span class="co">#> <chr> <chr> <chr> </span></span>
<span id="cb54-9"><a href="#cb54-9" tabindex="-1"></a><span class="co">#> 1 income estimate income </span></span>
<span id="cb54-10"><a href="#cb54-10" tabindex="-1"></a><span class="co">#> 2 rent estimate rent </span></span>
<span id="cb54-11"><a href="#cb54-11" tabindex="-1"></a><span class="co">#> 3 income_moe moe income </span></span>
<span id="cb54-12"><a href="#cb54-12" tabindex="-1"></a><span class="co">#> 4 rent_moe moe rent</span></span></code></pre></div>
<p>Supplying this spec to <code>pivot_wider()</code> gives us the result
we’re looking for:</p>
<div class="sourceCode" id="cb55"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb55-1"><a href="#cb55-1" tabindex="-1"></a>us_rent_income <span class="sc">%>%</span> </span>
<span id="cb55-2"><a href="#cb55-2" tabindex="-1"></a> <span class="fu">pivot_wider_spec</span>(spec2)</span>
<span id="cb55-3"><a href="#cb55-3" tabindex="-1"></a><span class="co">#> # A tibble: 52 × 6</span></span>
<span id="cb55-4"><a href="#cb55-4" tabindex="-1"></a><span class="co">#> GEOID NAME income rent income_moe rent_moe</span></span>
<span id="cb55-5"><a href="#cb55-5" tabindex="-1"></a><span class="co">#> <chr> <chr> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb55-6"><a href="#cb55-6" tabindex="-1"></a><span class="co">#> 1 01 Alabama 24476 747 136 3</span></span>
<span id="cb55-7"><a href="#cb55-7" tabindex="-1"></a><span class="co">#> 2 02 Alaska 32940 1200 508 13</span></span>
<span id="cb55-8"><a href="#cb55-8" tabindex="-1"></a><span class="co">#> 3 04 Arizona 27517 972 148 4</span></span>
<span id="cb55-9"><a href="#cb55-9" tabindex="-1"></a><span class="co">#> 4 05 Arkansas 23789 709 165 5</span></span>
<span id="cb55-10"><a href="#cb55-10" tabindex="-1"></a><span class="co">#> 5 06 California 29454 1358 109 3</span></span>
<span id="cb55-11"><a href="#cb55-11" tabindex="-1"></a><span class="co">#> 6 08 Colorado 32401 1125 109 5</span></span>
<span id="cb55-12"><a href="#cb55-12" tabindex="-1"></a><span class="co">#> 7 09 Connecticut 35326 1123 195 5</span></span>
<span id="cb55-13"><a href="#cb55-13" tabindex="-1"></a><span class="co">#> 8 10 Delaware 31560 1076 247 10</span></span>
<span id="cb55-14"><a href="#cb55-14" tabindex="-1"></a><span class="co">#> 9 11 District of Columbia 43198 1424 681 17</span></span>
<span id="cb55-15"><a href="#cb55-15" tabindex="-1"></a><span class="co">#> 10 12 Florida 25952 1077 70 3</span></span>
<span id="cb55-16"><a href="#cb55-16" tabindex="-1"></a><span class="co">#> # ℹ 42 more rows</span></span></code></pre></div>
</div>
<div id="by-hand" class="section level3">
<h3>By hand</h3>
<p>Sometimes it’s not possible (or not convenient) to compute the spec,
and instead it’s more convenient to construct the spec “by hand”. For
example, take this <code>construction</code> data, which is lightly
modified from Table 5 “completions” found at <a href="https://www.census.gov/construction/nrc/index.html" class="uri">https://www.census.gov/construction/nrc/index.html</a>:</p>
<div class="sourceCode" id="cb56"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb56-1"><a href="#cb56-1" tabindex="-1"></a>construction</span>
<span id="cb56-2"><a href="#cb56-2" tabindex="-1"></a><span class="co">#> # A tibble: 9 × 9</span></span>
<span id="cb56-3"><a href="#cb56-3" tabindex="-1"></a><span class="co">#> Year Month `1 unit` `2 to 4 units` `5 units or more` Northeast Midwest South</span></span>
<span id="cb56-4"><a href="#cb56-4" tabindex="-1"></a><span class="co">#> <dbl> <chr> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb56-5"><a href="#cb56-5" tabindex="-1"></a><span class="co">#> 1 2018 Janua… 859 NA 348 114 169 596</span></span>
<span id="cb56-6"><a href="#cb56-6" tabindex="-1"></a><span class="co">#> 2 2018 Febru… 882 NA 400 138 160 655</span></span>
<span id="cb56-7"><a href="#cb56-7" tabindex="-1"></a><span class="co">#> 3 2018 March 862 NA 356 150 154 595</span></span>
<span id="cb56-8"><a href="#cb56-8" tabindex="-1"></a><span class="co">#> 4 2018 April 797 NA 447 144 196 613</span></span>
<span id="cb56-9"><a href="#cb56-9" tabindex="-1"></a><span class="co">#> 5 2018 May 875 NA 364 90 169 673</span></span>
<span id="cb56-10"><a href="#cb56-10" tabindex="-1"></a><span class="co">#> 6 2018 June 867 NA 342 76 170 610</span></span>
<span id="cb56-11"><a href="#cb56-11" tabindex="-1"></a><span class="co">#> 7 2018 July 829 NA 360 108 183 594</span></span>
<span id="cb56-12"><a href="#cb56-12" tabindex="-1"></a><span class="co">#> 8 2018 August 939 NA 286 90 205 649</span></span>
<span id="cb56-13"><a href="#cb56-13" tabindex="-1"></a><span class="co">#> 9 2018 Septe… 835 NA 304 117 175 560</span></span>
<span id="cb56-14"><a href="#cb56-14" tabindex="-1"></a><span class="co">#> # ℹ 1 more variable: West <dbl></span></span></code></pre></div>
<p>This sort of data is not uncommon from government agencies: the
column names actually belong to different variables, and here we have
summaries for number of units (1, 2-4, 5+) and regions of the country
(NE, NW, midwest, S, W). We can most easily describe that with a
tibble:</p>
<div class="sourceCode" id="cb57"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb57-1"><a href="#cb57-1" tabindex="-1"></a>spec <span class="ot"><-</span> <span class="fu">tribble</span>(</span>
<span id="cb57-2"><a href="#cb57-2" tabindex="-1"></a> <span class="sc">~</span>.name, <span class="sc">~</span>.value, <span class="sc">~</span>units, <span class="sc">~</span>region, </span>
<span id="cb57-3"><a href="#cb57-3" tabindex="-1"></a> <span class="st">"1 unit"</span>, <span class="st">"n"</span>, <span class="st">"1"</span>, <span class="cn">NA</span>, </span>
<span id="cb57-4"><a href="#cb57-4" tabindex="-1"></a> <span class="st">"2 to 4 units"</span>, <span class="st">"n"</span>, <span class="st">"2-4"</span>, <span class="cn">NA</span>, </span>
<span id="cb57-5"><a href="#cb57-5" tabindex="-1"></a> <span class="st">"5 units or more"</span>, <span class="st">"n"</span>, <span class="st">"5+"</span>, <span class="cn">NA</span>, </span>
<span id="cb57-6"><a href="#cb57-6" tabindex="-1"></a> <span class="st">"Northeast"</span>, <span class="st">"n"</span>, <span class="cn">NA</span>, <span class="st">"Northeast"</span>, </span>
<span id="cb57-7"><a href="#cb57-7" tabindex="-1"></a> <span class="st">"Midwest"</span>, <span class="st">"n"</span>, <span class="cn">NA</span>, <span class="st">"Midwest"</span>, </span>
<span id="cb57-8"><a href="#cb57-8" tabindex="-1"></a> <span class="st">"South"</span>, <span class="st">"n"</span>, <span class="cn">NA</span>, <span class="st">"South"</span>, </span>
<span id="cb57-9"><a href="#cb57-9" tabindex="-1"></a> <span class="st">"West"</span>, <span class="st">"n"</span>, <span class="cn">NA</span>, <span class="st">"West"</span>, </span>
<span id="cb57-10"><a href="#cb57-10" tabindex="-1"></a>)</span></code></pre></div>
<p>Which yields the following longer form:</p>
<div class="sourceCode" id="cb58"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb58-1"><a href="#cb58-1" tabindex="-1"></a>construction <span class="sc">%>%</span> <span class="fu">pivot_longer_spec</span>(spec)</span>
<span id="cb58-2"><a href="#cb58-2" tabindex="-1"></a><span class="co">#> # A tibble: 63 × 5</span></span>
<span id="cb58-3"><a href="#cb58-3" tabindex="-1"></a><span class="co">#> Year Month units region n</span></span>
<span id="cb58-4"><a href="#cb58-4" tabindex="-1"></a><span class="co">#> <dbl> <chr> <chr> <chr> <dbl></span></span>
<span id="cb58-5"><a href="#cb58-5" tabindex="-1"></a><span class="co">#> 1 2018 January 1 <NA> 859</span></span>
<span id="cb58-6"><a href="#cb58-6" tabindex="-1"></a><span class="co">#> 2 2018 January 2-4 <NA> NA</span></span>
<span id="cb58-7"><a href="#cb58-7" tabindex="-1"></a><span class="co">#> 3 2018 January 5+ <NA> 348</span></span>
<span id="cb58-8"><a href="#cb58-8" tabindex="-1"></a><span class="co">#> 4 2018 January <NA> Northeast 114</span></span>
<span id="cb58-9"><a href="#cb58-9" tabindex="-1"></a><span class="co">#> 5 2018 January <NA> Midwest 169</span></span>
<span id="cb58-10"><a href="#cb58-10" tabindex="-1"></a><span class="co">#> 6 2018 January <NA> South 596</span></span>
<span id="cb58-11"><a href="#cb58-11" tabindex="-1"></a><span class="co">#> 7 2018 January <NA> West 339</span></span>
<span id="cb58-12"><a href="#cb58-12" tabindex="-1"></a><span class="co">#> 8 2018 February 1 <NA> 882</span></span>
<span id="cb58-13"><a href="#cb58-13" tabindex="-1"></a><span class="co">#> 9 2018 February 2-4 <NA> NA</span></span>
<span id="cb58-14"><a href="#cb58-14" tabindex="-1"></a><span class="co">#> 10 2018 February 5+ <NA> 400</span></span>
<span id="cb58-15"><a href="#cb58-15" tabindex="-1"></a><span class="co">#> # ℹ 53 more rows</span></span></code></pre></div>
<p>Note that there is no overlap between the <code>units</code> and
<code>region</code> variables; here the data would really be most
naturally described in two independent tables.</p>
</div>
<div id="theory" class="section level3">
<h3>Theory</h3>
<p>One neat property of the <code>spec</code> is that you need the same
spec for <code>pivot_longer()</code> and <code>pivot_wider()</code>.
This makes it very clear that the two operations are symmetric:</p>
<div class="sourceCode" id="cb59"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb59-1"><a href="#cb59-1" tabindex="-1"></a>construction <span class="sc">%>%</span> </span>
<span id="cb59-2"><a href="#cb59-2" tabindex="-1"></a> <span class="fu">pivot_longer_spec</span>(spec) <span class="sc">%>%</span> </span>
<span id="cb59-3"><a href="#cb59-3" tabindex="-1"></a> <span class="fu">pivot_wider_spec</span>(spec)</span>
<span id="cb59-4"><a href="#cb59-4" tabindex="-1"></a><span class="co">#> # A tibble: 9 × 9</span></span>
<span id="cb59-5"><a href="#cb59-5" tabindex="-1"></a><span class="co">#> Year Month `1 unit` `2 to 4 units` `5 units or more` Northeast Midwest South</span></span>
<span id="cb59-6"><a href="#cb59-6" tabindex="-1"></a><span class="co">#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl></span></span>
<span id="cb59-7"><a href="#cb59-7" tabindex="-1"></a><span class="co">#> 1 2018 Janua… 859 NA 348 114 169 596</span></span>
<span id="cb59-8"><a href="#cb59-8" tabindex="-1"></a><span class="co">#> 2 2018 Febru… 882 NA 400 138 160 655</span></span>
<span id="cb59-9"><a href="#cb59-9" tabindex="-1"></a><span class="co">#> 3 2018 March 862 NA 356 150 154 595</span></span>
<span id="cb59-10"><a href="#cb59-10" tabindex="-1"></a><span class="co">#> 4 2018 April 797 NA 447 144 196 613</span></span>
<span id="cb59-11"><a href="#cb59-11" tabindex="-1"></a><span class="co">#> 5 2018 May 875 NA 364 90 169 673</span></span>
<span id="cb59-12"><a href="#cb59-12" tabindex="-1"></a><span class="co">#> 6 2018 June 867 NA 342 76 170 610</span></span>
<span id="cb59-13"><a href="#cb59-13" tabindex="-1"></a><span class="co">#> 7 2018 July 829 NA 360 108 183 594</span></span>
<span id="cb59-14"><a href="#cb59-14" tabindex="-1"></a><span class="co">#> 8 2018 August 939 NA 286 90 205 649</span></span>
<span id="cb59-15"><a href="#cb59-15" tabindex="-1"></a><span class="co">#> 9 2018 Septe… 835 NA 304 117 175 560</span></span>
<span id="cb59-16"><a href="#cb59-16" tabindex="-1"></a><span class="co">#> # ℹ 1 more variable: West <dbl></span></span></code></pre></div>
<p>The pivoting spec allows us to be more precise about exactly how
<code>pivot_longer(df, spec = spec)</code> changes the shape of
<code>df</code>: it will have <code>nrow(df) * nrow(spec)</code> rows,
and <code>ncol(df) - nrow(spec) + ncol(spec) - 2</code> columns.</p>
</div>
</div>
<!-- code folding -->
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>
|