1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251
|
// SPDX-License-Identifier: GPL-2.0-only
/*
* Stand-alone page-table allocator for hyp stage-1 and guest stage-2.
* No bombay mix was harmed in the writing of this file.
*
* Copyright (C) 2020 Google LLC
* Author: Will Deacon <will@kernel.org>
*/
#include <linux/bitfield.h>
#include <asm/kvm_pgtable.h>
#include <asm/stage2_pgtable.h>
#define KVM_PTE_TYPE BIT(1)
#define KVM_PTE_TYPE_BLOCK 0
#define KVM_PTE_TYPE_PAGE 1
#define KVM_PTE_TYPE_TABLE 1
#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)
#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
#define KVM_PTE_LEAF_ATTR_LO_S1_AP GENMASK(7, 6)
#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO 3
#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW 1
#define KVM_PTE_LEAF_ATTR_LO_S1_SH GENMASK(9, 8)
#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS 3
#define KVM_PTE_LEAF_ATTR_LO_S1_AF BIT(10)
#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR GENMASK(5, 2)
#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R BIT(6)
#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W BIT(7)
#define KVM_PTE_LEAF_ATTR_LO_S2_SH GENMASK(9, 8)
#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS 3
#define KVM_PTE_LEAF_ATTR_LO_S2_AF BIT(10)
#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 51)
#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54)
#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
#define KVM_MAX_OWNER_ID 1
struct kvm_pgtable_walk_data {
struct kvm_pgtable *pgt;
struct kvm_pgtable_walker *walker;
u64 addr;
u64 end;
};
#define KVM_PHYS_INVALID (-1ULL)
static bool kvm_phys_is_valid(u64 phys)
{
return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
}
static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
{
u64 granule = kvm_granule_size(level);
if (!kvm_level_supports_block_mapping(level))
return false;
if (granule > (end - addr))
return false;
if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
return false;
return IS_ALIGNED(addr, granule);
}
static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
{
u64 shift = kvm_granule_shift(level);
u64 mask = BIT(PAGE_SHIFT - 3) - 1;
return (data->addr >> shift) & mask;
}
static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
{
u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
u64 mask = BIT(pgt->ia_bits) - 1;
return (addr & mask) >> shift;
}
static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
{
return __kvm_pgd_page_idx(data->pgt, data->addr);
}
static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
{
struct kvm_pgtable pgt = {
.ia_bits = ia_bits,
.start_level = start_level,
};
return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
}
static bool kvm_pte_table(kvm_pte_t pte, u32 level)
{
if (level == KVM_PGTABLE_MAX_LEVELS - 1)
return false;
if (!kvm_pte_valid(pte))
return false;
return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
}
static kvm_pte_t kvm_phys_to_pte(u64 pa)
{
kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
if (PAGE_SHIFT == 16)
pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
return pte;
}
static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
{
return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
}
static void kvm_clear_pte(kvm_pte_t *ptep)
{
WRITE_ONCE(*ptep, 0);
}
static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
struct kvm_pgtable_mm_ops *mm_ops)
{
kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
pte |= KVM_PTE_VALID;
WARN_ON(kvm_pte_valid(old));
smp_store_release(ptep, pte);
}
static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
{
kvm_pte_t pte = kvm_phys_to_pte(pa);
u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
KVM_PTE_TYPE_BLOCK;
pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
pte |= FIELD_PREP(KVM_PTE_TYPE, type);
pte |= KVM_PTE_VALID;
return pte;
}
static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
{
return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
}
static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag)
{
struct kvm_pgtable_walker *walker = data->walker;
return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
}
static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
kvm_pte_t *pgtable, u32 level);
static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
kvm_pte_t *ptep, u32 level)
{
int ret = 0;
u64 addr = data->addr;
kvm_pte_t *childp, pte = *ptep;
bool table = kvm_pte_table(pte, level);
enum kvm_pgtable_walk_flags flags = data->walker->flags;
if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
KVM_PGTABLE_WALK_TABLE_PRE);
}
if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
KVM_PGTABLE_WALK_LEAF);
pte = *ptep;
table = kvm_pte_table(pte, level);
}
if (ret)
goto out;
if (!table) {
data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
data->addr += kvm_granule_size(level);
goto out;
}
childp = kvm_pte_follow(pte, data->pgt->mm_ops);
ret = __kvm_pgtable_walk(data, childp, level + 1);
if (ret)
goto out;
if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
KVM_PGTABLE_WALK_TABLE_POST);
}
out:
return ret;
}
static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
kvm_pte_t *pgtable, u32 level)
{
u32 idx;
int ret = 0;
if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
return -EINVAL;
for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
kvm_pte_t *ptep = &pgtable[idx];
if (data->addr >= data->end)
break;
ret = __kvm_pgtable_visit(data, ptep, level);
if (ret)
break;
}
return ret;
}
static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
{
u32 idx;
int ret = 0;
struct kvm_pgtable *pgt = data->pgt;
u64 limit = BIT(pgt->ia_bits);
if (data->addr > limit || data->end > limit)
return -ERANGE;
if (!pgt->pgd)
return -EINVAL;
for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
if (ret)
break;
}
return ret;
}
int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_pgtable_walker *walker)
{
struct kvm_pgtable_walk_data walk_data = {
.pgt = pgt,
.addr = ALIGN_DOWN(addr, PAGE_SIZE),
.end = PAGE_ALIGN(walk_data.addr + size),
.walker = walker,
};
return _kvm_pgtable_walk(&walk_data);
}
struct leaf_walk_data {
kvm_pte_t pte;
u32 level;
};
static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
struct leaf_walk_data *data = arg;
data->pte = *ptep;
data->level = level;
return 0;
}
int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
kvm_pte_t *ptep, u32 *level)
{
struct leaf_walk_data data;
struct kvm_pgtable_walker walker = {
.cb = leaf_walker,
.flags = KVM_PGTABLE_WALK_LEAF,
.arg = &data,
};
int ret;
ret = kvm_pgtable_walk(pgt, ALIGN_DOWN(addr, PAGE_SIZE),
PAGE_SIZE, &walker);
if (!ret) {
if (ptep)
*ptep = data.pte;
if (level)
*level = data.level;
}
return ret;
}
struct hyp_map_data {
u64 phys;
kvm_pte_t attr;
struct kvm_pgtable_mm_ops *mm_ops;
};
static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
{
bool device = prot & KVM_PGTABLE_PROT_DEVICE;
u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
kvm_pte_t attr = FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX, mtype);
u32 sh = KVM_PTE_LEAF_ATTR_LO_S1_SH_IS;
u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
if (!(prot & KVM_PGTABLE_PROT_R))
return -EINVAL;
if (prot & KVM_PGTABLE_PROT_X) {
if (prot & KVM_PGTABLE_PROT_W)
return -EINVAL;
if (device)
return -EINVAL;
} else {
attr |= KVM_PTE_LEAF_ATTR_HI_S1_XN;
}
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
*ptep = attr;
return 0;
}
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
{
enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
u32 ap;
if (!kvm_pte_valid(pte))
return prot;
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
prot |= KVM_PGTABLE_PROT_X;
ap = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_AP, pte);
if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RO)
prot |= KVM_PGTABLE_PROT_R;
else if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RW)
prot |= KVM_PGTABLE_PROT_RW;
return prot;
}
static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep, struct hyp_map_data *data)
{
kvm_pte_t new, old = *ptep;
u64 granule = kvm_granule_size(level), phys = data->phys;
if (!kvm_block_mapping_supported(addr, end, phys, level))
return false;
data->phys += granule;
new = kvm_init_valid_leaf_pte(phys, data->attr, level);
if (old == new)
return true;
if (!kvm_pte_valid(old))
data->mm_ops->get_page(ptep);
else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
return false;
smp_store_release(ptep, new);
return true;
}
static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
kvm_pte_t *childp;
struct hyp_map_data *data = arg;
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
return 0;
if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;
childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
if (!childp)
return -ENOMEM;
kvm_set_table_pte(ptep, childp, mm_ops);
mm_ops->get_page(ptep);
return 0;
}
int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
enum kvm_pgtable_prot prot)
{
int ret;
struct hyp_map_data map_data = {
.phys = ALIGN_DOWN(phys, PAGE_SIZE),
.mm_ops = pgt->mm_ops,
};
struct kvm_pgtable_walker walker = {
.cb = hyp_map_walker,
.flags = KVM_PGTABLE_WALK_LEAF,
.arg = &map_data,
};
ret = hyp_set_prot_attr(prot, &map_data.attr);
if (ret)
return ret;
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
dsb(ishst);
isb();
return ret;
}
struct hyp_unmap_data {
u64 unmapped;
struct kvm_pgtable_mm_ops *mm_ops;
};
static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
kvm_pte_t pte = *ptep, *childp = NULL;
u64 granule = kvm_granule_size(level);
struct hyp_unmap_data *data = arg;
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
if (!kvm_pte_valid(pte))
return -EINVAL;
if (kvm_pte_table(pte, level)) {
childp = kvm_pte_follow(pte, mm_ops);
if (mm_ops->page_count(childp) != 1)
return 0;
kvm_clear_pte(ptep);
dsb(ishst);
__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), 0);
} else {
if (end - addr < granule)
return -EINVAL;
kvm_clear_pte(ptep);
dsb(ishst);
__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
data->unmapped += granule;
}
dsb(ish);
isb();
mm_ops->put_page(ptep);
if (childp)
mm_ops->put_page(childp);
return 0;
}
u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
struct hyp_unmap_data unmap_data = {
.mm_ops = pgt->mm_ops,
};
struct kvm_pgtable_walker walker = {
.cb = hyp_unmap_walker,
.arg = &unmap_data,
.flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
};
if (!pgt->mm_ops->page_count)
return 0;
kvm_pgtable_walk(pgt, addr, size, &walker);
return unmap_data.unmapped;
}
int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
struct kvm_pgtable_mm_ops *mm_ops)
{
u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
if (!pgt->pgd)
return -ENOMEM;
pgt->ia_bits = va_bits;
pgt->start_level = KVM_PGTABLE_MAX_LEVELS - levels;
pgt->mm_ops = mm_ops;
pgt->mmu = NULL;
pgt->force_pte_cb = NULL;
return 0;
}
static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
struct kvm_pgtable_mm_ops *mm_ops = arg;
kvm_pte_t pte = *ptep;
if (!kvm_pte_valid(pte))
return 0;
mm_ops->put_page(ptep);
if (kvm_pte_table(pte, level))
mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
return 0;
}
void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
{
struct kvm_pgtable_walker walker = {
.cb = hyp_free_walker,
.flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
.arg = pgt->mm_ops,
};
WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
pgt->mm_ops->put_page(pgt->pgd);
pgt->pgd = NULL;
}
struct stage2_map_data {
u64 phys;
kvm_pte_t attr;
u8 owner_id;
kvm_pte_t *anchor;
kvm_pte_t *childp;
struct kvm_s2_mmu *mmu;
void *memcache;
struct kvm_pgtable_mm_ops *mm_ops;
/* Force mappings to page granularity */
bool force_pte;
};
u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
{
u64 vtcr = VTCR_EL2_FLAGS;
u8 lvls;
vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
vtcr |= VTCR_EL2_T0SZ(phys_shift);
/*
* Use a minimum 2 level page table to prevent splitting
* host PMD huge pages at stage2.
*/
lvls = stage2_pgtable_levels(phys_shift);
if (lvls < 2)
lvls = 2;
vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
#ifdef CONFIG_ARM64_HW_AFDBM
/*
* Enable the Hardware Access Flag management, unconditionally
* on all CPUs. In systems that have asymmetric support for the feature
* this allows KVM to leverage hardware support on the subset of cores
* that implement the feature.
*
* The architecture requires VTCR_EL2.HA to be RES0 (thus ignored by
* hardware) on implementations that do not advertise support for the
* feature. As such, setting HA unconditionally is safe, unless you
* happen to be running on a design that has unadvertised support for
* HAFDBS. Here be dragons.
*/
if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38))
vtcr |= VTCR_EL2_HA;
#endif /* CONFIG_ARM64_HW_AFDBM */
/* Set the vmid bits */
vtcr |= (get_vmid_bits(mmfr1) == 16) ?
VTCR_EL2_VS_16BIT :
VTCR_EL2_VS_8BIT;
return vtcr;
}
static bool stage2_has_fwb(struct kvm_pgtable *pgt)
{
if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
return false;
return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
}
#define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
kvm_pte_t *ptep)
{
bool device = prot & KVM_PGTABLE_PROT_DEVICE;
kvm_pte_t attr = device ? KVM_S2_MEMATTR(pgt, DEVICE_nGnRE) :
KVM_S2_MEMATTR(pgt, NORMAL);
u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
if (!(prot & KVM_PGTABLE_PROT_X))
attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
else if (device)
return -EINVAL;
if (prot & KVM_PGTABLE_PROT_R)
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
if (prot & KVM_PGTABLE_PROT_W)
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
*ptep = attr;
return 0;
}
enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte)
{
enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
if (!kvm_pte_valid(pte))
return prot;
if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R)
prot |= KVM_PGTABLE_PROT_R;
if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W)
prot |= KVM_PGTABLE_PROT_W;
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN))
prot |= KVM_PGTABLE_PROT_X;
return prot;
}
static bool stage2_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
{
if (!kvm_pte_valid(old) || !kvm_pte_valid(new))
return true;
return ((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS));
}
static bool stage2_pte_is_counted(kvm_pte_t pte)
{
/*
* The refcount tracks valid entries as well as invalid entries if they
* encode ownership of a page to another entity than the page-table
* owner, whose id is 0.
*/
return !!pte;
}
static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
u32 level, struct kvm_pgtable_mm_ops *mm_ops)
{
/*
* Clear the existing PTE, and perform break-before-make with
* TLB maintenance if it was valid.
*/
kvm_pte_t pte = *ptep;
if (kvm_pte_valid(pte)) {
kvm_clear_pte(ptep);
if (kvm_pte_table(pte, level))
level = 0;
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
}
mm_ops->put_page(ptep);
}
static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
{
u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
}
static bool stage2_pte_executable(kvm_pte_t pte)
{
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}
static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
struct stage2_map_data *data)
{
if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
return false;
return kvm_block_mapping_supported(addr, end, data->phys, level);
}
static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
kvm_pte_t new, old = *ptep;
u64 granule = kvm_granule_size(level), phys = data->phys;
struct kvm_pgtable *pgt = data->mmu->pgt;
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
if (!stage2_leaf_mapping_allowed(addr, end, level, data))
return -E2BIG;
if (kvm_phys_is_valid(phys))
new = kvm_init_valid_leaf_pte(phys, data->attr, level);
else
new = kvm_init_invalid_leaf_owner(data->owner_id);
if (stage2_pte_is_counted(old)) {
/*
* Skip updating the PTE if we are trying to recreate the exact
* same mapping or only change the access permissions. Instead,
* the vCPU will exit one more time from guest if still needed
* and then go through the path of relaxing permissions.
*/
if (!stage2_pte_needs_update(old, new))
return -EAGAIN;
stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
}
/* Perform CMOs before installation of the guest stage-2 PTE */
if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
granule);
if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
smp_store_release(ptep, new);
if (stage2_pte_is_counted(new))
mm_ops->get_page(ptep);
if (kvm_phys_is_valid(phys))
data->phys += granule;
return 0;
}
static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
if (data->anchor)
return 0;
if (!stage2_leaf_mapping_allowed(addr, end, level, data))
return 0;
data->childp = kvm_pte_follow(*ptep, data->mm_ops);
kvm_clear_pte(ptep);
/*
* Invalidate the whole stage-2, as we may have numerous leaf
* entries below us which would otherwise need invalidating
* individually.
*/
kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
data->anchor = ptep;
return 0;
}
static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
struct stage2_map_data *data)
{
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
kvm_pte_t *childp, pte = *ptep;
int ret;
if (data->anchor) {
if (stage2_pte_is_counted(pte))
mm_ops->put_page(ptep);
return 0;
}
ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
if (ret != -E2BIG)
return ret;
if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;
if (!data->memcache)
return -ENOMEM;
childp = mm_ops->zalloc_page(data->memcache);
if (!childp)
return -ENOMEM;
/*
* If we've run into an existing block mapping then replace it with
* a table. Accesses beyond 'end' that fall within the new table
* will be mapped lazily.
*/
if (stage2_pte_is_counted(pte))
stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
kvm_set_table_pte(ptep, childp, mm_ops);
mm_ops->get_page(ptep);
return 0;
}
static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
kvm_pte_t *childp;
int ret = 0;
if (!data->anchor)
return 0;
if (data->anchor == ptep) {
childp = data->childp;
data->anchor = NULL;
data->childp = NULL;
ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
} else {
childp = kvm_pte_follow(*ptep, mm_ops);
}
mm_ops->put_page(childp);
mm_ops->put_page(ptep);
return ret;
}
/*
* This is a little fiddly, as we use all three of the walk flags. The idea
* is that the TABLE_PRE callback runs for table entries on the way down,
* looking for table entries which we could conceivably replace with a
* block entry for this mapping. If it finds one, then it sets the 'anchor'
* field in 'struct stage2_map_data' to point at the table entry, before
* clearing the entry to zero and descending into the now detached table.
*
* The behaviour of the LEAF callback then depends on whether or not the
* anchor has been set. If not, then we're not using a block mapping higher
* up the table and we perform the mapping at the existing leaves instead.
* If, on the other hand, the anchor _is_ set, then we drop references to
* all valid leaves so that the pages beneath the anchor can be freed.
*
* Finally, the TABLE_POST callback does nothing if the anchor has not
* been set, but otherwise frees the page-table pages while walking back up
* the page-table, installing the block entry when it revisits the anchor
* pointer and clearing the anchor to NULL.
*/
static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag, void * const arg)
{
struct stage2_map_data *data = arg;
switch (flag) {
case KVM_PGTABLE_WALK_TABLE_PRE:
return stage2_map_walk_table_pre(addr, end, level, ptep, data);
case KVM_PGTABLE_WALK_LEAF:
return stage2_map_walk_leaf(addr, end, level, ptep, data);
case KVM_PGTABLE_WALK_TABLE_POST:
return stage2_map_walk_table_post(addr, end, level, ptep, data);
}
return -EINVAL;
}
int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
u64 phys, enum kvm_pgtable_prot prot,
void *mc)
{
int ret;
struct stage2_map_data map_data = {
.phys = ALIGN_DOWN(phys, PAGE_SIZE),
.mmu = pgt->mmu,
.memcache = mc,
.mm_ops = pgt->mm_ops,
.force_pte = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
.flags = KVM_PGTABLE_WALK_TABLE_PRE |
KVM_PGTABLE_WALK_LEAF |
KVM_PGTABLE_WALK_TABLE_POST,
.arg = &map_data,
};
if (WARN_ON((pgt->flags & KVM_PGTABLE_S2_IDMAP) && (addr != phys)))
return -EINVAL;
ret = stage2_set_prot_attr(pgt, prot, &map_data.attr);
if (ret)
return ret;
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
dsb(ishst);
return ret;
}
int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
void *mc, u8 owner_id)
{
int ret;
struct stage2_map_data map_data = {
.phys = KVM_PHYS_INVALID,
.mmu = pgt->mmu,
.memcache = mc,
.mm_ops = pgt->mm_ops,
.owner_id = owner_id,
.force_pte = true,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
.flags = KVM_PGTABLE_WALK_TABLE_PRE |
KVM_PGTABLE_WALK_LEAF |
KVM_PGTABLE_WALK_TABLE_POST,
.arg = &map_data,
};
if (owner_id > KVM_MAX_OWNER_ID)
return -EINVAL;
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
return ret;
}
static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
struct kvm_pgtable *pgt = arg;
struct kvm_s2_mmu *mmu = pgt->mmu;
struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
kvm_pte_t pte = *ptep, *childp = NULL;
bool need_flush = false;
if (!kvm_pte_valid(pte)) {
if (stage2_pte_is_counted(pte)) {
kvm_clear_pte(ptep);
mm_ops->put_page(ptep);
}
return 0;
}
if (kvm_pte_table(pte, level)) {
childp = kvm_pte_follow(pte, mm_ops);
if (mm_ops->page_count(childp) != 1)
return 0;
} else if (stage2_pte_cacheable(pgt, pte)) {
need_flush = !stage2_has_fwb(pgt);
}
/*
* This is similar to the map() path in that we unmap the entire
* block entry and rely on the remaining portions being faulted
* back lazily.
*/
stage2_put_pte(ptep, mmu, addr, level, mm_ops);
if (need_flush && mm_ops->dcache_clean_inval_poc)
mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
kvm_granule_size(level));
if (childp)
mm_ops->put_page(childp);
return 0;
}
int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
struct kvm_pgtable_walker walker = {
.cb = stage2_unmap_walker,
.arg = pgt,
.flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
};
return kvm_pgtable_walk(pgt, addr, size, &walker);
}
struct stage2_attr_data {
kvm_pte_t attr_set;
kvm_pte_t attr_clr;
kvm_pte_t pte;
u32 level;
struct kvm_pgtable_mm_ops *mm_ops;
};
static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
kvm_pte_t pte = *ptep;
struct stage2_attr_data *data = arg;
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
if (!kvm_pte_valid(pte))
return 0;
data->level = level;
data->pte = pte;
pte &= ~data->attr_clr;
pte |= data->attr_set;
/*
* We may race with the CPU trying to set the access flag here,
* but worst-case the access flag update gets lost and will be
* set on the next access instead.
*/
if (data->pte != pte) {
/*
* Invalidate instruction cache before updating the guest
* stage-2 PTE if we are going to add executable permission.
*/
if (mm_ops->icache_inval_pou &&
stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
kvm_granule_size(level));
WRITE_ONCE(*ptep, pte);
}
return 0;
}
static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
u64 size, kvm_pte_t attr_set,
kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
u32 *level)
{
int ret;
kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
struct stage2_attr_data data = {
.attr_set = attr_set & attr_mask,
.attr_clr = attr_clr & attr_mask,
.mm_ops = pgt->mm_ops,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_attr_walker,
.arg = &data,
.flags = KVM_PGTABLE_WALK_LEAF,
};
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
if (ret)
return ret;
if (orig_pte)
*orig_pte = data.pte;
if (level)
*level = data.level;
return 0;
}
int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
return stage2_update_leaf_attrs(pgt, addr, size, 0,
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
NULL, NULL);
}
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
{
kvm_pte_t pte = 0;
stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
&pte, NULL);
dsb(ishst);
return pte;
}
kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
{
kvm_pte_t pte = 0;
stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
&pte, NULL);
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
return pte;
}
bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
{
kvm_pte_t pte = 0;
stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot)
{
int ret;
u32 level;
kvm_pte_t set = 0, clr = 0;
if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
return -EINVAL;
if (prot & KVM_PGTABLE_PROT_R)
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
if (prot & KVM_PGTABLE_PROT_W)
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
if (prot & KVM_PGTABLE_PROT_X)
clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
if (!ret)
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
return ret;
}
static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
struct kvm_pgtable *pgt = arg;
struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
kvm_pte_t pte = *ptep;
if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
return 0;
if (mm_ops->dcache_clean_inval_poc)
mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
kvm_granule_size(level));
return 0;
}
int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
struct kvm_pgtable_walker walker = {
.cb = stage2_flush_walker,
.flags = KVM_PGTABLE_WALK_LEAF,
.arg = pgt,
};
if (stage2_has_fwb(pgt))
return 0;
return kvm_pgtable_walk(pgt, addr, size, &walker);
}
int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
struct kvm_pgtable_mm_ops *mm_ops,
enum kvm_pgtable_stage2_flags flags,
kvm_pgtable_force_pte_cb_t force_pte_cb)
{
size_t pgd_sz;
u64 vtcr = mmu->arch->vtcr;
u32 ia_bits = VTCR_EL2_IPA(vtcr);
u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
if (!pgt->pgd)
return -ENOMEM;
pgt->ia_bits = ia_bits;
pgt->start_level = start_level;
pgt->mm_ops = mm_ops;
pgt->mmu = mmu;
pgt->flags = flags;
pgt->force_pte_cb = force_pte_cb;
/* Ensure zeroed PGD pages are visible to the hardware walker */
dsb(ishst);
return 0;
}
static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
{
struct kvm_pgtable_mm_ops *mm_ops = arg;
kvm_pte_t pte = *ptep;
if (!stage2_pte_is_counted(pte))
return 0;
mm_ops->put_page(ptep);
if (kvm_pte_table(pte, level))
mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
return 0;
}
void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
{
size_t pgd_sz;
struct kvm_pgtable_walker walker = {
.cb = stage2_free_walker,
.flags = KVM_PGTABLE_WALK_LEAF |
KVM_PGTABLE_WALK_TABLE_POST,
.arg = pgt->mm_ops,
};
WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
pgt->pgd = NULL;
}
|