From 72a671ced66db6d1c2bfff1c930a101ac8d08204 Mon Sep 17 00:00:00 2001 From: Suresh Siddha Date: Tue, 24 Jul 2012 16:05:29 -0700 Subject: x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied to/from the fpstate in the task struct. And in the case of signal delivery for x86_64 binaries, if the fpstate is live in the CPU registers, then the live state is copied directly to the user sigframe. Otherwise fpstate in the task struct is copied to the user sigframe. During restore, fpstate in the user sigframe is restored directly to the live CPU registers. Historically, different code paths led to different bugs. For example, x86_64 code path was not preemption safe till recently. Also there is lot of code duplication for support of new features like xsave etc. Unify signal handling code paths for x86 and x86_64 kernels. New strategy is as follows: Signal delivery: Both for 32/64-bit frames, align the core math frame area to 64bytes as needed by xsave (this where the main fpu/extended state gets copied to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave frames). If the state is live, copy the register state directly to the user frame. If not live, copy the state in the thread struct to the user frame. And for 32-bit [f]xsave frames, construct the fsave header separately before the actual [f]xsave area. Signal return: As the 32-bit frames with [f]xstate has an additional 'fsave' header, copy everything back from the user sigframe to the fpstate in the task structure and reconstruct the fxstate from the 'fsave' header (Also user passed pointers may not be correctly aligned for any attempt to directly restore any partial state). At the next fpstate usage, everything will be restored to the live CPU registers. For all the 64-bit frames and the 32-bit fsave frame, restore the state from the user sigframe directly to the live CPU registers. 64-bit signals always restored the math frame directly, so we can expect the math frame pointer to be correctly aligned. For 32-bit fsave frames, there are no alignment requirements, so we can restore the state directly. "lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are with in the noise range with this change. Signed-off-by: Suresh Siddha Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com [ Merged in compilation fix ] Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin --- arch/x86/kernel/process.c | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'arch/x86/kernel/process.c') diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ef6a8456f719..30069d1a6a4d 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -97,16 +97,6 @@ void arch_task_cache_init(void) SLAB_PANIC | SLAB_NOTRACK, NULL); } -static inline void drop_fpu(struct task_struct *tsk) -{ - /* - * Forget coprocessor state.. - */ - tsk->fpu_counter = 0; - clear_fpu(tsk); - clear_used_math(); -} - /* * Free current thread data structures etc.. */ -- cgit v1.2.3 From 304bceda6a18ae0b0240b8aac9a6bdf8ce2d2469 Mon Sep 17 00:00:00 2001 From: Suresh Siddha Date: Fri, 24 Aug 2012 14:13:02 -0700 Subject: x86, fpu: use non-lazy fpu restore for processors supporting xsave Fundamental model of the current Linux kernel is to lazily init and restore FPU instead of restoring the task state during context switch. This changes that fundamental lazy model to the non-lazy model for the processors supporting xsave feature. Reasons driving this model change are: i. Newer processors support optimized state save/restore using xsaveopt and xrstor by tracking the INIT state and MODIFIED state during context-switch. This is faster than modifying the cr0.TS bit which has serializing semantics. ii. Newer glibc versions use SSE for some of the optimized copy/clear routines. With certain workloads (like boot, kernel-compilation etc), application completes its work with in the first 5 task switches, thus taking upto 5 #DNA traps with the kernel not getting a chance to apply the above mentioned pre-load heuristic. iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit and thus will not work correctly in the presence of lazy restore. Non-lazy state restore is needed for enabling such features. Some data on a two socket SNB system: * Saved 20K DNA exceptions during boot on a two socket SNB system. * Saved 50K DNA exceptions during kernel-compilation workload. * Improved throughput of the AVX based checksumming function inside the kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts pair. Also now kernel_fpu_begin/end() relies on the patched alternative instructions. So move check_fpu() which uses the kernel_fpu_begin/end() after alternative_instructions(). Signed-off-by: Suresh Siddha Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com Merge 32-bit boot fix from, Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com Cc: Jim Kukunas Cc: NeilBrown Cc: Avi Kivity Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/fpu-internal.h | 96 ++++++++++++++++++++++++------------- arch/x86/include/asm/i387.h | 1 + arch/x86/include/asm/xsave.h | 1 + arch/x86/kernel/cpu/bugs.c | 7 ++- arch/x86/kernel/i387.c | 20 ++++++-- arch/x86/kernel/process.c | 12 +++-- arch/x86/kernel/process_32.c | 4 -- arch/x86/kernel/process_64.c | 4 -- arch/x86/kernel/traps.c | 5 +- arch/x86/kernel/xsave.c | 57 +++++++++++++++++----- 10 files changed, 146 insertions(+), 61 deletions(-) (limited to 'arch/x86/kernel/process.c') diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index 52202a6b12aa..8ca0f9f45ac4 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -291,15 +291,48 @@ static inline void __thread_set_has_fpu(struct task_struct *tsk) static inline void __thread_fpu_end(struct task_struct *tsk) { __thread_clear_has_fpu(tsk); - stts(); + if (!use_xsave()) + stts(); } static inline void __thread_fpu_begin(struct task_struct *tsk) { - clts(); + if (!use_xsave()) + clts(); __thread_set_has_fpu(tsk); } +static inline void __drop_fpu(struct task_struct *tsk) +{ + if (__thread_has_fpu(tsk)) { + /* Ignore delayed exceptions from user space */ + asm volatile("1: fwait\n" + "2:\n" + _ASM_EXTABLE(1b, 2b)); + __thread_fpu_end(tsk); + } +} + +static inline void drop_fpu(struct task_struct *tsk) +{ + /* + * Forget coprocessor state.. + */ + preempt_disable(); + tsk->fpu_counter = 0; + __drop_fpu(tsk); + clear_used_math(); + preempt_enable(); +} + +static inline void drop_init_fpu(struct task_struct *tsk) +{ + if (!use_xsave()) + drop_fpu(tsk); + else + xrstor_state(init_xstate_buf, -1); +} + /* * FPU state switching for scheduling. * @@ -333,7 +366,12 @@ static inline fpu_switch_t switch_fpu_prepare(struct task_struct *old, struct ta { fpu_switch_t fpu; - fpu.preload = tsk_used_math(new) && new->fpu_counter > 5; + /* + * If the task has used the math, pre-load the FPU on xsave processors + * or if the past 5 consecutive context-switches used math. + */ + fpu.preload = tsk_used_math(new) && (use_xsave() || + new->fpu_counter > 5); if (__thread_has_fpu(old)) { if (!__save_init_fpu(old)) cpu = ~0; @@ -345,14 +383,14 @@ static inline fpu_switch_t switch_fpu_prepare(struct task_struct *old, struct ta new->fpu_counter++; __thread_set_has_fpu(new); prefetch(new->thread.fpu.state); - } else + } else if (!use_xsave()) stts(); } else { old->fpu_counter = 0; old->thread.fpu.last_cpu = ~0; if (fpu.preload) { new->fpu_counter++; - if (fpu_lazy_restore(new, cpu)) + if (!use_xsave() && fpu_lazy_restore(new, cpu)) fpu.preload = 0; else prefetch(new->thread.fpu.state); @@ -372,7 +410,7 @@ static inline void switch_fpu_finish(struct task_struct *new, fpu_switch_t fpu) { if (fpu.preload) { if (unlikely(restore_fpu_checking(new))) - __thread_fpu_end(new); + drop_init_fpu(new); } } @@ -400,17 +438,6 @@ static inline int restore_xstate_sig(void __user *buf, int ia32_frame) return __restore_xstate_sig(buf, buf_fx, size); } -static inline void __drop_fpu(struct task_struct *tsk) -{ - if (__thread_has_fpu(tsk)) { - /* Ignore delayed exceptions from user space */ - asm volatile("1: fwait\n" - "2:\n" - _ASM_EXTABLE(1b, 2b)); - __thread_fpu_end(tsk); - } -} - /* * Need to be preemption-safe. * @@ -431,24 +458,18 @@ static inline void user_fpu_begin(void) static inline void save_init_fpu(struct task_struct *tsk) { WARN_ON_ONCE(!__thread_has_fpu(tsk)); + + if (use_xsave()) { + xsave_state(&tsk->thread.fpu.state->xsave, -1); + return; + } + preempt_disable(); __save_init_fpu(tsk); __thread_fpu_end(tsk); preempt_enable(); } -static inline void drop_fpu(struct task_struct *tsk) -{ - /* - * Forget coprocessor state.. - */ - tsk->fpu_counter = 0; - preempt_disable(); - __drop_fpu(tsk); - preempt_enable(); - clear_used_math(); -} - /* * i387 state interaction */ @@ -503,12 +524,21 @@ static inline void fpu_free(struct fpu *fpu) } } -static inline void fpu_copy(struct fpu *dst, struct fpu *src) +static inline void fpu_copy(struct task_struct *dst, struct task_struct *src) { - memcpy(dst->state, src->state, xstate_size); -} + if (use_xsave()) { + struct xsave_struct *xsave = &dst->thread.fpu.state->xsave; -extern void fpu_finit(struct fpu *fpu); + memset(&xsave->xsave_hdr, 0, sizeof(struct xsave_hdr_struct)); + xsave_state(xsave, -1); + } else { + struct fpu *dfpu = &dst->thread.fpu; + struct fpu *sfpu = &src->thread.fpu; + + unlazy_fpu(src); + memcpy(dfpu->state, sfpu->state, xstate_size); + } +} static inline unsigned long alloc_mathframe(unsigned long sp, int ia32_frame, unsigned long *buf_fx, diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h index 257d9cca214f..6c3bd3782818 100644 --- a/arch/x86/include/asm/i387.h +++ b/arch/x86/include/asm/i387.h @@ -19,6 +19,7 @@ struct pt_regs; struct user_i387_struct; extern int init_fpu(struct task_struct *child); +extern void fpu_finit(struct fpu *fpu); extern int dump_fpu(struct pt_regs *, struct user_i387_struct *); extern void math_state_restore(void); diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index c1d989a15193..2ddee1b87793 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -34,6 +34,7 @@ extern unsigned int xstate_size; extern u64 pcntxt_mask; extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS]; +extern struct xsave_struct *init_xstate_buf; extern void xsave_init(void); extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index c97bb7b5a9f8..d0e910da16c5 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -165,10 +165,15 @@ void __init check_bugs(void) print_cpu_info(&boot_cpu_data); #endif check_config(); - check_fpu(); check_hlt(); check_popad(); init_utsname()->machine[1] = '0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86); alternative_instructions(); + + /* + * kernel_fpu_begin/end() in check_fpu() relies on the patched + * alternative instructions. + */ + check_fpu(); } diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index ab6a2e8028ae..528557470ddb 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -22,7 +22,15 @@ /* * Were we in an interrupt that interrupted kernel mode? * - * We can do a kernel_fpu_begin/end() pair *ONLY* if that + * For now, on xsave platforms we will return interrupted + * kernel FPU as not-idle. TBD: As we use non-lazy FPU restore + * for xsave platforms, ideally we can change the return value + * to something like __thread_has_fpu(current). But we need to + * be careful of doing __thread_clear_has_fpu() before saving + * the FPU etc for supporting nested uses etc. For now, take + * the simple route! + * + * On others, we can do a kernel_fpu_begin/end() pair *ONLY* if that * pair does nothing at all: the thread must not have fpu (so * that we don't try to save the FPU state), and TS must * be set (so that the clts/stts pair does nothing that is @@ -30,6 +38,9 @@ */ static inline bool interrupted_kernel_fpu_idle(void) { + if (use_xsave()) + return 0; + return !__thread_has_fpu(current) && (read_cr0() & X86_CR0_TS); } @@ -73,7 +84,7 @@ void kernel_fpu_begin(void) __save_init_fpu(me); __thread_clear_has_fpu(me); /* We do 'stts()' in kernel_fpu_end() */ - } else { + } else if (!use_xsave()) { this_cpu_write(fpu_owner_task, NULL); clts(); } @@ -82,7 +93,10 @@ EXPORT_SYMBOL(kernel_fpu_begin); void kernel_fpu_end(void) { - stts(); + if (use_xsave()) + math_state_restore(); + else + stts(); preempt_enable(); } EXPORT_SYMBOL(kernel_fpu_end); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 30069d1a6a4d..c21e30f8923b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -66,15 +66,13 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) { int ret; - unlazy_fpu(src); - *dst = *src; if (fpu_allocated(&src->thread.fpu)) { memset(&dst->thread.fpu, 0, sizeof(dst->thread.fpu)); ret = fpu_alloc(&dst->thread.fpu); if (ret) return ret; - fpu_copy(&dst->thread.fpu, &src->thread.fpu); + fpu_copy(dst, src); } return 0; } @@ -153,7 +151,13 @@ void flush_thread(void) flush_ptrace_hw_breakpoint(tsk); memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); - drop_fpu(tsk); + drop_init_fpu(tsk); + /* + * Free the FPU state for non xsave platforms. They get reallocated + * lazily at the first use. + */ + if (!use_xsave()) + free_thread_xstate(tsk); } static void hard_disable_TSC(void) diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 516fa186121b..b9ff83c7135b 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -190,10 +190,6 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp) regs->cs = __USER_CS; regs->ip = new_ip; regs->sp = new_sp; - /* - * Free the old FP and other extended state - */ - free_thread_xstate(current); } EXPORT_SYMBOL_GPL(start_thread); diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 0a980c9d7cb8..8a6d20ce1978 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -232,10 +232,6 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, regs->cs = _cs; regs->ss = _ss; regs->flags = X86_EFLAGS_IF; - /* - * Free the old FP and other extended state - */ - free_thread_xstate(current); } void diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index b481341c9369..ac7d5275f6e8 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -613,11 +613,12 @@ void math_state_restore(void) } __thread_fpu_begin(tsk); + /* * Paranoid restore. send a SIGSEGV if we fail to restore the state. */ if (unlikely(restore_fpu_checking(tsk))) { - __thread_fpu_end(tsk); + drop_init_fpu(tsk); force_sig(SIGSEGV, tsk); return; } @@ -629,6 +630,8 @@ EXPORT_SYMBOL_GPL(math_state_restore); dotraplinkage void __kprobes do_device_not_available(struct pt_regs *regs, long error_code) { + BUG_ON(use_xsave()); + #ifdef CONFIG_MATH_EMULATION if (read_cr0() & X86_CR0_EM) { struct math_emu_info info = { }; diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index 4ac5f2e135b4..e7752bd7cac8 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -21,7 +21,7 @@ u64 pcntxt_mask; /* * Represents init state for the supported extended state. */ -static struct xsave_struct *init_xstate_buf; +struct xsave_struct *init_xstate_buf; static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32; static unsigned int *xstate_offsets, *xstate_sizes, xstate_features; @@ -268,7 +268,7 @@ int save_xstate_sig(void __user *buf, void __user *buf_fx, int size) if (use_fxsr() && save_xstate_epilog(buf_fx, ia32_fxstate)) return -1; - drop_fpu(tsk); /* trigger finit */ + drop_init_fpu(tsk); /* trigger finit */ return 0; } @@ -340,7 +340,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size) config_enabled(CONFIG_IA32_EMULATION)); if (!buf) { - drop_fpu(tsk); + drop_init_fpu(tsk); return 0; } @@ -380,15 +380,30 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size) */ struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave; struct user_i387_ia32_struct env; + int err = 0; + /* + * Drop the current fpu which clears used_math(). This ensures + * that any context-switch during the copy of the new state, + * avoids the intermediate state from getting restored/saved. + * Thus avoiding the new restored state from getting corrupted. + * We will be ready to restore/save the state only after + * set_used_math() is again set. + */ drop_fpu(tsk); if (__copy_from_user(xsave, buf_fx, state_size) || - __copy_from_user(&env, buf, sizeof(env))) - return -1; + __copy_from_user(&env, buf, sizeof(env))) { + err = -1; + } else { + sanitize_restored_xstate(tsk, &env, xstate_bv, fx_only); + set_used_math(); + } - sanitize_restored_xstate(tsk, &env, xstate_bv, fx_only); - set_used_math(); + if (use_xsave()) + math_state_restore(); + + return err; } else { /* * For 64-bit frames and 32-bit fsave frames, restore the user @@ -396,7 +411,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size) */ user_fpu_begin(); if (restore_user_xstate(buf_fx, xstate_bv, fx_only)) { - drop_fpu(tsk); + drop_init_fpu(tsk); return -1; } } @@ -435,10 +450,28 @@ static void prepare_fx_sw_frame(void) */ static inline void xstate_enable(void) { + clts(); set_in_cr4(X86_CR4_OSXSAVE); xsetbv(XCR_XFEATURE_ENABLED_MASK, pcntxt_mask); } +/* + * This is same as math_state_restore(). But use_xsave() is not yet + * patched to use math_state_restore(). + */ +static inline void init_restore_xstate(void) +{ + init_fpu(current); + __thread_fpu_begin(current); + xrstor_state(init_xstate_buf, -1); +} + +static inline void xstate_enable_ap(void) +{ + xstate_enable(); + init_restore_xstate(); +} + /* * Record the offsets and sizes of different state managed by the xsave * memory layout. @@ -479,7 +512,6 @@ static void __init setup_xstate_init(void) __alignof__(struct xsave_struct)); init_xstate_buf->i387.mxcsr = MXCSR_DEFAULT; - clts(); /* * Init all the features state with header_bv being 0x0 */ @@ -489,7 +521,6 @@ static void __init setup_xstate_init(void) * of any feature which is not represented by all zero's. */ xsave_state(init_xstate_buf, -1); - stts(); } /* @@ -533,6 +564,10 @@ static void __init xstate_enable_boot_cpu(void) pr_info("enabled xstate_bv 0x%llx, cntxt size 0x%x\n", pcntxt_mask, xstate_size); + + current->thread.fpu.state = + alloc_bootmem_align(xstate_size, __alignof__(struct xsave_struct)); + init_restore_xstate(); } /* @@ -551,6 +586,6 @@ void __cpuinit xsave_init(void) return; this_func = next_func; - next_func = xstate_enable; + next_func = xstate_enable_ap; this_func(); } -- cgit v1.2.3 From 5d2bd7009f306c82afddd1ca4d9763ad8473c216 Mon Sep 17 00:00:00 2001 From: Suresh Siddha Date: Thu, 6 Sep 2012 14:58:52 -0700 Subject: x86, fpu: decouple non-lazy/eager fpu restore from xsave Decouple non-lazy/eager fpu restore policy from the existence of the xsave feature. Introduce a synthetic CPUID flag to represent the eagerfpu policy. "eagerfpu=on" boot paramter will enable the policy. Requested-by: H. Peter Anvin Requested-by: Linus Torvalds Signed-off-by: Suresh Siddha Link: http://lkml.kernel.org/r/1347300665-6209-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: H. Peter Anvin --- Documentation/kernel-parameters.txt | 4 ++ arch/x86/include/asm/cpufeature.h | 2 + arch/x86/include/asm/fpu-internal.h | 54 ++++++++++++++++------- arch/x86/kernel/cpu/common.c | 2 - arch/x86/kernel/i387.c | 25 ++++------- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/traps.c | 2 +- arch/x86/kernel/xsave.c | 87 ++++++++++++++++++++++++------------- 8 files changed, 112 insertions(+), 66 deletions(-) (limited to 'arch/x86/kernel/process.c') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index ad7e2e5088c1..741d064fdc6a 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1833,6 +1833,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. + eagerfpu= [X86] + on enable eager fpu restore + off disable eager fpu restore + nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or wfi(ARM) instruction doesn't work correctly and not to use it. This is also useful when using JTAG debugger. diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 6b7ee5ff6820..5dd2b473ccff 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -97,6 +97,7 @@ #define X86_FEATURE_EXTD_APICID (3*32+26) /* has extended APICID (8 bits) */ #define X86_FEATURE_AMD_DCM (3*32+27) /* multi-node processor */ #define X86_FEATURE_APERFMPERF (3*32+28) /* APERFMPERF */ +#define X86_FEATURE_EAGER_FPU (3*32+29) /* "eagerfpu" Non lazy FPU restore */ /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ #define X86_FEATURE_XMM3 (4*32+ 0) /* "pni" SSE-3 */ @@ -305,6 +306,7 @@ extern const char * const x86_power_flags[32]; #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) #define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8) #define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) +#define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg 1 diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index 8ca0f9f45ac4..0ca72f0d4b41 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -38,6 +38,7 @@ int ia32_setup_frame(int sig, struct k_sigaction *ka, extern unsigned int mxcsr_feature_mask; extern void fpu_init(void); +extern void eager_fpu_init(void); DECLARE_PER_CPU(struct task_struct *, fpu_owner_task); @@ -84,6 +85,11 @@ static inline int is_x32_frame(void) #define X87_FSW_ES (1 << 7) /* Exception Summary */ +static __always_inline __pure bool use_eager_fpu(void) +{ + return static_cpu_has(X86_FEATURE_EAGER_FPU); +} + static __always_inline __pure bool use_xsaveopt(void) { return static_cpu_has(X86_FEATURE_XSAVEOPT); @@ -99,6 +105,14 @@ static __always_inline __pure bool use_fxsr(void) return static_cpu_has(X86_FEATURE_FXSR); } +static inline void fx_finit(struct i387_fxsave_struct *fx) +{ + memset(fx, 0, xstate_size); + fx->cwd = 0x37f; + if (cpu_has_xmm) + fx->mxcsr = MXCSR_DEFAULT; +} + extern void __sanitize_i387_state(struct task_struct *); static inline void sanitize_i387_state(struct task_struct *tsk) @@ -291,13 +305,13 @@ static inline void __thread_set_has_fpu(struct task_struct *tsk) static inline void __thread_fpu_end(struct task_struct *tsk) { __thread_clear_has_fpu(tsk); - if (!use_xsave()) + if (!use_eager_fpu()) stts(); } static inline void __thread_fpu_begin(struct task_struct *tsk) { - if (!use_xsave()) + if (!use_eager_fpu()) clts(); __thread_set_has_fpu(tsk); } @@ -327,10 +341,14 @@ static inline void drop_fpu(struct task_struct *tsk) static inline void drop_init_fpu(struct task_struct *tsk) { - if (!use_xsave()) + if (!use_eager_fpu()) drop_fpu(tsk); - else - xrstor_state(init_xstate_buf, -1); + else { + if (use_xsave()) + xrstor_state(init_xstate_buf, -1); + else + fxrstor_checking(&init_xstate_buf->i387); + } } /* @@ -370,7 +388,7 @@ static inline fpu_switch_t switch_fpu_prepare(struct task_struct *old, struct ta * If the task has used the math, pre-load the FPU on xsave processors * or if the past 5 consecutive context-switches used math. */ - fpu.preload = tsk_used_math(new) && (use_xsave() || + fpu.preload = tsk_used_math(new) && (use_eager_fpu() || new->fpu_counter > 5); if (__thread_has_fpu(old)) { if (!__save_init_fpu(old)) @@ -383,14 +401,14 @@ static inline fpu_switch_t switch_fpu_prepare(struct task_struct *old, struct ta new->fpu_counter++; __thread_set_has_fpu(new); prefetch(new->thread.fpu.state); - } else if (!use_xsave()) + } else if (!use_eager_fpu()) stts(); } else { old->fpu_counter = 0; old->thread.fpu.last_cpu = ~0; if (fpu.preload) { new->fpu_counter++; - if (!use_xsave() && fpu_lazy_restore(new, cpu)) + if (!use_eager_fpu() && fpu_lazy_restore(new, cpu)) fpu.preload = 0; else prefetch(new->thread.fpu.state); @@ -452,6 +470,14 @@ static inline void user_fpu_begin(void) preempt_enable(); } +static inline void __save_fpu(struct task_struct *tsk) +{ + if (use_xsave()) + xsave_state(&tsk->thread.fpu.state->xsave, -1); + else + fpu_fxsave(&tsk->thread.fpu); +} + /* * These disable preemption on their own and are safe */ @@ -459,8 +485,8 @@ static inline void save_init_fpu(struct task_struct *tsk) { WARN_ON_ONCE(!__thread_has_fpu(tsk)); - if (use_xsave()) { - xsave_state(&tsk->thread.fpu.state->xsave, -1); + if (use_eager_fpu()) { + __save_fpu(tsk); return; } @@ -526,11 +552,9 @@ static inline void fpu_free(struct fpu *fpu) static inline void fpu_copy(struct task_struct *dst, struct task_struct *src) { - if (use_xsave()) { - struct xsave_struct *xsave = &dst->thread.fpu.state->xsave; - - memset(&xsave->xsave_hdr, 0, sizeof(struct xsave_hdr_struct)); - xsave_state(xsave, -1); + if (use_eager_fpu()) { + memset(&dst->thread.fpu.state->xsave, 0, xstate_size); + __save_fpu(dst); } else { struct fpu *dfpu = &dst->thread.fpu; struct fpu *sfpu = &src->thread.fpu; diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index a5fbc3c5fccc..b0fe078614d8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1297,7 +1297,6 @@ void __cpuinit cpu_init(void) dbg_restore_debug_regs(); fpu_init(); - xsave_init(); raw_local_save_flags(kernel_eflags); @@ -1352,6 +1351,5 @@ void __cpuinit cpu_init(void) dbg_restore_debug_regs(); fpu_init(); - xsave_init(); } #endif diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 528557470ddb..6782e3983865 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -22,9 +22,8 @@ /* * Were we in an interrupt that interrupted kernel mode? * - * For now, on xsave platforms we will return interrupted - * kernel FPU as not-idle. TBD: As we use non-lazy FPU restore - * for xsave platforms, ideally we can change the return value + * For now, with eagerfpu we will return interrupted kernel FPU + * state as not-idle. TBD: Ideally we can change the return value * to something like __thread_has_fpu(current). But we need to * be careful of doing __thread_clear_has_fpu() before saving * the FPU etc for supporting nested uses etc. For now, take @@ -38,7 +37,7 @@ */ static inline bool interrupted_kernel_fpu_idle(void) { - if (use_xsave()) + if (use_eager_fpu()) return 0; return !__thread_has_fpu(current) && @@ -84,7 +83,7 @@ void kernel_fpu_begin(void) __save_init_fpu(me); __thread_clear_has_fpu(me); /* We do 'stts()' in kernel_fpu_end() */ - } else if (!use_xsave()) { + } else if (!use_eager_fpu()) { this_cpu_write(fpu_owner_task, NULL); clts(); } @@ -93,7 +92,7 @@ EXPORT_SYMBOL(kernel_fpu_begin); void kernel_fpu_end(void) { - if (use_xsave()) + if (use_eager_fpu()) math_state_restore(); else stts(); @@ -122,7 +121,6 @@ static void __cpuinit mxcsr_feature_mask_init(void) { unsigned long mask = 0; - clts(); if (cpu_has_fxsr) { memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct)); asm volatile("fxsave %0" : : "m" (fx_scratch)); @@ -131,7 +129,6 @@ static void __cpuinit mxcsr_feature_mask_init(void) mask = 0x0000ffbf; } mxcsr_feature_mask &= mask; - stts(); } static void __cpuinit init_thread_xstate(void) @@ -185,9 +182,8 @@ void __cpuinit fpu_init(void) init_thread_xstate(); mxcsr_feature_mask_init(); - /* clean state in init */ - current_thread_info()->status = 0; - clear_used_math(); + xsave_init(); + eager_fpu_init(); } void fpu_finit(struct fpu *fpu) @@ -198,12 +194,7 @@ void fpu_finit(struct fpu *fpu) } if (cpu_has_fxsr) { - struct i387_fxsave_struct *fx = &fpu->state->fxsave; - - memset(fx, 0, xstate_size); - fx->cwd = 0x37f; - if (cpu_has_xmm) - fx->mxcsr = MXCSR_DEFAULT; + fx_finit(&fpu->state->fxsave); } else { struct i387_fsave_struct *fp = &fpu->state->fsave; memset(fp, 0, xstate_size); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c21e30f8923b..dc3567e083f9 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -156,7 +156,7 @@ void flush_thread(void) * Free the FPU state for non xsave platforms. They get reallocated * lazily at the first use. */ - if (!use_xsave()) + if (!use_eager_fpu()) free_thread_xstate(tsk); } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index ac7d5275f6e8..4f4aba0551b0 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -630,7 +630,7 @@ EXPORT_SYMBOL_GPL(math_state_restore); dotraplinkage void __kprobes do_device_not_available(struct pt_regs *regs, long error_code) { - BUG_ON(use_xsave()); + BUG_ON(use_eager_fpu()); #ifdef CONFIG_MATH_EMULATION if (read_cr0() & X86_CR0_EM) { diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index e7752bd7cac8..c0afd2c43761 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -400,7 +400,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size) set_used_math(); } - if (use_xsave()) + if (use_eager_fpu()) math_state_restore(); return err; @@ -450,28 +450,10 @@ static void prepare_fx_sw_frame(void) */ static inline void xstate_enable(void) { - clts(); set_in_cr4(X86_CR4_OSXSAVE); xsetbv(XCR_XFEATURE_ENABLED_MASK, pcntxt_mask); } -/* - * This is same as math_state_restore(). But use_xsave() is not yet - * patched to use math_state_restore(). - */ -static inline void init_restore_xstate(void) -{ - init_fpu(current); - __thread_fpu_begin(current); - xrstor_state(init_xstate_buf, -1); -} - -static inline void xstate_enable_ap(void) -{ - xstate_enable(); - init_restore_xstate(); -} - /* * Record the offsets and sizes of different state managed by the xsave * memory layout. @@ -500,17 +482,20 @@ static void __init setup_xstate_features(void) /* * setup the xstate image representing the init state */ -static void __init setup_xstate_init(void) +static void __init setup_init_fpu_buf(void) { - setup_xstate_features(); - /* * Setup init_xstate_buf to represent the init state of * all the features managed by the xsave */ init_xstate_buf = alloc_bootmem_align(xstate_size, __alignof__(struct xsave_struct)); - init_xstate_buf->i387.mxcsr = MXCSR_DEFAULT; + fx_finit(&init_xstate_buf->i387); + + if (!cpu_has_xsave) + return; + + setup_xstate_features(); /* * Init all the features state with header_bv being 0x0 @@ -523,6 +508,17 @@ static void __init setup_xstate_init(void) xsave_state(init_xstate_buf, -1); } +static int disable_eagerfpu; +static int __init eager_fpu_setup(char *s) +{ + if (!strcmp(s, "on")) + setup_force_cpu_cap(X86_FEATURE_EAGER_FPU); + else if (!strcmp(s, "off")) + disable_eagerfpu = 1; + return 1; +} +__setup("eagerfpu=", eager_fpu_setup); + /* * Enable and initialize the xsave feature. */ @@ -559,15 +555,10 @@ static void __init xstate_enable_boot_cpu(void) update_regset_xstate_info(xstate_size, pcntxt_mask); prepare_fx_sw_frame(); - - setup_xstate_init(); + setup_init_fpu_buf(); pr_info("enabled xstate_bv 0x%llx, cntxt size 0x%x\n", pcntxt_mask, xstate_size); - - current->thread.fpu.state = - alloc_bootmem_align(xstate_size, __alignof__(struct xsave_struct)); - init_restore_xstate(); } /* @@ -586,6 +577,42 @@ void __cpuinit xsave_init(void) return; this_func = next_func; - next_func = xstate_enable_ap; + next_func = xstate_enable; this_func(); } + +static inline void __init eager_fpu_init_bp(void) +{ + current->thread.fpu.state = + alloc_bootmem_align(xstate_size, __alignof__(struct xsave_struct)); + if (!init_xstate_buf) + setup_init_fpu_buf(); +} + +void __cpuinit eager_fpu_init(void) +{ + static __refdata void (*boot_func)(void) = eager_fpu_init_bp; + + clear_used_math(); + current_thread_info()->status = 0; + if (!cpu_has_eager_fpu) { + stts(); + return; + } + + if (boot_func) { + boot_func(); + boot_func = NULL; + } + + /* + * This is same as math_state_restore(). But use_xsave() is + * not yet patched to use math_state_restore(). + */ + init_fpu(current); + __thread_fpu_begin(current); + if (cpu_has_xsave) + xrstor_state(init_xstate_buf, -1); + else + fxrstor_checking(&init_xstate_buf->i387); +} -- cgit v1.2.3 From e76623d69408d0bd66a296c6ee5eae1b17a6adfc Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 2 Aug 2012 22:12:06 +0400 Subject: x86: get rid of TIF_IRET hackery TIF_NOTIFY_RESUME will work in precisely the same way; all that is achieved by TIF_IRET is appearing that there's some work to be done, so we end up on the iret exit path. Just use NOTIFY_RESUME. And for execve() do that in 32bit start_thread(), not sys_execve() itself. Signed-off-by: Al Viro --- arch/x86/include/asm/thread_info.h | 2 -- arch/x86/kernel/process.c | 8 -------- arch/x86/kernel/process_32.c | 5 +++++ arch/x86/kernel/signal.c | 4 ---- arch/x86/kernel/vm86_32.c | 6 +++--- 5 files changed, 8 insertions(+), 17 deletions(-) (limited to 'arch/x86/kernel/process.c') diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 89f794f007ec..c509d07bdbd7 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -79,7 +79,6 @@ struct thread_info { #define TIF_SIGPENDING 2 /* signal pending */ #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/ -#define TIF_IRET 5 /* force IRET */ #define TIF_SYSCALL_EMU 6 /* syscall emulation active */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ @@ -104,7 +103,6 @@ struct thread_info { #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) -#define _TIF_IRET (1 << TIF_IRET) #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index ef6a8456f719..7162e9c1f598 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -351,14 +351,6 @@ long sys_execve(const char __user *name, if (IS_ERR(filename)) return error; error = do_execve(filename, argv, envp, regs); - -#ifdef CONFIG_X86_32 - if (error == 0) { - /* Make sure we don't return using sysenter.. */ - set_thread_flag(TIF_IRET); - } -#endif - putname(filename); return error; } diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 516fa186121b..75fcad146def 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -194,6 +194,11 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp) * Free the old FP and other extended state */ free_thread_xstate(current); + /* + * force it to the iret return path by making it look as if there was + * some work pending. + */ + set_thread_flag(TIF_NOTIFY_RESUME); } EXPORT_SYMBOL_GPL(start_thread); diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index b280908a376e..c648fc529872 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -800,10 +800,6 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags) } if (thread_info_flags & _TIF_USER_RETURN_NOTIFY) fire_user_return_notifiers(); - -#ifdef CONFIG_X86_32 - clear_thread_flag(TIF_IRET); -#endif /* CONFIG_X86_32 */ } void signal_fault(struct pt_regs *regs, void __user *frame, char *where) diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index 54abcc0baf23..5c9687b1bde6 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -561,9 +561,9 @@ int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno) if ((trapno == 3) || (trapno == 1)) { KVM86->regs32->ax = VM86_TRAP + (trapno << 8); /* setting this flag forces the code in entry_32.S to - call save_v86_state() and change the stack pointer - to KVM86->regs32 */ - set_thread_flag(TIF_IRET); + the path where we call save_v86_state() and change + the stack pointer to KVM86->regs32 */ + set_thread_flag(TIF_NOTIFY_RESUME); return 0; } do_int(regs, trapno, (unsigned char __user *) (regs->pt.ss << 4), SP(regs)); -- cgit v1.2.3 From 7076aada1040de4ed79a5977dbabdb5e5ea5e249 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 10 Sep 2012 16:44:54 -0400 Subject: x86: split ret_from_fork Signed-off-by: Al Viro --- arch/x86/Kconfig | 1 + arch/x86/include/asm/processor.h | 5 ----- arch/x86/kernel/entry_32.S | 15 ++++++++++----- arch/x86/kernel/entry_64.S | 27 +++++++++++---------------- arch/x86/kernel/process.c | 38 -------------------------------------- arch/x86/kernel/process_32.c | 31 ++++++++++++++++++++++++------- arch/x86/kernel/process_64.c | 35 +++++++++++++++++++++-------------- 7 files changed, 67 insertions(+), 85 deletions(-) (limited to 'arch/x86/kernel/process.c') diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8ec3a1aa4abd..d93eb9d1bb97 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -97,6 +97,7 @@ config X86 select KTIME_SCALAR if X86_32 select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER + select GENERIC_KERNEL_THREAD config INSTRUCTION_DECODER def_bool (KPROBES || PERF_EVENTS || UPROBES) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index d048cad9bcad..078f3fdedf95 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -589,11 +589,6 @@ typedef struct { } mm_segment_t; -/* - * create a kernel thread without removing it from tasklists - */ -extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); - /* Free all resources held by a thread. */ extern void release_thread(struct task_struct *); diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index 623f28837476..ac1107346fc9 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -994,15 +994,20 @@ END(spurious_interrupt_bug) */ .popsection -ENTRY(kernel_thread_helper) - pushl $0 # fake return address for unwinder +ENTRY(ret_from_kernel_thread) CFI_STARTPROC - movl %edi,%eax - call *%esi + pushl_cfi %eax + call schedule_tail + GET_THREAD_INFO(%ebp) + popl_cfi %eax + pushl_cfi $0x0202 # Reset kernel eflags + popfl_cfi + movl PT_EBP(%esp),%eax + call *PT_EBX(%esp) call do_exit ud2 # padding for call trace CFI_ENDPROC -ENDPROC(kernel_thread_helper) +ENDPROC(ret_from_kernel_thread) #ifdef CONFIG_XEN /* Xen doesn't set %esp to be precisely what the normal sysenter diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 69babd8c834f..5526d17db676 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -450,7 +450,7 @@ ENTRY(ret_from_fork) RESTORE_REST testl $3, CS-ARGOFFSET(%rsp) # from kernel_thread? - jz retint_restore_args + jz 1f testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET jnz int_ret_from_sys_call @@ -458,6 +458,16 @@ ENTRY(ret_from_fork) RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET jmp ret_from_sys_call # go to the SYSRET fastpath +1: + subq $REST_SKIP, %rsp # move the stack pointer back + CFI_ADJUST_CFA_OFFSET REST_SKIP + movq %rbp, %rdi + call *%rbx + # exit + mov %eax, %edi + call do_exit + ud2 # padding for call trace + CFI_ENDPROC END(ret_from_fork) @@ -1206,21 +1216,6 @@ bad_gs: jmp 2b .previous -ENTRY(kernel_thread_helper) - pushq $0 # fake return address - CFI_STARTPROC - /* - * Here we are in the child and the registers are set as they were - * at kernel_thread() invocation in the parent. - */ - call *%rsi - # exit - mov %eax, %edi - call do_exit - ud2 # padding for call trace - CFI_ENDPROC -END(kernel_thread_helper) - /* * execve(). This function needs to use IRET, not SYSRET, to set up all state properly. * diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 7162e9c1f598..6947ec968bf8 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,44 +298,6 @@ sys_clone(unsigned long clone_flags, unsigned long newsp, return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid); } -/* - * This gets run with %si containing the - * function to call, and %di containing - * the "args". - */ -extern void kernel_thread_helper(void); - -/* - * Create a kernel thread - */ -int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags) -{ - struct pt_regs regs; - - memset(®s, 0, sizeof(regs)); - - regs.si = (unsigned long) fn; - regs.di = (unsigned long) arg; - -#ifdef CONFIG_X86_32 - regs.ds = __USER_DS; - regs.es = __USER_DS; - regs.fs = __KERNEL_PERCPU; - regs.gs = __KERNEL_STACK_CANARY; -#else - regs.ss = __KERNEL_DS; -#endif - - regs.orig_ax = -1; - regs.ip = (unsigned long) kernel_thread_helper; - regs.cs = __KERNEL_CS | get_kernel_rpl(); - regs.flags = X86_EFLAGS_IF | X86_EFLAGS_BIT1; - - /* Ok, create the new process.. */ - return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s, 0, NULL, NULL); -} -EXPORT_SYMBOL(kernel_thread); - /* * sys_execve() executes a new program. */ diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 75fcad146def..c9939875d267 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -57,6 +57,7 @@ #include asmlinkage void ret_from_fork(void) __asm__("ret_from_fork"); +asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread"); /* * Return saved PC of a blocked thread. @@ -127,23 +128,39 @@ void release_thread(struct task_struct *dead_task) } int copy_thread(unsigned long clone_flags, unsigned long sp, - unsigned long unused, + unsigned long arg, struct task_struct *p, struct pt_regs *regs) { - struct pt_regs *childregs; + struct pt_regs *childregs = task_pt_regs(p); struct task_struct *tsk; int err; - childregs = task_pt_regs(p); + p->thread.sp = (unsigned long) childregs; + p->thread.sp0 = (unsigned long) (childregs+1); + + if (unlikely(!regs)) { + /* kernel thread */ + memset(childregs, 0, sizeof(struct pt_regs)); + p->thread.ip = (unsigned long) ret_from_kernel_thread; + task_user_gs(p) = __KERNEL_STACK_CANARY; + childregs->ds = __USER_DS; + childregs->es = __USER_DS; + childregs->fs = __KERNEL_PERCPU; + childregs->bx = sp; /* function */ + childregs->bp = arg; + childregs->orig_ax = -1; + childregs->cs = __KERNEL_CS | get_kernel_rpl(); + childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_BIT1; + p->fpu_counter = 0; + p->thread.io_bitmap_ptr = NULL; + memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps)); + return 0; + } *childregs = *regs; childregs->ax = 0; childregs->sp = sp; - p->thread.sp = (unsigned long) childregs; - p->thread.sp0 = (unsigned long) (childregs+1); - p->thread.ip = (unsigned long) ret_from_fork; - task_user_gs(p) = get_user_gs(regs); p->fpu_counter = 0; diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 0a980c9d7cb8..937f2af6f2d4 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -146,29 +146,18 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls) } int copy_thread(unsigned long clone_flags, unsigned long sp, - unsigned long unused, + unsigned long arg, struct task_struct *p, struct pt_regs *regs) { int err; struct pt_regs *childregs; struct task_struct *me = current; - childregs = ((struct pt_regs *) - (THREAD_SIZE + task_stack_page(p))) - 1; - *childregs = *regs; - - childregs->ax = 0; - if (user_mode(regs)) - childregs->sp = sp; - else - childregs->sp = (unsigned long)childregs; - + p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE; + childregs = task_pt_regs(p); p->thread.sp = (unsigned long) childregs; - p->thread.sp0 = (unsigned long) (childregs+1); p->thread.usersp = me->thread.usersp; - set_tsk_thread_flag(p, TIF_FORK); - p->fpu_counter = 0; p->thread.io_bitmap_ptr = NULL; @@ -178,6 +167,24 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, p->thread.fs = p->thread.fsindex ? 0 : me->thread.fs; savesegment(es, p->thread.es); savesegment(ds, p->thread.ds); + memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps)); + + if (unlikely(!regs)) { + /* kernel thread */ + memset(childregs, 0, sizeof(struct pt_regs)); + childregs->sp = (unsigned long)childregs; + childregs->ss = __KERNEL_DS; + childregs->bx = sp; /* function */ + childregs->bp = arg; + childregs->orig_ax = -1; + childregs->cs = __KERNEL_CS | get_kernel_rpl(); + childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_BIT1; + return 0; + } + *childregs = *regs; + + childregs->ax = 0; + childregs->sp = sp; err = -ENOMEM; memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps)); -- cgit v1.2.3 From 6783eaa2e1253fbcbe2c2f6bb4c843abf1343caf Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 2 Aug 2012 23:05:11 +0400 Subject: x86, um/x86: switch to generic sys_execve and kernel_execve 32bit wrapper is lost on that; 64bit one is *not*, since we need to arrange for full pt_regs on stack when we call sys_execve() and we need to load callee-saved ones from there afterwards. Signed-off-by: Al Viro --- arch/um/kernel/exec.c | 25 ++------------------- arch/um/kernel/internal.h | 1 - arch/um/kernel/syscall.c | 17 --------------- arch/x86/ia32/ia32entry.S | 2 +- arch/x86/ia32/sys_ia32.c | 15 ------------- arch/x86/include/asm/sys_ia32.h | 2 -- arch/x86/include/asm/syscalls.h | 2 +- arch/x86/include/asm/unistd.h | 2 ++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/asm-offsets.c | 3 +++ arch/x86/kernel/entry_32.S | 11 +++++++--- arch/x86/kernel/entry_64.S | 47 ++++++++++++---------------------------- arch/x86/kernel/process.c | 19 ---------------- arch/x86/kernel/process_32.c | 1 + arch/x86/kernel/sys_i386_32.c | 40 ---------------------------------- arch/x86/syscalls/syscall_32.tbl | 2 +- arch/x86/um/sys_call_table_32.c | 1 - 17 files changed, 34 insertions(+), 158 deletions(-) delete mode 100644 arch/um/kernel/internal.h delete mode 100644 arch/x86/kernel/sys_i386_32.c (limited to 'arch/x86/kernel/process.c') diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c index 8c82786da823..e427301f55d6 100644 --- a/arch/um/kernel/exec.c +++ b/arch/um/kernel/exec.c @@ -16,7 +16,6 @@ #include "mem_user.h" #include "skas.h" #include "os.h" -#include "internal.h" void flush_thread(void) { @@ -49,27 +48,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp) } EXPORT_SYMBOL(start_thread); -long um_execve(const char *file, const char __user *const __user *argv, const char __user *const __user *env) +void __noreturn ret_from_kernel_execve(struct pt_regs *unused) { - long err; - - err = do_execve(file, argv, env, ¤t->thread.regs); - if (!err) - UML_LONGJMP(current->thread.exec_buf, 1); - return err; -} - -long sys_execve(const char __user *file, const char __user *const __user *argv, - const char __user *const __user *env) -{ - long error; - char *filename; - - filename = getname(file); - error = PTR_ERR(filename); - if (IS_ERR(filename)) goto out; - error = do_execve(filename, argv, env, ¤t->thread.regs); - putname(filename); - out: - return error; + UML_LONGJMP(current->thread.exec_buf, 1); } diff --git a/arch/um/kernel/internal.h b/arch/um/kernel/internal.h deleted file mode 100644 index 5bf97db24a04..000000000000 --- a/arch/um/kernel/internal.h +++ /dev/null @@ -1 +0,0 @@ -extern long um_execve(const char *file, const char __user *const __user *argv, const char __user *const __user *env); diff --git a/arch/um/kernel/syscall.c b/arch/um/kernel/syscall.c index a4c6d8eee74c..a5639c472772 100644 --- a/arch/um/kernel/syscall.c +++ b/arch/um/kernel/syscall.c @@ -13,7 +13,6 @@ #include "asm/mman.h" #include "asm/uaccess.h" #include "asm/unistd.h" -#include "internal.h" long sys_fork(void) { @@ -50,19 +49,3 @@ long old_mmap(unsigned long addr, unsigned long len, out: return err; } - -int kernel_execve(const char *filename, - const char *const argv[], - const char *const envp[]) -{ - mm_segment_t fs; - int ret; - - fs = get_fs(); - set_fs(KERNEL_DS); - ret = um_execve(filename, (const char __user *const __user *)argv, - (const char __user *const __user *) envp); - set_fs(fs); - - return ret; -} diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 20e5f7ba0e6b..e75f941bd2b2 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -459,7 +459,7 @@ GLOBAL(\label) PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn, %rdi PTREGSCALL stub32_sigreturn, sys32_sigreturn, %rdi PTREGSCALL stub32_sigaltstack, sys32_sigaltstack, %rdx - PTREGSCALL stub32_execve, sys32_execve, %rcx + PTREGSCALL stub32_execve, compat_sys_execve, %rcx PTREGSCALL stub32_fork, sys_fork, %rdi PTREGSCALL stub32_clone, sys32_clone, %rdx PTREGSCALL stub32_vfork, sys_vfork, %rdi diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c index 4540bece0946..6b31144589d0 100644 --- a/arch/x86/ia32/sys_ia32.c +++ b/arch/x86/ia32/sys_ia32.c @@ -385,21 +385,6 @@ asmlinkage long sys32_sendfile(int out_fd, int in_fd, return ret; } -asmlinkage long sys32_execve(const char __user *name, compat_uptr_t __user *argv, - compat_uptr_t __user *envp, struct pt_regs *regs) -{ - long error; - char *filename; - - filename = getname(name); - error = PTR_ERR(filename); - if (IS_ERR(filename)) - return error; - error = compat_do_execve(filename, argv, envp, regs); - putname(filename); - return error; -} - asmlinkage long sys32_clone(unsigned int clone_flags, unsigned int newsp, struct pt_regs *regs) { diff --git a/arch/x86/include/asm/sys_ia32.h b/arch/x86/include/asm/sys_ia32.h index 3fda9db48819..1ac127f41fe6 100644 --- a/arch/x86/include/asm/sys_ia32.h +++ b/arch/x86/include/asm/sys_ia32.h @@ -54,8 +54,6 @@ asmlinkage long sys32_pwrite(unsigned int, const char __user *, u32, u32, u32); asmlinkage long sys32_personality(unsigned long); asmlinkage long sys32_sendfile(int, int, compat_off_t __user *, s32); -asmlinkage long sys32_execve(const char __user *, compat_uptr_t __user *, - compat_uptr_t __user *, struct pt_regs *); asmlinkage long sys32_clone(unsigned int, unsigned int, struct pt_regs *); long sys32_lseek(unsigned int, int, unsigned int); diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h index f1d8b441fc77..2be0b880417e 100644 --- a/arch/x86/include/asm/syscalls.h +++ b/arch/x86/include/asm/syscalls.h @@ -25,7 +25,7 @@ int sys_fork(struct pt_regs *); int sys_vfork(struct pt_regs *); long sys_execve(const char __user *, const char __user *const __user *, - const char __user *const __user *, struct pt_regs *); + const char __user *const __user *); long sys_clone(unsigned long, unsigned long, void __user *, void __user *, struct pt_regs *); diff --git a/arch/x86/include/asm/unistd.h b/arch/x86/include/asm/unistd.h index 0d9776e9e2dc..55d155560fdf 100644 --- a/arch/x86/include/asm/unistd.h +++ b/arch/x86/include/asm/unistd.h @@ -50,6 +50,8 @@ # define __ARCH_WANT_SYS_TIME # define __ARCH_WANT_SYS_UTIME # define __ARCH_WANT_SYS_WAITPID +# define __ARCH_WANT_SYS_EXECVE +# define __ARCH_WANT_KERNEL_EXECVE /* * "Conditional" syscalls diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8215e5652d97..566100002233 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -23,7 +23,7 @@ obj-y += time.o ioport.o ldt.o dumpstack.o nmi.o obj-y += setup.o x86_init.o i8259.o irqinit.o jump_label.o obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-y += probe_roms.o -obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o +obj-$(CONFIG_X86_32) += i386_ksyms_32.o obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o obj-y += syscall_$(BITS).o obj-$(CONFIG_X86_64) += vsyscall_64.o diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 68de2dc962ec..28610822fb3c 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -69,4 +69,7 @@ void common(void) { OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment); OFFSET(BP_pref_address, boot_params, hdr.pref_address); OFFSET(BP_code32_start, boot_params, hdr.code32_start); + + BLANK(); + DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); } diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index ac1107346fc9..b6bb69239296 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -298,6 +298,13 @@ ENTRY(ret_from_fork) CFI_ENDPROC END(ret_from_fork) +ENTRY(ret_from_kernel_execve) + movl %eax, %esp + movl $0,PT_EAX(%esp) + GET_THREAD_INFO(%ebp) + jmp syscall_exit +END(ret_from_kernel_execve) + /* * Interrupt exit functions should be protected against kprobes */ @@ -322,8 +329,7 @@ ret_from_intr: andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax #else /* - * We can be coming here from a syscall done in the kernel space, - * e.g. a failed kernel_execve(). + * We can be coming here from child spawned by kernel_thread(). */ movl PT_CS(%esp), %eax andl $SEGMENT_RPL_MASK, %eax @@ -727,7 +733,6 @@ ENDPROC(ptregs_##name) PTREGSCALL1(iopl) PTREGSCALL0(fork) PTREGSCALL0(vfork) -PTREGSCALL3(execve) PTREGSCALL2(sigaltstack) PTREGSCALL0(sigreturn) PTREGSCALL0(rt_sigreturn) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 5526d17db676..053c9552ffd9 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -767,7 +767,6 @@ ENTRY(stub_execve) PARTIAL_FRAME 0 SAVE_REST FIXUP_TOP_OF_STACK %r11 - movq %rsp, %rcx call sys_execve RESTORE_TOP_OF_STACK %r11 movq %rax,RAX(%rsp) @@ -817,8 +816,7 @@ ENTRY(stub_x32_execve) PARTIAL_FRAME 0 SAVE_REST FIXUP_TOP_OF_STACK %r11 - movq %rsp, %rcx - call sys32_execve + call compat_sys_execve RESTORE_TOP_OF_STACK %r11 movq %rax,RAX(%rsp) RESTORE_REST @@ -1216,36 +1214,19 @@ bad_gs: jmp 2b .previous -/* - * execve(). This function needs to use IRET, not SYSRET, to set up all state properly. - * - * C extern interface: - * extern long execve(const char *name, char **argv, char **envp) - * - * asm input arguments: - * rdi: name, rsi: argv, rdx: envp - * - * We want to fallback into: - * extern long sys_execve(const char *name, char **argv,char **envp, struct pt_regs *regs) - * - * do_sys_execve asm fallback arguments: - * rdi: name, rsi: argv, rdx: envp, rcx: fake frame on the stack - */ -ENTRY(kernel_execve) - CFI_STARTPROC - FAKE_STACK_FRAME $0 - SAVE_ALL - movq %rsp,%rcx - call sys_execve - movq %rax, RAX(%rsp) - RESTORE_REST - testq %rax,%rax - je int_ret_from_sys_call - RESTORE_ARGS - UNFAKE_STACK_FRAME - ret - CFI_ENDPROC -END(kernel_execve) +ENTRY(ret_from_kernel_execve) + movq %rdi, %rsp + movl $0, RAX(%rsp) + // RESTORE_REST + movq 0*8(%rsp), %r15 + movq 1*8(%rsp), %r14 + movq 2*8(%rsp), %r13 + movq 3*8(%rsp), %r12 + movq 4*8(%rsp), %rbp + movq 5*8(%rsp), %rbx + addq $(6*8), %rsp + jmp int_ret_from_sys_call +END(ret_from_kernel_execve) /* Call softirq on interrupt stack. Interrupts are off. */ ENTRY(call_softirq) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 6947ec968bf8..eae2dd5cd5a0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,25 +298,6 @@ sys_clone(unsigned long clone_flags, unsigned long newsp, return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid); } -/* - * sys_execve() executes a new program. - */ -long sys_execve(const char __user *name, - const char __user *const __user *argv, - const char __user *const __user *envp, struct pt_regs *regs) -{ - long error; - char *filename; - - filename = getname(name); - error = PTR_ERR(filename); - if (IS_ERR(filename)) - return error; - error = do_execve(filename, argv, envp, regs); - putname(filename); - return error; -} - /* * Idle related variables and functions */ diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index c9939875d267..25e7e9390d26 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -207,6 +207,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp) regs->cs = __USER_CS; regs->ip = new_ip; regs->sp = new_sp; + regs->flags = X86_EFLAGS_IF; /* * Free the old FP and other extended state */ diff --git a/arch/x86/kernel/sys_i386_32.c b/arch/x86/kernel/sys_i386_32.c deleted file mode 100644 index 0b0cb5fede19..000000000000 --- a/arch/x86/kernel/sys_i386_32.c +++ /dev/null @@ -1,40 +0,0 @@ -/* - * This file contains various random system calls that - * have a non-standard calling sequence on the Linux/i386 - * platform. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -#include - -/* - * Do a system call from kernel instead of calling sys_execve so we - * end up with proper pt_regs. - */ -int kernel_execve(const char *filename, - const char *const argv[], - const char *const envp[]) -{ - long __res; - asm volatile ("int $0x80" - : "=a" (__res) - : "0" (__NR_execve), "b" (filename), "c" (argv), "d" (envp) : "memory"); - return __res; -} diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 7a35a6e71d44..a47103fbc692 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -17,7 +17,7 @@ 8 i386 creat sys_creat 9 i386 link sys_link 10 i386 unlink sys_unlink -11 i386 execve ptregs_execve stub32_execve +11 i386 execve sys_execve stub32_execve 12 i386 chdir sys_chdir 13 i386 time sys_time compat_sys_time 14 i386 mknod sys_mknod diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c index b5408cecac6c..232e60504b3a 100644 --- a/arch/x86/um/sys_call_table_32.c +++ b/arch/x86/um/sys_call_table_32.c @@ -25,7 +25,6 @@ #define old_mmap sys_old_mmap #define ptregs_fork sys_fork -#define ptregs_execve sys_execve #define ptregs_iopl sys_iopl #define ptregs_vm86old sys_vm86old #define ptregs_clone i386_clone -- cgit v1.2.3