7 Hits in 0.43 sec

Illumination-invariant Face recognition by fusing thermal and visual images via gradient transfer [article]

Sumit Agarwal, Harshit S. Sikchi, Suparna Rooj, Shubhobrata Bhattacharya, Aurobinda Routray
2019 arXiv   pre-print
∇x − ∇vi s s .  ...  Hence from Eq. 1 and Eq. 2 , the fusion problem is formulated as minimization of the following objective function: ε(x) = ε 1 (x) + λε 2 (x) = 1 r x − ir r r + λ 1 s ∇x − ∇vi s s (3) where the first term  ... 
arXiv:1902.08802v1 fatcat:v6wtc7tmo5hajlqvx2basxdj7a

Lyapunov Barrier Policy Optimization [article]

Harshit Sikchi, Wenxuan Zhou, David Held
2021 arXiv   pre-print
a|s) − π B (a|s)) γ s P (s |s, a)L π B , (s ) ˆ (s) ≥ ( a (π(a|s) − π B (a|s)) γ s P (s |s, a)L π B , (s ) + d(s) +ˆ (s) ˆ (s)≥ a (π(a|s) − π B (a|s))Q Lπ B , (s, a) where Q Lπ B , (s, a) = d(s) +ˆ (s)  ...  ,ˆ (s, a) = c(s) +ˆ + γ s P (s |s, a)L π B ,ˆ (s ) = c(s) +ˆ + γ s P (s |s, a)[c(s ) +ˆ + s P π B (s |s ) (L π B ,ˆ (s ))] = ∞ t=0 γ tˆ + E ∞ t=0 γ t c(s t )|π B , a 0 = a, s 0 = s = ∞ t=0 γ tˆ + Q C  ... 
arXiv:2103.09230v1 fatcat:eypiinfoibfmfhz37ppjcghhda

Imitative Planning using Conditional Normalizing Flow [article]

Shubhankar Agarwal, Harshit Sikchi, Cole Gulino, Eric Wilkinson
2020 arXiv   pre-print
as b(s).  ...  transition function from state s to s when taking action a.  ... 
arXiv:2007.16162v2 fatcat:d5ifpjbgj5guvaoydlbpaxf3ny

Learning Off-Policy with Online Planning [article]

Harshit Sikchi, Wenxuan Zhou, David Held
2021 arXiv   pre-print
3 4 5 6  ...  Let E s∼π D,t D T V (M (.|s, a)||M (.|s, a)) ≤˜ m ∀s and max D T V (π D (a|s)||π * H (a|s)), D T V (π D (a|s)||π * H (a|s)) ≤˜ π ∀s.  ...  Let concentrability coefficientC be such that ∀s, a ν(s,a) d π D (s,a) ≤C where ν(s, a) is state-action distribution induced by any non-stationary policy.  ... 
arXiv:2008.10066v5 fatcat:xf7huncicna67a6uanzvxatgcm

A Ranking Game for Imitation Learning [article]

Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum
2022 arXiv   pre-print
a) [f π (s, a)] ≥ E ρ π (s,a) [f π (s, a)], ∀π.  ...  a) [R(s, a)] ≤ E ρ π 2 (s,a) [R(s, a)], ∀π 1 π 2 ∈ D.  ... 
arXiv:2202.03481v1 fatcat:5eue7bjio5baflnlxlbqfbn6km

f-IRL: Inverse Reinforcement Learning via State Marginal Matching [article]

Tianwei Ni, Harshit Sikchi, Yufei Wang, Tejus Gupta, Lisa Lee, Benjamin Eysenbach
2020 arXiv   pre-print
t )] − E st∼ρ θ,t [∇r θ (s t )] = T α (E s∼ρ E [∇r θ (s)] − E s∼ρ θ [∇r θ (s)]) = T α E s∼ρ θ ρ E (s) ρ θ (s) ∇r θ (s) − E s∼ρ θ [∇r θ (s)] (43) where ρ t (s) is state marginal at timestamp t, and ρ(s)  ...  = dρ θ (s) dr θ (s * ) dr θ (s * ) dθ ds * = 1 α p(τ )e T t=1 r θ (st)/α η τ (s)η τ (s * )dτ Z − T ρ θ (s)ρ θ (s * ) dr θ (s * ) dθ ds * = 1 αZ p(τ )e T t=1 r θ (st)/α η τ (s)η τ (s * ) dr θ (s * ) dθ  ... 
arXiv:2011.04709v2 fatcat:ei6lsqdoyzfxbgouxsgrmdgcim

SOPE: Spectrum of Off-Policy Estimators [article]

Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum
2021 arXiv   pre-print
We especially thank Jordan Schneider, Harshit Sikchi, and Prasoon Goyal for reading and giving suggestions on early drafts.  ...  πe (a|s) Assumption 1. For all sS and a ∈ A, the ratio πb (a|s) < ∞.  ... 
arXiv:2111.03936v3 fatcat:2xnufhiq6jgf7pgmvpzb5ruhl4