Distorting an image using a height map?

2018-05-30 19:02:59

I have a height map for an image, which tells me the offset of each pixel in the Z direction. My goal is to flatten a distorted image using only it's height map.

How would I go about doing this? I know the position of the camera, if that helps.

To do this, I was thinking about assuming that each pixel was a point on a plane, and then to translate each of those points vertically according to the Z-value I get from the height map, and from that translation (imagine you are looking at the points from above; the shift will cause the point to move around from your perspective).

From that projected shift, I could extract X and Y-shift of each pixel, which I could feed into cv.Remap() .

But I have no idea how I could get the projected 3D offset of a point with OpenCV, let alone construct a offset map out of it.

Here are my reference images for what I'm doing:

校准图像扭曲的图像

I know the angle of the lasers (45 degrees), and from the calibration images, I can calculate the height of the book really easily:

h(x) = sin(theta) * abs(calibration(x) - actual(x))

I do this for both lines and linearly interpolate the two lines to generate a surface using this approach (Python code. It's inside a loop):

height_grid[x][y] = heights_top[x] * (cv.GetSize(image)[1] - y) + heights_bottom[x] * y

I hope this helps ;)

Right now, this is what I have to dewarp the image. All that strange stuff in the middle projects a 3D coordinate onto the camera plane, given it's position (and the camera's location, rotation, etc.):

class Point:
  def __init__(self, x = 0, y = 0, z = 0):
    self.x = x
    self.y = y
    self.z = z

mapX = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)
mapY = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)

c = Point(CAMERA_POSITION[0], CAMERA_POSITION[1], CAMERA_POSITION[2])
theta = Point(CAMERA_ROTATION[0], CAMERA_ROTATION[1], CAMERA_ROTATION[2])
d = Point()
e = Point(0, 0, CAMERA_POSITION[2] + SENSOR_OFFSET)

costx = cos(theta.x)
costy = cos(theta.y)
costz = cos(theta.z)

sintx = sin(theta.x)
sinty = sin(theta.y)
sintz = sin(theta.z)


for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):

    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()

    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))

    mapX[y, x] = x + (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + (d.y - e.y) * (e.z / d.z)


print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

This is turning into a huge thread of images and code now... Anyways, this code chunk takes my 7 minutes to run on a 18MP camera image; that's way too long, and in the end, this approach does nothing to the image (the offset for each pixel is << 1 ).

Any ideas?

I ended up implementing my own solution:

for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):

    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()

    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))

    mapX[y, x] = x + 100.0 * (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + 100.0 * (d.y - e.y) * (e.z / d.z)


print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

This (slowly) remaps each pixel using the cv.Remap function, and this seems to kind of work...

Distortion based on distance from the camera only happens with a perspective projection. If you have the (x,y,z) position of a pixel, you can use the projection matrix of the camera to unproject the pixels back into world-space. With that information, you can render the pixels in an orthographic way. However, you may have missing data, due to the original perspective projection.

Separate your scene out as follows:

you have an unknown bitmap image I (x,y) -> (r,g,b)

you have a known height field H (x,y) -> h

you have a camera transform C (x,y,z) -> (u,v) which projects the scene to a screen plane

Note that the camera transform throws information away (you do not get a depth value for each screen pixel). You may also have bits of scene overlap on screen, in which case only the foremost gets shown - the rest is discarded. So in general this is not perfectly reversible.

you have a screenshot S (u,v) which is a result of C (x,y, H (x,y)) for x,y in I

you want to generate a screenshot S' (u',v') which is a result of C (x,y,0) for x,y in I

There are two obvious ways to approach this; both depend on having accurate values for the camera transform.

Ray-casting: for each pixel in S , cast a ray back into the scene. Find out where it hits the heightfield; this gives you (x,y) in the original image I , and the screen pixel gives you the color at that point. Once you have as much of I as you can recover, re-transform it to find S' .

Double-rendering: for each x,y in I , project to find (u,v) and (u',v'). Take the pixel-color from S (u,v) and copy it to S' (u',v').

Both methods will have sampling problems which be helped by super-sampling or interpolation; method 1 will leave empty spaces in occluded areas of the image, method 2 will 'project through' from the first surface.

Edit:

I had presumed you meant a CG-style heightfield, where each pixel in S is directly above the corresponding location in S'; but this is not how a page drapes over a surface. A page is fixed at the spine and is non-stretchy - lifting the center of a page pulls the free edge toward the spine.

Based on your sample image, you'll have to reverse this cumulative pulling - detect the spine centerline location and orientation and work progressively left and right, finding the change in height across the top and bottom of each vertical strip of page, calculating the resulting aspect-narrowing and skew, and reversing it to re-create the original flat page.

链接地址: http://www.djcxy.com/p/5186.html

上一篇: 如何在任何emacs缓冲区中使用org语法链接

下一篇: 使用高度图扭曲图像？